Обсуждение: executor stats / page reclaims

Поиск
Список
Период
Сортировка

executor stats / page reclaims

От
Uwe Bartels
Дата:
Hi,

I'm experiencing extremely different response times for some complex pgsql functions. extremly different means from 20ms - 500ms and up to 20s.
I have to say that the complete database fits in memory (64GB).
shared_buffers is set to 16GB. the rest ist used by thefs cache and conections/work_mem.
the server is running under linux rhel5 and is 8.4.5.
the filesystem is ext3 due to the lack of xfs support by redhat.

- I have for the one function response time of 20 ms with no shared blocks read.
- If there are shared blocks to be read I get immediatly response time of at least 80ms and up to 200ms.
- If i see page reclaims I always get response times above 400ms
- I'm guessing that 20s response time come together with i/o.

As far as I read page reclaims occur probably here, because fs cache has to free memory for allocations for the client. Am I right?
So how can i prevent page reclaims?

What do the number is within the brackets mean e.g. 0/3330 [0/4269] page faults/reclaims?
Or is this output somewhere explained? I didn't find anything.

best regards,
Uwe


this is one output of an execution without page reclaims:
LOG:  EXECUTOR STATISTICS
DETAIL:  ! system usage stats:
!       0.071247 elapsed 0.053992 user 0.016998 system sec
!       [0.056991 user 0.018997 sys total]
!       0/0 [0/0] filesystem blocks in/out
!       0/3330 [0/4269] page faults/reclaims, 0 [0] swaps
!       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!       0/0 [8/0] voluntary/involuntary context switches
! buffer usage stats:
!       Shared blocks:          1 read,          0 written, buffer hit rate = 99.97%
!       Local  blocks:          0 read,          0 written, buffer hit rate = 0.00%
!       Direct blocks:          0 read,          0 written
Time: 73.154 ms


this is one output of an execution with page reclaims:
LOG:  EXECUTOR STATISTICS
DETAIL:  ! system usage stats:
!       0.627502 elapsed 0.461930 user 0.075988 system sec
!       [0.465929 user 0.078987 sys total]
!       0/0 [0/0] filesystem blocks in/out
!       0/20941 [0/21893] page faults/reclaims, 0 [0] swaps
!       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
!       12/7 [20/7] voluntary/involuntary context switches
! buffer usage stats:
!       Shared blocks:         48 read,          0 written, buffer hit rate = 99.72%
!       Local  blocks:          0 read,          0 written, buffer hit rate = 0.00%
!       Direct blocks:          0 read,          0 written
Time: 629.823 ms



Re: executor stats / page reclaims

От
Robert Haas
Дата:
On Thu, Nov 18, 2010 at 5:10 AM, Uwe Bartels <uwe.bartels@gmail.com> wrote:
> I'm experiencing extremely different response times for some complex pgsql
> functions. extremly different means from 20ms - 500ms and up to 20s.
> I have to say that the complete database fits in memory (64GB).
> shared_buffers is set to 16GB. the rest ist used by thefs cache and
> conections/work_mem.
> the server is running under linux rhel5 and is 8.4.5.
> the filesystem is ext3 due to the lack of xfs support by redhat.
>
> - I have for the one function response time of 20 ms with no shared blocks
> read.
> - If there are shared blocks to be read I get immediatly response time of at
> least 80ms and up to 200ms.
> - If i see page reclaims I always get response times above 400ms
> - I'm guessing that 20s response time come together with i/o.
>
> As far as I read page reclaims occur probably here, because fs cache has to
> free memory for allocations for the client. Am I right?
> So how can i prevent page reclaims?
>
> What do the number is within the brackets mean e.g. 0/3330 [0/4269] page
> faults/reclaims?
> Or is this output somewhere explained? I didn't find anything.

I think you're probably going about this the wrong way.  Rather than
mess around with those executor stats, which I think are telling you
almost nothing, I'd enable log_min_duration_statement or load up
auto_explain and try to find out the specific queries that are
performing badly, and the plans for those queries.  Post the queries
that are performing badly and the EXPLAIN ANALYZE output for those
queries, and you'll get a lot more help.

As for the numbers in brackets, a quick glance at the source code
suggests that the bracketed numbers are cumulative since program start
and the unbracketed numbers are deltas.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company