Re: [Proposal] Add accumulated statistics for wait event

Поиск
Список
Период
Сортировка
От Jehan-Guillaume de Rorthais
Тема Re: [Proposal] Add accumulated statistics for wait event
Дата
Msg-id 20210614161032.60172b5b@firost
обсуждение исходный текст
Ответ на Re: [Proposal] Add accumulated statistics for wait event  (Andres Freund <andres@anarazel.de>)
Ответы Re: [Proposal] Add accumulated statistics for wait event  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi Andres, Hi all,

First, thank you for your feedback!

Please find in attachment a patch implementing accumulated wait event stats
only from the backend point of view. As I wrote when I reviewed and rebased the
existing patch, I was uncomfortable with the global approach. I still volunteer
to work/review the original approach is required.

See bellow for comments and some more explanations about what I think might be
improvements over the previous patch.

On Fri, 11 Jun 2021 12:18:07 -0700
Andres Freund <andres@anarazel.de> wrote:

> On 2021-06-05 00:53:44 +0200, Jehan-Guillaume de Rorthais wrote:
> > From 88c2779679c5c9625ca5348eec0543daab5ccab4 Mon Sep 17 00:00:00 2001
> > From: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
> > Date: Tue, 1 Jun 2021 13:25:57 +0200
> > Subject: [PATCH 1/2] Add pg_stat_waitaccum view.
> >
> > pg_stat_waitaccum shows counts and duration of each wait events.
> > Each backend/backgrounds counts and measures the time of wait event
> > in every pgstat_report_wait_start and pgstat_report_wait_end. They
> > store those info into their local variables and send to Statistics
> > Collector. We can get those info via Statistics Collector.
> >
> > For reducing overhead, I implemented statistic hash instead of
> > dynamic hash. I also implemented track_wait_timing which
> > determines wait event duration is collected or not.
>
> I object to adding this overhead. The whole selling point for wait
> events was that they are low overhead. I since spent time reducing the
> overhead further, because even just the branches for checking if
> track_activity is enabled are measurable (225a22b19ed).

Agree. The previous patch I rebased was to review it and reopen this discussion,
I even added a small FIXME in pgstat_report_wait_end and
pgstat_report_wait_start about your work:

  //FIXME: recent patch to speed up this call.

In the patch in attachment, I tried to fix this by using kind of an internal
hook for pgstat_report_wait_start and pgstat_report_wait_end. This allows to
"instrument" wait events only when required, on the fly, dynamically.

Moreover, I removed the hash structure for a simple static array for faster
access.

> > From ddb1adc5cd9acc9bc9de16d0cf057124b09fe1e3 Mon Sep 17 00:00:00 2001
> > From: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
> > Date: Fri, 4 Jun 2021 18:14:51 +0200
> > Subject: [PATCH 2/2] [POC] Change measuring method of wait event time from
> >  INSTR_TIME to rdtsc.
> >
> > This patch changes measuring method of wait event time from INSTR_TIME
> > (which uses gettimeofday or clock_gettime) to rdtsc. This might reduce the
> > overhead of measuring overhead.
> >
> > Any supports like changing clock cycle to actual time or error handling are
> > not currently implemented.
>
> rdtsc is a serializing (*) instruction - that's the expensive part. On linux
> clock_gettime() doesn't actually need a syscall. While the vdso call
> implies a bit of overhead over a raw rdtsc, it's a relatively small part
> of the overhead.  See
> https://www.postgresql.org/message-id/20200612232810.f46nbqkdhbutzqdg%40alap3.anarazel.de

I choose to remove all this rdtsc part from my patch as this wasn't clear how
much faster it was compare to simpler vdso functions and how to accurately
extract a human time.


About my take on $subject, for the sake of simplicity of this PoC, I added
instrumentation to log_statement_stats. Despite the query context of the
reported log, they are really accumulated stats.

The patch updated pg_stat_get_waitaccum() as well to be able to report the
accumulated wait events from your interactive or batch session.

So using my previous fe-time demo client, you can test it using:

  PGOPTIONS="--log_statement_stats=on" ./fe-time 100

From logs, I now have (notice the last line):

  LOG:  duration: 3837.194 ms  statement: SELECT * FROM pgbench_accounts
  LOG:  QUERY STATISTICS
  DETAIL:  ! system usage stats:
           !       0.087444 s user, 0.002106 s system, 3.837202 s elapsed
           !       [0.087444 s user, 0.003974 s system total]
           !       25860 kB max resident size
           !       0/0 [0/0] filesystem blocks in/out
           !       0/303 [0/697] page faults/reclaims, 0 [0] swaps
           !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
           !       4/18 [5/18] voluntary/involuntary context switches
           !       Client/ClientWrite 4 calls, 3747102 us elapsed


Using pgbench scale factor 10, the copy query for pgbench_accounts looks like:

  LOG:  duration: 2388.081 ms  statement: copy pgbench_accounts from stdin
  LOG:  QUERY STATISTICS
  DETAIL:  ! system usage stats:
           !       1.373756 s user, 0.252860 s system, 2.388100 s elapsed
           !       [1.397015 s user, 0.264951 s system total]
           !       37788 kB max resident size
           !       0/641584 [0/642056] filesystem blocks in/out
           !       194/4147 [195/4728] page faults/reclaims, 0 [0] swaps
           !       0 [0] signals rcvd, 0/0 [0/0] messages rcvd/sent
           !       3263/92 [3390/102] voluntary/involuntary context switches
           !       LWLock/WALBufMapping 3 calls, 654 us elapsed
           !       LWLock/WALWrite 3 calls, 60680 us elapsed
           !       LWLock/CheckpointerComm 3 calls, 318 us elapsed
           !       Client/ClientRead 1 calls, 151 us elapsed
           !       IO/DataFileExtend 16397 calls, 94000 us elapsed
           !       IO/DataFileWrite 14346 calls, 45911 us elapsed
           !       IO/WALInitSync 6 calls, 334840 us elapsed
           !       IO/WALInitWrite 6 calls, 48040 us elapsed
           !       IO/WALSync 17 calls, 353334 us elapsed
           !       IO/WALWrite 8362 calls, 38401 us elapsed

Now, I'm on a fence about the user facing interaction. I'll keep thinking about
this and report this week. In the meantime, any feedback about the current
implementation backbone is welcome.

Thanks!

Regards,

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: A new function to wait for the backend exit after termination
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Failure in subscription test 004_sync.pl