Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

Поиск
Список
Период
Сортировка
От Melanie Plageman
Тема Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Дата
Msg-id CAAKRu_YKc9e_=CZ4RB97Krx9TQb-zfNMoqdP45AYs5uZFqXeSA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)  (Andres Freund <andres@anarazel.de>)
Ответы Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi,

In the attached patch set, I've added in missing IO operations for
certain IO Paths as well as enumerating in the commit message which IO
Paths and IO Operations are not currently counted and or not possible.

There is a TODO in HandleWalWriterInterrupts() about removing
pgstat_report_wal() since it is immediately before a proc_exit()

I was wondering if LocalBufferAlloc() should increment the counter or if
I should wait until GetLocalBufferStorage() to increment the counter.

I also realized that I am not differentiating between IOPATH_SHARED and
IOPATH_STRATEGY for IOOP_FSYNC. But, given that we don't know what type
of buffer we are fsync'ing by the time we call register_dirty_segment(),
I'm not sure how we would fix this.

On Wed, Jul 6, 2022 at 3:20 PM Andres Freund <andres@anarazel.de> wrote:
On 2022-07-05 13:24:55 -0400, Melanie Plageman wrote:
> From 2d089e26236c55d1be5b93833baa0cf7667ba38d Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Tue, 28 Jun 2022 11:33:04 -0400
> Subject: [PATCH v22 1/3] Add BackendType for standalone backends
>
> All backends should have a BackendType to enable statistics reporting
> per BackendType.
>
> Add a new BackendType for standalone backends, B_STANDALONE_BACKEND (and
> alphabetize the BackendTypes). Both the bootstrap backend and single
> user mode backends will have BackendType B_STANDALONE_BACKEND.
>
> Author: Melanie Plageman <melanieplageman@gmail.com>
> Discussion: https://www.postgresql.org/message-id/CAAKRu_aaq33UnG4TXq3S-OSXGWj1QGf0sU%2BECH4tNwGFNERkZA%40mail.gmail.com
> ---
>  src/backend/utils/init/miscinit.c | 17 +++++++++++------
>  src/include/miscadmin.h           |  5 +++--
>  2 files changed, 14 insertions(+), 8 deletions(-)
>
> diff --git a/src/backend/utils/init/miscinit.c b/src/backend/utils/init/miscinit.c
> index eb43b2c5e5..07e6db1a1c 100644
> --- a/src/backend/utils/init/miscinit.c
> +++ b/src/backend/utils/init/miscinit.c
> @@ -176,6 +176,8 @@ InitStandaloneProcess(const char *argv0)
>  {
>       Assert(!IsPostmasterEnvironment);

> +     MyBackendType = B_STANDALONE_BACKEND;

Hm. This is used for singleuser mode as well as bootstrap. Should we
split those? It's not like bootstrap mode really matters for stats, so
I'm inclined not to.


I have no opinion currently.
It depends on how commonly you think developers might want separate
bootstrap and single user mode IO stats.
 

> @@ -375,6 +376,8 @@ BootstrapModeMain(int argc, char *argv[], bool check_only)
>        * out the initial relation mapping files.
>        */
>       RelationMapFinishBootstrap();
> +     // TODO: should this be done for bootstrap?
> +     pgstat_report_io_ops();

Hm. Not particularly useful, but also not harmful. But we don't need an
explicit call, because it'll be done at process exit too. At least I
think, it could be that it's different for bootstrap.



I've removed this and other occurrences which were before proc_exit()
(and thus redundant). (Though I did not explicitly check if it was
different for bootstrap.)
 

> diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
> index 2e146aac93..e6dbb1c4bb 100644
> --- a/src/backend/postmaster/autovacuum.c
> +++ b/src/backend/postmaster/autovacuum.c
> @@ -1712,6 +1712,9 @@ AutoVacWorkerMain(int argc, char *argv[])
>               recentXid = ReadNextTransactionId();
>               recentMulti = ReadNextMultiXactId();
>               do_autovacuum();
> +
> +             // TODO: should this be done more often somewhere in do_autovacuum()?
> +             pgstat_report_io_ops();
>       }

Don't think you need all these calls before process exit - it'll happen
via pgstat_shutdown_hook().

IMO it'd be a good idea to add pgstat_report_io_ops() to
pgstat_report_vacuum()/analyze(), so that the stats for a longrunning
autovac worker get updated more regularly.

noted and fixed.
 


> diff --git a/src/backend/postmaster/bgwriter.c b/src/backend/postmaster/bgwriter.c
> index 91e6f6ea18..87e4b9e9bd 100644
> --- a/src/backend/postmaster/bgwriter.c
> +++ b/src/backend/postmaster/bgwriter.c
> @@ -242,6 +242,7 @@ BackgroundWriterMain(void)

>               /* Report pending statistics to the cumulative stats system */
>               pgstat_report_bgwriter();
> +             pgstat_report_io_ops();

>               if (FirstCallSinceLastCheckpoint())
>               {

How about moving the pgstat_report_io_ops() into
pgstat_report_bgwriter(), pgstat_report_autovacuum() etc? Seems
unnecessary to have multiple pgstat_* calls in these places.



noted and fixed.
 

> +/*
> + * Flush out locally pending IO Operation statistics entries
> + *
> + * If nowait is true, this function returns false on lock failure. Otherwise
> + * this function always returns true. Writer processes are mutually excluded
> + * using LWLock, but readers are expected to use change-count protocol to avoid
> + * interference with writers.
> + *
> + * If nowait is true, this function returns true if the lock could not be
> + * acquired. Otherwise return false.
> + *
> + */
> +bool
> +pgstat_flush_io_ops(bool nowait)
> +{
> +     PgStat_IOPathOps *dest_io_path_ops;
> +     PgStatShared_BackendIOPathOps *stats_shmem;
> +
> +     PgBackendStatus *beentry = MyBEEntry;
> +
> +     if (!have_ioopstats)
> +             return false;
> +
> +     if (!beentry || beentry->st_backendType == B_INVALID)
> +             return false;
> +
> +     stats_shmem = &pgStatLocal.shmem->io_ops;
> +
> +     if (!nowait)
> +             LWLockAcquire(&stats_shmem->lock, LW_EXCLUSIVE);
> +     else if (!LWLockConditionalAcquire(&stats_shmem->lock, LW_EXCLUSIVE))
> +             return true;

Wonder if it's worth making the lock specific to the backend type?

I've added another Lock into PgStat_IOPathOps so that each BackendType
can be locked separately. But, I've also kept the lock in
PgStatShared_BackendIOPathOps so that reset_all and snapshot could be
done easily.
 


> +     dest_io_path_ops =
> +             &stats_shmem->stats[backend_type_get_idx(beentry->st_backendType)];
> +

This could be done before acquiring the lock, right?


> +void
> +pgstat_io_ops_snapshot_cb(void)
> +{
> +     PgStatShared_BackendIOPathOps *stats_shmem = &pgStatLocal.shmem->io_ops;
> +     PgStat_IOPathOps *snapshot_ops = pgStatLocal.snapshot.io_path_ops;
> +     PgStat_IOPathOps *reset_ops;
> +
> +     PgStat_IOPathOps *reset_offset = stats_shmem->reset_offset;
> +     PgStat_IOPathOps reset[BACKEND_NUM_TYPES];
> +
> +     pgstat_copy_changecounted_stats(snapshot_ops,
> +                     &stats_shmem->stats, sizeof(stats_shmem->stats),
> +                     &stats_shmem->changecount);

This doesn't make sense - with multiple writers you can't use the
changecount approach (and you don't in the flush part above).


> +     LWLockAcquire(&stats_shmem->lock, LW_SHARED);
> +     memcpy(&reset, reset_offset, sizeof(stats_shmem->stats));
> +     LWLockRelease(&stats_shmem->lock);

Which then also means that you don't need the reset offset stuff. It's
only there because with the changecount approach we can't take a lock to
reset the stats (since there is no lock). With a lock you can just reset
the shared state.

Yes, I believe I have cleaned up all of this embarrassing mess. I use the
lock in PgStatShared_BackendIOPathOps for reset all and snapshot and the
locks in PgStat_IOPathOps for flush.
 


> +void
> +pgstat_count_io_op(IOOp io_op, IOPath io_path)
> +{
> +     PgStat_IOOpCounters *pending_counters = &pending_IOOpStats.data[io_path];
> +     PgStat_IOOpCounters *cumulative_counters =
> +                     &cumulative_IOOpStats.data[io_path];

the pending_/cumultive_ prefix before an uppercase-first camelcase name
seems ugly...

> +     switch (io_op)
> +     {
> +             case IOOP_ALLOC:
> +                     pending_counters->allocs++;
> +                     cumulative_counters->allocs++;
> +                     break;
> +             case IOOP_EXTEND:
> +                     pending_counters->extends++;
> +                     cumulative_counters->extends++;
> +                     break;
> +             case IOOP_FSYNC:
> +                     pending_counters->fsyncs++;
> +                     cumulative_counters->fsyncs++;
> +                     break;
> +             case IOOP_WRITE:
> +                     pending_counters->writes++;
> +                     cumulative_counters->writes++;
> +                     break;
> +     }
> +
> +     have_ioopstats = true;
> +}

Doing two math ops / memory accesses every time seems off. Seems better
to maintain cumultive_counters whenever reporting stats, just before
zeroing pending_counters?

I've gone ahead and cut the cumulative counters concept.
 


> +/*
> + * Report IO operation statistics
> + *
> + * This works in much the same way as pgstat_flush_io_ops() but is meant for
> + * BackendTypes like bgwriter for whom pgstat_report_stat() will not be called
> + * frequently enough to keep shared memory stats fresh.
> + * Backends not typically calling pgstat_report_stat() can invoke
> + * pgstat_report_io_ops() explicitly.
> + */
> +void
> +pgstat_report_io_ops(void)
> +{

This shouldn't be needed - the flush function above can be used.

Fixed.
 


> +     PgStat_IOPathOps *dest_io_path_ops;
> +     PgStatShared_BackendIOPathOps *stats_shmem;
> +
> +     PgBackendStatus *beentry = MyBEEntry;
> +
> +     Assert(!pgStatLocal.shmem->is_shutdown);
> +     pgstat_assert_is_up();
> +
> +     if (!have_ioopstats)
> +             return;
> +
> +     if (!beentry || beentry->st_backendType == B_INVALID)
> +             return;

Is there a case where this may be called where we have no beentry?

Why not just use MyBackendType?

Fixed.
 


> +     stats_shmem = &pgStatLocal.shmem->io_ops;
> +
> +     dest_io_path_ops =
> +             &stats_shmem->stats[backend_type_get_idx(beentry->st_backendType)];
> +
> +     pgstat_begin_changecount_write(&stats_shmem->changecount);

A mentioned before, the changecount stuff doesn't apply here. You need a
lock.

Fixed.
 


> +PgStat_IOPathOps *
> +pgstat_fetch_backend_io_path_ops(void)
> +{
> +     pgstat_snapshot_fixed(PGSTAT_KIND_IOOPS);
> +     return pgStatLocal.snapshot.io_path_ops;
> +}
> +
> +PgStat_Counter
> +pgstat_fetch_cumulative_io_ops(IOPath io_path, IOOp io_op)
> +{
> +     PgStat_IOOpCounters *counters = &cumulative_IOOpStats.data[io_path];
> +
> +     switch (io_op)
> +     {
> +             case IOOP_ALLOC:
> +                     return counters->allocs;
> +             case IOOP_EXTEND:
> +                     return counters->extends;
> +             case IOOP_FSYNC:
> +                     return counters->fsyncs;
> +             case IOOP_WRITE:
> +                     return counters->writes;
> +             default:
> +                     elog(ERROR, "IO Operation %s for IO Path %s is undefined.",
> +                                     pgstat_io_op_desc(io_op), pgstat_io_path_desc(io_path));
> +     }
> +}

There's currently no user for this, right? Maybe let's just defer the
cumulative stuff until we need it?

Removed.
 


> +const char *
> +pgstat_io_path_desc(IOPath io_path)
> +{
> +     const char *io_path_desc = "Unknown IO Path";
> +

This should be unreachable, right?

Changed it to an error.
 


> From f2b5b75f5063702cbc3c64efdc1e7ef3cf1acdb4 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Mon, 4 Jul 2022 15:44:17 -0400
> Subject: [PATCH v22 3/3] Add system view tracking IO ops per backend type

> Add pg_stat_buffers, a system view which tracks the number of IO
> operations (allocs, writes, fsyncs, and extends) done through each IO
> path (e.g. shared buffers, local buffers, unbuffered IO) by each type of
> backend.

I think I like pg_stat_io a bit better? Nearly everything in here seems
to fit better in that.

I guess we could split out buffers allocated, but that's actually
interesting in the context of the kind of IO too.

changed it to pg_stat_io
 

> +CREATE VIEW pg_stat_buffers AS
> +SELECT
> +       b.backend_type,
> +       b.io_path,
> +       b.alloc,
> +       b.extend,
> +       b.fsync,
> +       b.write,
> +       b.stats_reset
> +FROM pg_stat_get_buffers() b;

Do we want to expose all data to all users? I guess pg_stat_bgwriter
does? But this does split things out a lot more...


I didn't see another similar example limiting access.


>  DROP TABLE trunc_stats_test, trunc_stats_test1, trunc_stats_test2, trunc_stats_test3, trunc_stats_test4;
>  DROP TABLE prevstats;
> +SELECT pg_stat_reset_shared('buffers');
> + pg_stat_reset_shared
> +----------------------
> +
> +(1 row)
> +
> +SELECT pg_stat_force_next_flush();
> + pg_stat_force_next_flush
> +--------------------------
> +
> +(1 row)
> +
> +SELECT write = 0 FROM pg_stat_buffers WHERE io_path = 'Shared' and backend_type = 'checkpointer';
> + ?column?
> +----------
> + t
> +(1 row)


Don't think you can rely on that. The lookup of the view, functions
might have needed to load catalog data, which might have needed to evict
buffers.  I think you can do something more reliable by checking that
there's more written buffers after a checkpoint than before, or such.


Yes, per an off list suggestion by you, I have changed the tests to use a
sum of writes. I've also added a test for IOPATH_LOCAL and fixed some of
the missing calls to count IO Operations for IOPATH_LOCAL and
IOPATH_STRATEGY.

I struggled to come up with a way to test writes for a particular
type of backend are counted correctly since a dirty buffer could be
written out by another type of backend before the target BackendType has
a chance to write it out.

I also struggled to come up with a way to test IO operations for
background workers. I'm not sure of a way to deterministically have a
background worker do a particular kind of IO in a test scenario.

I'm not sure how to cause a strategy "extend" for testing.
 

Would be nice to have something testing that the ringbuffer stats stuff
does something sensible - that feels not entirely trivial.


I've added a test to test that reused strategy buffers are counted as
allocs. I would like to add a test which checks that if a buffer in the
ring is pinned and thus not reused, that it is not counted as a strategy
alloc, but I found it challenging without a way to pause vacuuming, pin
a buffer, then resume vacuuming.

Thanks,
Melanie
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: [BUG] Logical replication failure "ERROR: could not map filenode "base/13237/442428" to relation OID" with catalog modifying txns
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: DropRelFileLocatorBuffers