Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Дата
Msg-id 20211209191752.dupgg4omrc7gck2l@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)  (Melanie Plageman <melanieplageman@gmail.com>)
Ответы Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)  (Melanie Plageman <melanieplageman@gmail.com>)
Список pgsql-hackers
Hi,

On 2021-12-03 15:02:24 -0500, Melanie Plageman wrote:
> From e0f7f3dfd60a68fa01f3c023bcdb69305ade3738 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Mon, 11 Oct 2021 16:15:06 -0400
> Subject: [PATCH v17 1/7] Read-only atomic backend write function
> 
> For counters in shared memory which can be read by any backend but only
> written to by one backend, an atomic is still needed to protect against
> torn values, however, pg_atomic_fetch_add_u64() is overkill for
> incrementing the counter. pg_atomic_inc_counter() is a helper function
> which can be used to increment these values safely but without
> unnecessary overhead.
>
> Author: Thomas Munro
> ---
>  src/include/port/atomics.h | 11 +++++++++++
>  1 file changed, 11 insertions(+)
> 
> diff --git a/src/include/port/atomics.h b/src/include/port/atomics.h
> index 856338f161..39ffff24dd 100644
> --- a/src/include/port/atomics.h
> +++ b/src/include/port/atomics.h
> @@ -519,6 +519,17 @@ pg_atomic_sub_fetch_u64(volatile pg_atomic_uint64 *ptr, int64 sub_)
>      return pg_atomic_sub_fetch_u64_impl(ptr, sub_);
>  }
>  
> +/*
> + * On modern systems this is really just *counter++.  On some older systems
> + * there might be more to it, due to inability to read and write 64 bit values
> + * atomically.
> + */
> +static inline void
> +pg_atomic_inc_counter(pg_atomic_uint64 *counter)
> +{
> +    pg_atomic_write_u64(counter, pg_atomic_read_u64(counter) + 1);
> +}

I wonder if it's worth putting something in the name indicating that this is
not actual atomic RMW operation. Perhaps adding _unlocked?



> From b0e193cfa08f0b8cf1be929f26fe38f06a39aeae Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Wed, 24 Nov 2021 10:32:56 -0500
> Subject: [PATCH v17 2/7] Add IO operation counters to PgBackendStatus
> 
> Add an array of counters in PgBackendStatus which count the buffers
> allocated, extended, fsynced, and written by a given backend. Each "IO
> Op" (alloc, fsync, extend, write) is counted per "IO Path" (direct,
> local, shared, or strategy). "local" and "shared" IO Path counters count
> operations on local and shared buffers. The "strategy" IO Path counts
> buffers alloc'd/written/read/fsync'd as part of a BufferAccessStrategy.
> The "direct" IO Path counts blocks of IO which are read, written, or
> fsync'd using smgrwrite/extend/immedsync directly (as opposed to through
> [Local]BufferAlloc()).
> 
> With this commit, all backends increment a counter in their
> PgBackendStatus when performing an IO operation. This is in preparation
> for future commits which will persist these stats upon backend exit and
> use the counters to provide observability of database IO operations.
> 
> Note that this commit does not add code to increment the "direct" path.
> A separate proposed patch [1] which would add wrappers for smgrwrite(),
> smgrextend(), and smgrimmedsync() would provide a good location to call
> pgstat_inc_ioop() for unbuffered IO and avoid regressions for future
> users of these functions.
>
> [1] https://www.postgresql.org/message-id/CAAKRu_aw72w70X1P%3Dba20K8iGUvSkyz7Yk03wPPh3f9WgmcJ3g%40mail.gmail.com

On longer thread it's nice for committers to already have Reviewed-By: in the
commit message.

> diff --git a/src/backend/utils/activity/backend_status.c b/src/backend/utils/activity/backend_status.c
> index 7229598822..413cc605f8 100644
> --- a/src/backend/utils/activity/backend_status.c
> +++ b/src/backend/utils/activity/backend_status.c
> @@ -399,6 +399,15 @@ pgstat_bestart(void)
>      lbeentry.st_progress_command = PROGRESS_COMMAND_INVALID;
>      lbeentry.st_progress_command_target = InvalidOid;
>      lbeentry.st_query_id = UINT64CONST(0);
> +    for (int io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++)
> +    {
> +        IOOps       *io_ops = &lbeentry.io_path_stats[io_path];
> +
> +        pg_atomic_init_u64(&io_ops->allocs, 0);
> +        pg_atomic_init_u64(&io_ops->extends, 0);
> +        pg_atomic_init_u64(&io_ops->fsyncs, 0);
> +        pg_atomic_init_u64(&io_ops->writes, 0);
> +    }
>  
>      /*
>       * we don't zero st_progress_param here to save cycles; nobody should

nit: I think we nearly always have a blank line before loops


> diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
> index 646126edee..93f1b4bcfc 100644
> --- a/src/backend/utils/init/postinit.c
> +++ b/src/backend/utils/init/postinit.c
> @@ -623,6 +623,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
>          RegisterTimeout(CLIENT_CONNECTION_CHECK_TIMEOUT, ClientCheckTimeoutHandler);
>      }
>  
> +    pgstat_beinit();
>      /*
>       * Initialize local process's access to XLOG.
>       */

nit: same with multi-line comments.


> @@ -649,6 +650,7 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
>           */
>          CreateAuxProcessResourceOwner();
>  
> +        pgstat_bestart();
>          StartupXLOG();
>          /* Release (and warn about) any buffer pins leaked in StartupXLOG */
>          ReleaseAuxProcessResources(true);
> @@ -676,7 +678,6 @@ InitPostgres(const char *in_dbname, Oid dboid, const char *username,
>      EnablePortalManager();
>  
>      /* Initialize status reporting */
> -    pgstat_beinit();

I'd like to see changes like moving this kind of thing around broken around
and committed separately. It's much easier to pinpoint breakage if the CF
breaks after moving just pgstat_beinit() around, rather than when committing
this considerably larger patch. And reordering subsystem initialization has
the habit of causing problems...


> +/* ----------
> + * IO Stats reporting utility types
> + * ----------
> + */
> +
> +typedef enum IOOp
> +{
> +    IOOP_ALLOC,
> +    IOOP_EXTEND,
> +    IOOP_FSYNC,
> +    IOOP_WRITE,
> +} IOOp;
> [...]
> +/*
> + * Structure for counting all types of IOOps for a live backend.
> + */
> +typedef struct IOOps
> +{
> +    pg_atomic_uint64 allocs;
> +    pg_atomic_uint64 extends;
> +    pg_atomic_uint64 fsyncs;
> +    pg_atomic_uint64 writes;
> +} IOOps;

To me IOop and IOOps sound to much alike - even though they're really kind of
separate things. s/IOOps/IOOpCounters/ maybe?


> @@ -3152,6 +3156,14 @@ pgstat_shutdown_hook(int code, Datum arg)
>  {
>      Assert(!pgstat_is_shutdown);
>  
> +    /*
> +     * Only need to send stats on IO Ops for IO Paths when a process exits.
> +     * Users requiring IO Ops for both live and exited backends can read from
> +     * live backends' PgBackendStatus and sum this with totals from exited
> +     * backends persisted by the stats collector.
> +     */
> +    pgstat_send_buffers();

Perhaps something like this comment belongs somewhere at the top of the file,
or in the header, or ...? It's a fairly central design piece, and it's not
obvious one would need to look in the shutdown hook for it?


> +/*
> + * Before exiting, a backend sends its IO op statistics to the collector so
> + * that they may be persisted.
> + */
> +void
> +pgstat_send_buffers(void)
> +{
> +    PgStat_MsgIOPathOps msg;
> +
> +    PgBackendStatus *beentry = MyBEEntry;
> +
> +    /*
> +     * Though some backends with type B_INVALID (such as the single-user mode
> +     * process) do initialize and increment IO operations stats, there is no
> +     * spot in the array of IO operations for backends of type B_INVALID. As
> +     * such, do not send these to the stats collector.
> +     */
> +    if (!beentry || beentry->st_backendType == B_INVALID)
> +        return;

Why does single user mode use B_INVALID? That doesn't seem quite right.


> +    memset(&msg, 0, sizeof(msg));
> +    msg.backend_type = beentry->st_backendType;
> +
> +    pgstat_sum_io_path_ops(msg.iop.io_path_ops,
> +                           (IOOps *) &beentry->io_path_stats);
> +
> +    pgstat_setheader(&msg.m_hdr, PGSTAT_MTYPE_IO_PATH_OPS);
> +    pgstat_send(&msg, sizeof(msg));
> +}

It seems worth having a path skipping sending the message if there was no IO?



> +/*
> + * Helper function to sum all live IO Op stats for all IO Paths (e.g. shared,
> + * local) to those in the equivalent stats structure for exited backends. Note
> + * that this adds and doesn't set, so the destination stats structure should be
> + * zeroed out by the caller initially. This would commonly be used to transfer
> + * all IO Op stats for all IO Paths for a particular backend type to the
> + * pgstats structure.
> + */
> +void
> +pgstat_sum_io_path_ops(PgStatIOOps *dest, IOOps *src)
> +{
> +    for (int io_path = 0; io_path < IOPATH_NUM_TYPES; io_path++)
> +    {

Sacriligeous, but I find io_path a harder to understand variable name for the
counter than i (or io_path_off or ...) ;)


> +static void
> +pgstat_recv_io_path_ops(PgStat_MsgIOPathOps *msg, int len)
> +{
> +    PgStatIOOps *src_io_path_ops;
> +    PgStatIOOps *dest_io_path_ops;
> +
> +    /*
> +     * Subtract 1 from message's BackendType to get a valid index into the
> +     * array of IO Ops which does not include an entry for B_INVALID
> +     * BackendType.
> +     */
> +    Assert(msg->backend_type > B_INVALID);

Probably worth also asserting the upper boundary?



> From f972ea87270feaed464a74fb6541ac04b4fc7d98 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Wed, 24 Nov 2021 11:39:48 -0500
> Subject: [PATCH v17 4/7] Add "buffers" to pgstat_reset_shared_counters
> 
> Backends count IO operations for various IO paths in their PgBackendStatus.
> Upon exit, they send these counts to the stats collector. Prior to this commit,
> these IO Ops stats would have been reset when the target was "bgwriter".
> 
> With this commit, target "bgwriter" no longer will cause the IO operations
> stats to be reset, and the IO operations stats can be reset with new target,
> "buffers".
> ---
>  doc/src/sgml/monitoring.sgml                |  2 +-
>  src/backend/postmaster/pgstat.c             | 83 +++++++++++++++++++--
>  src/backend/utils/activity/backend_status.c | 29 +++++++
>  src/include/pgstat.h                        |  8 +-
>  src/include/utils/backend_status.h          |  2 +
>  5 files changed, 117 insertions(+), 7 deletions(-)
> 
> diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
> index 62f2a3332b..bda3eef309 100644
> --- a/doc/src/sgml/monitoring.sgml
> +++ b/doc/src/sgml/monitoring.sgml
> @@ -3604,7 +3604,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
>         <structfield>stats_reset</structfield> <type>timestamp with time zone</type>
>        </para>
>        <para>
> -       Time at which these statistics were last reset
> +       Time at which these statistics were last reset.
>        </para></entry>
>       </row>
>      </tbody>

Hm?

Shouldn't this new reset target be documented?


> +/*
> + * Helper function to collect and send live backends' current IO operations
> + * stats counters when a stats reset is initiated so that they may be deducted
> + * from future totals.
> + */
> +static void
> +pgstat_send_buffers_reset(PgStat_MsgResetsharedcounter *msg)
> +{
> +    PgStatIOPathOps ops[BACKEND_NUM_TYPES];
> +
> +    memset(ops, 0, sizeof(ops));
> +    pgstat_report_live_backend_io_path_ops(ops);
> +
> +    /*
> +     * Iterate through the array of IO Ops for all IO Paths for each
> +     * BackendType. Because the array does not include a spot for BackendType
> +     * B_INVALID, add 1 to the index when setting backend_type so that there is
> +     * no confusion as to the BackendType with which this reset message
> +     * corresponds.
> +     */
> +    for (int backend_type_idx = 0; backend_type_idx < BACKEND_NUM_TYPES; backend_type_idx++)
> +    {
> +        msg->m_backend_resets.backend_type = backend_type_idx + 1;
> +        memcpy(&msg->m_backend_resets.iop, &ops[backend_type_idx],
> +                sizeof(msg->m_backend_resets.iop));
> +        pgstat_send(msg, sizeof(PgStat_MsgResetsharedcounter));
> +    }
> +}

Probably worth explaining why multiple messages are sent?


> @@ -5583,10 +5621,45 @@ pgstat_recv_resetsharedcounter(PgStat_MsgResetsharedcounter *msg, int len)
>  {
>      if (msg->m_resettarget == RESET_BGWRITER)
>      {
> -        /* Reset the global, bgwriter and checkpointer statistics for the cluster. */
> -        memset(&globalStats, 0, sizeof(globalStats));
> +        /*
> +         * Reset the global bgwriter and checkpointer statistics for the
> +         * cluster.
> +         */
> +        memset(&globalStats.checkpointer, 0, sizeof(globalStats.checkpointer));
> +        memset(&globalStats.bgwriter, 0, sizeof(globalStats.bgwriter));
>          globalStats.bgwriter.stat_reset_timestamp = GetCurrentTimestamp();
>      }

Oh, is this a live bug?


> +        /*
> +         * Subtract 1 from the BackendType to arrive at a valid index in the
> +         * array, as it does not contain a spot for B_INVALID BackendType.
> +         */

Instead of repeating a comment about +- 1 in a bunch of places, would it look
better to have two helper inline functions for this purpose?



> +/*
> +* When adding a new column to the pg_stat_buffers view, add a new enum
> +* value here above COLUMN_LENGTH.
> +*/
> +enum
> +{
> +    COLUMN_BACKEND_TYPE,
> +    COLUMN_IO_PATH,
> +    COLUMN_ALLOCS,
> +    COLUMN_EXTENDS,
> +    COLUMN_FSYNCS,
> +    COLUMN_WRITES,
> +    COLUMN_RESET_TIME,
> +    COLUMN_LENGTH,
> +};

COLUMN_LENGTH seems like a fairly generic name...



> From 9f22da9041e1e1fbc0ef003f5f78f4e72274d438 Mon Sep 17 00:00:00 2001
> From: Melanie Plageman <melanieplageman@gmail.com>
> Date: Wed, 24 Nov 2021 12:20:10 -0500
> Subject: [PATCH v17 6/7] Remove superfluous bgwriter stats
> 
> Remove stats from pg_stat_bgwriter which are now more clearly expressed
> in pg_stat_buffers.
> 
> TODO:
> - make pg_stat_checkpointer view and move relevant stats into it
> - add additional stats to pg_stat_bgwriter

When do you think it makes sense to tackle these wrt committing some of the
patches?


> diff --git a/src/backend/storage/buffer/bufmgr.c b/src/backend/storage/buffer/bufmgr.c
> index 6926fc5742..67447f997a 100644
> --- a/src/backend/storage/buffer/bufmgr.c
> +++ b/src/backend/storage/buffer/bufmgr.c
> @@ -2164,7 +2164,6 @@ BufferSync(int flags)
>              if (SyncOneBuffer(buf_id, false, &wb_context) & BUF_WRITTEN)
>              {
>                  TRACE_POSTGRESQL_BUFFER_SYNC_WRITTEN(buf_id);
> -                PendingCheckpointerStats.m_buf_written_checkpoints++;
>                  num_written++;
>              }
>          }
> @@ -2273,9 +2272,6 @@ BgBufferSync(WritebackContext *wb_context)
>       */
>      strategy_buf_id = StrategySyncStart(&strategy_passes, &recent_alloc);
>  
> -    /* Report buffer alloc counts to pgstat */
> -    PendingBgWriterStats.m_buf_alloc += recent_alloc;
> -
>      /*
>       * If we're not running the LRU scan, just stop after doing the stats
>       * stuff.  We mark the saved state invalid so that we can recover sanely
> @@ -2472,8 +2468,6 @@ BgBufferSync(WritebackContext *wb_context)
>              reusable_buffers++;
>      }
>  
> -    PendingBgWriterStats.m_buf_written_clean += num_written;
> -

Isn't num_written unused now, unless tracepoints are enabled? I'd expect some
compilers to warn... Perhaps we should just remove information from the
tracepoint?


Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: port conflicts when running tests concurrently on windows.
Следующее
От: John Naylor
Дата:
Сообщение: do only critical work during single-user vacuum?