Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)

Поиск
Список
Период
Сортировка
От Magnus Hagander
Тема Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)
Дата
Msg-id CABUevEw=TnHVtoJtD+A1KjPC8RefQGRGJh-LbCgdJ_pDbUw2Zw@mail.gmail.com
обсуждение исходный текст
Ответ на pg_stat_bgwriter.buffers_backend is pretty meaningless (and more?)  (Andres Freund <andres@anarazel.de>)
Ответы Re: pg_stat_bgwriter.buffers_backend is pretty meaningless (andmore?)
Список pgsql-hackers
On Fri, Jan 24, 2020 at 8:52 PM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> Currently pg_stat_bgwriter.buffers_backend is pretty useless to gauge
> whether backends are doing writes they shouldn't do. That's because it
> counts things that are either unavoidably or unlikely doable by other
> parts of the system (checkpointer, bgwriter).
> In particular extending the file can not currently be done by any
> another type of process, yet is counted. When using a buffer access
> strategy it is also very likely that writes have to be done by the
> 'dirtying' backend itself, as the buffer will be reused soon after (when
> not previously in s_b that is).

Yeah. That's quite annoying.


> Additionally pg_stat_bgwriter.buffers_backend also counts writes done by
> autovacuum et al.
>
>
> I think it'd make sense to at least split buffers_backend into
> buffers_backend_extend,
> buffers_backend_write,
> buffers_backend_write_strat
>
> but it could also be worthwhile to expand it into
> buffers_backend_extend,
> buffers_{backend,checkpoint,bgwriter,autovacuum}_write
> buffers_{backend,autovacuum}_write_stat

Given that these are individual global counters, I don't really see
any reason not to expand it to the bigger set of counters. It's easy
enough to add them up together later if needed.


> Possibly by internally, in contrast to SQL level, having just counter
> arrays indexed by backend types.
>
>
> It's also noteworthy that buffers_backend is accounted in an absurd
> manner. One might think that writes are accounted from backend -> shared
> memory or such. But instead it works like this:
>
> 1) backend flushes buffer in bufmgr.c, accounts for backend *write time*
> 2) mdwrite writes and  registers a sync request, which forwards the sync request to checkpointer
> 3) ForwardSyncRequest(), when not called by bgwriter, increments CheckpointerShmem->num_backend_writes
> 4) checkpointer, whenever doing AbsorbSyncRequests(), moves
>    CheckpointerShmem->num_backend_writes to
>    BgWriterStats.m_buf_written_backend (local memory!)
> 5) Occasionally it calls pgstat_send_bgwriter(), which sends the data to
>    pgstat (which bgwriter also does)
> 6) Which then updates the shared memory used by the display functions
>
> Worthwhile to note that backend buffer read/write *time* is accounted
> differently. That's done via pgstat_send_tabstat().
>
>
> I think there's very little excuse for the indirection via checkpointer,
> besides architectually being weird, it actually requires that we
> continue to wake up checkpointer over and over instead of optimizing how
> and when we submit fsync requests.
>
> As far as I can tell we're also simply not accounting at all for writes
> done outside of shared buffers. All writes done directly through
> smgrwrite()/extend() aren't accounted anywhere as far as I can tell.
>
>
> I think we also count things as writes that aren't writes: mdtruncate()
> is AFAICT counted as one backend write for each segment. Which seems
> weird to me.

It's at least slightly weird :) Might it be worth counting truncate
events separately?


> Lastly, I don't understand what the point of sending fixed size stats,
> like the stuff underlying pg_stat_bgwriter, through pgstats IPC. While
> I don't like it's architecture, we obviously need something like pgstat
> to handle variable amounts of stats (database, table level etc
> stats). But that doesn't at all apply to these types of global stats.

That part has annoyed me as well a few times. +1 for just moving that
into a global shared memory. Given that we don't really care about
things being in sync between those different counters *or* if we loose
a bit of data (which the stats collector is designed to do), we could
even do that without a lock?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Dean Rasheed
Дата:
Сообщение: Re: Greatest Common Divisor
Следующее
От: "曾文旌(义从)"
Дата:
Сообщение: Re: [Proposal] Global temporary tables