Обсуждение: SLRU statistics

Поиск
Список
Период
Сортировка

SLRU statistics

От
Tomas Vondra
Дата:
Hi,

One of the stats I occasionally wanted to know are stats for the SLRU
stats (we have couple of those - clog, subtrans, ...). So here is a WIP
version of a patch adding that.

The implementation is fairly simple - the slru code updates counters in
local memory, and then sends them to the collector at the end of the
transaction (similarly to table/func stats). The collector stores it
similarly to global stats. And the collected stats are accessible
through pg_stat_slru.

The main issue is that we have multiple SLRU caches, and it seems handy
to have separate stats for each. OTOH the number of SLRU stats is not
fixed, so e.g. extensions might define their own SLRU caches. But
handing dynamic number of SLRU caches seems a bit hard (we'd need to
assign some sort of unique IDs etc.) so what I did was define a fixed
number of SLRU types

     typedef enum SlruType
     {
         SLRU_CLOG,
         SLRU_COMMIT_TS,
         SLRU_MULTIXACT_OFFSET,
         SLRU_MULTIXACT_MEMBER,
         SLRU_SUBTRANS,
         SLRU_ASYNC,
         SLRU_OLDSERXID,
         SLRU_OTHER
     } SlruType;

with one group of counters for each group. The last type (SLRU_OTHER) is
used to store stats for all SLRUs that are not predefined. It wouldn't
be that difficult to store dynamic number of SLRUs, but I'm not sure how
to solve issues with identifying SLRUs etc. And there are probably very
few extensions adding custom SLRU anyway.

The one thing missing from the patch is a way to reset the SLRU stats,
similarly to how we can reset bgwriter stats.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

RE: SLRU statistics

От
"tsunakawa.takay@fujitsu.com"
Дата:
From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
> One of the stats I occasionally wanted to know are stats for the SLRU
> stats (we have couple of those - clog, subtrans, ...). So here is a WIP
> version of a patch adding that.

How can users take advantage of this information?  I think we also need the ability to set the size of SLRU buffers.
(Iwant to be freed from the concern about the buffer shortage by setting the buffer size to its maximum.  For example,
CLOGwould be only 1 GB.) 


Regards
Takayuki Tsunakawa






Re: SLRU statistics

От
Tomas Vondra
Дата:
On Mon, Jan 20, 2020 at 01:04:33AM +0000, tsunakawa.takay@fujitsu.com wrote:
>From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
>> One of the stats I occasionally wanted to know are stats for the SLRU
>> stats (we have couple of those - clog, subtrans, ...). So here is a WIP
>> version of a patch adding that.
>
>How can users take advantage of this information?  I think we also need
>the ability to set the size of SLRU buffers.  (I want to be freed from
>the concern about the buffer shortage by setting the buffer size to its
>maximum.  For example, CLOG would be only 1 GB.)
>

You're right the users can't really take advantage of this - my primary
motivation was providing a feedback for devs, benchmarking etc. That
might have been done with DEBUG messages or something, but this seems
more convenient.

I think it's unclear how desirable / necessary it is to allow users to
tweak those caches. I don't think we should have a GUC for everything,
but maybe there's some sort of heuristics to determine the size. The
assumption is we actually find practical workloads where the size of
these SLRUs is a performance issue.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SLRU statistics

От
Alvaro Herrera
Дата:
On 2020-Jan-20, Tomas Vondra wrote:

> On Mon, Jan 20, 2020 at 01:04:33AM +0000, tsunakawa.takay@fujitsu.com wrote:
> > From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
> > > One of the stats I occasionally wanted to know are stats for the SLRU
> > > stats (we have couple of those - clog, subtrans, ...). So here is a WIP
> > > version of a patch adding that.
> > 
> > How can users take advantage of this information?  I think we also need
> > the ability to set the size of SLRU buffers.  (I want to be freed from
> > the concern about the buffer shortage by setting the buffer size to its
> > maximum.  For example, CLOG would be only 1 GB.)
> 
> You're right the users can't really take advantage of this - my primary
> motivation was providing a feedback for devs, benchmarking etc. That
> might have been done with DEBUG messages or something, but this seems
> more convenient.

I think the stats are definitely needed if we keep the current code.
I've researched some specific problems in this code, such as the need
for more subtrans SLRU buffers; IIRC it was pretty painful to figure out
what the problem was without counters, and it'd have been trivial with
them.

> I think it's unclear how desirable / necessary it is to allow users to
> tweak those caches. I don't think we should have a GUC for everything,
> but maybe there's some sort of heuristics to determine the size. The
> assumption is we actually find practical workloads where the size of
> these SLRUs is a performance issue.

I expect we'll eventually realize the need for changes in this area.
Either configurability in the buffer pool sizes, or moving them to be
part of shared_buffers (IIRC Thomas Munro had a patch for this.)
Example: SLRUs like pg_commit and pg_subtrans have higher buffer
consumption as the range of open transactions increases; for many users
this is not a concern and they can live with the default values.

(I think when pg_commit (née pg_clog) buffers were increased, we should
have increased pg_subtrans buffers to match.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SLRU statistics

От
Tomas Vondra
Дата:
On Mon, Jan 20, 2020 at 03:01:36PM -0300, Alvaro Herrera wrote:
>On 2020-Jan-20, Tomas Vondra wrote:
>
>> On Mon, Jan 20, 2020 at 01:04:33AM +0000, tsunakawa.takay@fujitsu.com wrote:
>> > From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
>> > > One of the stats I occasionally wanted to know are stats for the SLRU
>> > > stats (we have couple of those - clog, subtrans, ...). So here is a WIP
>> > > version of a patch adding that.
>> >
>> > How can users take advantage of this information?  I think we also need
>> > the ability to set the size of SLRU buffers.  (I want to be freed from
>> > the concern about the buffer shortage by setting the buffer size to its
>> > maximum.  For example, CLOG would be only 1 GB.)
>>
>> You're right the users can't really take advantage of this - my primary
>> motivation was providing a feedback for devs, benchmarking etc. That
>> might have been done with DEBUG messages or something, but this seems
>> more convenient.
>
>I think the stats are definitely needed if we keep the current code.
>I've researched some specific problems in this code, such as the need
>for more subtrans SLRU buffers; IIRC it was pretty painful to figure out
>what the problem was without counters, and it'd have been trivial with
>them.
>

Right. Improving our ability to monitor/measure things is the goal of
this patch.

>> I think it's unclear how desirable / necessary it is to allow users to
>> tweak those caches. I don't think we should have a GUC for everything,
>> but maybe there's some sort of heuristics to determine the size. The
>> assumption is we actually find practical workloads where the size of
>> these SLRUs is a performance issue.
>
>I expect we'll eventually realize the need for changes in this area.
>Either configurability in the buffer pool sizes, or moving them to be
>part of shared_buffers (IIRC Thomas Munro had a patch for this.)
>Example: SLRUs like pg_commit and pg_subtrans have higher buffer
>consumption as the range of open transactions increases; for many users
>this is not a concern and they can live with the default values.
>
>(I think when pg_commit (née pg_clog) buffers were increased, we should
>have increased pg_subtrans buffers to match.)
>

Quite possibly, yes. All I'm saying is that it's not something I intend
to address with this patch. It's quite possible the solutions will be
different for each SLRU, and that will require more research.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



RE: SLRU statistics

От
"tsunakawa.takay@fujitsu.com"
Дата:
From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
> You're right the users can't really take advantage of this - my primary
> motivation was providing a feedback for devs, benchmarking etc. That
> might have been done with DEBUG messages or something, but this seems
> more convenient.

Understood.  I'm in favor of adding performance information even if it doesn't make sense for users (like other DBMSs
sometimesdo.)  One concern is that all the PostgreSQL performance statistics have been useful so far for tuning in some
way,and this may become the first exception.  Do we describe the SLRU stats view in the manual, or hide it only for PG
devsand support staff? 


> I think it's unclear how desirable / necessary it is to allow users to
> tweak those caches. I don't think we should have a GUC for everything,
> but maybe there's some sort of heuristics to determine the size. The
> assumption is we actually find practical workloads where the size of
> these SLRUs is a performance issue.

I understood that the new performance statistics are expected to reveal what SLRUs need to be tunable and/or
implementedwith a different mechanism like shared buffers. 


Regards
Takayuki Tsunakawa






Re: SLRU statistics

От
Masahiko Sawada
Дата:
On Tue, 21 Jan 2020 at 01:38, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>
> On Mon, Jan 20, 2020 at 01:04:33AM +0000, tsunakawa.takay@fujitsu.com wrote:
> >From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
> >> One of the stats I occasionally wanted to know are stats for the SLRU
> >> stats (we have couple of those - clog, subtrans, ...). So here is a WIP
> >> version of a patch adding that.

+1

> >
> >How can users take advantage of this information?  I think we also need
> >the ability to set the size of SLRU buffers.  (I want to be freed from
> >the concern about the buffer shortage by setting the buffer size to its
> >maximum.  For example, CLOG would be only 1 GB.)
> >
>
> You're right the users can't really take advantage of this - my primary
> motivation was providing a feedback for devs, benchmarking etc. That
> might have been done with DEBUG messages or something, but this seems
> more convenient.

I've not tested the performance impact but perhaps we might want to
disable these counter by default and controlled by a GUC. And similar
to buffer statistics it might be better to inline
pgstat_count_slru_page_xxx function for better performance.

Regards,

-- 
Masahiko Sawada            http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SLRU statistics

От
Tomas Vondra
Дата:
On Tue, Jan 21, 2020 at 05:09:33PM +0900, Masahiko Sawada wrote:
>On Tue, 21 Jan 2020 at 01:38, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
>>
>> On Mon, Jan 20, 2020 at 01:04:33AM +0000, tsunakawa.takay@fujitsu.com wrote:
>> >From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
>> >> One of the stats I occasionally wanted to know are stats for the SLRU
>> >> stats (we have couple of those - clog, subtrans, ...). So here is a WIP
>> >> version of a patch adding that.
>
>+1
>
>> >
>> >How can users take advantage of this information?  I think we also need
>> >the ability to set the size of SLRU buffers.  (I want to be freed from
>> >the concern about the buffer shortage by setting the buffer size to its
>> >maximum.  For example, CLOG would be only 1 GB.)
>> >
>>
>> You're right the users can't really take advantage of this - my primary
>> motivation was providing a feedback for devs, benchmarking etc. That
>> might have been done with DEBUG messages or something, but this seems
>> more convenient.
>
>I've not tested the performance impact but perhaps we might want to
>disable these counter by default and controlled by a GUC. And similar
>to buffer statistics it might be better to inline
>pgstat_count_slru_page_xxx function for better performance.
>

Hmmm, yeah. Inlining seems like a good idea, and maybe we should have
something like track_slru GUC.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SLRU statistics

От
Tomas Vondra
Дата:
On Tue, Jan 21, 2020 at 06:24:29AM +0000, tsunakawa.takay@fujitsu.com
wrote:
>From: Tomas Vondra <tomas.vondra@2ndquadrant.com>
>> You're right the users can't really take advantage of this - my
>> primary motivation was providing a feedback for devs, benchmarking
>> etc. That might have been done with DEBUG messages or something, but
>> this seems more convenient.
>
>Understood.  I'm in favor of adding performance information even if it
>doesn't make sense for users (like other DBMSs sometimes do.)  One
>concern is that all the PostgreSQL performance statistics have been
>useful so far for tuning in some way, and this may become the first
>exception.  Do we describe the SLRU stats view in the manual, or hide
>it only for PG devs and support staff?
>

Yes, the pg_stat_slru view should be described in a manual. That's
missing from the patch.

>
>> I think it's unclear how desirable / necessary it is to allow users
>> to tweak those caches. I don't think we should have a GUC for
>> everything, but maybe there's some sort of heuristics to determine
>> the size. The assumption is we actually find practical workloads
>> where the size of these SLRUs is a performance issue.
>
>I understood that the new performance statistics are expected to reveal
>what SLRUs need to be tunable and/or implemented with a different
>mechanism like shared buffers.
>

Right. It's certainly meant to provide information for further tuning.
I'm just saying it's targeted more at developers, at least initially.
Maybe we'll end up with GUCs, maybe we'll choose other approaches for
some SLRUs. I don't have an opinion on that yet.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SLRU statistics

От
Alvaro Herrera
Дата:
On 2020-Jan-21, Tomas Vondra wrote:

> On Tue, Jan 21, 2020 at 05:09:33PM +0900, Masahiko Sawada wrote:

> > I've not tested the performance impact but perhaps we might want to
> > disable these counter by default and controlled by a GUC. And similar
> > to buffer statistics it might be better to inline
> > pgstat_count_slru_page_xxx function for better performance.
> 
> Hmmm, yeah. Inlining seems like a good idea, and maybe we should have
> something like track_slru GUC.

I disagree with adding a GUC.  If a performance impact can be measured
let's turn the functions to static inline, as already proposed.  My
guess is that pgstat_count_slru_page_hit() is the only candidate for
that; all the other paths involve I/O or lock acquisition or even WAL
generation, so the impact won't be measurable anyhow.  We removed
track-enabling GUCs years ago.

BTW, this comment:
            /* update the stats counter of pages found in shared buffers */

is not strictly true, because we don't use what we normally call "shared
buffers"  for SLRUs.

Patch applies cleanly.  I suggest to move the page_miss() call until
after SlruRecentlyUsed(), for consistency with the other case.

I find SlruType pretty odd, and the accompanying "if" list in
pg_stat_get_slru() correspondingly so.  Would it be possible to have
each SLRU enumerate itself somehow?  Maybe add the name in SlruCtlData
and query that, somehow.  (I don't think we have an array of SlruCtlData
anywhere though, so this might be a useless idea.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SLRU statistics

От
Tomas Vondra
Дата:
On Fri, Feb 28, 2020 at 08:19:18PM -0300, Alvaro Herrera wrote:
>On 2020-Jan-21, Tomas Vondra wrote:
>
>> On Tue, Jan 21, 2020 at 05:09:33PM +0900, Masahiko Sawada wrote:
>
>> > I've not tested the performance impact but perhaps we might want to
>> > disable these counter by default and controlled by a GUC. And similar
>> > to buffer statistics it might be better to inline
>> > pgstat_count_slru_page_xxx function for better performance.
>>
>> Hmmm, yeah. Inlining seems like a good idea, and maybe we should have
>> something like track_slru GUC.
>
>I disagree with adding a GUC.  If a performance impact can be measured
>let's turn the functions to static inline, as already proposed.  My
>guess is that pgstat_count_slru_page_hit() is the only candidate for
>that; all the other paths involve I/O or lock acquisition or even WAL
>generation, so the impact won't be measurable anyhow.  We removed
>track-enabling GUCs years ago.
>

Did we actually remove track-enabling GUCs? I think we still have

  - track_activities
  - track_counts
  - track_io_timing
  - track_functions

But maybe I'm missing something?

That being said, I'm not sure we need to add a GUC. I'll do some
measurements and we'll see. Maybe the statis inline will me enough.

>BTW, this comment:
>            /* update the stats counter of pages found in shared buffers */
>
>is not strictly true, because we don't use what we normally call "shared
>buffers"  for SLRUs.
>

Oh, right. Will fix.

>Patch applies cleanly.  I suggest to move the page_miss() call until
>after SlruRecentlyUsed(), for consistency with the other case.
>

OK.

>I find SlruType pretty odd, and the accompanying "if" list in
>pg_stat_get_slru() correspondingly so.  Would it be possible to have
>each SLRU enumerate itself somehow?  Maybe add the name in SlruCtlData
>and query that, somehow.  (I don't think we have an array of SlruCtlData
>anywhere though, so this might be a useless idea.)
>

Well, maybe. We could have a system to register SLRUs dynamically, but
the trick here is that by having a fixed predefined number of SLRUs
simplifies serialization in pgstat.c and so on. I don't think the "if"
branches in pg_stat_get_slru() are particularly ugly, but maybe we could
replace the enum with a registry of structs, something like rmgrlist.h.
It seems like an overkill to me, though.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SLRU statistics

От
Alvaro Herrera
Дата:
On 2020-Feb-29, Tomas Vondra wrote:

> Did we actually remove track-enabling GUCs? I think we still have
> 
>  - track_activities
>  - track_counts
>  - track_io_timing
>  - track_functions
> 
> But maybe I'm missing something?

Hm I remembered we removed the one for row-level stats
(track_row_stats), but what we really did is merge it with block-level
stats (track_block_stats) into track_counts -- commit 48f7e6439568. 
Funnily enough, if you disable that autovacuum won't work, so I'm not
sure it's a very useful tunable.  And it definitely has more overhead
than what this new GUC would have.

> > I find SlruType pretty odd, and the accompanying "if" list in
> > pg_stat_get_slru() correspondingly so.  Would it be possible to have
> > each SLRU enumerate itself somehow?  Maybe add the name in SlruCtlData
> > and query that, somehow.  (I don't think we have an array of SlruCtlData
> > anywhere though, so this might be a useless idea.)
> 
> Well, maybe. We could have a system to register SLRUs dynamically, but
> the trick here is that by having a fixed predefined number of SLRUs
> simplifies serialization in pgstat.c and so on. I don't think the "if"
> branches in pg_stat_get_slru() are particularly ugly, but maybe we could
> replace the enum with a registry of structs, something like rmgrlist.h.
> It seems like an overkill to me, though.

Yeah, maybe we don't have to fix that now.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SLRU statistics

От
Tomas Vondra
Дата:
On Sat, Feb 29, 2020 at 11:44:26AM -0300, Alvaro Herrera wrote:
>On 2020-Feb-29, Tomas Vondra wrote:
>
>> Did we actually remove track-enabling GUCs? I think we still have
>>
>>  - track_activities
>>  - track_counts
>>  - track_io_timing
>>  - track_functions
>>
>> But maybe I'm missing something?
>
>Hm I remembered we removed the one for row-level stats
>(track_row_stats), but what we really did is merge it with block-level
>stats (track_block_stats) into track_counts -- commit 48f7e6439568.
>Funnily enough, if you disable that autovacuum won't work, so I'm not
>sure it's a very useful tunable.  And it definitely has more overhead
>than what this new GUC would have.
>

OK

>> > I find SlruType pretty odd, and the accompanying "if" list in
>> > pg_stat_get_slru() correspondingly so.  Would it be possible to have
>> > each SLRU enumerate itself somehow?  Maybe add the name in SlruCtlData
>> > and query that, somehow.  (I don't think we have an array of SlruCtlData
>> > anywhere though, so this might be a useless idea.)
>>
>> Well, maybe. We could have a system to register SLRUs dynamically, but
>> the trick here is that by having a fixed predefined number of SLRUs
>> simplifies serialization in pgstat.c and so on. I don't think the "if"
>> branches in pg_stat_get_slru() are particularly ugly, but maybe we could
>> replace the enum with a registry of structs, something like rmgrlist.h.
>> It seems like an overkill to me, though.
>
>Yeah, maybe we don't have to fix that now.
>

IMO the current solution is sufficient for the purpose. I guess we could
just stick a name into the SlruCtlData (and remove SlruType entirely),
and use that to identify the stats entries. That might be enough, and in
fact we already have that - SimpleLruInit gets a name parameter and
copies that to the lwlock_tranche_name.

One of the main reasons why I opted to use the enum is that it makes
tracking, lookup and serialization pretty trivial - it's just an index
lookup, etc. But maybe it wouldn't be much more complex with the name, 
considering the name length is limited by SLRU_MAX_NAME_LENGTH. And we
probably don't expect many entries, so we could keep them in a simple
list, or maybe a simplehash.

I'm not sure what to do with data for SLRUs that might have disappeared
after a restart (e.g. because someone removed an extension). Until now
those would be in the all in the "other" entry.


The attached v2 fixes the issues in your first message:

- I moved the page_miss() call after SlruRecentlyUsed(), but then I
   realized it's entirely duplicate with the page_read() update done in
   SlruPhysicalReadPage(). I removed the call from SlruPhysicalReadPage()
   and renamed page_miss to page_read - that's more consistent with
   shared buffers stats, which also have buffers_hit and buffer_read.

- I've also implemented the reset. I ended up adding a new option to
   pg_stat_reset_shared, which always resets all SLRU entries. We track
   the reset timestamp for each SLRU entry, but the value is always the
   same. I admit this is a bit weird - I did it like this because (a) I'm
   not sure how to identify the individual entries and (b) the SLRU is
   shared, so pg_stat_reset_shared seems kinda natural.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: SLRU statistics

От
Tomas Vondra
Дата:
Hi,

Attached is v3 of the patch with one big change and various small ones.

The main change is that it gets rid of the SlruType enum and the new
field in SlruCtlData. Instead, the patch now uses the name passed to
SimpleLruInit (which is then stored as LWLock tranche name).

The counters are still stored in a fixed-sized array, and there's a
simple name/index mapping. We don't have any registry of stable SLRU
IDs, so I can't think of anything better, and I think this is good
enough for now.

The other change is that I got rid of the io_error counter. We don't
have that for shared buffers etc. either, anyway.

I've also renamed the colunms from "pages" to "blks" to make it
consistent with other similar stats (blks_hit, blks_read). I've
renamed the fields to "blks_written" and "blks_zeroed".

And finally, I've added the view to monitoring.sgml.

Barring objections, I'll get this committed in the next few days, after
reviewing the comments a bit.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: SLRU statistics

От
Tomas Vondra
Дата:
Hi,

here is a bit improved version of the patch - I've been annoyed by how
the resetting works (per-entry timestamp, but resetting all entries) so
I've added a new function pg_stat_reset_slru() that allows resetting
either all entries or just one entry (identified by name). So

     SELECT pg_stat_reset_slru('clog');

resets just "clog" SLRU counters, while

     SELECT pg_stat_reset_slru(NULL);

resets all entries.

I've also done a bit of benchmarking, to see if this has measurable
impact (in which case it might deserve a new GUC), and I think it's not
measurable. I've used a tiny unlogged table (single row).

     CREATE UNLOGGED TABLE t (a int);
     INSERT INTO t VALUES (1);

and then short pgbench runs with a single client, updatint the row. I've
been unable to measure any regression, it's all well within 1% so noise.
But perhaps there's some other benchmark that I should do?


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: SLRU statistics

От
Tomas Vondra
Дата:
Hi,

I've pushed this after some minor cleanup and improvements.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



RE: SLRU statistics

От
"Shinoda, Noriyoshi (PN Japan A&PS Delivery)"
Дата:
Hi,

Thank you for developing great features.
The attached patch is a small fix to the committed documentation for the data type name of blks_hit column.

Best regards,
Noriyoshi Shinoda

-----Original Message-----
From: Tomas Vondra [mailto:tomas.vondra@2ndquadrant.com]
Sent: Thursday, April 2, 2020 9:42 AM
To: Alvaro Herrera <alvherre@2ndquadrant.com>
Cc: Masahiko Sawada <masahiko.sawada@2ndquadrant.com>; tsunakawa.takay@fujitsu.com; pgsql-hackers@postgresql.org
Subject: Re: SLRU statistics

Hi,

I've pushed this after some minor cleanup and improvements.


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Вложения

Re: SLRU statistics

От
Tomas Vondra
Дата:
On Thu, Apr 02, 2020 at 02:04:10AM +0000, Shinoda, Noriyoshi (PN Japan A&PS Delivery) wrote:
>Hi,
>
>Thank you for developing great features.
>The attached patch is a small fix to the committed documentation for the data type name of blks_hit column.
>

Thank you for the patch, pushed.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SLRU statistics

От
Kuntal Ghosh
Дата:
Hello Tomas,

On Thu, Apr 2, 2020 at 5:59 PM Tomas Vondra
<tomas.vondra@2ndquadrant.com> wrote:
> Thank you for the patch, pushed.
>
In SimpleLruReadPage_ReadOnly, we first try to find the SLRU page in
shared buffer under shared lock, then conditionally visit
SimpleLruReadPage if reading is necessary. IMHO, we should update
hit_count if we can find the buffer in SimpleLruReadPage_ReadOnly
directly. Am I missing something?

Attached a patch for the same.

-- 
Thanks & Regards,
Kuntal Ghosh
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: SLRU statistics

От
Tomas Vondra
Дата:
On Tue, Apr 07, 2020 at 05:01:37PM +0530, Kuntal Ghosh wrote:
>Hello Tomas,
>
>On Thu, Apr 2, 2020 at 5:59 PM Tomas Vondra
><tomas.vondra@2ndquadrant.com> wrote:
>> Thank you for the patch, pushed.
>>
>In SimpleLruReadPage_ReadOnly, we first try to find the SLRU page in
>shared buffer under shared lock, then conditionally visit
>SimpleLruReadPage if reading is necessary. IMHO, we should update
>hit_count if we can find the buffer in SimpleLruReadPage_ReadOnly
>directly. Am I missing something?
>
>Attached a patch for the same.
>

Yes, I think that's correct - without this we fail to account for
(possibly) a quite significant number of hits. Thanks for the report,
I'll get this pushed later today.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: SLRU statistics

От
Fujii Masao
Дата:

On 2020/04/02 9:41, Tomas Vondra wrote:
> Hi,
> 
> I've pushed this after some minor cleanup and improvements.

+static char *slru_names[] = {"async", "clog", "commit_timestamp",
+                              "multixact_offset", "multixact_member",
+                              "oldserxid", "pg_xact", "subtrans",
+                              "other" /* has to be last */};

When I tried pg_stat_slru, I found that it returns a row for "pg_xact".
But since there is no "pg_xact" slru ("clog" slru exists instead),
"pg_xact" should be removed? Patch attached.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Вложения

Re: SLRU statistics

От
Tomas Vondra
Дата:
On Fri, May 01, 2020 at 03:02:59AM +0900, Fujii Masao wrote:
>
>
>On 2020/04/02 9:41, Tomas Vondra wrote:
>>Hi,
>>
>>I've pushed this after some minor cleanup and improvements.
>
>+static char *slru_names[] = {"async", "clog", "commit_timestamp",
>+                              "multixact_offset", "multixact_member",
>+                              "oldserxid", "pg_xact", "subtrans",
>+                              "other" /* has to be last */};
>
>When I tried pg_stat_slru, I found that it returns a row for "pg_xact".
>But since there is no "pg_xact" slru ("clog" slru exists instead),
>"pg_xact" should be removed? Patch attached.
>

Yeah, I think I got confused and accidentally added both "clog" and
"pg_xact". I'll get "pg_xact" removed.

>Regards,
>
>-- 
>Fujii Masao
>Advanced Computing Technology Center
>Research and Development Headquarters
>NTT DATA CORPORATION

>diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
>index 6562cc400b..ba6d8d2123 100644
>--- a/doc/src/sgml/monitoring.sgml
>+++ b/doc/src/sgml/monitoring.sgml
>@@ -3483,7 +3483,7 @@ SELECT pid, wait_event_type, wait_event FROM pg_stat_activity WHERE wait_event i
>        predefined list (<literal>async</literal>, <literal>clog</literal>,
>        <literal>commit_timestamp</literal>, <literal>multixact_offset</literal>,
>        <literal>multixact_member</literal>, <literal>oldserxid</literal>,
>-       <literal>pg_xact</literal>, <literal>subtrans</literal> and
>+       <literal>subtrans</literal> and
>        <literal>other</literal>) resets counters for only that entry.
>        Names not included in this list are treated as <literal>other</literal>.
>       </entry>
>diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
>index 50eea2e8a8..2ba3858d31 100644
>--- a/src/backend/postmaster/pgstat.c
>+++ b/src/backend/postmaster/pgstat.c
>@@ -152,7 +152,7 @@ PgStat_MsgBgWriter BgWriterStats;
>  */
> static char *slru_names[] = {"async", "clog", "commit_timestamp",
>                               "multixact_offset", "multixact_member",
>-                              "oldserxid", "pg_xact", "subtrans",
>+                              "oldserxid", "subtrans",
>                               "other" /* has to be last */};
>
> /* number of elemenents of slru_name array */


-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SLRU statistics

От
Fujii Masao
Дата:

On 2020/05/01 3:19, Tomas Vondra wrote:
> On Fri, May 01, 2020 at 03:02:59AM +0900, Fujii Masao wrote:
>>
>>
>> On 2020/04/02 9:41, Tomas Vondra wrote:
>>> Hi,
>>>
>>> I've pushed this after some minor cleanup and improvements.
>>
>> +static char *slru_names[] = {"async", "clog", "commit_timestamp",
>> +                              "multixact_offset", "multixact_member",
>> +                              "oldserxid", "pg_xact", "subtrans",
>> +                              "other" /* has to be last */};
>>
>> When I tried pg_stat_slru, I found that it returns a row for "pg_xact".
>> But since there is no "pg_xact" slru ("clog" slru exists instead),
>> "pg_xact" should be removed? Patch attached.
>>
> 
> Yeah, I think I got confused and accidentally added both "clog" and
> "pg_xact". I'll get "pg_xact" removed.

Thanks!

Another thing I found is; pgstat_send_slru() should be called also by
other processes than backend? For example, since clog data is flushed
basically by checkpointer, checkpointer seems to need to send slru stats.
Otherwise, pg_stat_slru.flushes would not be updated.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: SLRU statistics

От
Tomas Vondra
Дата:
On Fri, May 01, 2020 at 11:49:51AM +0900, Fujii Masao wrote:
>
>
>On 2020/05/01 3:19, Tomas Vondra wrote:
>>On Fri, May 01, 2020 at 03:02:59AM +0900, Fujii Masao wrote:
>>>
>>>
>>>On 2020/04/02 9:41, Tomas Vondra wrote:
>>>>Hi,
>>>>
>>>>I've pushed this after some minor cleanup and improvements.
>>>
>>>+static char *slru_names[] = {"async", "clog", "commit_timestamp",
>>>+                              "multixact_offset", "multixact_member",
>>>+                              "oldserxid", "pg_xact", "subtrans",
>>>+                              "other" /* has to be last */};
>>>
>>>When I tried pg_stat_slru, I found that it returns a row for "pg_xact".
>>>But since there is no "pg_xact" slru ("clog" slru exists instead),
>>>"pg_xact" should be removed? Patch attached.
>>>
>>
>>Yeah, I think I got confused and accidentally added both "clog" and
>>"pg_xact". I'll get "pg_xact" removed.
>
>Thanks!
>

OK, pushed. Thanks!

>Another thing I found is; pgstat_send_slru() should be called also by
>other processes than backend? For example, since clog data is flushed
>basically by checkpointer, checkpointer seems to need to send slru stats.
>Otherwise, pg_stat_slru.flushes would not be updated.
>

Hmmm, that's a good point. If I understand the issue correctly, the
checkpointer accumulates the stats but never really sends them because
it never calls pgstat_report_stat/pgstat_send_slru. That's only called
from PostgresMain, but not from CheckpointerMain.

I think we could simply add pgstat_send_slru() right after the existing
call in CheckpointerMain, right?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SLRU statistics

От
Fujii Masao
Дата:

On 2020/05/02 9:08, Tomas Vondra wrote:
> On Fri, May 01, 2020 at 11:49:51AM +0900, Fujii Masao wrote:
>>
>>
>> On 2020/05/01 3:19, Tomas Vondra wrote:
>>> On Fri, May 01, 2020 at 03:02:59AM +0900, Fujii Masao wrote:
>>>>
>>>>
>>>> On 2020/04/02 9:41, Tomas Vondra wrote:
>>>>> Hi,
>>>>>
>>>>> I've pushed this after some minor cleanup and improvements.
>>>>
>>>> +static char *slru_names[] = {"async", "clog", "commit_timestamp",
>>>> +                              "multixact_offset", "multixact_member",
>>>> +                              "oldserxid", "pg_xact", "subtrans",
>>>> +                              "other" /* has to be last */};
>>>>
>>>> When I tried pg_stat_slru, I found that it returns a row for "pg_xact".
>>>> But since there is no "pg_xact" slru ("clog" slru exists instead),
>>>> "pg_xact" should be removed? Patch attached.
>>>>
>>>
>>> Yeah, I think I got confused and accidentally added both "clog" and
>>> "pg_xact". I'll get "pg_xact" removed.
>>
>> Thanks!
>>
> 
> OK, pushed. Thanks!

Thanks a lot!

But, like the patch that I attached in the previous email does,
"pg_xact" should be removed from the description of pg_stat_reset_slru()
in monitoring.sgml.

>> Another thing I found is; pgstat_send_slru() should be called also by
>> other processes than backend? For example, since clog data is flushed
>> basically by checkpointer, checkpointer seems to need to send slru stats.
>> Otherwise, pg_stat_slru.flushes would not be updated.
>>
> 
> Hmmm, that's a good point. If I understand the issue correctly, the
> checkpointer accumulates the stats but never really sends them because
> it never calls pgstat_report_stat/pgstat_send_slru. That's only called
> from PostgresMain, but not from CheckpointerMain.

Yes.

> I think we could simply add pgstat_send_slru() right after the existing
> call in CheckpointerMain, right?

Checkpointer sends off activity statistics to the stats collector in
two places, by calling pgstat_send_bgwriter(). What about calling
pgstat_send_slru() just after pgstat_send_bgwriter()?

In previous email, I mentioned checkpointer just as an example.
So probably we need to investigate what process should send slru stats,
other than checkpointer. I guess that at least autovacuum worker,
logical replication walsender and parallel query worker (maybe this has
been already covered by parallel query some mechanisms. Sorry I've
not checked that) would need to send its slru stats.

Atsushi-san reported another issue in pg_stat_slru.
You're planning to work on that?
https://postgr.es/m/CACZ0uYFe16pjZxQYaTn53mspyM7dgMPYL3DJLjjPw69GMCC2Ow@mail.gmail.com

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: SLRU statistics

От
Tomas Vondra
Дата:
On Sat, May 02, 2020 at 03:56:07PM +0900, Fujii Masao wrote:
>
>
>On 2020/05/02 9:08, Tomas Vondra wrote:
>>On Fri, May 01, 2020 at 11:49:51AM +0900, Fujii Masao wrote:
>>>
>>>
>>>On 2020/05/01 3:19, Tomas Vondra wrote:
>>>>On Fri, May 01, 2020 at 03:02:59AM +0900, Fujii Masao wrote:
>>>>>
>>>>>
>>>>>On 2020/04/02 9:41, Tomas Vondra wrote:
>>>>>>Hi,
>>>>>>
>>>>>>I've pushed this after some minor cleanup and improvements.
>>>>>
>>>>>+static char *slru_names[] = {"async", "clog", "commit_timestamp",
>>>>>+                              "multixact_offset", "multixact_member",
>>>>>+                              "oldserxid", "pg_xact", "subtrans",
>>>>>+                              "other" /* has to be last */};
>>>>>
>>>>>When I tried pg_stat_slru, I found that it returns a row for "pg_xact".
>>>>>But since there is no "pg_xact" slru ("clog" slru exists instead),
>>>>>"pg_xact" should be removed? Patch attached.
>>>>>
>>>>
>>>>Yeah, I think I got confused and accidentally added both "clog" and
>>>>"pg_xact". I'll get "pg_xact" removed.
>>>
>>>Thanks!
>>>
>>
>>OK, pushed. Thanks!
>
>Thanks a lot!
>
>But, like the patch that I attached in the previous email does,
>"pg_xact" should be removed from the description of pg_stat_reset_slru()
>in monitoring.sgml.
>

Whooops. My bad, will fix.

>>>Another thing I found is; pgstat_send_slru() should be called also by
>>>other processes than backend? For example, since clog data is flushed
>>>basically by checkpointer, checkpointer seems to need to send slru stats.
>>>Otherwise, pg_stat_slru.flushes would not be updated.
>>>
>>
>>Hmmm, that's a good point. If I understand the issue correctly, the
>>checkpointer accumulates the stats but never really sends them because
>>it never calls pgstat_report_stat/pgstat_send_slru. That's only called
>>from PostgresMain, but not from CheckpointerMain.
>
>Yes.
>
>>I think we could simply add pgstat_send_slru() right after the existing
>>call in CheckpointerMain, right?
>
>Checkpointer sends off activity statistics to the stats collector in
>two places, by calling pgstat_send_bgwriter(). What about calling
>pgstat_send_slru() just after pgstat_send_bgwriter()?
>

Yep, that's what I proposed.

>In previous email, I mentioned checkpointer just as an example.
>So probably we need to investigate what process should send slru stats,
>other than checkpointer. I guess that at least autovacuum worker,
>logical replication walsender and parallel query worker (maybe this has
>been already covered by parallel query some mechanisms. Sorry I've
>not checked that) would need to send its slru stats.
>

Probably. Do you have any other process type in mind?

>Atsushi-san reported another issue in pg_stat_slru.
>You're planning to work on that?
>https://postgr.es/m/CACZ0uYFe16pjZxQYaTn53mspyM7dgMPYL3DJLjjPw69GMCC2Ow@mail.gmail.com
>

Yes, I'll investigate.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SLRU statistics

От
Tomas Vondra
Дата:
On Sat, May 02, 2020 at 12:55:00PM +0200, Tomas Vondra wrote:
>On Sat, May 02, 2020 at 03:56:07PM +0900, Fujii Masao wrote:
>>
>> ...
>
>>>>Another thing I found is; pgstat_send_slru() should be called also by
>>>>other processes than backend? For example, since clog data is flushed
>>>>basically by checkpointer, checkpointer seems to need to send slru stats.
>>>>Otherwise, pg_stat_slru.flushes would not be updated.
>>>>
>>>
>>>Hmmm, that's a good point. If I understand the issue correctly, the
>>>checkpointer accumulates the stats but never really sends them because
>>>it never calls pgstat_report_stat/pgstat_send_slru. That's only called
>>>from PostgresMain, but not from CheckpointerMain.
>>
>>Yes.
>>
>>>I think we could simply add pgstat_send_slru() right after the existing
>>>call in CheckpointerMain, right?
>>
>>Checkpointer sends off activity statistics to the stats collector in
>>two places, by calling pgstat_send_bgwriter(). What about calling
>>pgstat_send_slru() just after pgstat_send_bgwriter()?
>>
>
>Yep, that's what I proposed.
>
>>In previous email, I mentioned checkpointer just as an example.
>>So probably we need to investigate what process should send slru stats,
>>other than checkpointer. I guess that at least autovacuum worker,
>>logical replication walsender and parallel query worker (maybe this has
>>been already covered by parallel query some mechanisms. Sorry I've
>>not checked that) would need to send its slru stats.
>>
>
>Probably. Do you have any other process type in mind?
>

I've looked at places calling pgstat_send_* functions, and I found
thsese places:

src/backend/postmaster/bgwriter.c

- AFAIK it merely writes out dirty shared buffers, so likely irrelevant.

src/backend/postmaster/checkpointer.c

- This is what we're already discussing here.

src/backend/postmaster/pgarch.c

- Seems irrelevant.


I'm a bit puzzled why we're not sending any stats from walsender, which
I suppose could do various stuff during logical decoding.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SLRU statistics

От
Fujii Masao
Дата:

On 2020/05/03 1:59, Tomas Vondra wrote:
> On Sat, May 02, 2020 at 12:55:00PM +0200, Tomas Vondra wrote:
>> On Sat, May 02, 2020 at 03:56:07PM +0900, Fujii Masao wrote:
>>>
>>> ...
>>
>>>>> Another thing I found is; pgstat_send_slru() should be called also by
>>>>> other processes than backend? For example, since clog data is flushed
>>>>> basically by checkpointer, checkpointer seems to need to send slru stats.
>>>>> Otherwise, pg_stat_slru.flushes would not be updated.
>>>>>
>>>>
>>>> Hmmm, that's a good point. If I understand the issue correctly, the
>>>> checkpointer accumulates the stats but never really sends them because
>>>> it never calls pgstat_report_stat/pgstat_send_slru. That's only called
>>>> from PostgresMain, but not from CheckpointerMain.
>>>
>>> Yes.
>>>
>>>> I think we could simply add pgstat_send_slru() right after the existing
>>>> call in CheckpointerMain, right?
>>>
>>> Checkpointer sends off activity statistics to the stats collector in
>>> two places, by calling pgstat_send_bgwriter(). What about calling
>>> pgstat_send_slru() just after pgstat_send_bgwriter()?
>>>
>>
>> Yep, that's what I proposed.
>>
>>> In previous email, I mentioned checkpointer just as an example.
>>> So probably we need to investigate what process should send slru stats,
>>> other than checkpointer. I guess that at least autovacuum worker,
>>> logical replication walsender and parallel query worker (maybe this has
>>> been already covered by parallel query some mechanisms. Sorry I've
>>> not checked that) would need to send its slru stats.
>>>
>>
>> Probably. Do you have any other process type in mind?

No. For now what I'm in mind are just checkpointer, autovacuum worker,
logical replication walsender and parallel query worker. Seems logical
replication worker and syncer have sent slru stats via pgstat_report_stat().

> I've looked at places calling pgstat_send_* functions, and I found
> thsese places:
> 
> src/backend/postmaster/bgwriter.c
> 
> - AFAIK it merely writes out dirty shared buffers, so likely irrelevant.
> 
> src/backend/postmaster/checkpointer.c
> 
> - This is what we're already discussing here.
> 
> src/backend/postmaster/pgarch.c
> 
> - Seems irrelevant.
> 
> 
> I'm a bit puzzled why we're not sending any stats from walsender, which
> I suppose could do various stuff during logical decoding.

Not sure why, but that seems an oversight...


Also I found another minor issue; SLRUStats has not been initialized to 0
and which could update the counters unexpectedly. Attached patch fixes
this issue.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Вложения

Re: SLRU statistics

От
Fujii Masao
Дата:

On 2020/05/07 13:47, Fujii Masao wrote:
> 
> 
> On 2020/05/03 1:59, Tomas Vondra wrote:
>> On Sat, May 02, 2020 at 12:55:00PM +0200, Tomas Vondra wrote:
>>> On Sat, May 02, 2020 at 03:56:07PM +0900, Fujii Masao wrote:
>>>>
>>>> ...
>>>
>>>>>> Another thing I found is; pgstat_send_slru() should be called also by
>>>>>> other processes than backend? For example, since clog data is flushed
>>>>>> basically by checkpointer, checkpointer seems to need to send slru stats.
>>>>>> Otherwise, pg_stat_slru.flushes would not be updated.
>>>>>>
>>>>>
>>>>> Hmmm, that's a good point. If I understand the issue correctly, the
>>>>> checkpointer accumulates the stats but never really sends them because
>>>>> it never calls pgstat_report_stat/pgstat_send_slru. That's only called
>>>>> from PostgresMain, but not from CheckpointerMain.
>>>>
>>>> Yes.
>>>>
>>>>> I think we could simply add pgstat_send_slru() right after the existing
>>>>> call in CheckpointerMain, right?
>>>>
>>>> Checkpointer sends off activity statistics to the stats collector in
>>>> two places, by calling pgstat_send_bgwriter(). What about calling
>>>> pgstat_send_slru() just after pgstat_send_bgwriter()?
>>>>
>>>
>>> Yep, that's what I proposed.
>>>
>>>> In previous email, I mentioned checkpointer just as an example.
>>>> So probably we need to investigate what process should send slru stats,
>>>> other than checkpointer. I guess that at least autovacuum worker,
>>>> logical replication walsender and parallel query worker (maybe this has
>>>> been already covered by parallel query some mechanisms. Sorry I've
>>>> not checked that) would need to send its slru stats.
>>>>
>>>
>>> Probably. Do you have any other process type in mind?
> 
> No. For now what I'm in mind are just checkpointer, autovacuum worker,
> logical replication walsender and parallel query worker. Seems logical
> replication worker and syncer have sent slru stats via pgstat_report_stat().
> 
>> I've looked at places calling pgstat_send_* functions, and I found
>> thsese places:
>>
>> src/backend/postmaster/bgwriter.c
>>
>> - AFAIK it merely writes out dirty shared buffers, so likely irrelevant.
>>
>> src/backend/postmaster/checkpointer.c
>>
>> - This is what we're already discussing here.
>>
>> src/backend/postmaster/pgarch.c
>>
>> - Seems irrelevant.
>>
>>
>> I'm a bit puzzled why we're not sending any stats from walsender, which
>> I suppose could do various stuff during logical decoding.
> 
> Not sure why, but that seems an oversight...
> 
> 
> Also I found another minor issue; SLRUStats has not been initialized to 0
> and which could update the counters unexpectedly. Attached patch fixes
> this issue.

This is minor issue, but basically it's better to fix that before
v13 beta1 release. So barring any objection, I will commit the patch.

+        values[8] = Int64GetDatum(stat.stat_reset_timestamp);

Also I found another small issue: pg_stat_get_slru() returns the timestamp
when pg_stat_slru was reset by using Int64GetDatum(). This works maybe
because the timestamp is also int64. But TimestampTzGetDatum() should
be used here, instead. Patch attached. Thought?

Regards,


-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION

Вложения

Re: SLRU statistics

От
Tomas Vondra
Дата:
On Wed, May 13, 2020 at 04:10:30PM +0900, Fujii Masao wrote:
>
>
>On 2020/05/07 13:47, Fujii Masao wrote:
>>
>>
>>On 2020/05/03 1:59, Tomas Vondra wrote:
>>>On Sat, May 02, 2020 at 12:55:00PM +0200, Tomas Vondra wrote:
>>>>On Sat, May 02, 2020 at 03:56:07PM +0900, Fujii Masao wrote:
>>>>>
>>>>>...
>>>>
>>>>>>>Another thing I found is; pgstat_send_slru() should be called also by
>>>>>>>other processes than backend? For example, since clog data is flushed
>>>>>>>basically by checkpointer, checkpointer seems to need to send slru stats.
>>>>>>>Otherwise, pg_stat_slru.flushes would not be updated.
>>>>>>>
>>>>>>
>>>>>>Hmmm, that's a good point. If I understand the issue correctly, the
>>>>>>checkpointer accumulates the stats but never really sends them because
>>>>>>it never calls pgstat_report_stat/pgstat_send_slru. That's only called
>>>>>>from PostgresMain, but not from CheckpointerMain.
>>>>>
>>>>>Yes.
>>>>>
>>>>>>I think we could simply add pgstat_send_slru() right after the existing
>>>>>>call in CheckpointerMain, right?
>>>>>
>>>>>Checkpointer sends off activity statistics to the stats collector in
>>>>>two places, by calling pgstat_send_bgwriter(). What about calling
>>>>>pgstat_send_slru() just after pgstat_send_bgwriter()?
>>>>>
>>>>
>>>>Yep, that's what I proposed.
>>>>
>>>>>In previous email, I mentioned checkpointer just as an example.
>>>>>So probably we need to investigate what process should send slru stats,
>>>>>other than checkpointer. I guess that at least autovacuum worker,
>>>>>logical replication walsender and parallel query worker (maybe this has
>>>>>been already covered by parallel query some mechanisms. Sorry I've
>>>>>not checked that) would need to send its slru stats.
>>>>>
>>>>
>>>>Probably. Do you have any other process type in mind?
>>
>>No. For now what I'm in mind are just checkpointer, autovacuum worker,
>>logical replication walsender and parallel query worker. Seems logical
>>replication worker and syncer have sent slru stats via pgstat_report_stat().
>>
>>>I've looked at places calling pgstat_send_* functions, and I found
>>>thsese places:
>>>
>>>src/backend/postmaster/bgwriter.c
>>>
>>>- AFAIK it merely writes out dirty shared buffers, so likely irrelevant.
>>>
>>>src/backend/postmaster/checkpointer.c
>>>
>>>- This is what we're already discussing here.
>>>
>>>src/backend/postmaster/pgarch.c
>>>
>>>- Seems irrelevant.
>>>
>>>
>>>I'm a bit puzzled why we're not sending any stats from walsender, which
>>>I suppose could do various stuff during logical decoding.
>>
>>Not sure why, but that seems an oversight...
>>
>>
>>Also I found another minor issue; SLRUStats has not been initialized to 0
>>and which could update the counters unexpectedly. Attached patch fixes
>>this issue.
>
>This is minor issue, but basically it's better to fix that before
>v13 beta1 release. So barring any objection, I will commit the patch.
>
>+        values[8] = Int64GetDatum(stat.stat_reset_timestamp);
>
>Also I found another small issue: pg_stat_get_slru() returns the timestamp
>when pg_stat_slru was reset by using Int64GetDatum(). This works maybe
>because the timestamp is also int64. But TimestampTzGetDatum() should
>be used here, instead. Patch attached. Thought?
>

I agree with both fixes.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



Re: SLRU statistics

От
Fujii Masao
Дата:

On 2020/05/13 17:21, Tomas Vondra wrote:
> On Wed, May 13, 2020 at 04:10:30PM +0900, Fujii Masao wrote:
>>
>>
>> On 2020/05/07 13:47, Fujii Masao wrote:
>>>
>>>
>>> On 2020/05/03 1:59, Tomas Vondra wrote:
>>>> On Sat, May 02, 2020 at 12:55:00PM +0200, Tomas Vondra wrote:
>>>>> On Sat, May 02, 2020 at 03:56:07PM +0900, Fujii Masao wrote:
>>>>>>
>>>>>> ...
>>>>>
>>>>>>>> Another thing I found is; pgstat_send_slru() should be called also by
>>>>>>>> other processes than backend? For example, since clog data is flushed
>>>>>>>> basically by checkpointer, checkpointer seems to need to send slru stats.
>>>>>>>> Otherwise, pg_stat_slru.flushes would not be updated.
>>>>>>>>
>>>>>>>
>>>>>>> Hmmm, that's a good point. If I understand the issue correctly, the
>>>>>>> checkpointer accumulates the stats but never really sends them because
>>>>>>> it never calls pgstat_report_stat/pgstat_send_slru. That's only called
>>>>>>> from PostgresMain, but not from CheckpointerMain.
>>>>>>
>>>>>> Yes.
>>>>>>
>>>>>>> I think we could simply add pgstat_send_slru() right after the existing
>>>>>>> call in CheckpointerMain, right?
>>>>>>
>>>>>> Checkpointer sends off activity statistics to the stats collector in
>>>>>> two places, by calling pgstat_send_bgwriter(). What about calling
>>>>>> pgstat_send_slru() just after pgstat_send_bgwriter()?
>>>>>>
>>>>>
>>>>> Yep, that's what I proposed.
>>>>>
>>>>>> In previous email, I mentioned checkpointer just as an example.
>>>>>> So probably we need to investigate what process should send slru stats,
>>>>>> other than checkpointer. I guess that at least autovacuum worker,
>>>>>> logical replication walsender and parallel query worker (maybe this has
>>>>>> been already covered by parallel query some mechanisms. Sorry I've
>>>>>> not checked that) would need to send its slru stats.
>>>>>>
>>>>>
>>>>> Probably. Do you have any other process type in mind?
>>>
>>> No. For now what I'm in mind are just checkpointer, autovacuum worker,
>>> logical replication walsender and parallel query worker. Seems logical
>>> replication worker and syncer have sent slru stats via pgstat_report_stat().
>>>
>>>> I've looked at places calling pgstat_send_* functions, and I found
>>>> thsese places:
>>>>
>>>> src/backend/postmaster/bgwriter.c
>>>>
>>>> - AFAIK it merely writes out dirty shared buffers, so likely irrelevant.
>>>>
>>>> src/backend/postmaster/checkpointer.c
>>>>
>>>> - This is what we're already discussing here.
>>>>
>>>> src/backend/postmaster/pgarch.c
>>>>
>>>> - Seems irrelevant.
>>>>
>>>>
>>>> I'm a bit puzzled why we're not sending any stats from walsender, which
>>>> I suppose could do various stuff during logical decoding.
>>>
>>> Not sure why, but that seems an oversight...
>>>
>>>
>>> Also I found another minor issue; SLRUStats has not been initialized to 0
>>> and which could update the counters unexpectedly. Attached patch fixes
>>> this issue.
>>
>> This is minor issue, but basically it's better to fix that before
>> v13 beta1 release. So barring any objection, I will commit the patch.
>>
>> +        values[8] = Int64GetDatum(stat.stat_reset_timestamp);
>>
>> Also I found another small issue: pg_stat_get_slru() returns the timestamp
>> when pg_stat_slru was reset by using Int64GetDatum(). This works maybe
>> because the timestamp is also int64. But TimestampTzGetDatum() should
>> be used here, instead. Patch attached. Thought?
>>
> 
> I agree with both fixes.

Pushed both. Thanks!

Regards,


-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: SLRU statistics

От
Tom Lane
Дата:
Fujii Masao <masao.fujii@oss.nttdata.com> writes:
> On 2020/05/13 17:21, Tomas Vondra wrote:
>> On Wed, May 13, 2020 at 04:10:30PM +0900, Fujii Masao wrote:
>>> Also I found another minor issue; SLRUStats has not been initialized to 0
>>> and which could update the counters unexpectedly. Attached patch fixes
>>> this issue.

> Pushed both. Thanks!

Why is that necessary?  A static variable is defined by C to start off
as zeroes.

            regards, tom lane



Re: SLRU statistics

От
Fujii Masao
Дата:

On 2020/05/13 23:26, Tom Lane wrote:
> Fujii Masao <masao.fujii@oss.nttdata.com> writes:
>> On 2020/05/13 17:21, Tomas Vondra wrote:
>>> On Wed, May 13, 2020 at 04:10:30PM +0900, Fujii Masao wrote:
>>>> Also I found another minor issue; SLRUStats has not been initialized to 0
>>>> and which could update the counters unexpectedly. Attached patch fixes
>>>> this issue.
> 
>> Pushed both. Thanks!
> 
> Why is that necessary?  A static variable is defined by C to start off
> as zeroes.

Because SLRUStats is not a static variable. No?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: SLRU statistics

От
Tomas Vondra
Дата:
On Wed, May 13, 2020 at 10:26:39AM -0400, Tom Lane wrote:
>Fujii Masao <masao.fujii@oss.nttdata.com> writes:
>> On 2020/05/13 17:21, Tomas Vondra wrote:
>>> On Wed, May 13, 2020 at 04:10:30PM +0900, Fujii Masao wrote:
>>>> Also I found another minor issue; SLRUStats has not been initialized to 0
>>>> and which could update the counters unexpectedly. Attached patch fixes
>>>> this issue.
>
>> Pushed both. Thanks!
>
>Why is that necessary?  A static variable is defined by C to start off
>as zeroes.
>

But is it a static variable? It's not declared as 'static' but maybe we
can assume it inits to zeroes anyway? I see we do that for
BgWriterStats.



regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SLRU statistics

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> On Wed, May 13, 2020 at 10:26:39AM -0400, Tom Lane wrote:
>> Why is that necessary?  A static variable is defined by C to start off
>> as zeroes.

> But is it a static variable? It's not declared as 'static' but maybe we
> can assume it inits to zeroes anyway? I see we do that for
> BgWriterStats.

Sorry, by "static" I meant "statically allocated", not "private to
this module".  I'm sure the C standard has some more precise terminology
for this distinction, but I forget what it is.

            regards, tom lane



Re: SLRU statistics

От
Tomas Vondra
Дата:
On Wed, May 13, 2020 at 11:01:47AM -0400, Tom Lane wrote:
>Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> On Wed, May 13, 2020 at 10:26:39AM -0400, Tom Lane wrote:
>>> Why is that necessary?  A static variable is defined by C to start off
>>> as zeroes.
>
>> But is it a static variable? It's not declared as 'static' but maybe we
>> can assume it inits to zeroes anyway? I see we do that for
>> BgWriterStats.
>
>Sorry, by "static" I meant "statically allocated", not "private to
>this module".  I'm sure the C standard has some more precise terminology
>for this distinction, but I forget what it is.
>

Ah, I see. I'm no expert in reading C standard (or any other standard),
but a quick google search yielded this section of C99 standard:

-------------------------------------------------------------------------
If an object that has static storage duration is not initialized
explicitly, then:

- if it has pointer type, it is initialized to a null pointer;

- if it has arithmetic type, it is initialized to (positive or unsigned)
   zero;

- if it is an aggregate, every member is initialized (recursively)
   according to these rules;

- if it is au nion, the first named member is initialized (recursively)
   according to these rules
-------------------------------------------------------------------------

I assume the SLRU variable counts as aggregate, with members having
arithmetic types. In which case it really should be initialized to 0.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SLRU statistics

От
Tomas Vondra
Дата:
On Wed, May 13, 2020 at 11:46:30PM +0900, Fujii Masao wrote:
>
>
>On 2020/05/13 23:26, Tom Lane wrote:
>>Fujii Masao <masao.fujii@oss.nttdata.com> writes:
>>>On 2020/05/13 17:21, Tomas Vondra wrote:
>>>>On Wed, May 13, 2020 at 04:10:30PM +0900, Fujii Masao wrote:
>>>>>Also I found another minor issue; SLRUStats has not been initialized to 0
>>>>>and which could update the counters unexpectedly. Attached patch fixes
>>>>>this issue.
>>
>>>Pushed both. Thanks!
>>
>>Why is that necessary?  A static variable is defined by C to start off
>>as zeroes.
>
>Because SLRUStats is not a static variable. No?
>

I think it counts as a variable with "static storage duration" per 6.7.8
(para 10), see [1]. I wasn't aware of this either, but it probably means
the memset is unnecessary.

Also, it seems a bit strange/confusing to handle this differently from
BgWriterStats. And that worked fine without the init for years ...


[1] http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SLRU statistics

От
Tom Lane
Дата:
Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
> I think it counts as a variable with "static storage duration" per 6.7.8
> (para 10), see [1]. I wasn't aware of this either, but it probably means
> the memset is unnecessary.
> Also, it seems a bit strange/confusing to handle this differently from
> BgWriterStats. And that worked fine without the init for years ...

Yeah, exactly.

There might be merit in memsetting it if we thought that it could have
become nonzero in the postmaster during a previous shmem cycle-of-life.
But the postmaster really shouldn't be accumulating such counts; and
if it is, then we have a bigger problem, because child processes would
be inheriting those counts via fork.

I think this change is unnecessary and should be reverted to avoid
future confusion about whether somehow it is necessary.

            regards, tom lane



Re: SLRU statistics

От
Fujii Masao
Дата:

On 2020/05/14 0:38, Tom Lane wrote:
> Tomas Vondra <tomas.vondra@2ndquadrant.com> writes:
>> I think it counts as a variable with "static storage duration" per 6.7.8
>> (para 10), see [1]. I wasn't aware of this either, but it probably means
>> the memset is unnecessary.
>> Also, it seems a bit strange/confusing to handle this differently from
>> BgWriterStats. And that worked fine without the init for years ...
> 
> Yeah, exactly.
> 
> There might be merit in memsetting it if we thought that it could have
> become nonzero in the postmaster during a previous shmem cycle-of-life.
> But the postmaster really shouldn't be accumulating such counts; and
> if it is, then we have a bigger problem, because child processes would
> be inheriting those counts via fork.

In my previous test, I thought I observed that the counters are already
updated at the beginning of some processes. So I thought that
the counters need to be initialized. Sorry, that's my fault...

So I tried the similar test again and found that postmaster seems to be
able to increment the counters unless I'm missing something.
For example,

     frame #2: 0x000000010d93845f postgres`pgstat_count_slru_page_zeroed(ctl=0x000000010de27320) at pgstat.c:6739:2
     frame #3: 0x000000010d5922ba postgres`SimpleLruZeroPage(ctl=0x000000010de27320, pageno=0) at slru.c:290:2
     frame #4: 0x000000010d6b9ae2 postgres`AsyncShmemInit at async.c:568:12
     frame #5: 0x000000010d9da9a6 postgres`CreateSharedMemoryAndSemaphores at ipci.c:265:2
     frame #6: 0x000000010d93f679 postgres`reset_shared at postmaster.c:2664:2
     frame #7: 0x000000010d93d253 postgres`PostmasterMain(argc=3, argv=0x00007fad56402e00) at postmaster.c:1008:2

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: SLRU statistics

От
Alvaro Herrera
Дата:
On 2020-May-14, Fujii Masao wrote:

> So I tried the similar test again and found that postmaster seems to be
> able to increment the counters unless I'm missing something.
> For example,
> 
>     frame #2: 0x000000010d93845f postgres`pgstat_count_slru_page_zeroed(ctl=0x000000010de27320) at pgstat.c:6739:2
>     frame #3: 0x000000010d5922ba postgres`SimpleLruZeroPage(ctl=0x000000010de27320, pageno=0) at slru.c:290:2
>     frame #4: 0x000000010d6b9ae2 postgres`AsyncShmemInit at async.c:568:12
>     frame #5: 0x000000010d9da9a6 postgres`CreateSharedMemoryAndSemaphores at ipci.c:265:2
>     frame #6: 0x000000010d93f679 postgres`reset_shared at postmaster.c:2664:2
>     frame #7: 0x000000010d93d253 postgres`PostmasterMain(argc=3, argv=0x00007fad56402e00) at postmaster.c:1008:2

Umm.  I have the feeling that we'd rather avoid these updates in
postmaster, per our general rule that postmaster should not touch shared
memory.  However, it might be that it's okay in this case, as it only
happens just as shmem is being "created", so other processes have not
yet had any time to mess things up.  (IIRC only the Async module is
doing that.)

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: SLRU statistics

От
Fujii Masao
Дата:

On 2020/05/14 1:14, Alvaro Herrera wrote:
> On 2020-May-14, Fujii Masao wrote:
> 
>> So I tried the similar test again and found that postmaster seems to be
>> able to increment the counters unless I'm missing something.
>> For example,
>>
>>      frame #2: 0x000000010d93845f postgres`pgstat_count_slru_page_zeroed(ctl=0x000000010de27320) at pgstat.c:6739:2
>>      frame #3: 0x000000010d5922ba postgres`SimpleLruZeroPage(ctl=0x000000010de27320, pageno=0) at slru.c:290:2
>>      frame #4: 0x000000010d6b9ae2 postgres`AsyncShmemInit at async.c:568:12
>>      frame #5: 0x000000010d9da9a6 postgres`CreateSharedMemoryAndSemaphores at ipci.c:265:2
>>      frame #6: 0x000000010d93f679 postgres`reset_shared at postmaster.c:2664:2
>>      frame #7: 0x000000010d93d253 postgres`PostmasterMain(argc=3, argv=0x00007fad56402e00) at postmaster.c:1008:2
> 
> Umm.  I have the feeling that we'd rather avoid these updates in
> postmaster, per our general rule that postmaster should not touch shared
> memory.  However, it might be that it's okay in this case, as it only
> happens just as shmem is being "created", so other processes have not
> yet had any time to mess things up.

But since the counter that postmaster incremented is propagated to
child processes via fork, it should be zeroed at postmaster or the
beginning of child process? Otherwise that counter always starts
with non-zero in child process.

> (IIRC only the Async module is
> doing that.)

Yes, as far as I do the test.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: SLRU statistics

От
Tom Lane
Дата:
Fujii Masao <masao.fujii@oss.nttdata.com> writes:
> But since the counter that postmaster incremented is propagated to
> child processes via fork, it should be zeroed at postmaster or the
> beginning of child process? Otherwise that counter always starts
> with non-zero in child process.

Yes, if the postmaster is incrementing these counts then we would
have to reset them at the start of each child process.  I share
Alvaro's feeling that that's bad and we don't want to do it.

>> (IIRC only the Async module is doing that.)

Hm, maybe we can fix that.

            regards, tom lane



Re: SLRU statistics

От
Tom Lane
Дата:
I wrote:
>>> (IIRC only the Async module is doing that.)

> Hm, maybe we can fix that.

Yeah, it's quite easy to make async.c postpone its first write to the
async SLRU.  This seems like a win all around, because many installations
don't use NOTIFY and so will never need to do that work at all.  In
installations that do use notify, this costs an extra instruction or
two per NOTIFY, but that's down in the noise.

I got through check-world with the assertion shown that we are not
counting any SLRU operations in the postmaster.  Don't know if we
want to commit that or not --- any thoughts?

            regards, tom lane

diff --git a/src/backend/commands/async.c b/src/backend/commands/async.c
index 0c9d20e..6ecea01 100644
--- a/src/backend/commands/async.c
+++ b/src/backend/commands/async.c
@@ -200,7 +200,10 @@ typedef struct QueuePosition
     } while (0)
 
 #define QUEUE_POS_EQUAL(x,y) \
-     ((x).page == (y).page && (x).offset == (y).offset)
+    ((x).page == (y).page && (x).offset == (y).offset)
+
+#define QUEUE_POS_IS_ZERO(x) \
+    ((x).page == 0 && (x).offset == 0)
 
 /* choose logically smaller QueuePosition */
 #define QUEUE_POS_MIN(x,y) \
@@ -515,7 +518,6 @@ void
 AsyncShmemInit(void)
 {
     bool        found;
-    int            slotno;
     Size        size;
 
     /*
@@ -562,13 +564,6 @@ AsyncShmemInit(void)
          * During start or reboot, clean out the pg_notify directory.
          */
         (void) SlruScanDirectory(AsyncCtl, SlruScanDirCbDeleteAll, NULL);
-
-        /* Now initialize page zero to empty */
-        LWLockAcquire(AsyncCtlLock, LW_EXCLUSIVE);
-        slotno = SimpleLruZeroPage(AsyncCtl, QUEUE_POS_PAGE(QUEUE_HEAD));
-        /* This write is just to verify that pg_notify/ is writable */
-        SimpleLruWritePage(AsyncCtl, slotno);
-        LWLockRelease(AsyncCtlLock);
     }
 }
 
@@ -1470,9 +1465,18 @@ asyncQueueAddEntries(ListCell *nextNotify)
      */
     queue_head = QUEUE_HEAD;
 
-    /* Fetch the current page */
+    /*
+     * If this is the first write since the postmaster started, we need to
+     * initialize the first page of the async SLRU.  Otherwise, the current
+     * page should be initialized already, so just fetch it.
+     */
     pageno = QUEUE_POS_PAGE(queue_head);
-    slotno = SimpleLruReadPage(AsyncCtl, pageno, true, InvalidTransactionId);
+    if (QUEUE_POS_IS_ZERO(queue_head))
+        slotno = SimpleLruZeroPage(AsyncCtl, pageno);
+    else
+        slotno = SimpleLruReadPage(AsyncCtl, pageno, true,
+                                   InvalidTransactionId);
+
     /* Note we mark the page dirty before writing in it */
     AsyncCtl->shared->page_dirty[slotno] = true;
 
diff --git a/src/backend/postmaster/pgstat.c b/src/backend/postmaster/pgstat.c
index 80a06e5..e3c808b 100644
--- a/src/backend/postmaster/pgstat.c
+++ b/src/backend/postmaster/pgstat.c
@@ -6729,6 +6729,8 @@ slru_entry(int slru_idx)
 {
     Assert((slru_idx >= 0) && (slru_idx < SLRU_NUM_ELEMENTS));
 
+    Assert(IsUnderPostmaster || !IsPostmasterEnvironment);
+
     return &SLRUStats[slru_idx];
 }


Re: SLRU statistics

От
Ranier Vilela
Дата:
Em qua., 13 de mai. de 2020 às 11:46, Fujii Masao <masao.fujii@oss.nttdata.com> escreveu:


On 2020/05/13 23:26, Tom Lane wrote:
> Fujii Masao <masao.fujii@oss.nttdata.com> writes:
>> On 2020/05/13 17:21, Tomas Vondra wrote:
>>> On Wed, May 13, 2020 at 04:10:30PM +0900, Fujii Masao wrote:
>>>> Also I found another minor issue; SLRUStats has not been initialized to 0
>>>> and which could update the counters unexpectedly. Attached patch fixes
>>>> this issue.
>
>> Pushed both. Thanks!
>
> Why is that necessary?  A static variable is defined by C to start off
> as zeroes.

Because SLRUStats is not a static variable. No?
IMHO, BgWriterStats  have the same problem, shouldn't the same be done?

/* Initialize BgWriterStats to zero */
MemSet(&BgWriterStats, 0, sizeof(BgWriterStats));

/* Initialize SLRU statistics to zero */
memset(&SLRUStats, 0, sizeof(SLRUStats));

regards,
Ranier Vilela

Re: SLRU statistics

От
Fujii Masao
Дата:

On 2020/05/14 2:44, Tom Lane wrote:
> I wrote:
>>>> (IIRC only the Async module is doing that.)
> 
>> Hm, maybe we can fix that.
> 
> Yeah, it's quite easy to make async.c postpone its first write to the
> async SLRU.  This seems like a win all around, because many installations
> don't use NOTIFY and so will never need to do that work at all.  In
> installations that do use notify, this costs an extra instruction or
> two per NOTIFY, but that's down in the noise.

Looks good to me. Thanks for the patch!

> I got through check-world with the assertion shown that we are not
> counting any SLRU operations in the postmaster.  Don't know if we
> want to commit that or not --- any thoughts?

+1 to add this assertion because basically it's not good thing
to access to SLRU at postmaster and we may want to fix that if found.
At least if we get rid of the SLRUStats initialization code,
IMO it's better to add this assertion and ensure that postmaster
doesn't update the SLRU stats counters.

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: SLRU statistics

От
Tom Lane
Дата:
Fujii Masao <masao.fujii@oss.nttdata.com> writes:
> On 2020/05/14 2:44, Tom Lane wrote:
>> I got through check-world with the assertion shown that we are not
>> counting any SLRU operations in the postmaster.  Don't know if we
>> want to commit that or not --- any thoughts?

> +1 to add this assertion because basically it's not good thing
> to access to SLRU at postmaster and we may want to fix that if found.
> At least if we get rid of the SLRUStats initialization code,
> IMO it's better to add this assertion and ensure that postmaster
> doesn't update the SLRU stats counters.

Seems reasonable --- I'll include it.

It might be nice to have similar assertions protecting BgWriterStats.
But given that we've made that public to be hacked on directly by several
different modules, I'm not sure that there's any simple way to do that.

            regards, tom lane



Re: SLRU statistics

От
Fujii Masao
Дата:

On 2020/05/07 13:47, Fujii Masao wrote:
> 
> 
> On 2020/05/03 1:59, Tomas Vondra wrote:
>> On Sat, May 02, 2020 at 12:55:00PM +0200, Tomas Vondra wrote:
>>> On Sat, May 02, 2020 at 03:56:07PM +0900, Fujii Masao wrote:
>>>>
>>>> ...
>>>
>>>>>> Another thing I found is; pgstat_send_slru() should be called also by
>>>>>> other processes than backend? For example, since clog data is flushed
>>>>>> basically by checkpointer, checkpointer seems to need to send slru stats.
>>>>>> Otherwise, pg_stat_slru.flushes would not be updated.
>>>>>>
>>>>>
>>>>> Hmmm, that's a good point. If I understand the issue correctly, the
>>>>> checkpointer accumulates the stats but never really sends them because
>>>>> it never calls pgstat_report_stat/pgstat_send_slru. That's only called
>>>>> from PostgresMain, but not from CheckpointerMain.
>>>>
>>>> Yes.
>>>>
>>>>> I think we could simply add pgstat_send_slru() right after the existing
>>>>> call in CheckpointerMain, right?
>>>>
>>>> Checkpointer sends off activity statistics to the stats collector in
>>>> two places, by calling pgstat_send_bgwriter(). What about calling
>>>> pgstat_send_slru() just after pgstat_send_bgwriter()?
>>>>
>>>
>>> Yep, that's what I proposed.
>>>
>>>> In previous email, I mentioned checkpointer just as an example.
>>>> So probably we need to investigate what process should send slru stats,
>>>> other than checkpointer. I guess that at least autovacuum worker,
>>>> logical replication walsender and parallel query worker (maybe this has
>>>> been already covered by parallel query some mechanisms. Sorry I've
>>>> not checked that) would need to send its slru stats.
>>>>
>>>
>>> Probably. Do you have any other process type in mind?
> 
> No. For now what I'm in mind are just checkpointer, autovacuum worker,
> logical replication walsender and parallel query worker. Seems logical
> replication worker and syncer have sent slru stats via pgstat_report_stat().

Let me go back to this topic. As far as I read the code again, logical
walsender reports the stats at the exit via pgstat_beshutdown_hook()
process-exit callback. But it doesn't report the stats while it's running.
This is not the problem only for SLRU stats. We would need to consider
how to handle the stats by logical walsender, separately from SLRU stats.

Autovacuum worker reports the stats at the exit via pgstat_beshutdown_hook(),
too. Unlike logical walsender, autovacuum worker is not the process that
basically keeps running during the service. It exits after it does vacuum or
analyze. So it's not bad to report the stats only at the exit, in autovacuum
worker case. There is no need to add extra code for SLRU stats report by
autovacuum worker.

Parallel worker is in the same situation as autovacuum worker. Its lifetime
is basically short and its stats is reported at the exit via
pgstat_beshutdown_hook().

pgstat_beshutdown_hook() reports the stats only when MyDatabaseId is valid.
Checkpointer calls pgstat_beshutdown_hook() at the exit, but doesn't report
the stats because its MyDatabaseId is invalid. Also it doesn't report the SLRU
stats while it's running. As we discussed upthread, we need to make
checkpointer call pgstat_send_slru() just after pgstat_send_bgwriter().

However even if we do this, the stats updated during the last checkpointer's
activity (e.g., shutdown checkpoint) seems not reported because
pgstat_beshutdown_hook() doesn't report the stats in checkpointer case.
Do we need to address this issue? If yes, we would need to change
pgstat_beshutdown_hook() or register another checkpointer-exit callback
that sends the stats. Thought?

Startup process is in the same situation as checkpointer process. It reports
the stats neither at the exit nor whle it's running. But, like logical
walsender, this seems not the problem only for SLRU stats. We would need to
consider how to handle the stats by startup process, separately from SLRU
stats.

Therefore what we can do right now seems to make checkpointer report the SLRU
stats while it's running. Other issues need more time to investigate...
Thought?

Regards,

-- 
Fujii Masao
Advanced Computing Technology Center
Research and Development Headquarters
NTT DATA CORPORATION



Re: SLRU statistics

От
Robert Haas
Дата:
On Thu, May 14, 2020 at 2:27 AM Fujii Masao <masao.fujii@oss.nttdata.com> wrote:
> Therefore what we can do right now seems to make checkpointer report the SLRU
> stats while it's running. Other issues need more time to investigate...
> Thought?

I'm confused by why SLRU statistics are reported by messages sent to
the stats collector rather than by just directly updating shared
memory. For database or table statistics there can be any number of
objects and we can't know in advance how many there will be, so we
can't set aside shared memory for the stats in advance. For SLRUs,
there's no such problem. Just having the individual backends
periodically merge their accumulated backend-local counters into the
shared counters seems like it would be way simpler and more
performant.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: SLRU statistics

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I'm confused by why SLRU statistics are reported by messages sent to
> the stats collector rather than by just directly updating shared
> memory.

It would be better to consider that as an aspect of the WIP stats
collector redesign, rather than inventing a bespoke mechanism for
SLRU stats that's outside the stats collector (and, no doubt,
would have its own set of bugs).  We don't need to invent even more
pathways for this sort of data.

            regards, tom lane