Обсуждение: Comments on Custom RMGRs

Поиск

Список

Период

Сортировка

Comments on Custom RMGRs

От

Simon Riggs

Дата:

11 мая 2022 г., 14:24:51

New rmgr stuff looks interesting. I've had a detailed look through it
and tried to think about how it might be used in practice.

Spotted a minor comment that needs adjustment for new methods...
[PATCH: rmgr_001.v1.patch]

I notice rm_startup() and rm_cleanup() presume that this only works in
recovery. If recovery is "not needed", there is no way to run anything
at all, which seems wrong because how do we know that? I would prefer
it if rm_startup() and rm_cleanup() were executed in all cases. Only 4
builtin index rmgrs have these anyway, and they are all quick, so I
suggest we run them always. This allows a greater range of startup
behavior for rmgrs.
[PATCH: rmgr_002.v1.patch]

It occurs to me that any use of WAL presumes that Checkpoints exist
and do something useful. However, the custom rmgr interface doesn't
allow you to specify any actions on checkpoint, so ends up being
limited in scope. So I think we also need an rm_checkpoint() call -
which would be a no-op for existing rmgrs.
[PATCH: rmgr_003.v1.patch]

The above turns out to be fairly simple, but extends the API to
something truly flexible.

Please let me know what you think?

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Вложения

Re: Comments on Custom RMGRs

От

Jeff Davis

Дата:

11 мая 2022 г., 16:39:48

On Wed, 2022-05-11 at 15:24 +0100, Simon Riggs wrote:
> [PATCH: rmgr_001.v1.patch]
> 
> [PATCH: rmgr_002.v1.patch]

Thank you. Both of these look like good ideas, and I will commit them
in a few days assuming that nobody else sees a problem.

> It occurs to me that any use of WAL presumes that Checkpoints exist
> and do something useful. However, the custom rmgr interface doesn't
> allow you to specify any actions on checkpoint, so ends up being
> limited in scope. So I think we also need an rm_checkpoint() call -
> which would be a no-op for existing rmgrs.
> [PATCH: rmgr_003.v1.patch]

I also like this idea, but can you describe the intended use case? I
looked through CheckPointGuts() and I'm not sure what else a custom AM
might want to do. Maybe sync special files in a way that's not handled
with RegisterSyncRequest()?

Regards,
    Jeff Davis

Re: Comments on Custom RMGRs

От

Andres Freund

Дата:

12 мая 2022 г., 03:40:10

Hi,

On 2022-05-11 09:39:48 -0700, Jeff Davis wrote:
> On Wed, 2022-05-11 at 15:24 +0100, Simon Riggs wrote:
> > [PATCH: rmgr_001.v1.patch]
> > 
> > [PATCH: rmgr_002.v1.patch]
> 
> Thank you. Both of these look like good ideas, and I will commit them
> in a few days assuming that nobody else sees a problem.

What exactly is the use case here? Without passing in information about
whether recovery will be performed etc, it's not at all clear how callbacks
could something useful?

I don't think we should allocate a bunch of memory contexts to just free them
immediately after?

> > It occurs to me that any use of WAL presumes that Checkpoints exist
> > and do something useful. However, the custom rmgr interface doesn't
> > allow you to specify any actions on checkpoint, so ends up being
> > limited in scope. So I think we also need an rm_checkpoint() call -
> > which would be a no-op for existing rmgrs.
> > [PATCH: rmgr_003.v1.patch]
> 
> I also like this idea, but can you describe the intended use case? I
> looked through CheckPointGuts() and I'm not sure what else a custom AM
> might want to do. Maybe sync special files in a way that's not handled
> with RegisterSyncRequest()?

I'm not happy with the idea of random code being executed in the middle of
CheckPointGuts(), without any documentation of what is legal to do at that
point. To actually be useful we'd likely need multiple calls to such an rmgr
callback, with a parameter where in CheckPointGuts() we are. Right now the
sequencing is explicit in CheckPointGuts(), but with the proposed callback,
that'd not be the case anymore.

Greetings,

Andres Freund

Re: Comments on Custom RMGRs

От

Andres Freund

Дата:

12 мая 2022 г., 23:42:07

Hi,

On 2022-05-12 22:26:51 +0100, Simon Riggs wrote:
> On Thu, 12 May 2022 at 04:40, Andres Freund <andres@anarazel.de> wrote:
> > I'm not happy with the idea of random code being executed in the middle of
> > CheckPointGuts(), without any documentation of what is legal to do at that
> > point.
> 
> The "I'm not happy.." ship has already sailed with pluggable rmgrs.

I don't agree. The ordering within a checkpoint is a lot more fragile than
what you do in an individual redo routine.

> Checkpoints exist for one purpose - as the starting place for recovery.
> 
> Why would we allow pluggable recovery without *also* allowing
> pluggable checkpoints?

Because one can do a lot of stuff with just pluggable WAL records, without
integrating into checkpoints?

Note that I'm *not* against making checkpoint extensible - I just think it
needs a good bit of design work around when the hook is called etc.

I definitely think it's too late in the cycle to add checkpoint extensibility
now.

> > To actually be useful we'd likely need multiple calls to such an rmgr
> > callback, with a parameter where in CheckPointGuts() we are. Right now the
> > sequencing is explicit in CheckPointGuts(), but with the proposed callback,
> > that'd not be the case anymore.
> 
> It is useful without the extra complexity you mention.

Shrug. The documentation work definitely is needed. Perhaps we can get away
without multiple callbacks within a checkpoint, I think it'll become more
apparent when writing information about the precise point in time the
checkpoint callback is called.

> I see multiple uses for the rm_checkpoint() point proposed and I've
> been asked multiple times for a checkpoint hook. Any rmgr that
> services crash recovery for a non-smgr based storage system would need
> this because the current checkpoint code only handles flushing to disk
> for smgr-based approaches. That is orthogonal to other code during
> checkpoint, so it stands alone quite well.

FWIW, for that there are much bigger problems than checkpoint
extensibility. Most importantly there's currently no good way to integrate
relation creation / drop with the commit / abort infrastructure...

Greetings,

Andres Freund

Re: Comments on Custom RMGRs

От

Jeff Davis

Дата:

13 мая 2022 г., 04:13:43

On Thu, 2022-05-12 at 22:26 +0100, Simon Riggs wrote:
> I see multiple uses for the rm_checkpoint() point proposed and I've
> been asked multiple times for a checkpoint hook.

Can you elaborate and/or link to a discussion?

Regards,
    Jeff Davis

Re: Comments on Custom RMGRs

От

Simon Riggs

Дата:

13 мая 2022 г., 12:31:13

On Fri, 13 May 2022 at 05:13, Jeff Davis <pgsql@j-davis.com> wrote:
>
> On Thu, 2022-05-12 at 22:26 +0100, Simon Riggs wrote:
> > I see multiple uses for the rm_checkpoint() point proposed and I've
> > been asked multiple times for a checkpoint hook.
>
> Can you elaborate and/or link to a discussion?

Those were conversations away from Hackers, but I'm happy to share.

The first was a discussion about a data structure needed by BDR about
4 years ago. In the absence of a pluggable checkpoint, the solution
was forced to use a normal table, which wasn't very satisfactory.

The second was a more recent conversation with Mike Stonebraker, at
the end of 2021.. He was very keen to remove the buffer manager
entirely, which requires that we have a new smgr, which then needs new
code to allow it to be written to disk at checkpoint time, which then
requires some kind of pluggable code at checkpoint time. (Mike was
also keen to remove WAL, but that's another story entirely!).

The last use case was unlogged indexes, which need to be read from
disk at startup or rebuilt after crash, which requires RmgrStartup to
work both with and without InRedo=true.

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Comments on Custom RMGRs

От

Simon Riggs

Дата:

13 мая 2022 г., 12:46:58

On Fri, 13 May 2022 at 00:42, Andres Freund <andres@anarazel.de> wrote:
>
> On 2022-05-12 22:26:51 +0100, Simon Riggs wrote:
> > On Thu, 12 May 2022 at 04:40, Andres Freund <andres@anarazel.de> wrote:
> > > I'm not happy with the idea of random code being executed in the middle of
> > > CheckPointGuts(), without any documentation of what is legal to do at that
> > > point.
> >
> > The "I'm not happy.." ship has already sailed with pluggable rmgrs.
>
> I don't agree. The ordering within a checkpoint is a lot more fragile than
> what you do in an individual redo routine.

Example?


> > Checkpoints exist for one purpose - as the starting place for recovery.
> >
> > Why would we allow pluggable recovery without *also* allowing
> > pluggable checkpoints?
>
> Because one can do a lot of stuff with just pluggable WAL records, without
> integrating into checkpoints?
>
> Note that I'm *not* against making checkpoint extensible - I just think it
> needs a good bit of design work around when the hook is called etc.

When was any such work done previously for any other hook?? That isn't needed.

Checkpoints aren't complete until all data structures have
checkpointed, so there are no problems from a partial checkpoint being
written.

As a result, the order of actions in CheckpointGuts() is mostly
independent of each other. The SLRUs are all independent of each
other, as is CheckPointBuffers().

The use cases I'm trying to support aren't tricksy modifications of
existing code, they are just entirely new data structures which are
completely independent of other Postgres objects.


> I definitely think it's too late in the cycle to add checkpoint extensibility
> now.
>
>
> > > To actually be useful we'd likely need multiple calls to such an rmgr
> > > callback, with a parameter where in CheckPointGuts() we are. Right now the
> > > sequencing is explicit in CheckPointGuts(), but with the proposed callback,
> > > that'd not be the case anymore.
> >
> > It is useful without the extra complexity you mention.
>
> Shrug. The documentation work definitely is needed. Perhaps we can get away
> without multiple callbacks within a checkpoint, I think it'll become more
> apparent when writing information about the precise point in time the
> checkpoint callback is called.

You seem to be thinking in terms of modifying the existing actions in
CheckpointGuts(). I don't care about that. Anybody that wishes to do
that can work out the details of their actions.

There is nothing to document, other than "don't do things that won't
work". How can anyone enumerate all the things that wouldn't work??

There is no list of caveats for any other hook. Why is it needed here?

> > I see multiple uses for the rm_checkpoint() point proposed and I've
> > been asked multiple times for a checkpoint hook. Any rmgr that
> > services crash recovery for a non-smgr based storage system would need
> > this because the current checkpoint code only handles flushing to disk
> > for smgr-based approaches. That is orthogonal to other code during
> > checkpoint, so it stands alone quite well.
>
> FWIW, for that there are much bigger problems than checkpoint
> extensibility. Most importantly there's currently no good way to integrate
> relation creation / drop with the commit / abort infrastructure...

One at a time...

-- 
Simon Riggs                http://www.EnterpriseDB.com/

Re: Comments on Custom RMGRs

От

Jeff Davis

Дата:

13 мая 2022 г., 15:46:24

On Fri, 2022-05-13 at 13:31 +0100, Simon Riggs wrote:
> The first was a discussion about a data structure needed by BDR about
> 4 years ago. In the absence of a pluggable checkpoint, the solution
> was forced to use a normal table, which wasn't very satisfactory.

I'm interested to hear more about this case. Are you developing it into
a full table AM? In my experience with columnar, there's still a long
tail of things I wish I had in the backend to better support complete
table AMs.

> The second was a more recent conversation with Mike Stonebraker, at
> the end of 2021.. He was very keen to remove the buffer manager
> entirely, which requires that we have a new smgr, which then needs
> new
> code to allow it to be written to disk at checkpoint time, which then
> requires some kind of pluggable code at checkpoint time. (Mike was
> also keen to remove WAL, but that's another story entirely!).

I'm guessing that would be more of an experimental/ambitious project,
and based on modified postgres anyway.

> The last use case was unlogged indexes, which need to be read from
> disk at startup or rebuilt after crash, which requires RmgrStartup to
> work both with and without InRedo=true.

That sounds like a core feature, in which case we can just refactor
that for v16. It might be a nice cleanup for unlogged tables, too. I
don't think your 002-v2 patch is particularly risky, but any reluctance
at all probably pushes it to v16 given that it's so late in the cycle.

Regards,
    Jeff Davis

Re: Comments on Custom RMGRs

От

Robert Haas

Дата:

13 мая 2022 г., 16:25:21

On Fri, May 13, 2022 at 8:47 AM Simon Riggs
<simon.riggs@enterprisedb.com> wrote:
> > Note that I'm *not* against making checkpoint extensible - I just think it
> > needs a good bit of design work around when the hook is called etc.
>
> When was any such work done previously for any other hook?? That isn't needed.

I think almost every proposal to add a hook results in some discussion
about how usable the hook will be and whether it's being put in the
correct place and called with the correct arguments.

I think that's a good thing, too. Otherwise the code would be
cluttered with a bunch of hooks that seemed to someone like a good
idea at the time but are actually just a maintenance headache.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Comments on Custom RMGRs

От

Danil Anisimow

Дата:

26 февраля 2024 г., 16:29:26

Hi,

The checkpoint hook looks very useful, especially for extensions that have their own storage, like pg_stat_statements.
For example, we can keep work data in shared memory and save it only during checkpoints.
When recovering, we need to read all the data from the disk and then repeat the latest changes from the WAL.

On Mon, Feb 26, 2024 at 2:42 PM Simon Riggs <simon.riggs@enterprisedb.com> wrote:
> On Fri, 13 May 2022 at 00:42, Andres Freund <andres@anarazel.de> wrote:
> >
> > On 2022-05-12 22:26:51 +0100, Simon Riggs wrote:
> > > On Thu, 12 May 2022 at 04:40, Andres Freund <andres@anarazel.de> wrote:
> > > > I'm not happy with the idea of random code being executed in the middle of
> > > > CheckPointGuts(), without any documentation of what is legal to do at that
> > > > point.
> > >
> > > The "I'm not happy.." ship has already sailed with pluggable rmgrs.
> >
> > I don't agree. The ordering within a checkpoint is a lot more fragile than
> > what you do in an individual redo routine.
>
> Example?
>
>
> > > Checkpoints exist for one purpose - as the starting place for recovery.
> > >
> > > Why would we allow pluggable recovery without *also* allowing
> > > pluggable checkpoints?
> >
> > Because one can do a lot of stuff with just pluggable WAL records, without
> > integrating into checkpoints?
> >
> > Note that I'm *not* against making checkpoint extensible - I just think it
> > needs a good bit of design work around when the hook is called etc.
>
> When was any such work done previously for any other hook?? That isn't needed.
>
> Checkpoints aren't complete until all data structures have
> checkpointed, so there are no problems from a partial checkpoint being
> written.
>
> As a result, the order of actions in CheckpointGuts() is mostly
> independent of each other. The SLRUs are all independent of each
> other, as is CheckPointBuffers().
>
> The use cases I'm trying to support aren't tricksy modifications of
> existing code, they are just entirely new data structures which are
> completely independent of other Postgres objects.
>
>
> > I definitely think it's too late in the cycle to add checkpoint extensibility
> > now.
> >
> >
> > > > To actually be useful we'd likely need multiple calls to such an rmgr
> > > > callback, with a parameter where in CheckPointGuts() we are. Right now the
> > > > sequencing is explicit in CheckPointGuts(), but with the proposed callback,
> > > > that'd not be the case anymore.
> > >
> > > It is useful without the extra complexity you mention.
> >
> > Shrug. The documentation work definitely is needed. Perhaps we can get away
> > without multiple callbacks within a checkpoint, I think it'll become more
> > apparent when writing information about the precise point in time the
> > checkpoint callback is called.
>
> You seem to be thinking in terms of modifying the existing actions in
> CheckpointGuts(). I don't care about that. Anybody that wishes to do
> that can work out the details of their actions.
>
> There is nothing to document, other than "don't do things that won't
> work". How can anyone enumerate all the things that wouldn't work??
>
> There is no list of caveats for any other hook. Why is it needed here?

There are easily reproducible issues where rm_checkpoint() throws an ERROR.
When it occurs at the end-of-recovery checkpoint, the server fails to start with a message like this:
ERROR: Test error
FATAL: checkpoint request failed
HINT: Consult recent messages in the server log for details.

Even if we remove the broken extension from shared_preload_libraries, we get the following message in the server log:
FATAL: resource manager with ID 128 not registered
HINT: Include the extension module that implements this resource manager in shared_preload_libraries.

In both cases, with or without the extension in shared_preload_libraries, the server cannot start.

This seems like a programmer's problem, but what should the user do after receiving such messages?

Maybe it would be safer to use something like after_checkpoint_hook, which would be called after the checkpoint is completed?
This is enough for some cases when we only need to save shared memory to disk.

--
Regards,
Daniil Anisimov
Postgres Professional: http://postgrespro.com

Re: Comments on Custom RMGRs

От

Jeff Davis

Дата:

26 февраля 2024 г., 19:55:58

On Mon, 2024-02-26 at 23:29 +0700, Danil Anisimow wrote:
> Hi,
>
> The checkpoint hook looks very useful, especially for extensions that
> have their own storage, like pg_stat_statements.
> For example, we can keep work data in shared memory and save it only
> during checkpoints.
> When recovering, we need to read all the data from the disk and then
> repeat the latest changes from the WAL.

Let's pick this discussion back up, then. Where should the hook go?
Does it need to be broken into phases like resource owners? What
guidance can we provide to extension authors to use it correctly?

Simon's right that these things don't need to be 100% answered for
every hook we add; but I agree with Andres and Robert that this could
benefit from some more discussion about the details.

The proposal calls the hook right after CheckPointPredicate() and
before CheckPointBuffers(). Is that the right place for the use case
you have in mind with pg_stat_statements?

Regards,
    Jeff Davis

Re: Comments on Custom RMGRs

От

Danil Anisimow

Дата:

29 февраля 2024 г., 14:47:57

On Tue, Feb 27, 2024 at 2:56 AM Jeff Davis <pgsql@j-davis.com> wrote:
> Let's pick this discussion back up, then. Where should the hook go?
> Does it need to be broken into phases like resource owners? What
> guidance can we provide to extension authors to use it correctly?
>
> Simon's right that these things don't need to be 100% answered for
> every hook we add; but I agree with Andres and Robert that this could
> benefit from some more discussion about the details.
>
> The proposal calls the hook right after CheckPointPredicate() and
> before CheckPointBuffers(). Is that the right place for the use case
> you have in mind with pg_stat_statements?

Hello!

Answering your questions might take some time as I want to write a sample patch for pg_stat_statements and make some tests.
What do you think about putting the patch to commitfest as it closing in a few hours?

--
Regards,
Daniil Anisimov
Postgres Professional: http://postgrespro.com

Re: Comments on Custom RMGRs

От

Jeff Davis

Дата:

29 февраля 2024 г., 19:06:22

On Thu, 2024-02-29 at 21:47 +0700, Danil Anisimow wrote:
> Answering your questions might take some time as I want to write a
> sample patch for pg_stat_statements and make some tests.
> What do you think about putting the patch to commitfest as it closing
> in a few hours?

Added to March CF.

I don't have an immediate use case in mind for this, so please drive
that part of the discussion. I can't promise this for 17, but if the
patch is simple enough and a quick consensus develops, then it's
possible.

Regards,
    Jeff Davis

Re: Comments on Custom RMGRs

От

"Andrey M. Borodin"

Дата:

04 марта 2024 г., 08:31:37

> On 29 Feb 2024, at 19:47, Danil Anisimow <anisimow.d@gmail.com> wrote:
>
> Answering your questions might take some time as I want to write a sample patch for pg_stat_statements and make some
tests.
> What do you think about putting the patch to commitfest as it closing in a few hours?

I’ve switched the patch to “Waiting on Author” to indicate that currently patch is not available yet. Please, flip it
backwhen it’s available for review. 

Thanks!

Best regards, Andrey Borodin.

Re: Comments on Custom RMGRs

От

Danil Anisimow

Дата:

21 марта 2024 г., 12:47:10

On Fri, Mar 1, 2024 at 2:06 AM Jeff Davis <pgsql@j-davis.com> wrote:
> Added to March CF.
>
> I don't have an immediate use case in mind for this, so please drive
> that part of the discussion. I can't promise this for 17, but if the
> patch is simple enough and a quick consensus develops, then it's
> possible.

[pgss_001.v1.patch] adds a custom resource manager to the pg_stat_statements extension. The proposed patch is not a complete solution for pgss and may not work correctly with replication.

The 020_crash.pl test demonstrates server interruption by killing a backend. Without rm_checkpoint hook, the server restores pgss stats only after last CHECKPOINT. Data added to WAL before the checkpoint is not restored.

The rm_checkpoint hook allows saving shared memory data to disk at each checkpoint. However, for pg_stat_statements, it matters when the checkpoint occurred. When the server shuts down, pgss deletes the temporary file of query texts. In other cases, this is unacceptable.
To provide this capability, a flags parameter was added to the rm_checkpoint hook. The changes are presented in [rmgr_003.v2.patch].

--
Regards,
Daniil Anisimov
Postgres Professional: http://postgrespro.com

Вложения

Re: Comments on Custom RMGRs

От

Jeff Davis

Дата:

21 марта 2024 г., 19:02:03

On Thu, 2024-03-21 at 19:47 +0700, Danil Anisimow wrote:
> [pgss_001.v1.patch] adds a custom resource manager to the
> pg_stat_statements extension.

Did you consider moving the logic for loading the initial contents from
disk from pgss_shmem_startup to .rmgr_startup?

> The rm_checkpoint hook allows saving shared memory data to disk at
> each checkpoint. However, for pg_stat_statements, it matters when the
> checkpoint occurred. When the server shuts down, pgss deletes the
> temporary file of query texts. In other cases, this is unacceptable.
> To provide this capability, a flags parameter was added to the
> rm_checkpoint hook. The changes are presented in [rmgr_003.v2.patch].

Overall this seems fairly reasonable to me. I think this will work for
similar extensions, where the data being stored is independent from the
buffers.

My biggest concern is that it might not be quite right for a table AM
that has complex state that needs action to be taken at a slightly
different time, e.g. right after CheckPointBuffers().

Then again, the rmgr is a low-level API, and any extension using it
should be prepared to adapt to changes. If it works for pgss, then we
know it works for at least one thing, and we can always improve it
later. For instance, we might call the hook several times and pass it a
"phase" argument.

Regards,
    Jeff Davis

Re: Comments on Custom RMGRs

От

Jeff Davis

Дата:

21 марта 2024 г., 19:07:38

On Thu, 2024-03-21 at 19:47 +0700, Danil Anisimow wrote:
> The proposed patch is not a complete solution for pgss and may not
> work correctly with replication.

Also, what is the desired behavior during replication? Should queries
on the primary be represented in pgss on the replica? If the answer is
yes, should they be differentiated somehow so that you can know where
the slow queries are running?

Regards,
    Jeff Davis

Re: Comments on Custom RMGRs

От

Danil Anisimow

Дата:

29 марта 2024 г., 11:20:11

On Fri, Mar 22, 2024 at 2:02 AM Jeff Davis <pgsql@j-davis.com> wrote:
> On Thu, 2024-03-21 at 19:47 +0700, Danil Anisimow wrote:
> > [pgss_001.v1.patch] adds a custom resource manager to the
> > pg_stat_statements extension.
>
> Did you consider moving the logic for loading the initial contents from
> disk from pgss_shmem_startup to .rmgr_startup?

I tried it, but .rmgr_startup is not called if the system was shut down cleanly.

> My biggest concern is that it might not be quite right for a table AM
> that has complex state that needs action to be taken at a slightly
> different time, e.g. right after CheckPointBuffers().

> Then again, the rmgr is a low-level API, and any extension using it
> should be prepared to adapt to changes. If it works for pgss, then we
> know it works for at least one thing, and we can always improve it
> later. For instance, we might call the hook several times and pass it a
> "phase" argument.

In [rmgr_003.v3.patch] I added a phase argument to RmgrCheckpoint().
Currently it is only called in two places: before and after CheckPointBuffers().

--
Regards,
Daniil Anisimov
Postgres Professional: http://postgrespro.com

Вложения

rmgr_003.v3.patch

Re: Comments on Custom RMGRs

От

Jeff Davis

Дата:

29 марта 2024 г., 17:09:38

On Fri, 2024-03-29 at 18:20 +0700, Danil Anisimow wrote:
>
> In [rmgr_003.v3.patch] I added a phase argument to RmgrCheckpoint().
> Currently it is only called in two places: before and after
> CheckPointBuffers().

I am fine with this.

You've moved the discussion forward in two ways:

  1. Changes to pg_stat_statements to actually use the API; and
  2. The hook is called at multiple points.

Those at least partially address the concerns raised by Andres and
Robert. But given that there was pushback from multiple people on the
feature, I'd like to hear from at least one of them. It's very late in
the cycle so I'm not sure we'll get more feedback in time, though.

Regards,
    Jeff Davis

Re: Comments on Custom RMGRs

От

Robert Haas

Дата:

17 мая 2024 г., 18:56:53

On Fri, Mar 29, 2024 at 1:09 PM Jeff Davis <pgsql@j-davis.com> wrote:
> I am fine with this.
>
> You've moved the discussion forward in two ways:
>
>   1. Changes to pg_stat_statements to actually use the API; and
>   2. The hook is called at multiple points.
>
> Those at least partially address the concerns raised by Andres and
> Robert. But given that there was pushback from multiple people on the
> feature, I'd like to hear from at least one of them. It's very late in
> the cycle so I'm not sure we'll get more feedback in time, though.

In my seemingly-neverending pass through the July CommitFest, I
reached this patch. My comment is: it's possible that
rmgr_003.v3.patch is enough to be useful, but does anyone in the world
think they know that for a fact?

I mean, pgss_001.v1.patch purports to demonstrate that it can be used,
but that's based on rmgr_003.v2.patch, not the v3 patch, and the
emails seem to indicate that it may not actually work. I also think,
looking at it, that it looks much more like a POC than something we'd
consider ready for commit. It also seems very unclear that we'd want
pg_stat_statements to behave this way, and indeed "this way" isn't
really spelled out anywhere.

I think it would be nice if we had an example that uses the proposed
hook that we could actually commit. Maybe that's asking too much,
though. I think the minimum thing we need is a compelling rationale
for why this particular hook design is going to be good enough. That
could be demonstrated by means of (1) a well-commented example that
accomplishes some understandable goal and/or (2) a detailed
description of how a non-core table AM or index AM is expected to be
able to make use of this. Bonus points if the person providing that
rationale can say credibly that they've actually implemented this and
it works great with 100TB of production data.

The problem here is not only that we don't want to commit a hook that
does nothing useful. We also don't want to commit a hook that works
wonderfully for someone but we have no idea why. If we do that, then
we don't know whether it's OK to modify the hook in the future as the
code evolves, or more to the point, which kinds of modifications will
be acceptable. And also, the next person who wants to use it is likely
to have to figure out all on their own how to do so, instead of being
able to refer to comments or documentation or the commit message or at
least a mailing list post.

My basic position is not that this patch is a bad idea, but that it
isn't really finished. The idea is probably a pretty good one, but
whether this is a reasonable implementation of the idea doesn't seem
clear, at least not to me.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Comments on Custom RMGRs

От

Jeff Davis

Дата:

17 мая 2024 г., 20:20:19

On Fri, 2024-05-17 at 14:56 -0400, Robert Haas wrote:
> (2) a detailed
> description of how a non-core table AM or index AM is expected to be
> able to make use of this. Bonus points if the person providing that
> rationale can say credibly that they've actually implemented this and
> it works great with 100TB of production data.

That's a chicken-and-egg problem and we should be careful about setting
the bar too high for table AM improvements. Ultimately, AM authors will
benefit more from a steady stream of improvements that sometimes miss
the mark than complete stagnation, as long as we use reasonable
judgement.

There aren't a lot of table AMs, and to create a good one you need a
lot of internals knowledge. If it's an important AM, the developers are
surely going to try it out on mainline occasionally, and expect API
breaks. If the API breaks for them in some fundamental way, they can
complain and we still have time to fix it.

> The problem here is not only that we don't want to commit a hook that
> does nothing useful. We also don't want to commit a hook that works
> wonderfully for someone but we have no idea why. If we do that, then
> we don't know whether it's OK to modify the hook in the future as the
> code evolves, or more to the point, which kinds of modifications will
> be acceptable.

We have to have some kind of understanding between us and AM authors
that they need to participate in discussions when using these APIs, try
changes during development, be adaptable when they change from release
to release, and come back and tell us when something is wrong.

> And also, the next person who wants to use it is likely
> to have to figure out all on their own how to do so, instead of being
> able to refer to comments or documentation or the commit message or
> at
> least a mailing list post.

Obviously it would be better to have a nice example table AM in
/contrib, different enough from heap, but nobody has done that yet. You
could argue that we never should have exposed the API without something
like this in the first place, but that's also a big ask and we'd
probably still not have it.

Regarding this particular change: the checkpointing hook seems more
like a table AM feature, so I agree with you that we should have a good
idea how a real table AM might use this, rather than only
pg_stat_statements.

Regards,
    Jeff Davis

Re: Comments on Custom RMGRs

От

Robert Haas

Дата:

17 мая 2024 г., 20:25:15

On Fri, May 17, 2024 at 4:20 PM Jeff Davis <pgsql@j-davis.com> wrote:
> Regarding this particular change: the checkpointing hook seems more
> like a table AM feature, so I agree with you that we should have a good
> idea how a real table AM might use this, rather than only
> pg_stat_statements.

I would even be OK with a pg_stat_statements example that is fully
working and fully explained. I just don't want to have no example at
all. The original proposal has been changed twice because of
complaints that the hook wasn't quite useful enough, but I think that
only proves that v3 is closer to being useful than v1. If v1 is 40% of
the way to useful and v3 is 120% of the way to useful, wonderful! But
if v1 is 20% of the way to being useful and v3 is 60% of the way to
being useful, it's not time to commit anything yet. I don't know which
is the case, and I think if someone wants this to be committed, they
need to explain clearly why it's the first and not the second.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Comments on Custom RMGRs

От

Michael Paquier

Дата:

27 мая 2024 г., 18:20:52

On Fri, May 17, 2024 at 04:25:15PM -0400, Robert Haas wrote:
> On Fri, May 17, 2024 at 4:20 PM Jeff Davis <pgsql@j-davis.com> wrote:
>> Regarding this particular change: the checkpointing hook seems more
>> like a table AM feature, so I agree with you that we should have a good
>> idea how a real table AM might use this, rather than only
>> pg_stat_statements.
>
> I would even be OK with a pg_stat_statements example that is fully
> working and fully explained. I just don't want to have no example at
> all. The original proposal has been changed twice because of
> complaints that the hook wasn't quite useful enough, but I think that
> only proves that v3 is closer to being useful than v1. If v1 is 40% of
> the way to useful and v3 is 120% of the way to useful, wonderful! But
> if v1 is 20% of the way to being useful and v3 is 60% of the way to
> being useful, it's not time to commit anything yet. I don't know which
> is the case, and I think if someone wants this to be committed, they
> need to explain clearly why it's the first and not the second.

Please note that I've been studying ways to have pg_stat_statements
being plugged in directly with the shared pgstat APIs to get it backed
by a dshash to give more flexibility and scaling, giving a way for
extensions to register their own stats kind.  In this case, the flush
of the stats would be controlled with a callback in the stats
registered by the extensions, conflicting with what's proposed here.
pg_stat_statements is all about stats, at the end.  I don't want this
argument to act as a barrier if a checkpoint hook is an accepted
consensus here,  but a checkpoint hook used for this code path is not
the most intuitive solution I can think of in the long-term.
--
Michael

Вложения

signature.asc

Re: Comments on Custom RMGRs

От

"Tristan Partin"

Дата:

27 мая 2024 г., 21:32:46

On Fri May 17, 2024 at 3:20 PM CDT, Jeff Davis wrote:
> ...
>
> Obviously it would be better to have a nice example table AM in
> /contrib, different enough from heap, but nobody has done that yet. You
> could argue that we never should have exposed the API without something
> like this in the first place, but that's also a big ask and we'd
> probably still not have it.

Not sure how useful it would be as an example, but MariaDB has
a blackhole storage engine[0], which has helped serve as a guide for me
previously.

[0]: https://mariadb.com/kb/en/blackhole/

--
Tristan Partin
https://tristan.partin.io

Re: Comments on Custom RMGRs

От

Heikki Linnakangas

Дата:

23 июля 2024 г., 13:21:53

On 27/05/2024 21:20, Michael Paquier wrote:
> On Fri, May 17, 2024 at 04:25:15PM -0400, Robert Haas wrote:
>> On Fri, May 17, 2024 at 4:20 PM Jeff Davis <pgsql@j-davis.com> wrote:
>>> Regarding this particular change: the checkpointing hook seems more
>>> like a table AM feature, so I agree with you that we should have a good
>>> idea how a real table AM might use this, rather than only
>>> pg_stat_statements.
>>
>> I would even be OK with a pg_stat_statements example that is fully
>> working and fully explained. I just don't want to have no example at
>> all. The original proposal has been changed twice because of
>> complaints that the hook wasn't quite useful enough, but I think that
>> only proves that v3 is closer to being useful than v1. If v1 is 40% of
>> the way to useful and v3 is 120% of the way to useful, wonderful! But
>> if v1 is 20% of the way to being useful and v3 is 60% of the way to
>> being useful, it's not time to commit anything yet. I don't know which
>> is the case, and I think if someone wants this to be committed, they
>> need to explain clearly why it's the first and not the second.
> 
> Please note that I've been studying ways to have pg_stat_statements
> being plugged in directly with the shared pgstat APIs to get it backed
> by a dshash to give more flexibility and scaling, giving a way for
> extensions to register their own stats kind.  In this case, the flush
> of the stats would be controlled with a callback in the stats
> registered by the extensions, conflicting with what's proposed here.
> pg_stat_statements is all about stats, at the end.  I don't want this
> argument to act as a barrier if a checkpoint hook is an accepted
> consensus here,  but a checkpoint hook used for this code path is not
> the most intuitive solution I can think of in the long-term.

On the topic of concrete uses for this API: We have a bunch of built-in 
resource managers that could be refactored to use this API. 
CheckPointGuts currently looks like this:

> /*
>  * Flush all data in shared memory to disk, and fsync
>  *
>  * This is the common code shared between regular checkpoints and
>  * recovery restartpoints.
>  */
> static void
> CheckPointGuts(XLogRecPtr checkPointRedo, int flags)
> {
>     CheckPointRelationMap();
>     CheckPointReplicationSlots(flags & CHECKPOINT_IS_SHUTDOWN);
>     CheckPointSnapBuild();
>     CheckPointLogicalRewriteHeap();
>     CheckPointReplicationOrigin();
> 
>     /* Write out all dirty data in SLRUs and the main buffer pool */
>     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_START(flags);
>     CheckpointStats.ckpt_write_t = GetCurrentTimestamp();
>     CheckPointCLOG();
>     CheckPointCommitTs();
>     CheckPointSUBTRANS();
>     CheckPointMultiXact();
>     CheckPointPredicate();
> 
>     RmgrCheckpoint(flags, RMGR_CHECKPOINT_BEFORE_BUFFERS);
> 
>     CheckPointBuffers(flags);
> 
>     RmgrCheckpoint(flags, RMGR_CHECKPOINT_AFTER_BUFFERS);
> 
>     /* Perform all queued up fsyncs */
>     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_SYNC_START();
>     CheckpointStats.ckpt_sync_t = GetCurrentTimestamp();
>     ProcessSyncRequests();
>     CheckpointStats.ckpt_sync_end_t = GetCurrentTimestamp();
>     TRACE_POSTGRESQL_BUFFER_CHECKPOINT_DONE();
> 
>     /* We deliberately delay 2PC checkpointing as long as possible */
>     CheckPointTwoPhase(checkPointRedo);
> }

Of these calls, CheckPointCLOG would be natural as the rmgr_callback of 
the clog rmgr. Similarly for CheckPointMultiXact and maybe a few others.

However, let's look at the pg_stat_statements patch (pgss_001.v1.patch):

It's now writing a new WAL record for every call to pgss_store(), 
turning even simple queries into WAL-logged operations. That's a 
non-starter. It will also not work on a standby. This needs to be 
redesigned so that the data is updated in memory, and written to disk 
and/or WAL-logged only periodically. Perhaps at checkpoints, but you 
could do it more frequently too.

I'm not convinced that the stats should be WAL-logged. Do you want them 
to be replicated and included in backups? Maybe, but maybe not. It's 
certainly a change to how it currently works.

If we don't WAL-log the stats, we don't really need a custom RMGR for 
it. We just need a checkpoint hook to flush the stats to disk, but we 
don't need a registered RMGR ID for it.

So, I got a feeling that adding this to the rmgr interface is not quite 
right. The rmgr callbacks are for things that run when WAL is 
*replayed*, while checkpoints are related to how WAL is generated. Let's 
design this as an independent hook, separate from rmgrs.

Another data point: In Neon, we actually had to add a little code to 
checkpoints, to WAL-log some exta data. That was a quick hack and might 
not be the right design in the first place, but these hooks would not 
have been useful for our purposes. We wanted to write a new WAL record 
at shutdown, and in CheckPointGuts(), it's already too late for that. It 
needs to be done earlier, before starting to the shutdown checkpoint. 
Similar to LogStandbySnapshot(), except that LogStandbySnapshot() is not 
called at shutdown like we wanted to. For a table AM, the point of a 
checkpoint hook is probably to fsync() data that is managed outside of 
the normal buffer manager and CheckPointGuts() is the right place for 
that, but other extensions might want to hook into checkpoints for other 
reasons.

-- 
Heikki Linnakangas
Neon (https://neon.tech)

Re: Comments on Custom RMGRs

От

Jeff Davis

Дата:

04 августа 2024 г., 15:36:28

On Tue, 2024-07-23 at 16:21 +0300, Heikki Linnakangas wrote:
> So, I got a feeling that adding this to the rmgr interface is not
> quite
> right. The rmgr callbacks are for things that run when WAL is
> *replayed*, while checkpoints are related to how WAL is generated.
> Let's
> design this as an independent hook, separate from rmgrs.

That's a good way to look at it, agreed.

Regards,
    Jeff Davis

Re: Comments on Custom RMGRs

От

Andrei Lepikhov

Дата:

14 октября 2025 г., 12:11:53

On 27/5/2024 20:20, Michael Paquier wrote:
> Please note that I've been studying ways to have pg_stat_statements
> being plugged in directly with the shared pgstat APIs to get it backed
> by a dshash to give more flexibility and scaling, giving a way for
> extensions to register their own stats kind.  In this case, the flush
> of the stats would be controlled with a callback in the stats
> registered by the extensions, conflicting with what's proposed here.
> pg_stat_statements is all about stats, at the end.  I don't want this
> argument to act as a barrier if a checkpoint hook is an accepted
> consensus here,  but a checkpoint hook used for this code path is not
> the most intuitive solution I can think of in the long-term.Let me continue this thread.
I wait for any kind of checkpoint cut-in machinery for extensions.

Typically, when collecting knowledge about the instance state, we store 
it in an extension's owned database table, incurring the costs 
associated with transactional mechanics, tuple format overhead, and so 
on. Usually, we don't need MVCC or rollback; we have fixed-length data, 
and it would be better to store data in hash tables. These hash tables 
should survive instances' restarts and crashes - that's the only feature 
needed.

The pg_stat_statements dumps its data to a file, but it is not reliable 
enough when we need consistent information, such as replication status 
or when logging update conflicts (see the Spock extension [1]). When we 
learn about query executions, we can't dump the hash table on each 
ExecutorEnd due to overhead, but we are okay with adding one more WAL 
record containing the hash table entry data - it may be done by the 
backend or by a separate background worker.

So, the primary reason for us is to have a moment to store the 
extension's state on disk, keeping in mind that we have registered RMGR, 
which allows us to restore the full state using this disk file and WAL 
records.

For me, the ideal place for such a hook is CheckPointGuts, right between 
the CheckPointBuffers call and fsyncs. I think that to demonstrate how 
this hook can work, the pg_stat_statements storage may need to be 
redesigned slightly.

[1] https://github.com/pgEdge/spock

-- 
regards, Andrei Lepikhov,
pgEdge

Re: Comments on Custom RMGRs

От

Andrei Lepikhov

Дата:

15 ноября 2025 г., 13:44:15

On 14/10/2025 11:11, Andrei Lepikhov wrote:
> For me, the ideal place for such a hook is CheckPointGuts, right between 
> the CheckPointBuffers call and fsyncs. I think that to demonstrate how 
> this hook can work, the pg_stat_statements storage may need to be 
> redesigned slightly.
There are two patches: 0001, which is the checkpoint hook itself, and 
0002, which includes an example and a trivial test.

During development, I attempted to apply it in my different modules and 
realised that the hook is preferred over an RMGR callback - I don't 
actually want to be forced to register RMGR in each project and have it 
loadable on an instance startup. In lightweight modules, I want to keep 
my knowledge base relatively close to the current state of the instance.
Nevertheless, the plan freezing extension (for example) needs to ensure 
that the user's query plan is 'frozen' after the function call. 
Therefore, we need to store the decision made in the WAL, which requires 
dumping the state into a file before performing the WAL cut. 
Additionally, I'd like to experiment with synchronising an extension 
state between master and replica through WAL records, as most 
optimisation recommendations are relevant to both instances.

Patch 0001 contains a hook that is called once after all checkpoint 
preparations have finished. I recall that people mentioned it might be 
helpful for AMs as well - feel free to propose changes to this patch.

Patch 2 adds an example to the test_dsm_registry module, as it is 
precisely the way I write the code: named DSM segment -> shared HTAB -> 
file dump. So, it looks natural and opens a room to extend this example 
by employing RMGR and xact callbacks to keep the extension state as 
close to the committed changes as possible.

The test looks pretty trivial so far - feel free to propose ideas on how 
to extend it.

-- 
regards, Andrei Lepikhov,
pgEdge

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Comments on Custom RMGRs

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения