Обсуждение: Some other things about contrib/bloom and generic_xlog.c

Поиск
Список
Период
Сортировка

Some other things about contrib/bloom and generic_xlog.c

От
Tom Lane
Дата:
1. It doesn't seem like generic_xlog.c has thought very carefully about
the semantics of the "hole" between pd_lower and pd_upper.  The mainline
XLOG code goes to some lengths to ensure that the hole stays all-zeroes;
for example RestoreBlockImage() explicitly zeroes the hole when restoring
from a full-page image that has a hole.  But generic_xlog.c's redo routine
does not do anything comparable, nor does GenericXLogFinish make any
effort to ensure that the "hole" is all-zeroes after normal application of
a generic update.  The reason this is of interest is that it means the
contents of the "hole" could diverge between master and slave, or differ
between the original state of a database and what it is after a crash and
recovery.  That would at least complicate forensic comparisons of pages,
and I think it might also break checksumming.  We thought that this was
important enough to take the trouble of explicitly zeroing holes during
mainline XLOG replay.  Shouldn't generic_xlog.c take the same trouble?

2. Unless I'm missing something, contrib/bloom is making no effort
to ensure that bloom index pages can be distinguished from other pages
by pg_filedump.  That's fine if your expectation is that bloom will always
be a toy with no use in production; but otherwise, not so much.  You need
to make sure that the last two bytes of the page special space contain a
uniquely identifiable code; compare spgist_page_id in SPGiST indexes.
        regards, tom lane



Re: Some other things about contrib/bloom and generic_xlog.c

От
Michael Paquier
Дата:
On Sun, Apr 10, 2016 at 1:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> 1. It doesn't seem like generic_xlog.c has thought very carefully about
> the semantics of the "hole" between pd_lower and pd_upper.  The mainline
> XLOG code goes to some lengths to ensure that the hole stays all-zeroes;
> for example RestoreBlockImage() explicitly zeroes the hole when restoring
> from a full-page image that has a hole.

Yes.

> But generic_xlog.c's redo routine
> does not do anything comparable, nor does GenericXLogFinish make any
> effort to ensure that the "hole" is all-zeroes after normal application of
> a generic update.  The reason this is of interest is that it means the
> contents of the "hole" could diverge between master and slave, or differ
> between the original state of a database and what it is after a crash and
> recovery.  That would at least complicate forensic comparisons of pages,
> and I think it might also break checksumming.  We thought that this was
> important enough to take the trouble of explicitly zeroing holes during
> mainline XLOG replay.  Shouldn't generic_xlog.c take the same trouble?

One look at checksum_impl.h shows up that the page hole is taken into
account in the checksum calculation, so we definitely has better zero
the page. And looking at generic_xlog.c, this code either logs in a
full page, or a delta.

Actually, as I look at this code for the first time, I find that
GenericXLogFinish() is a very confusing interface. It makes no
distinction between a page and the meta data associated to this page,
this is controlled by a flag isNew and buffer data is saved as some
delta. Actually, fullmage is not exactly true, this may refer to a
page that has a hole. It seems to me that we should not have one but
two routines: GenericXLogRegisterBuffer and
GenericXLogRegisterBufData, similar to what the normal XLOG routines
offer.
-- 
Michael



Re: Some other things about contrib/bloom and generic_xlog.c

От
Teodor Sigaev
Дата:
> mainline XLOG replay.  Shouldn't generic_xlog.c take the same trouble?
Yes, it should. Alexander now works on it.

>
> 2. Unless I'm missing something, contrib/bloom is making no effort
> to ensure that bloom index pages can be distinguished from other pages
> by pg_filedump.  That's fine if your expectation is that bloom will always
> be a toy with no use in production; but otherwise, not so much.  You need
> to make sure that the last two bytes of the page special space contain a
> uniquely identifiable code; compare spgist_page_id in SPGiST indexes.

Now contrib/bloom is a canonical example how to implement yourown index and how 
to use generic WAL interface. And it is an useful toy: in some cases it could 
help in production system, patch attached. I'm a little dissatisfied with patch 
because I had to add unused field to BloomPageOpaqueData, in opposite case size 
of BloomPageOpaqueData struct equals 6 bytes but special size is MAXALIGNED, so, 
last two bytes will be unused. If unused field is not a problem, I will push 
this patch.

-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: Some other things about contrib/bloom and generic_xlog.c

От
Alexander Korotkov
Дата:
Hi, Tom!

On Sun, Apr 10, 2016 at 7:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
1. It doesn't seem like generic_xlog.c has thought very carefully about
the semantics of the "hole" between pd_lower and pd_upper.  The mainline
XLOG code goes to some lengths to ensure that the hole stays all-zeroes;
for example RestoreBlockImage() explicitly zeroes the hole when restoring
from a full-page image that has a hole.  But generic_xlog.c's redo routine
does not do anything comparable, nor does GenericXLogFinish make any
effort to ensure that the "hole" is all-zeroes after normal application of
a generic update.  The reason this is of interest is that it means the
contents of the "hole" could diverge between master and slave, or differ
between the original state of a database and what it is after a crash and
recovery.  That would at least complicate forensic comparisons of pages,
and I think it might also break checksumming.  We thought that this was
important enough to take the trouble of explicitly zeroing holes during
mainline XLOG replay.  Shouldn't generic_xlog.c take the same trouble?

2. Unless I'm missing something, contrib/bloom is making no effort
to ensure that bloom index pages can be distinguished from other pages
by pg_filedump.  That's fine if your expectation is that bloom will always
be a toy with no use in production; but otherwise, not so much.  You need
to make sure that the last two bytes of the page special space contain a
uniquely identifiable code; compare spgist_page_id in SPGiST indexes.

Thank you for spotting these issues.
I'm going to prepare patches for fixing both of them.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Some other things about contrib/bloom and generic_xlog.c

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> Actually, as I look at this code for the first time, I find that
> GenericXLogFinish() is a very confusing interface. It makes no
> distinction between a page and the meta data associated to this page,
> this is controlled by a flag isNew and buffer data is saved as some
> delta. Actually, fullmage is not exactly true, this may refer to a
> page that has a hole. It seems to me that we should not have one but
> two routines: GenericXLogRegisterBuffer and
> GenericXLogRegisterBufData, similar to what the normal XLOG routines
> offer.

Hmm.  Maybe the documentation leaves something to be desired, but
I thought that the interface was reasonable if you grant that the
goal is to be easy to use rather than to have maximum performance.
The calling code only has to concern itself with making the actual
changes to the target pages, not with constructing WAL descriptions
of those changes.  Also, the fact that the changes don't have to be
made within a critical section is a bigger help than it might sound.

So I don't really have any problems with the API as such.  I tried
to improve the docs a day or two ago, but maybe that needs further
work.
        regards, tom lane



Re: Some other things about contrib/bloom and generic_xlog.c

От
Tom Lane
Дата:
Teodor Sigaev <teodor@sigaev.ru> writes:
>> 2. Unless I'm missing something, contrib/bloom is making no effort
>> to ensure that bloom index pages can be distinguished from other pages
>> by pg_filedump.  That's fine if your expectation is that bloom will always
>> be a toy with no use in production; but otherwise, not so much.  You need
>> to make sure that the last two bytes of the page special space contain a
>> uniquely identifiable code; compare spgist_page_id in SPGiST indexes.

> Now contrib/bloom is a canonical example how to implement yourown index and how 
> to use generic WAL interface. And it is an useful toy: in some cases it could 
> help in production system, patch attached. I'm a little dissatisfied with patch 
> because I had to add unused field to BloomPageOpaqueData, in opposite case size 
> of BloomPageOpaqueData struct equals 6 bytes but special size is MAXALIGNED, so, 
> last two bytes will be unused. If unused field is not a problem, I will push 
> this patch.

Yes, it will mean that the special space will have to be 8 bytes not 4.
But on the other hand, that only makes a difference on 32-bit machines;
if MAXALIGN is 8 then you won't be able to fit anything into those bytes
anyway.  And someday there might be a use for the other two bytes ...
so I'm not particularly concerned by the "wasted space" argument.
        regards, tom lane



Re: Some other things about contrib/bloom and generic_xlog.c

От
Michael Paquier
Дата:
On Mon, Apr 11, 2016 at 11:29 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> Actually, as I look at this code for the first time, I find that
>> GenericXLogFinish() is a very confusing interface. It makes no
>> distinction between a page and the meta data associated to this page,
>> this is controlled by a flag isNew and buffer data is saved as some
>> delta. Actually, fullmage is not exactly true, this may refer to a
>> page that has a hole. It seems to me that we should not have one but
>> two routines: GenericXLogRegisterBuffer and
>> GenericXLogRegisterBufData, similar to what the normal XLOG routines
>> offer.
>
> Hmm.  Maybe the documentation leaves something to be desired, but
> I thought that the interface was reasonable if you grant that the
> goal is to be easy to use rather than to have maximum performance.
> The calling code only has to concern itself with making the actual
> changes to the target pages, not with constructing WAL descriptions
> of those changes.  Also, the fact that the changes don't have to be
> made within a critical section is a bigger help than it might sound.
>
> So I don't really have any problems with the API as such.  I tried
> to improve the docs a day or two ago, but maybe that needs further
> work.

Well, I am not saying that the given interface does nothing, far from
that. Though I think that we could take advantage of the rmgr
RM_GENERIC_ID introduced by this interface to be able to generate
custom WAL records and pass them through a stream.

As given out now, we are able to do the following:
- Log a full page
- Log a delta of a full page, which is actually data associated to a page.
- At recovery, replay those full pages with a delta.

What I thought we should be able to do with that should not be only
limited to indexes, aka to:
- Be able to register and log full pages
- Be able to log data associated to a block
- Be able to have record data
- Use at recovery custom routines to apply those WAL records
The first 3 ones are done via the set of routines generating WAL
records for the rmgr RM_GENERIC_ID. The 4th one needs a hook in
generic_redo. Looking at the thread where this patch has been debated
I am not seeing a discussion regarding that.
-- 
Michael



Re: Some other things about contrib/bloom and generic_xlog.c

От
Tom Lane
Дата:
Michael Paquier <michael.paquier@gmail.com> writes:
> As given out now, we are able to do the following:
> - Log a full page
> - Log a delta of a full page, which is actually data associated to a page.
> - At recovery, replay those full pages with a delta.

Right.

> What I thought we should be able to do with that should not be only
> limited to indexes, aka to:
> - Be able to register and log full pages
> - Be able to log data associated to a block
> - Be able to have record data
> - Use at recovery custom routines to apply those WAL records

I'm not following your distinction between a "page" and a "block",
perhaps.  Also, the entire point here is that extensions DON'T
get to have custom apply routines, because we have no way for
replay to know about such apply routines.  (It surely cannot look
in the catalogs for them, but there's no other suitable infrastructure
either.)  In turn, that means there's little value in having any custom
data associated with the WAL records.

If we ever solve the registering-custom-replay-routines conundrum,
it'll be time to think about what the frontend API for that ought
to be.  But this patch is not pretending to solve that, and indeed is
mainly showing that it's possible to have a useful WAL extension API
that doesn't require solving it.

In any case, the existence of this API doesn't foreclose adding
other APIs (perhaps backed by other RM_GENERIC_ID WAL record types)
later on.  So I don't think we need to insist that it do everything
anyone will ever want.
        regards, tom lane



Re: Some other things about contrib/bloom and generic_xlog.c

От
Tom Lane
Дата:
... BTW, with respect to the documentation angle, it seems to me
that it'd be better if GenericXLogRegister were renamed to
GenericXLogRegisterBuffer, or perhaps GenericXLogRegisterPage.
I think this would make the documentation clearer, and it would
also make it easier to add other sorts of Register actions later,
if we ever think of some (which seems not unlikely, really).

Another thing to think about is whether we're going to regret
hard-wiring the third argument as a boolean.  Should we consider
making it a bitmask of flags, instead?  It's not terribly hard
to think of other flags we might want there in future; for example
maybe something to tell GenericXLogFinish whether it's worth trying
to identify data movement on the page rather than just doing the
byte-by-byte delta calculation.
        regards, tom lane



Re: Some other things about contrib/bloom and generic_xlog.c

От
Michael Paquier
Дата:
On Tue, Apr 12, 2016 at 9:21 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Michael Paquier <michael.paquier@gmail.com> writes:
>> What I thought we should be able to do with that should not be only
>> limited to indexes, aka to:
>> - Be able to register and log full pages
>> - Be able to log data associated to a block
>> - Be able to have record data
>> - Use at recovery custom routines to apply those WAL records
>
> I'm not following your distinction between a "page" and a "block",
> perhaps.

s/block/page/. Sorry for the confusion.

> Also, the entire point here is that extensions DON'T
> get to have custom apply routines, because we have no way for
> replay to know about such apply routines.  (It surely cannot look
> in the catalogs for them, but there's no other suitable infrastructure
> either.)  In turn, that means there's little value in having any custom
> data associated with the WAL records.

Well, yes, the startup process has no knowledge of that. That's why it
would need to go through a hook, but the startup process has no
knowledge of routines loaded via _PG_init perhaps? I recall that it
loaded them. The weakness with this interface is that one needs to be
sure that this hook is actually loaded properly.

> If we ever solve the registering-custom-replay-routines conundrum,
> it'll be time to think about what the frontend API for that ought
> to be.  But this patch is not pretending to solve that, and indeed is
> mainly showing that it's possible to have a useful WAL extension API
> that doesn't require solving it.

Yep. I am not saying the contrary. This patch solves its own category
of things with its strict page-level control.

> In any case, the existence of this API doesn't foreclose adding
> other APIs (perhaps backed by other RM_GENERIC_ID WAL record types)
> later on.  So I don't think we need to insist that it do everything
> anyone will ever want.

Perhaps I am just confused by the interface. Linking the idea of a
page delta with GenericXLogRegister is not that intuitive, knowing
that what it should actually be GenericXLogRegisterBuffer and we
should have GenericXLogAddDelta, that applies a page difference on top
of an existing one.
-- 
Michael



Re: Some other things about contrib/bloom and generic_xlog.c

От
Michael Paquier
Дата:
On Tue, Apr 12, 2016 at 9:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> ... BTW, with respect to the documentation angle, it seems to me
> that it'd be better if GenericXLogRegister were renamed to
> GenericXLogRegisterBuffer, or perhaps GenericXLogRegisterPage.
> I think this would make the documentation clearer, and it would
> also make it easier to add other sorts of Register actions later,
> if we ever think of some (which seems not unlikely, really).

Funny thing. I just suggested the same just above :) With a second
routine to generate a delta difference from a page to keep the
knowledge of this delta in its own code path.

> Another thing to think about is whether we're going to regret
> hard-wiring the third argument as a boolean.  Should we consider
> making it a bitmask of flags, instead?  It's not terribly hard
> to think of other flags we might want there in future; for example
> maybe something to tell GenericXLogFinish whether it's worth trying
> to identify data movement on the page rather than just doing the
> byte-by-byte delta calculation.

Yes. Definitely this interface needs more thoughts. I'd think of
GenericXLogFinish as a more generic entry point.
-- 
Michael



Re: Some other things about contrib/bloom and generic_xlog.c

От
Craig Ringer
Дата:
On 12 April 2016 at 08:36, Michael Paquier <michael.paquier@gmail.com> wrote:
 

> Also, the entire point here is that extensions DON'T
> get to have custom apply routines, because we have no way for
> replay to know about such apply routines.  (It surely cannot look
> in the catalogs for them, but there's no other suitable infrastructure
> either.)  In turn, that means there's little value in having any custom
> data associated with the WAL records.

Well, yes, the startup process has no knowledge of that. That's why it
would need to go through a hook, but the startup process has no
knowledge of routines loaded via _PG_init perhaps?

The only way we really have at the moment is shared_preload_libraries.

This got discussed much earlier, possibly on a previous iteration of the generic xlog discussions rather than this specific thread. IIRC the gist was that we need a way to:

- Flag the need for a redo routine so that replay stops if that redo routine isn't available
- Find out *which* redo routine the generic log message needs during redo
- Get a pointer to that routine to actually call it

... and also possibly allow the admin to force skip of redo calls for a given extension (but not others) in case of a major problem.

If you rely on shared_preload_libraries, then "oops, I forgot to put myextension in shared_preload_libraries on the downstream" becomes a Bad Thing. There's no way to go back and redo the work you've passed over. You have to recreate the standby. Worse, it's silently wrong. The server has no idea it was meant to do anything and no way to tell the user it couldn't do what it was meant to do.

We can't register redo routines in the catalogs because we don't have access to the syscaches etc during redo (right?).

So how do we look at a generic log record, say "ok, the upstream that wrote this says it needs to invoke registered generic xlog hook 42, which is <func> from <extension> at <ptr>" ?

Personally I'd be willing to accept the limitations of the s_p_l and hook approach to the point where I think we should add the hook and accept the caveats of its use ... but I understand the problems with it. I understand not having custom redo routine support in this iteration of the patch. It's somewhat orthogonal job to this patch anyway - while handy for specific relation generic xlog, custom redo routines make most sense when combined with generic xlog that isn't necessarily associated with a specific relation. This patch doesn't support that.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Some other things about contrib/bloom and generic_xlog.c

От
Michael Paquier
Дата:
On Tue, Apr 12, 2016 at 10:54 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 12 April 2016 at 08:36, Michael Paquier wrote:
> The only way we really have at the moment is shared_preload_libraries.

That's exactly my point.

> If you rely on shared_preload_libraries, then "oops, I forgot to put
> myextension in shared_preload_libraries on the downstream" becomes a Bad
> Thing. There's no way to go back and redo the work you've passed over. You
> have to recreate the standby. Worse, it's silently wrong. The server has no
> idea it was meant to do anything and no way to tell the user it couldn't do
> what it was meant to do.

Well, one playing with the generic WAL record facility is

> We can't register redo routines in the catalogs because we don't have access
> to the syscaches etc during redo (right?).

Yes, the startup process does not have that knowledge.

> So how do we look at a generic log record, say "ok, the upstream that wrote
> this says it needs to invoke registered generic xlog hook 42, which is
> <func> from <extension> at <ptr>" ?
>
> Personally I'd be willing to accept the limitations of the s_p_l and hook
> approach to the point where I think we should add the hook and accept the
> caveats of its use ... but I understand the problems with it.

Honestly, so do I, and I could accept the use of a hook in
generic_redo in this code path which is really... Generic. Any other
infrastructure is going to be one brick shy of a load, and we are
actually sure that those routines are getting loaded once we reach the
first redo routine as far as I know. We could for example force the
hook to set some validation flags that are then checked in the generic
redo routine, and mark in the WAL record itself that this record
should have used the hook. If the record is replayed and this hook is
missing, then do a FATAL and prevent recovery to move on.
-- 
Michael



Re: Some other things about contrib/bloom and generic_xlog.c

От
Craig Ringer
Дата:
On 12 April 2016 at 10:09, Michael Paquier <michael.paquier@gmail.com> wrote:
On Tue, Apr 12, 2016 at 10:54 AM, Craig Ringer <craig@2ndquadrant.com> wrote: 

> If you rely on shared_preload_libraries, then "oops, I forgot to put
> myextension in shared_preload_libraries on the downstream" becomes a Bad
> Thing. There's no way to go back and redo the work you've passed over. You
> have to recreate the standby. Worse, it's silently wrong. The server has no
> idea it was meant to do anything and no way to tell the user it couldn't do
> what it was meant to do.

Well, one playing with the generic WAL record facility is

Yeah... that's what people say about a lot of things.

If it's there, someone will shoot their foot off with it and blame us.

There's a point where you have to say "well, that was dumb, you had to take the safety off, remove the barrel plug attached to the giant sign saying 'do not remove', find and insert the bolt, and load the ammunition yourself first, maybe you shouldn't have done that?"

However, that doesn't mean we should make it easy for a simple oversight to have serious data correctness implications. I'm on the fence about whether this counts; enough so that I wouldn't fight hard to get it in even if a patch existed, which it doesn't, and we weren't past feature freeze, which we are.
 
 
Any other
infrastructure is going to be one brick shy of a load

I disagree. It's very useful, just not for what you (and I) want to use it for. It's still quite reasonable for custom index representations, and it can be extended later.
 
We could for example force the
hook to set some validation flags that are then checked in the generic
redo routine, and mark in the WAL record itself that this record
should have used the hook. If the record is replayed and this hook is
missing, then do a FATAL and prevent recovery to move on.


Right, but _which_ routine on the standby? How does the standby know which hook must be called? You can't just say "any hook"; there might be multiple ones interested in different generic WAL messages. You need a registration system of some sort and a way for the standby and master to agree on how to identify extensions that have redo hooks for generic WAL. Then you need to be able to look up that registration in some manner during redo.

The simplest registration system would be something like a function called at _PG_init time that passes a globally-unique integer identifier for the extension that maps to some kind of central web-based registry where we hand out IDs. Forever. Then the standby says "this xlog needs extension with generic xlog id 42 but it's missing". But we all know how well those generic global identifier systems work when people are testing and prototyping stuff, want to change versions incompatibly, etc....

So I disagree it's as simple as a validation flag.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Some other things about contrib/bloom and generic_xlog.c

От
Alexander Korotkov
Дата:
On Tue, Apr 12, 2016 at 3:08 AM, Michael Paquier <michael.paquier@gmail.com> wrote:
What I thought we should be able to do with that should not be only
limited to indexes, aka to:
- Be able to register and log full pages
- Be able to log data associated to a block
- Be able to have record data
- Use at recovery custom routines to apply those WAL records
The first 3 ones are done via the set of routines generating WAL
records for the rmgr RM_GENERIC_ID. The 4th one needs a hook in
generic_redo. Looking at the thread where this patch has been debated
I am not seeing a discussion regarding that.

I've designed generic xlog under following restrictions:
 - We don't want users to register their own redo functions since it could affect reliability. Unfortunately, I can't find original discussion now.
 - API should be as simple as possible for extension author.

However, I think we should add some way for custom redo functions one day.  If we will add pluggable storage engines (I hope one day we will), then we would face some engines which don't use our buffer manager.  For such pluggable storage engines, current generic WAL wouldn't work, but user-defined redo functions would.

Now, my view is that we will need kind of two-phase recovery in order to provide user-defined redo functions:
1) In the first phase, all built-in objects are recovered.  After this phase, we can use system catalog.
2) In the second phase, user-defined redo functions are called on the base of consistent system catalog.

I think we can implement such two-phase recovery even now on the base of generic logical messages.  We can create logical slot.  When extension is loaded it starts second phase of recovery by reading generic logical messages from this logical slot.  Logical slot position should be also advanced on checkpoint.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Some other things about contrib/bloom and generic_xlog.c

От
Robert Haas
Дата:
On Mon, Apr 11, 2016 at 9:54 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> If you rely on shared_preload_libraries, then "oops, I forgot to put
> myextension in shared_preload_libraries on the downstream" becomes a Bad
> Thing. There's no way to go back and redo the work you've passed over. You
> have to recreate the standby. Worse, it's silently wrong. The server has no
> idea it was meant to do anything and no way to tell the user it couldn't do
> what it was meant to do.

No, that's not right.  If you rely on shared_preload_libraries, then
you'd just have the server die like this:

FATAL: I don't know how to replay a record for extension rmgr "craigrules".
HINT: Add the necessary library to shared_preload_libraries and
restart the server.

That seems 100% fine, although maybe we could adopt the convention
that the user-visible extension rmgr name should match the shared
library name, just as we do for logical decoding plugins (IIRC), and
the server can try to load a library by that name and continue.

> We can't register redo routines in the catalogs because we don't have access
> to the syscaches etc during redo (right?).

Correct.

> So how do we look at a generic log record, say "ok, the upstream that wrote
> this says it needs to invoke registered generic xlog hook 42, which is
> <func> from <extension> at <ptr>" ?

Record enough information in WAL that the rmgrs can have names instead
of ID numbers.  Stick the list of extension rmgrs in use into each
checkpoint record and call it good.

> Personally I'd be willing to accept the limitations of the s_p_l and hook
> approach to the point where I think we should add the hook and accept the
> caveats of its use ... but I understand the problems with it. I understand
> not having custom redo routine support in this iteration of the patch. It's
> somewhat orthogonal job to this patch anyway - while handy for specific
> relation generic xlog, custom redo routines make most sense when combined
> with generic xlog that isn't necessarily associated with a specific
> relation. This patch doesn't support that.

Yeah.  And while this patch may (probably does) have technical defects
that need to be cured, I don't think that's an argument against this
approach.  For prototyping, this is totally fine.  For
performance-critical stuff, probably not so much.  But if you don't
give people a way to prototype stuff and extend the system, then they
won't be as likely to innovate and make new stuff, and then we won't
get as many new patches for core proposing new and innovative things.
I think it's great that we are finally getting close to make the
access method system truly extensible - that has been a clear need for
a long time, and I think we are going about it in the right way.  We
can add more in the future.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Some other things about contrib/bloom and generic_xlog.c

От
Alexander Korotkov
Дата:
On Sun, Apr 10, 2016 at 7:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
1. It doesn't seem like generic_xlog.c has thought very carefully about
the semantics of the "hole" between pd_lower and pd_upper.  The mainline
XLOG code goes to some lengths to ensure that the hole stays all-zeroes;
for example RestoreBlockImage() explicitly zeroes the hole when restoring
from a full-page image that has a hole.  But generic_xlog.c's redo routine
does not do anything comparable, nor does GenericXLogFinish make any
effort to ensure that the "hole" is all-zeroes after normal application of
a generic update.  The reason this is of interest is that it means the
contents of the "hole" could diverge between master and slave, or differ
between the original state of a database and what it is after a crash and
recovery.  That would at least complicate forensic comparisons of pages,
and I think it might also break checksumming.  We thought that this was
important enough to take the trouble of explicitly zeroing holes during
mainline XLOG replay.  Shouldn't generic_xlog.c take the same trouble?

Attached patch is intended to fix this.  It zeroes "hole" in both GenericXLogFinish() and generic_redo().

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Вложения

Re: Some other things about contrib/bloom and generic_xlog.c

От
Alexander Korotkov
Дата:
On Tue, Apr 12, 2016 at 3:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
... BTW, with respect to the documentation angle, it seems to me
that it'd be better if GenericXLogRegister were renamed to
GenericXLogRegisterBuffer, or perhaps GenericXLogRegisterPage.
I think this would make the documentation clearer, and it would
also make it easier to add other sorts of Register actions later,
if we ever think of some (which seems not unlikely, really).

Another thing to think about is whether we're going to regret
hard-wiring the third argument as a boolean.  Should we consider
making it a bitmask of flags, instead?  It's not terribly hard
to think of other flags we might want there in future; for example
maybe something to tell GenericXLogFinish whether it's worth trying
to identify data movement on the page rather than just doing the
byte-by-byte delta calculation.

I agree with both of these ideas. Patch is attached.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Вложения

Re: Some other things about contrib/bloom and generic_xlog.c

От
Craig Ringer
Дата:
On 12 April 2016 at 20:48, Robert Haas <robertmhaas@gmail.com> wrote:
 
> So how do we look at a generic log record, say "ok, the upstream that wrote
> this says it needs to invoke registered generic xlog hook 42, which is
> <func> from <extension> at <ptr>" ?

Record enough information in WAL that the rmgrs can have names instead
of ID numbers.  Stick the list of extension rmgrs in use into each
checkpoint record and call it good.

Repeating the mapping at each checkpoint sounds pretty reasonable and means we always know what we need. There's no need to bloat each record with an extension name and no need for any kind of ugly global registration. The mapping table would be small and simple. I like it.

Of course, it's all maybe-in-future stuff at this point, but I think that's a really good way to approach it.

There's no way around the fact that user defined redo functions can affect reliability. But then, so can user-defined data types, functions, bgworkers, plpython functions loading ctypes, plpython functions getting creative in the datadir, and admins who paste into the wrong window. The scope for problems is somewhat greater but not IMO prohibitively so.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Re: Some other things about contrib/bloom and generic_xlog.c

От
Tom Lane
Дата:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> Attached patch is intended to fix this.  It zeroes "hole" in both
> GenericXLogFinish() and generic_redo().

Pushed.  I rewrote the comments a bit and modified GenericXLogFinish
so that it doesn't copy data only to overwrite it with zeroes.
        regards, tom lane



Re: Some other things about contrib/bloom and generic_xlog.c

От
Teodor Sigaev
Дата:
> I agree with both of these ideas. Patch is attached.

Hmm, you accidentally forget to change flag type of GenericXLogRegister in 
contrib/bloom signature.


-- 
Teodor Sigaev                                   E-mail: teodor@sigaev.ru
  WWW: http://www.sigaev.ru/
 



Re: Some other things about contrib/bloom and generic_xlog.c

От
Alexander Korotkov
Дата:
On Tue, Apr 12, 2016 at 6:34 PM, Teodor Sigaev <teodor@sigaev.ru> wrote:
I agree with both of these ideas. Patch is attached.

Hmm, you accidentally forget to change flag type of GenericXLogRegister in contrib/bloom signature.

Fixed.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company 
Вложения

Re: Some other things about contrib/bloom and generic_xlog.c

От
Tom Lane
Дата:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> I agree with both of these ideas. Patch is attached.

Pushed with minor cleanup.
        regards, tom lane



Re: Some other things about contrib/bloom and generic_xlog.c

От
Alexander Korotkov
Дата:
On Tue, Apr 12, 2016 at 6:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Alexander Korotkov <a.korotkov@postgrespro.ru> writes:
> I agree with both of these ideas. Patch is attached.

Pushed with minor cleanup.

Great, thanks!

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

Re: Some other things about contrib/bloom and generic_xlog.c

От
Robert Haas
Дата:
On Tue, Apr 12, 2016 at 9:36 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Repeating the mapping at each checkpoint sounds pretty reasonable and means
> we always know what we need. There's no need to bloat each record with an
> extension name and no need for any kind of ugly global registration. The
> mapping table would be small and simple. I like it.
>
> Of course, it's all maybe-in-future stuff at this point, but I think that's
> a really good way to approach it.
>
> There's no way around the fact that user defined redo functions can affect
> reliability. But then, so can user-defined data types, functions, bgworkers,
> plpython functions loading ctypes, plpython functions getting creative in
> the datadir, and admins who paste into the wrong window. The scope for
> problems is somewhat greater but not IMO prohibitively so.

I am in agreement with you on both the merits of this particular thing
and the general principle you are articulating regarding how to think
about these things.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company