Обсуждение: [PATCH] Send catalog_xmin separately in hot standby feedback

Поиск
Список
Период
Сортировка

[PATCH] Send catalog_xmin separately in hot standby feedback

От
Craig Ringer
Дата:
Hi all

Currently hot standby feedback sends GetOldestXmin()'s result to the
upstream as the required xmin. GetOldestXmin() returns a slot's
catalog_xmin if that's the lowest xmin on the system.

That's fine so long as we don't do logical decoding on standbys, but
if we start allowing logical slots on standbys it'll cause the master
to retain too much bloat since it'll pin the master's xmin (not
catalog_xmin) down based on the catalog_xmin of any slots on the
downstream.

To fix that, add new fields to the hot standby feedback protocol
message to carry a separate catalog_xmin.

This doesn't need any special care for backward compatibility because
the only thing that has any business sending hot standby feedback is a
physical standby and they're required to be the same major version as
the master. If someone tries to connect with a standby of the wrong
version they'll fail long before this, and even if they didn't they'd
just get an error saying there's not enough data in the message.
pg_basebackup, pg_recvlogical and pg_receivexlog don't send hot
standby feedback messages.

I'm posting this now because Petr was interested in it for his work on
logical replication. I'll be following it a subsequent patch to allow
logical slot creation on physical replicas if they're using a slot to
talk to the master and have hot_standby_feedback enabled, just so you
know the direction this is going in.

Passes 'make check', src/bin/pg_basebackup and src/test/recovery TAP
tests. I haven't added specific tests for this functionality since
there isn't (yet) a way to set catalog_xmin separately on a physical
standby without a dedicated test module.

The logical decoding timeline following patch[1] is also relevant for
this, since it is required for logical decoding on standby to survive
promotion.

Next steps will be:


* Expose information about whether or not a slot is in use from walreceiver.c

* Allow logical slots to be created on replicas if
hot_standby_feedback is enabled and a logical slot is in use. Return
null as the exported snapshot ID when creating over the walsender
protocol, since we can't export a snapshot on a standby due to the
need to allocate an xid. (That can be addressed separately).

* Now that recovery tests are possible, write the recovery test suite
for logical decoding on standby

* Auto-drop replication slots when dropping a database in dbase_redo

* Add a safety mechanism to stop users disabling hs feedback on the
replica or stopping using a physical slot to the upstream while
logical slots exist on the replica. Or mark such logical slots as
unusable using a new persistent field on the slot. Not trivial because
we must allow crash recovery without a slot to upstream (obviously),
and should preferably also allow fallback to archive recovery when
server with slot is temporarily unreachable. Must also consider
handling of physical slot with catalog_xmin set from cascading
physical replica with logical slots on it.

* Extend the logical replication patch to add support for following
physical failover using this functionality, likely in 11.0.

[1] https://commitfest.postgresql.org/10/779/


I'll add this to the next CF, but I realise the inability to test it
standalone may mean it can only be committed as part of a series along
with full support for logical decoding from standby.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: [PATCH] Send catalog_xmin separately in hot standby feedback

От
Craig Ringer
Дата:
On 5 September 2016 at 12:40, Craig Ringer <craig@2ndquadrant.com> wrote:
> Hi all
>
> Currently hot standby feedback sends GetOldestXmin()'s result to the
> upstream as the required xmin. GetOldestXmin() returns a slot's
> catalog_xmin if that's the lowest xmin on the system.


Note that this patch changes the API to GetOldestXmin(), adding a new
boolean to allow it to disregard the catalog_xmin of slots.

Per Simon's feedback I'm going to split that out into a separate
patch, so will post a follow-up split one soon as the series.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: [PATCH] Send catalog_xmin separately in hot standby feedback

От
Craig Ringer
Дата:
On 5 September 2016 at 14:44, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 5 September 2016 at 12:40, Craig Ringer <craig@2ndquadrant.com> wrote:
>> Hi all
>>
>> Currently hot standby feedback sends GetOldestXmin()'s result to the
>> upstream as the required xmin. GetOldestXmin() returns a slot's
>> catalog_xmin if that's the lowest xmin on the system.
>
>
> Note that this patch changes the API to GetOldestXmin(), adding a new
> boolean to allow it to disregard the catalog_xmin of slots.
>
> Per Simon's feedback I'm going to split that out into a separate
> patch, so will post a follow-up split one soon as the series.

Now formatted a series:

1.      Send catalog_xmin in hot standby feedback protocol
2.      Make walsender respect catalog_xmin in hot standby feedback messages
3.      Allow GetOldestXmin(...) to optionally disregard the catalog_xmin
4.      Send catalog_xmin separately in hot_standby_feedback messages


Descriptions are in the patch headers.


1 adds the protocol field only. The value is at this point always sent
as 0 by walreceiver and ignored by walsender. There's need to handle
backward compatibility in the addition to the hot standby protocol
message here as only the same major version Pg has any business
sending us hot standby feedback. pg_receivexlog, pg_basebackup etc
don't use hs feedback. Includes protocol docs change.

2 makes walsender now pay attention to the sent catalog_xmin.
walreceiver doesn't set it yet and has no way to get it separately.

3 Provides a way to get the global xmin without considering the
catalog_xmin so walreceiver can use it.

4 makes walsender use the modified GetOldestXmin()


(3) needs additional attention:

By ignoring slot catalog_xmin in the GetOldestXmin() call then
separately calling ProcArrayGetReplicationSlotXmin() to get the
catalog_xmin to  we might produce a catalog xmin slightly later than
what was in the procarray at the time we previously called
GetOldestXmin() to examine backend/session state. ProcArrayLock is
released so it can be advanced in-between the calls. That's harmless -
it isn't necessary for the reported catalog_xmin to be exactly
consistent with backend state. If it advances it's safe to report the
new position since we know the confirmed positions are on-disk
locally.

The alternative here is to extend GetOldestXmin() to take an out-param
to report the slot catalog xmin. That really just duplicates  the
functionality of ProcArrayGetReplicationSlotXmin but means we can grab
it within a single ProcArray lock. Variants of GetOldestXmin and
ProcArrayGetReplicationSlotXmin that take an already-locked flag would
work too, but would hold the lock across parts of GetOldestXmin() that
currently don't retain it. I could also convert the current boolean
param ignoreVacuum into a flags argument instead of adding another
boolean. No real preference from me.

I cut out some comment changes to be submitted separately; otherwise
this series is much the same as the original patch upthread.

Also available at
https://github.com/2ndQuadrant/postgres/tree/dev/feedback-catalog-xmin
(and tagged dev/feedback-catalog-xmin). Branch subject to rebasing.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения

Re: [PATCH] Send catalog_xmin separately in hot standby feedback

От
Petr Jelinek
Дата:
Hi Craig,

On 05/09/16 11:28, Craig Ringer wrote:
> On 5 September 2016 at 14:44, Craig Ringer <craig@2ndquadrant.com> wrote:
>> On 5 September 2016 at 12:40, Craig Ringer <craig@2ndquadrant.com> wrote:
>>> Hi all
>>>
>>> Currently hot standby feedback sends GetOldestXmin()'s result to the
>>> upstream as the required xmin. GetOldestXmin() returns a slot's
>>> catalog_xmin if that's the lowest xmin on the system.
>>
>>
>> Note that this patch changes the API to GetOldestXmin(), adding a new
>> boolean to allow it to disregard the catalog_xmin of slots.
>>
>> Per Simon's feedback I'm going to split that out into a separate
>> patch, so will post a follow-up split one soon as the series.
> 

Here is my review of them.

> Now formatted a series:
> 
> 1.      Send catalog_xmin in hot standby feedback protocol

> +    xmin_epoch = nextEpoch;
>      if (nextXid < xmin)
> -        nextEpoch--;
> +        xmin_epoch --;
> +    catalog_xmin_epoch = nextEpoch;
> +    if (nextXid < catalog_xmin)
> +        catalog_xmin_epoch --;

Don't understand why you keep the nextEpoch here, it's not used anywhere
that I can see, you could just as well use the xmin_epoch directly if
that's how you want to name it.

> +    /*
> +     * A 10.0+ standby's walsender passes the lowest catalog xmin of any
> +     * replication slot up to the master.
> +     */
> +    feedbackCatalogXmin = pq_getmsgint(&reply_message, 4);
> +    feedbackCatalogEpoch = pq_getmsgint(&reply_message, 4);
>  

I'd be more interested to know why this is sent rather than it's sent
since version 10+ in this comment.

> 2.      Make walsender respect catalog_xmin in hot standby feedback messages

> +        if (TransactionIdIsNormal(feedbackCatalogXmin)
> +            && TransactionIdPrecedes(feedbackCatalogXmin, feedbackXmin))
> +        {
> +            MyPgXact->xmin = feedbackCatalogXmin;
> +        }
> +        else
> +        {
> +            MyPgXact->xmin = feedbackXmin;
> +        }

This not how we usually use the {} brackets (there are some more
instances of using them for one line of code in this particular commit).

> 3.      Allow GetOldestXmin(...) to optionally disregard the catalog_xmin
> By ignoring slot catalog_xmin in the GetOldestXmin() call then
> separately calling ProcArrayGetReplicationSlotXmin() to get the
> catalog_xmin to  we might produce a catalog xmin slightly later than
> what was in the procarray at the time we previously called
> GetOldestXmin() to examine backend/session state. ProcArrayLock is
> released so it can be advanced in-between the calls. That's harmless -
> it isn't necessary for the reported catalog_xmin to be exactly
> consistent with backend state. If it advances it's safe to report the
> new position since we know the confirmed positions are on-disk
> locally.
> 
> The alternative here is to extend GetOldestXmin() to take an out-param
> to report the slot catalog xmin. That really just duplicates  the
> functionality of ProcArrayGetReplicationSlotXmin but means we can grab
> it within a single ProcArray lock. Variants of GetOldestXmin and
> ProcArrayGetReplicationSlotXmin that take an already-locked flag would
> work too, but would hold the lock across parts of GetOldestXmin() that
> currently don't retain it. I could also convert the current boolean
> param ignoreVacuum into a flags argument instead of adding another
> boolean. No real preference from me.
> 

I would honestly prefer the change to GetOldestXmin to return the
catalog_xmin. It seems both cleaner and does less locking.

> 4.      Send catalog_xmin separately in hot_standby_feedback messages
> 

This looks okay (provided the change above).

In general it's simpler patch than I expected which is good. But it
would be good to have some tests.

--  Petr Jelinek                  http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services



Re: [PATCH] Send catalog_xmin separately in hot standby feedback

От
Craig Ringer
Дата:
On 25 October 2016 at 00:19, Petr Jelinek <petr@2ndquadrant.com> wrote:

>> Now formatted a series:
>>
>> 1.      Send catalog_xmin in hot standby feedback protocol
>
>> +     xmin_epoch = nextEpoch;
>>       if (nextXid < xmin)
>> -             nextEpoch--;
>> +             xmin_epoch --;
>> +     catalog_xmin_epoch = nextEpoch;
>> +     if (nextXid < catalog_xmin)
>> +             catalog_xmin_epoch --;
>
> Don't understand why you keep the nextEpoch here, it's not used anywhere
> that I can see, you could just as well use the xmin_epoch directly if
> that's how you want to name it.

Fixed, thanks.

>> +     /*
>> +      * A 10.0+ standby's walsender passes the lowest catalog xmin of any
>> +      * replication slot up to the master.
>> +      */
>> +     feedbackCatalogXmin = pq_getmsgint(&reply_message, 4);
>> +     feedbackCatalogEpoch = pq_getmsgint(&reply_message, 4);
>>
>
> I'd be more interested to know why this is sent rather than it's sent
> since version 10+ in this comment.

Removed. It's explained in a comment inside the if
(hot_standby_feedback) block in walreceiver anyway.

>> 2.      Make walsender respect catalog_xmin in hot standby feedback messages
>
>> +             if (TransactionIdIsNormal(feedbackCatalogXmin)
>> +                     && TransactionIdPrecedes(feedbackCatalogXmin, feedbackXmin))
>> +             {
>> +                     MyPgXact->xmin = feedbackCatalogXmin;
>> +             }
>> +             else
>> +             {
>> +                     MyPgXact->xmin = feedbackXmin;
>> +             }
>
> This not how we usually use the {} brackets (there are some more
> instances of using them for one line of code in this particular commit).

Whoops. Thanks. I find the Pg convention pretty ghastly when dealing
with multi-line 'if' conditions followed by a single-line statement,
but it's still the convention whether I like it or not.

>> 3.      Allow GetOldestXmin(...) to optionally disregard the catalog_xmin
>> By ignoring slot catalog_xmin in the GetOldestXmin() call then
>> separately calling ProcArrayGetReplicationSlotXmin() to get the
>> catalog_xmin to  we might produce a catalog xmin slightly later than
>> what was in the procarray at the time we previously called
>> GetOldestXmin() to examine backend/session state. ProcArrayLock is
>> released so it can be advanced in-between the calls. That's harmless -
>> it isn't necessary for the reported catalog_xmin to be exactly
>> consistent with backend state. If it advances it's safe to report the
>> new position since we know the confirmed positions are on-disk
>> locally.
>>
>> The alternative here is to extend GetOldestXmin() to take an out-param
>> to report the slot catalog xmin. That really just duplicates  the
>> functionality of ProcArrayGetReplicationSlotXmin but means we can grab
>> it within a single ProcArray lock. Variants of GetOldestXmin and
>> ProcArrayGetReplicationSlotXmin that take an already-locked flag would
>> work too, but would hold the lock across parts of GetOldestXmin() that
>> currently don't retain it. I could also convert the current boolean
>> param ignoreVacuum into a flags argument instead of adding another
>> boolean. No real preference from me.
>>
>
> I would honestly prefer the change to GetOldestXmin to return the
> catalog_xmin. It seems both cleaner and does less locking.

Fair enough. Done.

> In general it's simpler patch than I expected which is good. But it
> would be good to have some tests.

Agreed. OK, I've added basic tests for physical replication slots and
hot_standby_feedback to t/001_stream_rep.pl since it make sense to
test both along with stream_rep.pl's tests of cascading, etc and I
don't think a separate test is needed.

It's not actually practical to add tests for the catalog_xmin on
standby functionality until the next patch in the series (pending)
which enables logical decoding on standby. Currently you can't create
a slot on a standby so you can't cause the standby to hold down
catalog_xmin. But the tests show things work how they should within
the range of currently exposed functionality.

In the process I noticed how skeletal those tests still are. We have a
great framework now (thanks Michael!) and I'd like to start filling it
out with tests involving unclean shutdowns, promotions, etc. There's a
lot still to write to get solid coverage. Tests aren't hard. Who's
keen to write some? I'll happily help any volunteers out.

New patch series attached. 0001 is the new tests. The guts is patches
2-5. I'm not sure whether 2, 3, 4 and 5 should be squashed for commit
or not, but I left them separate for easier review.

For complete functionality this series will want to be coupled with
logical decoding timeline following and a pending patch to enable
logical decoding on standby.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Вложения