Re: Minimal logical decoding on standbys

Поиск
Список
Период
Сортировка
От Drouvot, Bertrand
Тема Re: Minimal logical decoding on standbys
Дата
Msg-id 26c6f320-98f0-253c-f8b5-df1e7c1f6315@amazon.com
обсуждение исходный текст
Ответ на Re: Minimal logical decoding on standbys  (Andres Freund <andres@anarazel.de>)
Ответы Re: [UNVERIFIED SENDER] Re: Minimal logical decoding on standbys  ("Drouvot, Bertrand" <bdrouvot@amazon.com>)
Список pgsql-hackers
Hi Andres,

On 4/8/21 5:47 AM, Andres Freund wrote:
> Hi,
>
> On 2021-04-07 13:32:18 -0700, Andres Freund wrote:
>> While working on this I found a, somewhat substantial, issue:
>>
>> When the primary is idle, on the standby logical decoding via walsender
>> will typically not process the records until further WAL writes come in
>> from the primary, or until a 10s lapsed.
>>
>> The problem is that WalSndWaitForWal() waits for the *replay* LSN to
>> increase, but gets woken up by walreceiver when new WAL has been
>> flushed. Which means that typically walsenders will get woken up at the
>> same time that the startup process will be - which means that by the
>> time the logical walsender checks GetXLogReplayRecPtr() it's unlikely
>> that the startup process already replayed the record and updated
>> XLogCtl->lastReplayedEndRecPtr.
>>
>> I think fixing this would require too invasive changes at this point. I
>> think we might be able to live with 10s delay issue for one release, but
>> it sure is ugly :(.
> This is indeed pretty painful. It's a lot more regularly occuring if you
> either have a slot disk, or you switch around the order of
> WakeupRecovery() and WalSndWakeup() XLogWalRcvFlush().
>
> - There's about which timeline to use. If you use pg_recvlogical and you
>    restart the server, you'll see errors like:
>
>    pg_recvlogical: error: unexpected termination of replication stream: ERROR:  requested WAL segment
000000000000000000000003has already been removed
 
>
>    the real filename is 000000010000000000000003 - i.e. the timeline is
>    0.
>
>    This isn't too hard to fix, but definitely needs fixing.

Thanks, nice catch!

 From what I have seen, we are not going through InitXLOGAccess() on a 
Standby and in some cases (like the one you mentioned) 
StartLogicalReplication() is called without IdentifySystem() being 
called previously: this lead to ThisTimeLineID still set to 0.

I am proposing a fix in the attached v18 by adding a check in 
StartLogicalReplication() and ensuring that ThisTimeLineID is retrieved.

>
> - ResolveRecoveryConflictWithLogicalSlots() is racy - potentially
>    leading us to drop a slot that has been created since we signalled a
>    recovery conflict.  See
>    https://www.postgresql.org/message-id/20210408020913.zzprrlvqyvlt5cyy%40alap3.anarazel.de
>    for some very similar issues.

I have rewritten this part by following the same logic as the one used 
in 96540f80f8 (the commit linked to the thread you mentioned).

>
> - Given the precedent of max_slot_wal_keep_size, I think it's wrong to
>    just drop the logical slots. Instead we should just mark them as
>    invalid, like InvalidateObsoleteReplicationSlots().

Makes fully sense and done that way in the attached patch.

I am setting the slot's data.xmin and data.catalog_xmin as 
InvalidTransactionId to mark the slot(s) as invalid in case of conflict.

> - There's no tests covering timeline switches, what happens if there's a
>    promotion if logical decoding is currently ongoing.

I'll now work on the tests.

>
> - The way ResolveRecoveryConflictWithLogicalSlots() builds the error
>    message is not good (and I've complained about it before...).

I changed it and made it more simple.

I also removed the details around mentioning xmin or catalog xmin (as I 
am not sure of the added value and they are currently also not mentioned 
during standby recovery snapshot conflict).

>
> Unfortunately I think the things I have found are too many for me to
> address within the given time. I'll send a version with a somewhat
> polished set of the changes I made in the next few days...

Thanks for the review and feedback.

Please find enclosed v18 with the changes I worked on.

I still need to have a look on the tests.

There is also the 10s delay to work on, do you already have an idea on 
how we should handle it?

Thanks

Bertrand


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Masahiko Sawada
Дата:
Сообщение: Re: Failure in subscription test 004_sync.pl
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: An out-of-date comment in nodeIndexonlyscan.c