RE: [PoC] pg_upgrade: allow to upgrade publisher node

Поиск
Список
Период
Сортировка
От Hayato Kuroda (Fujitsu)
Тема RE: [PoC] pg_upgrade: allow to upgrade publisher node
Дата
Msg-id TYAPR01MB5866D67857191A7F75A40916F5999@TYAPR01MB5866.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Julien Rouhaud <rjuju123@gmail.com>)
Ответы Re: [PoC] pg_upgrade: allow to upgrade publisher node  (vignesh C <vignesh21@gmail.com>)
Re: [PoC] pg_upgrade: allow to upgrade publisher node  (vignesh C <vignesh21@gmail.com>)
Re: [PoC] pg_upgrade: allow to upgrade publisher node  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
Dear Julien,

> Sorry for the delay, I didn't had time to come back to it until this afternoon.

No issues, everyone is busy:-).

> I don't think that your analysis is correct.  Slots are guaranteed to be
> stopped after all the normal backends have been stopped, exactly to avoid such
> extraneous records.
>
> What is happening here is that the slot's confirmed_flush_lsn is properly
> updated in memory and ends up being the same as the current LSN before the
> shutdown.  But as it's a logical slot and those records aren't decoded, the
> slot isn't marked as dirty and therefore isn't saved to disk.  You don't see
> that behavior when doing a manual checkpoint before (per your script comment),
> as in that case the checkpoint also tries to save the slot to disk but then
> finds a slot that was marked as dirty and therefore saves it.
>
> In your script's scenario, when you restart the server the previous slot data
> is restored and the confirmed_flush_lsn goes backward, which explains those
> extraneous records.

So you meant to say that the key point was that some records which are not sent
to subscriber do not mark slots as dirty, hence the updated confirmed_flush was
not written into slot file. Is it right? LogicalConfirmReceivedLocation() is called
by walsender when the process gets reply from worker process, so your analysis
seems correct.

> It's probably totally harmless to throw away that value for now (and probably
> also doesn't lead to crazy amount of work after restart, I really don't know
> much about the logical slot code), but clearly becomes problematic with your
> usecase.  One easy way to fix this is to teach the checkpoint code to force
> saving the logical slots to disk even if they're not marked as dirty during a
> shutdown checkpoint, as done in the attached v1 patch (renamed as .txt to not
> interfere with the cfbot).  With this patch applied I reliably only see a final
> shutdown checkpoint record with your scenario.
>
> Now such a change will make shutdown a bit more expensive when using logical
> replication, even if in 99% of cases you will not need to save the
> confirmed_flush_lsn value, so I don't know if that's acceptable or not.

In any case we these records must be advanced. IIUC, currently such records are
read after rebooting but ingored, and this patch just skips them. I have not measured,
but there is a possibility that is not additional overhead, just a trade-off.

Currently I did not come up with another solution, so I have included your patch.
Please see 0002.

Additionally, I added a checking functions in 0003.
According to pg_resetwal and other functions, the length of CHECKPOINT_SHUTDOWN
record seems (SizeOfXLogRecord + SizeOfXLogRecordDataHeaderShort + sizeof(CheckPoint)).
Therefore, the function ensures that the difference between current insert position
and confirmed_flush_lsn is less than (above + page header).

Best Regards,
Hayato Kuroda
FUJITSU LIMITED


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jim Jones
Дата:
Сообщение: Re: Tab completion for AT TIME ZONE
Следующее
От: Konstantin Knizhnik
Дата:
Сообщение: OOM in hash join