Re: persist logical slots to disk during shutdown checkpoint

Поиск
Список
Период
Сортировка
От Amit Kapila
Тема Re: persist logical slots to disk during shutdown checkpoint
Дата
Msg-id CAA4eK1+yQDOmW-9bPkE26UgPbWsWwdQKHKQPTPormCCGLf2QEw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: persist logical slots to disk during shutdown checkpoint  (Michael Paquier <michael@paquier.xyz>)
Ответы Re: persist logical slots to disk during shutdown checkpoint  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
On Tue, Sep 12, 2023 at 10:55 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Sep 11, 2023 at 02:49:49PM +0530, Amit Kapila wrote:
>
> Please see the v11 attached, that rewords all the places of the patch
> that need clarifications IMO.  I've found that the comment additions
> in CheckPointReplicationSlots() to be overcomplicated:
> - The key point to force a flush of a slot if its confirmed_lsn has
> moved ahead of the last LSN where it was saved is to make the follow
> up restart more responsive.
>

I don't think it will become more responsive in any way, not sure what
made you think like that. The key idea is that after restart we want
to ensure that all the WAL data up to the shutdown checkpoint record
is sent to downstream. As mentioned in the commit message, this will
help in ensuring that upgrades don't miss any data and then there is
another small advantage as mentioned in the commit message.

> - Not sure that there is any point to mention the other code paths in
> the tree where ReplicationSlotSave() can be called, and a slot can be
> saved in other processes than just WAL senders (like slot
> manipulations in normal backends, for one).  This was the last
> sentence in v10.
>

The point was the earlier sentence is no longer true and keeping it as
it is could be wrong or at least misleading. For example, earlier it
is okay to say, "This needn't actually be part of a checkpoint, ..."
but now that is no longer true as we want to invoke this at the time
of shutdown checkpoint for correctness. If we want to be precise, we
can say, "It is convenient to flush dirty replication slots at the
time of checkpoint. Additionally, .."

>
> +           if (s->data.invalidated == RS_INVAL_NONE &&
> +               s->data.confirmed_flush != s->last_saved_confirmed_flush)
>
> Actually this is incorrect, no?  Shouldn't we make sure that the
> confirmed_flush is strictly higher than the last saved LSN?
>

I can't see why it is incorrect. Do you see how (in what scenario) it
could go wrong? As per my understanding, confirmed_flush LSN will
always be greater than equal to last_saved_confirmed_flush but we
don't want to ensure that point here because we just want if the
latest value is not the same then we should mark the slot dirty and
flush it as that will be location we have ensured to update before
walsender shutdown. I think it is better to add an assert if you are
worried about any such case and we had thought of adding it as well
but then didn't do it because we don't have matching asserts to ensure
that we never assign prior LSN value to consfirmed_flush LSN.

+ /*
+ * LSN used to track the last confirmed_flush LSN where the slot's data
+ * has been flushed to disk.
+ */
+ XLogRecPtr last_saved_confirmed_flush;

I don't want to argue on such a point because it is a little bit of a
matter of personal choice but I find this comment unclear. It seems to
read that confirmed_flush LSN is some LSN position which is where we
flushed the slot's data and that is not true. I found the last comment
in the patch sent by me: "This value tracks the last confirmed_flush
LSN flushed which is used during a shutdown checkpoint to decide if
logical's slot data should be forcibly flushed or not." which I feel
we agreed upon is better.

--
With Regards,
Amit Kapila.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: vignesh C
Дата:
Сообщение: Re: CHECK Constraint Deferrable
Следующее
От: Dilip Kumar
Дата:
Сообщение: Re: trying again to get incremental backup