Re: persist logical slots to disk during shutdown checkpoint

Поиск
Список
Период
Сортировка
От Julien Rouhaud
Тема Re: persist logical slots to disk during shutdown checkpoint
Дата
Msg-id CAOBaU_aFueEACJNf5L2HxBW=D0T_9khL202Jt4F-nb2jEZYSuw@mail.gmail.com
обсуждение исходный текст
Ответ на persist logical slots to disk during shutdown checkpoint  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: persist logical slots to disk during shutdown checkpoint  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Sat, 19 Aug 2023, 14:16 Amit Kapila, <amit.kapila16@gmail.com> wrote:
It's entirely possible for a logical slot to have a confirmed_flush
LSN higher than the last value saved on disk while not being marked as
dirty.  It's currently not a problem to lose that value during a clean
shutdown / restart cycle but to support the upgrade of logical slots
[1] (see latest patch at [2]), we seem to rely on that value being
properly persisted to disk. During the upgrade, we need to verify that
all the data prior to shudown_checkpoint for the logical slots has
been consumed, otherwise, the downstream may miss some data. Now, to
ensure the same, we are planning to compare the confirm_flush LSN
location with the latest shudown_checkpoint location which means that
the confirm_flush LSN should be updated after restart.

I think this is inefficient even without an upgrade because, after the
restart, this may lead to decoding some data again. Say, we process
some transactions for which we didn't send anything downstream (the
changes got filtered) but the confirm_flush LSN is updated due to
keepalives. As we don't flush the latest value of confirm_flush LSN,
it may lead to processing the same changes again.

In most cases there shouldn't be a lot of records to decode after restart, but I agree it's better to avoid decoding those again. 

The idea discussed in the thread [1] is to always persist logical
slots to disk during the shutdown checkpoint. I have extracted the
patch to achieve the same from that thread and attached it here. This
could lead to some overhead during shutdown (checkpoint) if there are
many slots but it is probably a one-time work.

I couldn't think of better ideas but another possibility is to mark
the slot as dirty when we update the confirm_flush LSN (see
LogicalConfirmReceivedLocation()). However, that would be a bigger
overhead in the running server as it could be a frequent operation and
could lead to more writes.

Yeah I didn't find any better option either at that time. I still think that forcing persistence on shutdown is the best compromise. If we tried to always mark the slot as dirty, we would be sure to add regular overhead but we would probably end up persisting the slot on disk on shutdown anyway most of the time, so I don't think it would be a good compromise. 

My biggest concern was that some switchover scenario might be a bit slower in some cases, but if that really is a problem it's hard to imagine what workload would be possible without having to persist them anyway due to continuous activity needing to be sent just before the shutdown.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: persist logical slots to disk during shutdown checkpoint
Следующее
От: Dean Rasheed
Дата:
Сообщение: Re: [PATCH] Add function to_oct