Re: [Patch] ALTER SYSTEM READ ONLY

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [Patch] ALTER SYSTEM READ ONLY
Дата
Msg-id 20201210003428.sy5tx55v5x242hrf@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: [Patch] ALTER SYSTEM READ ONLY  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: [Patch] ALTER SYSTEM READ ONLY  (Amul Sul <sulamul@gmail.com>)
Список pgsql-hackers
Hi,

On 2020-12-09 16:13:06 -0500, Robert Haas wrote:
> That's not good. On a typical busy system, a system is going to be in
> the middle of a checkpoint most of the time, and the checkpoint will
> take a long time to finish - maybe minutes.

Or hours, even. Due to the cost of FPWs it can make a lot of sense to
reduce the frequency of that cost...


> We want this feature to respond within milliseconds or a few seconds,
> not minutes. So we need something better here.

Indeed.


> I'm inclined to think
> that we should try to CompleteWALProhibitChange() at the same places
> we AbsorbSyncRequests(). We know from experience that bad things
> happen if we fail to absorb sync requests in a timely fashion, so we
> probably have enough calls to AbsorbSyncRequests() to make sure that
> we always do that work in a timely fashion. So, if we do this work in
> the same place, then it will also be done in a timely fashion.

Sounds sane, without having looked in detail.


> I'm not 100% sure whether that introduces any other problems.
> Certainly, we're not going to be able to finish the checkpoint once
> we've gone read-only, so we'll fail when we try to write the WAL
> record for that, or maybe earlier if there's anything else that tries
> to write WAL. Either the checkpoint needs to error out, like any other
> attempt to write WAL, and we can attempt a new checkpoint if and when
> we go read/write, or else we need to finish writing stuff out to disk
> but not actually write the checkpoint completion record (or any other
> WAL) unless and until the system goes back into read/write mode - and
> then at that point the previously-started checkpoint will finish
> normally. The latter seems better if we can make it work, but the
> former is probably also acceptable. What you've got right now is not.

I mostly wonder which of those two has which implications for how many
FPWs we need to redo. Presumably stalling but not cancelling the current
checkpoint is better?

Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: [Patch] ALTER SYSTEM READ ONLY
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: Deleting older versions in unique indexes to avoid page splits