RFC: Giving bgworkers walsender-like grace during shutdown (for logical replication)

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема RFC: Giving bgworkers walsender-like grace during shutdown (for logical replication)
Дата
Msg-id CAGRY4nzO7-0Q6UkzOHKRTrLLYVNbq8hnGaO-o9eSBmNhY-Jo=g@mail.gmail.com
обсуждение исходный текст
Список pgsql-hackers
Hi folks

TL;DR: Anyone object to a new bgworker flag that exempts background workers (such as logical apply workers) from the first round of fast shutdown signals, and instead lets them to finish their currently in-progress txn and exit?

This is a change proposal raised for comment before patch submission so please consider it. Explanation of why I think we need it comes first, then proposed implementation.

Rationale:

Currently a fast shutdown causes logical replication subscribers to abort their currently in-progress transaction and terminate along with user backends. This means they cannot finish receiving and flushing the currently in-progress transaction, possibly wasting a very large amount of work.

After restart the subscriber must reconnect, decode and reorder buffer from the restart_lsn up to the current confirmed_flush_lsn, receive the whole txn on the wire all over again, and apply the whole txn again locally. We don't currently spool received txn change-streams to disk on the subscriber and flush them so we can't repeat just the local apply part (see the related thread "Logical archiving" for relevant discussion there). This can create a lot of bloat, a lot of excess WAL, etc, if a big txn was in progress at the time.

I'd like to add a bgworker flag that tells the postmaster to treat the logical apply bgworker (or extension equivalents) somewhat like a walsender for the purpose of fast shutdown. Instead of immediately terminating it like user backends on fast shutdown, the bgworker should be sent a ProcSignal warning that shutdown is pending and instructing it to finish receiving and applying its current transaction, then exit gracefully.

It's not quite the same as the walsender, since there we try to flush changes to downstreams up to the end of the last commit before shutting down. That doesn't make sense on a subscriber because the upstream is likely still generating txns. We just want to avoid wasting our effort on any in-flight txn.

Any objections?

Proposed implementation:

* Add new bgworker flag like BGW_DELAYED_SHUTDOWN

* Define new ProcSignal PROCSIG_SHUTDOWN_REQUESTED. On fast shutdown send this instead of a SIGTERM to bgworker backends flagged BGW_DELAYED_SHUTDOWN. On smart shutdown send it to all backends when the shutdown request arrives, since that could be handy for other uses too.

* Flagged bgworker is expected to finish its current txn and exit promptly. Impose a grace period after which they get SIGTERM'd anyway. Also send a SIGTERM if the postmaster receives a second fast shutdown request.

* Defer sending PROCSIG_WALSND_INIT_STOPPING to walsenders until all BGW_DELAYED_SHUTDOWN flagged bgworkers have exited, so we can ensure that cascaded downstreams receive any txns applied from the upstream.

This doesn't look likely to be particularly complicated to implement.

It might be better to use a flag in PGPROC rather than the bgworker struct, in case we want to extend this to other backend types in future. Also to make it easier for the postmaster to check the flag during shutdown. Could just claim a bit from statusFlags for the purpose. Thoughts?

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Zhihong Yu
Дата:
Сообщение: Re: Parallel Inserts in CREATE TABLE AS
Следующее
От: Peter Smith
Дата:
Сообщение: Re: Single transaction in the tablesync worker?