Re: Weird failure with latches in curculio on v15

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: Weird failure with latches in curculio on v15
Дата
Msg-id Y979NR3U5VnWrTwB@paquier.xyz
обсуждение исходный текст
Ответ на Re: Weird failure with latches in curculio on v15  (Nathan Bossart <nathandbossart@gmail.com>)
Ответы Re: Weird failure with latches in curculio on v15  (Nathan Bossart <nathandbossart@gmail.com>)
Re: Weird failure with latches in curculio on v15  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
On Sat, Feb 04, 2023 at 10:03:54AM -0800, Nathan Bossart wrote:
> Okay.  Michael, why don't we revert the shell_restore stuff for now?  Once
> the archive modules interface changes and the fix for this
> SIGTERM-during-system() problem are in, I will work through this feedback
> and give recovery modules another try.  I'm still hoping to have recovery
> modules ready in time for the v16 feature freeze.

Yes, at this stage a revert of the refactoring with shell_restore.c is
the best path forward.

From the discussion, I got the following things on top of my mind, for
reference:
- Should we include archive_cleanup_command into the recovery modules
at all?  We've discussed offloading that from the checkpointer, and it
makes the failure handling trickier when it comes to unexpected GUC
configurations, for one.  The same may actually apply to
restore_end_command.  Though it is done in the startup process now,
there may be an argument to offload that somewhere else based on the
timing of the end-of-recovery checkpoint.  My opinion on this stuff is
that only including restore_command in the modules would make most
users I know of happy enough as it removes the overhead of the command
invocation from the startup process, if able to replay things fast
enough so as the restore command is the bottleneck.
restore_end_command would be simple enough, but if there is a wish to
redesign the startup process to offload it somewhere else, then the
recovery module makes backward-compatibility concerns harder to think
about in the long-term.
- Do we need to reconsider the assumptions of the startup
process where SIGTERM enforces an immediate shutdown while running
system() for the restore command?  For example, the difference of
behavior when a restore_command uses a system sleep() that does not
react on signals from what I recall?
- Fixing the original issue of this thread may finish by impacting
what you are trying to do in this area, so fixing the original issue
first sounds  like a pre-requirement to me at the end because it may
impact the final design of the modules and their callbacks.  (I have
not looked at all the arguments raised about what to do with ~15,
still it does not look like we have a clear picture here yet.)
--
Michael

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Thomas Munro
Дата:
Сообщение: Re: File descriptors in exec'd subprocesses
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: undersized unions