Re: Logical replication launcher did not automatically restart when got SIGKILL
От | Fujii Masao |
---|---|
Тема | Re: Logical replication launcher did not automatically restart when got SIGKILL |
Дата | |
Msg-id | CAHGQGwHF-PdUOgiXCH_8K5qBm8b13h0Qt=dSoFXZybXQdbf-tw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Logical replication launcher did not automatically restart when got SIGKILL (shveta malik <shveta.malik@gmail.com>) |
Ответы |
Re: Logical replication launcher did not automatically restart when got SIGKILL
|
Список | pgsql-hackers |
On Fri, Jul 25, 2025 at 5:25 PM shveta malik <shveta.malik@gmail.com> wrote: > > On Fri, Jul 25, 2025 at 7:17 AM Fujii Masao <masao.fujii@gmail.com> wrote: > > > > On Thu, Jul 24, 2025 at 6:46 PM shveta malik <shveta.malik@gmail.com> wrote: > > > Sounds reasonable. > > > Thinking out loud, when cleaning up after a backend or background > > > worker crash, process_pm_child_exit() is invoked, which subsequently > > > calls both CleanupBackend() and HandleChildCrash(). After the cleanup > > > completes, process_pm_child_exit() calls PostmasterStateMachine() to > > > move to the next state. As part of that, PostmasterStateMachine() > > > invokes ResetBackgroundWorkerCrashTimes() (only in crash > > > scenarios/FatalError), to reset a few things. Since it also resets > > > rw_worker.bgw_notify_pid, it seems reasonable to reset the rw_pid as > > > well there. > > > > Thanks! > > Attached is a patch that fixes the issue by resetting rw_pid in > > ResetBackgroundWorkerCrashTimes(). > > > > The patch LGTM. Thanks to both ChangAo and Shveta for the review! I've pushed the patch and back-patched it to v18. > > We should probably add a regression test for this case, > > but I'd prefer to commit the fix first and work on the test separately. > > Andrey Rudometov proposed a test patch in thread [1], > > which we might use as a starting point. > > Sounds good. This proposed patch adds a new regression test file to verify background worker restarts when restart_after_crash is enabled. However, since we already have 013_crash_restart.pl for testing that scenario, I’m thinking it might be sufficient to check that the logical replication launcher, i.e., the background worker, restarts properly there. For example, we could add a check like the following to 013_crash_restart.pl. Thoughts? ----------------- is($node->safe_psql('postgres', "SELECT count(*) = 1 FROM pg_stat_activity WHERE backend_type = 'logical replication launcher'"), 't', 'logical replication launcher is running after crash'); ----------------- Regards, -- Fujii Masao
В списке pgsql-hackers по дате отправления: