Re: Logical replication launcher did not automatically restart when got SIGKILL
От | Fujii Masao |
---|---|
Тема | Re: Logical replication launcher did not automatically restart when got SIGKILL |
Дата | |
Msg-id | 1e1746d4-6273-4eb7-a870-912bf39662ee@oss.nttdata.com обсуждение исходный текст |
Ответ на | Re: Logical replication launcher did not automatically restart when got SIGKILL (shveta malik <shveta.malik@gmail.com>) |
Ответы |
Re: Logical replication launcher did not automatically restart when got SIGKILL
|
Список | pgsql-hackers |
On 2025/07/15 19:34, shveta malik wrote: > On Tue, Jul 15, 2025 at 2:56 PM cca5507 <cca5507@qq.com> wrote: >> >> Hi, hackers >> >> I found the $SUBJECT, the main reason is that RegisteredBgWorker::rw_pid has not been cleaned. >> >> Attach a patch to fix it. Thanks for the report! This issue appears to have been introduced by commit 28a520c0b77. As a result, not only the logical replication launcher but also other background workers (like autoprewarm) may fail to restart after a server crash. > Thank You for reporting this. The problem exists and the patch works > as expected. > > In the patch, we are resetting the PID during shared memory > initialization. Is there a better place to handle PID reset in the > case of a SIGKILL, possibly within a cleanup flow? For example, during > a regular shutdown, we reset the launcher (background worker) PID in > CleanupBackend(). Or is this the only possibility? From a quick look at the code, it seems that the second half of CleanupBackend() is responsible for cleaning up background workers and resetting rw_pid to 0. However, in the crash case, the function exits immediately after calling HandleChildCrash(), skipping that cleanup: if (crashed) { HandleChildCrash(bp_pid, exitstatus, procname); return; } This could be the problem? Shouldn't the background worker cleanup still happen even in the crash case? Regards, -- Fujii Masao NTT DATA Japan Corporation
В списке pgsql-hackers по дате отправления: