Re: Exit walsender before confirming remote flush in logical replication
| От | Vitaly Davydov |
|---|---|
| Тема | Re: Exit walsender before confirming remote flush in logical replication |
| Дата | |
| Msg-id | e25567b4-9893-48bf-ac17-0e884f1acef9@postgrespro.ru обсуждение исходный текст |
| Ответ на | Re: Exit walsender before confirming remote flush in logical replication (Fujii Masao <masao.fujii@gmail.com>) |
| Ответы |
Re: Exit walsender before confirming remote flush in logical replication
|
| Список | pgsql-hackers |
Dear Hackers, I think, I reproduced test fails. The test fails because walsender is in waiting state in WalSndDoneImmediate -> ereport with the following stack (see below). It seems, it tries to send the message to the replica and flush it, but the replica is hung. #0 0x00007a4b37f2a037 in epoll_wait #1 0x000056855317a2e8 in WaitEventSetWaitBlock #2 WaitEventSetWait #3 0x0000568552feea8e in secure_write #4 0x0000568552ff5666 in internal_flush_buffer #5 0x0000568552ff5966 in internal_flush #6 socket_flush () #7 socket_flush () #8 0x00005685532ff1b3 in send_message_to_frontend (edata=<optimized out>) #9 EmitErrorReport () #10 0x00005685532ff6dd in errfinish #11 0x000056855312cc9c in WalSndDoneImmediate () at walsender.c:3625 I would propose to remove the ereport call from WalSndDoneImmediate. With best regards, Vitaly On 1/19/26 15:41, Fujii Masao wrote: > On Sun, Jan 18, 2026 at 1:20 AM Andrey Silitskiy > <a.silitskiy@postgrespro.ru> wrote: >> >> On Jan 9, 2026 at 10:04 AM Fujii Masao >> <masao(dot)fujii(at)gmail(dot)com> wrote: >>> Why do we need to send a "done" message to the receiver here? >>> Since delivery isn't guaranteed in immediate mode, it seems of limited >>> value. >> >> It seems to me that it is better to send a message in cases where it is >> possible, so as not to raise errors on the subscriber during a clean shutdown. >> And when this is not possible, exit the process without waiting. >> >>> For the immediate mode, would it make sense to log that the walsender is >>> terminating in immediate mode and that WAL replication may be incomplete, >>> so users can more easily understand what happened? >> >> Added to the latest patch. > > Thanks for updating the patch! > > cfbot is reporting a test failure. Could you please look into it and > fix the issue? > https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F6234 > > Regards, >
В списке pgsql-hackers по дате отправления: