Re: [PATCH] Fix fragile walreceiver test.
| От | Xuneng Zhou |
|---|---|
| Тема | Re: [PATCH] Fix fragile walreceiver test. |
| Дата | |
| Msg-id | CABPTF7WCWqQ2DrioSbUAShZk9Qm7Expf6NU6b9=97vQnNU7yGw@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: [PATCH] Fix fragile walreceiver test. (Michael Paquier <michael@paquier.xyz>) |
| Список | pgsql-hackers |
Hi, On Wed, Nov 5, 2025 at 3:56 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Nov 05, 2025 at 03:30:30PM +0800, Xuneng Zhou wrote: > > On Wed, Nov 5, 2025 at 2:50 PM Michael Paquier <michael@paquier.xyz> wrote: > >> Timing issue then, the buildfarm has not been complaining on this one > >> AFAIK, there have been no recoveryCheck failures reported: > >> https://buildfarm.postgresql.org/cgi-bin/show_failures.pl > > drongo has just reported one failure, so I stand corrected: > https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=drongo&dt=2025-11-05%2003%3A50%3A50 > > And one log rotation should be enough before the restart. > > >> Hmm. The reason why I didn't use a PID matching check (mentioned at > >> [1]) is that this is not entirely bullet-proof. On a very slow > >> machine, one could assume that standby_1 generates some records and > >> that these are replayed by standby_2 *before* the PID of the WAL > >> receiver is retrieved. This could lead to false positives in some > >> cases, and a bunch of buildfarm members are very slow. You have a > >> point that these would unlikely happen in normal runs, so a PID > >> matching check would be relevant most of the time anyway, even if the > >> original PID has been fetched after the TLI jump has been processed in > >> standby_2. I'd rather keep the log check, TBH, bypassing it with an > >> extra rotate_logfile() before the restart of standby_2. > > > > I’ve also prepared a patch for this method. > > That's exactly what I have done a couple of minutes ago, and noticed > your message before applying the fix so I've listed you are a > co-author on this one. > Thanks. > I have also kept the PID check after pondering a bit about it. A TLI > jump could be replayed before we grab the initial PID, but in most > cases it should be able to do its work correctly. Checking the PID seems straightforward and makes sense to me mostly. Best, Xuneng
В списке pgsql-hackers по дате отправления: