RE: How can end users know the cause of LR slot sync delays?
| От | Zhijie Hou (Fujitsu) |
|---|---|
| Тема | RE: How can end users know the cause of LR slot sync delays? |
| Дата | |
| Msg-id | TY4PR01MB169070A4CA8D544ACDFB1191094D1A@TY4PR01MB16907.jpnprd01.prod.outlook.com обсуждение исходный текст |
| Ответ на | RE: How can end users know the cause of LR slot sync delays? ("Hayato Kuroda (Fujitsu)" <kuroda.hayato@fujitsu.com>) |
| Список | pgsql-hackers |
On Tuesday, November 25, 2025 6:30 PM Kuroda, Hayato <kuroda.hayato@fujitsu.com> wrote: > > Dear Hou, Amit, > > > Right, I agree. Here is the patch to release the slot at necessary places. > > Thanks for working on it. However, BF machines have not satisfied the fix yet. > There are still two failures after 3df4df53b06 [1] [2]. > > The reported issue was that standby server failed to synchronize the slot after > the slot is re-created on the primary. According to [1], slots on standby has > newer catalog xmin than primary. Like: > > ``` > LOG: could not synchronize replication slot "lsub1_slot" > DETAIL: Synchronization could lead to data loss, because the remote slot > needs WAL at LSN 0/030163A8 and catalog xmin 758, but the standby has > LSN 0/030163A8 and catalog xmin 759. > ``` > > Per analysis, the newly created logical slot on primary has the initial > catalog_xmin as 758 due to the physical slot holding catalog_xmin:758. The > standby does not have slots, so the new slot will have the latest xid (759) as > catalog_xmin. > > Anyway, I think this is a test issue. The issue is that the physical slot on the primary retains a catalog_xmin of 758, causing newly created slots to inherit the same catalog_xmin. In contrast, the standby, lacking slots, assigns an initial catalog_xmin of 759 to newly synced slots. The problem arises because the logical slot on the primary isn't being consumed, preventing the catalog_xmin from advancing, which leads to the test timing out. Previously, we avoided this issue by intentionally preventing xid assignment during slotsync tests, ensuring xmin/catalog_xmin remained static in most cases. However, the new test involves some DDLs in between tests causing this issue. Rather than adding additional wait events for control, we discussed to relocate the test to the end—after promoting the standby—where syncing the slot successfully isn't necessary. Since the test's goal is solely to verify slotsync skip statistics, this approach should suffice. Here is the patch to modify the test. > > [1]: https://buildfarm.postgresql.org/cgi- > bin/show_log.pl?nm=scorpion&dt=2025-11-25%2009%3A03%3A17 > [2]: https://buildfarm.postgresql.org/cgi- > bin/show_log.pl?nm=grassquit&dt=2025-11-25%2009%3A01%3A08 Best Regards, Hou zj
Вложения
В списке pgsql-hackers по дате отправления: