Re: Improve pg_sync_replication_slots() to wait for primary to advance
От | shveta malik |
---|---|
Тема | Re: Improve pg_sync_replication_slots() to wait for primary to advance |
Дата | |
Msg-id | CAJpy0uBHvq=r2uxeCrJ7vAcGM0qab8=ZtBtRk55ZE0c-yo+bAw@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Improve pg_sync_replication_slots() to wait for primary to advance (Ajin Cherian <itsajin@gmail.com>) |
Ответы |
Re: Improve pg_sync_replication_slots() to wait for primary to advance
|
Список | pgsql-hackers |
On Thu, Jul 31, 2025 at 3:11 PM Ajin Cherian <itsajin@gmail.com> wrote: > > > Patch v3 attached. > Thanks for the patch. I tested it, please find a few comments: 1) it hits an assert (slotsync_reread_config()-->Assert(sync_replication_slots)) when API is trying to sync and is in wait loop while in another session, I enable sync_replication_slots using: ALTER SYSTEM SET sync_replication_slots = 'on'; SELECT pg_reload_conf(); Assert: 025-08-01 10:55:43.637 IST [118576] STATEMENT: SELECT pg_sync_replication_slots(); 2025-08-01 10:55:51.730 IST [118563] LOG: received SIGHUP, reloading configuration files 2025-08-01 10:55:51.731 IST [118563] LOG: parameter "sync_replication_slots" changed to "on" TRAP: failed Assert("sync_replication_slots"), File: "slotsync.c", Line: 1334, PID: 118576 postgres: shveta postgres [local] SELECT(ExceptionalCondition+0xbb)[0x61df0160e090] postgres: shveta postgres [local] SELECT(+0x6520dc)[0x61df0133a0dc] 2025-08-01 10:55:51.739 IST [118666] ERROR: cannot synchronize replication slots concurrently postgres: shveta postgres [local] SELECT(+0x6522b2)[0x61df0133a2b2] postgres: shveta postgres [local] SELECT(+0x650664)[0x61df01338664] postgres: shveta postgres [local] SELECT(+0x650cf8)[0x61df01338cf8] postgres: shveta postgres [local] SELECT(+0x6513ea)[0x61df013393ea] postgres: shveta postgres [local] SELECT(+0x6519df)[0x61df013399df] postgres: shveta postgres [local] SELECT(SyncReplicationSlots+0xbb)[0x61df0133af60] postgres: shveta postgres [local] SELECT(pg_sync_replication_slots+0x1b1)[0x61df01357e52] 2) + ereport(ERROR, + (errcode(ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE), + errmsg("cannot synchronize replication slots when" + " standby promotion is ongoing"))); I think better error message will be: "exiting from slot synchronization as promotion is triggered" This will be better suited in log file as well after below wait statements: LOG: continuing to wait for remote slot "failover_slot" LSN (0/3000060) and catalog xmin (755) to pass local slot LSN (0/3000060) and catalog xmin (757) STATEMENT: SELECT pg_sync_replication_slots(); 3) API dumps this when it is waiting for primary: ---- LOG: could not synchronize replication slot "failover_slot2" DETAIL: Synchronization could lead to data loss, because the remote slot needs WAL at LSN 0/03066E70 and catalog xmin 755, but the standby has LSN 0/03066E70 and catalog xmin 770. STATEMENT: SELECT pg_sync_replication_slots(); LOG: waiting for remote slot "failover_slot2" LSN (0/3066E70) and catalog xmin (755) to pass local slot LSN (0/3066E70) and catalog xmin (770) STATEMENT: SELECT pg_sync_replication_slots(); LOG: continuing to wait for remote slot "failover_slot2" LSN (0/3066E70) and catalog xmin (755) to pass local slot LSN (0/3066E70) and catalog xmin (770) STATEMENT: SELECT pg_sync_replication_slots(); ---- Unsure if we shall still dump 'could not synchronize..' when it is going to retry until it succeeds? The concerned log gives a feeling that we are done trying and could not synchronize it. What do you think? thanks Shveta
В списке pgsql-hackers по дате отправления: