Re: speed up a logical replica setup
От | Alexander Lakhin |
---|---|
Тема | Re: speed up a logical replica setup |
Дата | |
Msg-id | 68de6498-0449-a113-dd03-e198dded0bac@gmail.com обсуждение исходный текст |
Ответ на | Re: speed up a logical replica setup (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: speed up a logical replica setup
|
Список | pgsql-hackers |
Hello Amit and Hou-San, 11.07.2024 13:21, Amit Kapila wrote: > > We don't wait for the xmin to catch up corresponding to this insert > and I don't know if there is a way to do that. So, we should move this > Insert to after the call to pg_sync_replication_slots(). It won't > impact the general test of pg_createsubscriber. > > Thanks to Hou-San for helping me in the analysis of this BF failure. Thank you for investigating that issue! May I ask you to look at another failure of the test occurred today [1]? The failure log contains: recovery_target_lsn = '0/30098D0' pg_createsubscriber: starting the subscriber ... pg_createsubscriber: server was started pg_createsubscriber: waiting for the target server to reach the consistent state ... 2024-07-11 07:40:10.837 UTC [2948531][postmaster][:0] LOG: received fast shutdown request Though what I'm seeing after a successful run is: recovery_target_lsn = '0/3009860' pg_createsubscriber: starting the subscriber ... pg_createsubscriber: server was started pg_createsubscriber: waiting for the target server to reach the consistent state ... 2024-07-11 15:19:40.733 UTC [207517] 040_pg_createsubscriber.pl LOG: statement: SELECT pg_catalog.pg_is_in_recovery() 2024-07-11 15:19:41.635 UTC [207514] LOG: recovery stopping after WAL location (LSN) "0/3009860" 2024-07-11 15:19:41.635 UTC [207514] LOG: redo done at 0/3009860 system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 21.00 s I've managed to reproduce the failure locally. With the bgwriter mod: -#define LOG_SNAPSHOT_INTERVAL_MS 15000 +#define LOG_SNAPSHOT_INTERVAL_MS 150 and wal_debug=on, when running 5 test instances with parallel, I get the failure with the following log: recovery_target_lsn = '0/3009A20' pg_createsubscriber: starting the subscriber 2024-07-11 14:40:04.551 UTC [205589:72][startup][33/0:0] LOG: REDO @ 0/30099E8; LSN 0/3009A20: prev 0/30099B0; xid 0; len 24 - Standby/RUNNING_XACTS: nextXid 747 latestCompletedXid 746 oldestRunningXid 747 # ^^^ the last REDO record in the log ... pg_createsubscriber: server was started pg_createsubscriber: waiting for the target server to reach the consistent state ... pg_createsubscriber: server was stopped pg_createsubscriber: error: recovery timed out ... [14:43:06.011](181.800s) not ok 30 - run pg_createsubscriber on node S [14:43:06.012](0.000s) [14:43:06.012](0.000s) # Failed test 'run pg_createsubscriber on node S' # at t/040_pg_createsubscriber.pl line 356. $ pg_waldump -p src/bin/pg_basebackup_1/tmp_check/t_040_pg_createsubscriber_node_s_data/pgdata/pg_wal/ 000000010000000000000003 000000010000000000000003 | tail -2 rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/030099B0, prev 0/03009948, desc: RUNNING_XACTS nextXid 747 latestCompletedXid 746 oldestRunningXid 747 rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/030099E8, prev 0/030099B0, desc: RUNNING_XACTS nextXid 747 latestCompletedXid 746 oldestRunningXid 747 Whilst $ pg_waldump -p src/bin/pg_basebackup_1/tmp_check/t_040_pg_createsubscriber_node_p_data/pgdata/pg_wal/ 000000010000000000000003 000000010000000000000003 | tail rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/030099B0, prev 0/03009948, desc: RUNNING_XACTS nextXid 747 latestCompletedXid 746 oldestRunningXid 747 rmgr: Standby len (rec/tot): 50/ 50, tx: 0, lsn: 0/030099E8, prev 0/030099B0, desc: RUNNING_XACTS nextXid 747 latestCompletedXid 746 oldestRunningXid 747 rmgr: Heap2 len (rec/tot): 60/ 60, tx: 747, lsn: 0/03009A20, prev 0/030099E8, desc: NEW_CID rel: 1663/16384/6104, tid: 0/1, cmin: 4294967295, cmax: 0, combo: 4294967295 rmgr: Heap len (rec/tot): 54/ 54, tx: 747, lsn: 0/03009A60, prev 0/03009A20, desc: DELETE xmax: 747, off: 1, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/16384/6104 blk 0 rmgr: Transaction len (rec/tot): 78/ 78, tx: 747, lsn: 0/03009A98, prev 0/03009A60, desc: INVALIDATION ; inval msgs: catcache 49 catcache 46 relcache 0 rmgr: Transaction len (rec/tot): 98/ 98, tx: 747, lsn: 0/03009AE8, prev 0/03009A98, desc: COMMIT 2024-07-11 14:43:05.994561 UTC; relcache init file inval dbid 16384 tsid 1663; inval msgs: catcache 49 catcache 46 relcache 0 rmgr: Heap2 len (rec/tot): 60/ 60, tx: 748, lsn: 0/03009B50, prev 0/03009AE8, desc: NEW_CID rel: 1663/16385/6104, tid: 0/1, cmin: 4294967295, cmax: 0, combo: 4294967295 rmgr: Heap len (rec/tot): 54/ 54, tx: 748, lsn: 0/03009B90, prev 0/03009B50, desc: DELETE xmax: 748, off: 1, infobits: [KEYS_UPDATED], flags: 0x00, blkref #0: rel 1663/16385/6104 blk 0 rmgr: Transaction len (rec/tot): 78/ 78, tx: 748, lsn: 0/03009BC8, prev 0/03009B90, desc: INVALIDATION ; inval msgs: catcache 49 catcache 46 relcache 0 rmgr: Transaction len (rec/tot): 98/ 98, tx: 748, lsn: 0/03009C18, prev 0/03009BC8, desc: COMMIT 2024-07-11 14:43:06.008619 UTC; relcache init file inval dbid 16385 tsid 1663; inval msgs: catcache 49 catcache 46 relcache 0 [1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=olingo&dt=2024-07-11%2007%3A25%3A12 Best regards, Alexander
В списке pgsql-hackers по дате отправления: