RE: Synchronizing slots from primary to standby

Поиск
Список
Период
Сортировка
От Zhijie Hou (Fujitsu)
Тема RE: Synchronizing slots from primary to standby
Дата
Msg-id OS0PR01MB57162B67D3CB01B2756FBA6D94062@OS0PR01MB5716.jpnprd01.prod.outlook.com
обсуждение исходный текст
Ответ на Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Ответы Re: Synchronizing slots from primary to standby  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-hackers
On Thursday, April 4, 2024 5:37 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
> 
> BTW, while thinking on this one, I
> noticed that in the function LogicalConfirmReceivedLocation(), we first update
> the disk copy, see comment [1] and then in-memory whereas the same is not
> true in
> update_local_synced_slot() for the case when snapshot exists. Now, do we have
> the same risk here in case of standby? Because I think we will use these xmins
> while sending the feedback message (in XLogWalRcvSendHSFeedback()).
>
> * We have to write the changed xmin to disk *before* we change
> * the in-memory value, otherwise after a crash we wouldn't know
> * that some catalog tuples might have been removed already.

Yes, I think we have the risk on the standby, I can reproduce the case that if
the server crashes after updating the in-memory value and before saving them to
disk, the synced slot could be invalidated after restarting from crash, because
the necessary rows have been removed on the primary. The steps can be found in
[1].

I think we'd better fix the order in update_local_synced_slot() as well. I
tried to make the fix in 0002, 0001 is Shveta's patch to fix another issue in this thread. Since
they are touching the same function, so attach them together for review.

[1]
-- Primary:
SELECT 'init' FROM pg_create_logical_replication_slot('logicalslot', 'test_decoding', false, false, true);

-- Standby:
SELECT 'init' FROM pg_create_logical_replication_slot('standbylogicalslot', 'test_decoding', false, false, false);
SELECT pg_sync_replication_slots();

-- Primary:
CREATE TABLE test (a int);
INSERT INTO test VALUES(1);
DROP TABLE test;

SELECT txid_current();
SELECT txid_current();
SELECT txid_current();
SELECT pg_log_standby_snapshot();

SELECT pg_replication_slot_advance('logicalslot', pg_current_wal_lsn());

-- Standby:
- wait for standby to replay all the changes on the primary.

- this is to serialize snapshots.
SELECT pg_replication_slot_advance('standbylogicalslot', pg_last_wal_replay_lsn());

- Use gdb to stop at the place after calling ReplicationSlotsComputexx()
  functions and before calling ReplicationSlotSave().
SELECT pg_sync_replication_slots();

-- Primary:

- First, wait for the primary slot(the physical slot)'s catalog xmin to be
  updated to the same as the failover slot.

VACUUM FULL;

- Wait for VACUMM FULL to be replayed on standby.

-- Standby:

- For the process which is blocked by gdb, let the process crash (elog(PANIC,
  ...)).

After restarting the standby from crash, we can see the synced slot is invalidated.

LOG:  invalidating obsolete replication slot "logicalslot"
DETAIL:  The slot conflicted with xid horizon 741.
CONTEXT:  WAL redo at 0/3059B90 for Heap2/PRUNE_ON_ACCESS: snapshotConflictHorizon: 741, isCatalogRel: T, nplans: 0,
nredirected:0, ndead: 7, nunused: 0, dead: [22, 23, 24, 25, 26, 27, 28]; blkref #0: rel 1663/5/1249, blk 16
 


Best Regards,
Hou zj

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Etsuro Fujita
Дата:
Сообщение: [MASSMAIL]Comment about handling of asynchronous requests in postgres_fdw.c
Следующее
От: Alexander Lakhin
Дата:
Сообщение: Re: Add SPLIT PARTITION/MERGE PARTITIONS commands