Re: BUG #17695: Failed Assert in logical replication snapbuild.
От | Masahiko Sawada |
---|---|
Тема | Re: BUG #17695: Failed Assert in logical replication snapbuild. |
Дата | |
Msg-id | CAD21AoCsxvV3Mpzv8Q3hG7CdQZ7vSBWyEFLgY3QZARRDcgzVwA@mail.gmail.com обсуждение исходный текст |
Ответ на | BUG #17695: Failed Assert in logical replication snapbuild. (PG Bug reporting form <noreply@postgresql.org>) |
Список | pgsql-bugs |
On Thu, Nov 24, 2022 at 7:28 PM PG Bug reporting form <noreply@postgresql.org> wrote: > > The following bug has been logged on the website: > > Bug reference: 17695 > Logged by: 施博文 > Email address: zxwsbg@qq.com > PostgreSQL version: 14.6 > Operating system: centos > Description: > > In PG14 or higher version, I notice that SnapBuildRestore don't set > builder->next_phase_at=InvalidTransactionId . > > But in SnapBuildSerialize function, the assert check this condition. > > Assert(builder->next_phase_at == InvalidTransactionId); > > This would cause some problems, and I have repeat it with the perl test case > which I would update later, the problem is following: > > TRAP: FailedAssertion("builder->next_phase_at == InvalidTransactionId", > File: "snapbuild.c", Line: 1604, PID: 29974) > postgres: master: walsender postgres [local] > START_REPLICATION(ExceptionalCondition+0xb9)[0xb1c9bd] > postgres: master: walsender postgres [local] START_REPLICATION[0x8f548d] > postgres: master: walsender postgres [local] > START_REPLICATION(SnapBuildProcessRunningXacts+0x55)[0x8f4c5c] > postgres: master: walsender postgres [local] START_REPLICATION[0x8dd8be] > postgres: master: walsender postgres [local] > START_REPLICATION(LogicalDecodingProcessRecord+0xd1)[0x8dd243] > postgres: master: walsender postgres [local] START_REPLICATION[0x915eb1] > postgres: master: walsender postgres [local] START_REPLICATION[0x91520c] > postgres: master: walsender postgres [local] START_REPLICATION[0x913bdd] > postgres: master: walsender postgres [local] > START_REPLICATION(exec_replication_command+0x42c)[0x914593] > postgres: master: walsender postgres [local] > START_REPLICATION(PostgresMain+0x7be)[0x984ddf] > postgres: master: walsender postgres [local] START_REPLICATION[0x8c0d41] > postgres: master: walsender postgres [local] START_REPLICATION[0x8c06b3] > postgres: master: walsender postgres [local] START_REPLICATION[0x8bc9c4] > postgres: master: walsender postgres [local] > START_REPLICATION(PostmasterMain+0x117a)[0x8bc29b] > postgres: master: walsender postgres [local] START_REPLICATION[0x7bdaf9] > /lib64/libc.so.6(__libc_start_main+0xf5)[0x7f815dd92555] > postgres: master: walsender postgres [local] START_REPLICATION[0x488d09] > Thank you for reporting the issue! I could reproduce this issue with a small change and the following steps: 1. Add sleep after setting SNAPBUILD_BUILDING_SNAPSHOT. @@ -1409,6 +1409,9 @@ SnapBuildFindSnapshot(SnapBuild *builder, XLogRecPtr lsn, xl_running_xacts *runn running->xcnt, running->nextXid))); SnapBuildWaitSnapshot(running, running->nextXid); + + elog(LOG, "sleep 10s"); + pg_usleep(10 * 1000000); } 2. Create and start database instance and create a table and the replication slot 's1'. create table test (i int); select pg_logical_replication_slot('s1', 'test_decoding'); 3. Create another replication slot "s2" while a transaction is running. tx-1: begin; insert into test values (1); tx-2: select pg_logical_replication_slot('s2', 'test_decoding'); checkpoint; commit; Note that replication slot creation will wait for 10 sec after commiting the tx-1, please go to the next step once the slot has been created. 4. Start decoding on slot 's2'. select pg_logical_slot_get_changes('s2', null, null); The logical decoding will sleep. Please go to the next step after seeing the log "sleep 10s". 5. While the logical decoding started at step 4 is sleeping, start decoding on slot 's1'. select pg_logical_slot_get_changes('s1', null, null); 6. After the decoding on slot 's1' wakes up, it fails due to the assertion check. This scenario simulates the case where the logical decoding (re)starts where RUNNING_XACTS record having a running transactions and it restores the serialized snapshot when decoding the next RUNNING_XACTS record. Since we don't reset builder->next_phase_at when restoring a serialized snapshot, the assertion check in SnapBuildSerialize fails. Regarding the proposed patch, I've confirmed it fixes this issue. But I think it's better to reset builder->next_phase_at right after the following assertion check: /* consistent snapshots have no next phase */ Assert(ondisk.builder.next_phase_at == InvalidTransactionId); I could not reproduce this issue with your script in my environment. I think it's better to include the reproducible test case in the patch but I'm not sure how to do that without adding sleep/gdb attach. Regards, -- Masahiko Sawada Amazon Web Services: https://aws.amazon.com
В списке pgsql-bugs по дате отправления:
Предыдущее
От: Tom LaneДата:
Сообщение: Re: BUG #17696: Creation of deferrable initially deferred constraint breaks before transaction commit
Следующее
От: "Hayato Kuroda (Fujitsu)"Дата:
Сообщение: RE: BUG #17695: Failed Assert in logical replication snapbuild.