Обсуждение: Historic snapshot doesn't track txns committed in BUILDING_SNAPSHOT state

Поиск
Список
Период
Сортировка

Historic snapshot doesn't track txns committed in BUILDING_SNAPSHOT state

От
"cca5507"
Дата:
Hello hackers, I found that we currently don't track txns committed in
BUILDING_SNAPSHOT state because of the code in xact_decode():
/*
* If the snapshot isn't yet fully built, we cannot decode anything, so
* bail out.
*/
if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
return;

This can cause a txn to take an incorrect historic snapshot and result in an
interruption of logical replication. Consider the following scenario:
(pub)create table t1 (id int primary key);
(pub)insert into t1 values (1);
(pub)create publication pub for table t1;
(sub)create table t1 (id int primary key);
(pub)begin; insert into t1 values (2); (txn1 in session1)
(sub)create subscription sub connection 'hostaddr=127.0.0.1 port=5432 user=xxx dbname=postgres' publication pub; (pub will switch to BUILDING_SNAPSHOT state soon)
(pub)begin; insert into t1 values (3); (txn2 in session2)
(pub)create table t2 (id int primary key); (session3)
(pub)commit; (commit txn1, and pub will switch to FULL_SNAPSHOT state soon)
(pub)begin; insert into t2 values (1); (txn3 in session3)
(pub)commit; (commit txn2, and pub will switch to CONSISTENT state soon)
(pub)commit; (commit txn3, and replay txn3 will failed because its snapshot cannot see table t2)

The output of pub's log:
ERROR: could not map filenumber "base/5/16395" to relation OID

Is this a bug? Should we also track the txns committed in BUILDING_SNAPSHOT state?

--
Regards,
ChangAo Chen

Re: Historic snapshot doesn't track txns committed in BUILDING_SNAPSHOT state

От
Michael Paquier
Дата:
On Sun, Jun 09, 2024 at 11:21:52PM +0800, cca5507 wrote:
> Hello hackers, I found that we currently don't track txns committed in
> BUILDING_SNAPSHOT state because of the code in xact_decode():
>     /*
>      * If the snapshot isn't yet fully built, we cannot decode anything, so
>      * bail out.
>      */
>     if (SnapBuildCurrentState(builder) < SNAPBUILD_FULL_SNAPSHOT)
>         return;
>
> The output of pub's log:
> ERROR:  could not map filenumber "base/5/16395" to relation OID
>
> Is this a bug? Should we also track the txns committed in BUILDING_SNAPSHOT state?

Clearly, this is not an error you should be able to see as a user.  So
yes, that's something that needs to be fixed.
--
Michael

Вложения
Thank you for reply!
I am trying to fix it. This patch (pass check-world) will track txns
committed in BUILDING_SNAPSHOT state and can fix this bug.

--
Regards,
ChangAo Chen
Вложения