Re: POC: enable logical decoding when wal_level = 'replica' without a server restart
| От | Masahiko Sawada |
|---|---|
| Тема | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart |
| Дата | |
| Msg-id | CAD21AoCtOoF8P3Uzy9i8yfdNztbX6w-5+bsYeLGU-Gk74oS4SA@mail.gmail.com обсуждение исходный текст |
| Ответ на | Re: POC: enable logical decoding when wal_level = 'replica' without a server restart (Amit Kapila <amit.kapila16@gmail.com>) |
| Список | pgsql-hackers |
On Wed, Dec 10, 2025 at 10:19 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Wed, Dec 10, 2025 at 3:32 PM shveta malik <shveta.malik@gmail.com> wrote:
> >
> > +1. This can be reproduced as well. When the logical-decoding state is
> > cached, we may fail to log logical-info (unassigned XID case), causing
> > certain rows not to be replicated to subscribers. The steps below
> > demonstrate this.
> >
> > Backend1 of pub:
> > -------------
> > create table tab1(i int);
> > create publication pub1 for table tab1;
> >
> > BEGIN;
> > SELECT txid_current_if_assigned(); --xid not assigned yet.
> > SHOW wal_level; SHOW effective_wal_level; --replica
> >
> > --pause here and do 'Step1' mentioned below on backend2.
> > --logical decoding is now enabled except this backend.
> > --now continue with backend1:
> >
> > insert into tab1 values(20);
> > insert into tab1 values(30);
> >
> > --pause here and do 'Step2' mentioned below on backend2.
> > --now continue with backend1:
> >
> > SELECT txid_current_if_assigned(); --xid gets assigned before above insert.
> > SHOW wal_level; SHOW effective_wal_level; --it is still 'replica' in this txn.
> > COMMIT;
> >
> > Step1 (it will enable logical decoding):
> > ----------------------------
> > Backend2 of pub:
> > SELECT pg_create_logical_replication_slot('slot', 'pgoutput', false,
> > false, false);
> > show wal_level; show effective_wal_level; --logical now.
> >
> > Subscriber:
> > create table tab1(i int);
> > create subscription sub1 connection '...' publication pub1;
> >
> > Backend2 of pub: insert into tab1 values(10);
> > ----------------------------
> >
> >
> > Step2:
> > --------------------------------
> > Backend2 of pub: insert into tab1 values(40);
> > --------------------------------
> >
> > At the end after backend1 commits:
> > On pub, we have 4 rows in tab1:
> > {10}, {20}, {30}, {40}
> >
> > On sub, we have 2 rows in tab1:
> > {10}, {40}
> >
> > ~~
> >
> > If we stop caching the logical-decoding state within a transaction, we
> > may still encounter issues, because the backend could observe logical
> > decoding as disabled at one point and enabled at another.
> >
>
> I think such a problem won't happen at transaction-level if we ensure
> that transaction-level cache is initialized at the time of
> transaction-id assignment.
Right. It seems reasonable to me. Transactional operations can
consistently write logical-level or replica-level WAL records whereas
non-transactional operations (such as VACUUM) immediately change its
effective_wal_level.
The transaction-level cache is aimed to prevent the issue like we had
in ExecuteTruncate() and ExecuteTruncateGuts(). I guess it's quite
confusing if XLogStandbyInfoActive() and XLogLogicalInfoActive()
behave differently, so I think we need the transaction-level cache.
> However, if we want to wait for all
> backends that have any open transaction during first logical
> slot-creation then this should be addressed automatically.
Right.
> And, we
> don't need to worry about the theoretical scenario where half the WAL
> info is constructed before tranasaction_id assignment and the other
> half after assignment. I feel waiting for all open transactions idea
> sounds like we are going too far without the real need.
>
> Having said that, if we still want to go with waiting for all open
> transactions idea then let's document it along with logical slot
> creation documentation.
With the above transaction-level cache, logical decoding ends up
processing non-logical-level WAL when:
(1) operations decide not to include logical-info to WAL records
before getting an XID.
(2) operations that don't require XID assignment started when
effective_wal_level was 'replica'.
For (1), I imagine the following scenario for example:
xl_xxx xlrec;
if (XLogLogicalInfoActive())
xlrec.flags |= LOGICAL_INFO_1;
GetTopTransactionId();
if (XLogLogicalInfoActive())
xlrec.flags |= LOGICAL_INFO_2;
XLogBeginInsert();
XLogRegisterData(&xlrec, sizeof(xlrec));
XLogInsert(XXX, YYY);
if a logical decoding starts before the XID assignment, it would
decode the WAL record, but which logical-info is included into the WAL
record depends on when the process absorbs the cache update signal
barrier. If the signal is processed between setting LOGICAL_INFO_1 and
the XID assignment, the WAL record would have only LOGICAL_INFO_2. I
guess such coding practice is uncommon and I don't see it in the
existing codes.
For (2), operations within a transaction that don't require the XID
assignment would write WAL records at 'replica' level or the mixed of
'replica' and 'logical' levels, depending on when it processes the
cache update singal. I searched[1] what kinds of rmgr don't require
the XID assignment (info is a hex number returned by
XLogRecGetInfo()):
BRIN info: 10, 20, 30, 40, 90, A0
BTEE info: 80, 90, B0, C0, E0
DBASE info: 00, 20
GIN info: 10, 20, 30, 60, 80, 90
GIST info: 00
HASH info: 70, 80, 90, B0
HEAP info: 70
HEAP2 info: 10, 20, 30, 40
LOGICALMSG info: 00
REPLORIGIN info: 10
SMGR info: 10
SPG info: 60, 80
STANDBY info: 00, 10, 20
XACT info: 00, 10, 20, 30, 40, 60
XLOG info: 00, 10, 30, 40, 50, 60, 70, 90, A0, B0, D0, E0, F0
As far as I research, there is no problem in terms of logical decoding
even if we process these WAL records generated at non-logical level
during logical decoding.
I think we can go without waiting. It would be great if we could have
checks or assertions to detect such scenarios.
I've updated the patch accordingly.
Regards,
[1] I put a debug log message in the recovery code for WAL records
whose xl_xid is invalid, run 'make check-world', and gather the logs
from all server logs. I might miss some cases.
--
Masahiko Sawada
Amazon Web Services: https://aws.amazon.com
Вложения
В списке pgsql-hackers по дате отправления: