Re: long-standing data loss bug in initial sync of logical replication

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: long-standing data loss bug in initial sync of logical replication
Дата
Msg-id 20231118025445.crhaeeuvoe2g5dv6@awork3.anarazel.de
обсуждение исходный текст
Ответ на Re: long-standing data loss bug in initial sync of logical replication  (Andres Freund <andres@anarazel.de>)
Ответы Re: long-standing data loss bug in initial sync of logical replication  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Список pgsql-hackers
Hi,

On 2023-11-17 17:54:43 -0800, Andres Freund wrote:
> On 2023-11-17 15:36:25 +0100, Tomas Vondra wrote:
> > Overall, this looks, walks and quacks like a cache invalidation issue,
> > likely a missing invalidation somewhere in the ALTER PUBLICATION code.

I can confirm that something is broken with invalidation handling.

To test this I just used pg_recvlogical to stdout. It's just interesting
whether something arrives, that's easy to discern even with binary output.

CREATE PUBLICATION pb;
src/bin/pg_basebackup/pg_recvlogical --plugin=pgoutput --start --slot test -d postgres -o proto_version=4 -o
publication_names=pb-o messages=true -f -
 

S1: CREATE TABLE d(data text not null);
S1: INSERT INTO d VALUES('d1');
S2: BEGIN; INSERT INTO d VALUES('d2');
S1: ALTER PUBLICATION pb ADD TABLE d;
S2: COMMIT
S2: INSERT INTO d VALUES('d3');
S1: INSERT INTO d VALUES('d4');
RL: <nothing>

Without the 'd2' insert in an in-progress transaction, pgoutput *does* react
to the ALTER PUBLICATION.

I think the problem here is insufficient locking. The ALTER PUBLICATION pb ADD
TABLE d basically modifies the catalog state of 'd', without a lock preventing
other sessions from having a valid cache entry that they could continue to
use. Due to this, decoding S2's transactions that started before S2's commit,
will populate the cache entry with the state as of the time of S1's last
action, i.e. no need to output the change.

The reason this can happen is because OpenTableList() uses
ShareUpdateExclusiveLock. That allows the ALTER PUBLICATION to happen while
there's an ongoing INSERT.

I think this isn't just a logical decoding issue. S2's cache state just after
the ALTER PUBLICATION is going to be wrong - the table is already locked,
therefore further operations on the table don't trigger cache invalidation
processing - but the catalog state *has* changed.  It's a bigger problem for
logical decoding though, as it's a bit more lazy about invalidation processing
than normal transactions, allowing the problem to persist for longer.


I guess it's not really feasible to just increase the lock level here though
:(. The use of ShareUpdateExclusiveLock isn't new, and suddenly using AEL
would perhaps lead to new deadlocks and such? But it also seems quite wrong.


We could brute force this in the logical decoding infrastructure, by
distributing invalidations from catalog modifying transactions to all
concurrent in-progress transactions (like already done for historic catalog
snapshot, c.f. SnapBuildDistributeNewCatalogSnapshot()).  But I think that'd
be a fairly significant increase in overhead.



Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: John Naylor
Дата:
Сообщение: Re: [PoC] Reducing planning time when tables have many partitions
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [PoC] Reducing planning time when tables have many partitions