Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica
Дата
Msg-id 20220211001240.47idtavhfyzi4aa3@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica  (Andrey Borodin <x4mmm@yandex-team.ru>)
Ответы Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica  (Michael Paquier <michael@paquier.xyz>)
Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica  (Andrey Borodin <x4mmm@yandex-team.ru>)
Список pgsql-bugs
Hi,

On 2022-02-10 22:41:00 +0500, Andrey Borodin wrote:
> index_concurrently_swap(newidx->indexId, oldidx->indexId, oldName); // WAL-logged
> CacheInvalidateRelcacheByRelid(oldidx->tableId); // Not WAL-logged
> CommitTransactionCommand(); // WAL-logged
>
> But I do not know is it expected to work on standby, since relcache
> invalidation is not logged. Maybe Peter or someone more experienced than me
> can explain how this is expected to work.

What makes you say that CacheInvalidateRelcacheByRelid isn't WAL logged? It
should be emitted at the commit, like other invalidation messages?

Here's the WAL for a REINDEX CONCURRENTLY, leaving out the actual catalog
changes (i.e. heap/heap2/btree rmgr):
rmgr: Storage     len (rec/tot):     42/    42, tx:          0, lsn: 0/0155F3F8, prev 0/0155F3B8, desc: CREATE
base/5/16399
rmgr: Standby     len (rec/tot):     42/    42, tx:        731, lsn: 0/0155F428, prev 0/0155F3F8, desc: LOCK xid 731 db
5rel 16399
 
rmgr: Transaction len (rec/tot):    194/   194, tx:        731, lsn: 0/0155F960, prev 0/0155F918, desc: COMMIT
2022-02-1015:53:56.173778 PST; inval msgs: catcache 53 catcache 52 catcache 7 catcache 6 catcache 32 relcache 16399
relcache16392 snapshot 2608 relcache 16392
 
rmgr: Transaction len (rec/tot):    146/   146, tx:        732, lsn: 0/01561BB0, prev 0/01561B70, desc: COMMIT
2022-02-1015:53:56.192747 PST; inval msgs: catcache 53 catcache 52 catcache 32 relcache 16392 relcache 16399 relcache
16399


It might be worth writing a test that replays WAL records one-by-one, and that
checks at which record things start to break. If the problem is not
reproducible that way, we have a problem "within" a single WAL record
replay. If it is reproducible, it's likely a problem in the way the primary
generates WAL.

That would be generally very useful tooling for detailed testing of sequences
of WAL records.


I'm pretty sure the problem is on the primary. Looking through
ReindexRelationConcurrently() I think I found at least two problems:

1) We don't WAL log snapshot conflicts, afaict (i.e. the
WaitForOlderSnapshots() in phase 3). Which I think means that the new index
can end up being used "too soon" in pre-existing snapshots on the standby.

I don't think this is the problem this thread is about, but it's definitely a
problem.

2) WaitForLockersMultiple() in phase 5 / 6 isn't WAL logged. Without waiting
for sessions to see the results of Phase 4, 5, we can mark the index as dead
(phase 5) and drop it (phase 6), while there are ongoing accesses.

I think this is the likely cause of the reported bug.


Both seems like a significant issue for several of the CONCURRENTLY commands,
not just REINDEX.

Greetings,

Andres Freund



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andrey Borodin
Дата:
Сообщение: Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica
Следующее
От: Andres Freund
Дата:
Сообщение: Re: BUG #17399: Dead tuple number stats not updated on long running queries