Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica
Дата
Msg-id Yl+y7kvCDR6x4V+E@paquier.xyz
обсуждение исходный текст
Ответ на Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica  (Andrey Borodin <x4mmm@yandex-team.ru>)
Ответы Re: BUG #17401: REINDEX TABLE CONCURRENTLY creates a race condition on a streaming replica
Список pgsql-bugs
On Tue, Mar 08, 2022 at 06:44:31PM +0300, Andrey Borodin wrote:
> I tried to dig into this again. And once again I simply cannot
> understand how it is supposed to work... I'd appreciate if  someone
> explained it to me.
> When I do SELECT bt_index_check('idx',true) in fact I'm doing following:
> 1. regclassin() lookups 'idx', converts it into oid without taking any lock
> 2. bt_index_check() gets this oid and lookups index again.
> Do we rely on catalog snapshot here? Or how do we know that this oid
> is still of the index named 'idx' on standby? When backend will
> understand that this oid is no more of the interest and get an error
> somehow? I think it must happen during index_open().

Yeah, I can see that, and I think that it is a second issue, rather
independent of the cache lookup errors that we are seeing on a
standby.  As far as I have tested, it is possible to see this error as
well with a sequence of CREATE/DROP INDEX CONCURRENTLY run repeatedly
on a primary with the standby doing a scan of pg_index, like that for
example in a tight loop:
SELECT bt_index_check(indexrelid) FROM pg_index WHERE indisvalid AND
  indexrelid > 16000;

It is a matter of seconds to see bt_index_check() complain that one of
the so-said index is not valid, but the scan of pg_index was told to
not select such an index, so we are missing something with the
relation cache.

> BTW is anyone working on this bug?

It took me some time to come back to this thread, but I do now that we
are done with the commit fest business.

Speaking of the cache errors when working directly on the relations
rebuilt, I got a patch that seems to work properly, with a mix of
standby snapshots and AELs logged.  I have been running a set of
REINDEX INDEX/TABLE CONCURRENTLY on the primary through more than 100k
relfilenodes without the standby complaining, and I should be able to
get this one fixed before the next minor release.
--
Michael

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Takahiro Kitayama
Дата:
Сообщение: Re: BUG #16866: pg_basebackup Windows Server 20160
Следующее
От: Julien Rouhaud
Дата:
Сообщение: Re: Help Me! ^^ "must be superuser to make network requests"