Re: BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon
Дата
Msg-id 20230614213724.gweo666oxqrugm44@awork3.anarazel.de
обсуждение исходный текст
Ответ на BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon  (PG Bug reporting form <noreply@postgresql.org>)
Ответы Re: BUG #17973: Reinit of pgstats entry for dropped DB can break autovacuum daemon  (Andres Freund <andres@anarazel.de>)
Список pgsql-bugs
Hi,

Good catch!


On 2023-06-13 23:49:27 +0000, PG Bug reporting form wrote:
> We've reproduced this on both 15.2 and REL_15_STABLE (git commit
> bd590d1fea1ba9245c791d589eea94d2dbad5a2b). We've never seen a similar issue
> on 14 despite very similar conditions occurring frequently. We haven't yet
> tried on 16.

The relevant code is new for 15, so that makes sense.


> In Jacob's repro, the problem begins when AutoVacWorkerMain() is in
> InitPostgres() and some other backend drops the database. Specifically, the
> autovacuum worker's InitPostgres() is just about to obtain the lock at
>
https://github.com/postgres/postgres/blob/bd590d1fea1ba9245c791d589eea94d2dbad5a2b/src/backend/utils/init/postinit.c#L1012
> . The backend dropping the DB marks the DB's pgstats entry as dropped but
> can’t free it because its refcount is nonzero, so
> AtEOXact_PgStat_DroppedStats() calls pgstat_request_entry_refs_gc(). The
> autovacuum worker's InitPostgres() proceeds to call GetDatabaseTuple() and
> notices the database has been dropped, at some point calling
> pgstat_gc_entry_refs() to release its reference to the DB's pgstats entry,
> and the worker decides to exit with a fatal error. But while exiting, it
> calls pgstat_report_stat(), which calls pgstat_update_dbstats() for the
> dropped DB, which calls pgstat_prep_database_pending(), which calls
> pgstat_prep_pending_entry(), which calls pgstat_get_entry_ref() with create
> == true, which calls pgstat_reinit_entry() against the DB's pgstats entry.
> This sets dropped to false on that entry. Finally, the autovacuum worker
> exits.

Hm. It's pretty nasty that we end up in the proc_exit() hooks with
MyDatabaseId set, even though we aren't actually validly connected to that
database.

ISTM that at the very least we should make "Recheck pg_database to make sure
the target database hasn't gone away." in InitPostgres() unset
MyDatabaseId, MyProc->databaseId.

We unfortunately can't just defer setting up MyProc->databaseId, I
think. Otherwise a concurrent vacuum could decide to remove database contents
we think we still can see.

If we had something like that it should be fairly easy to skip relevant parts
of pgstat_report_stat(), based on MyDatabaseId being invalid.

Greetings,

Andres Freund



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: BUG #17975: Nested Loop Index Scan returning wrong result
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: BUG #17975: Nested Loop Index Scan returning wrong result