Re: Autovacuum daemon terminated by signal 11

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Autovacuum daemon terminated by signal 11
Дата
Msg-id 15221.1232149389@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Autovacuum daemon terminated by signal 11  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Autovacuum daemon terminated by signal 11
Список pgsql-general
I wrote:
> ... and you've seemingly not managed to install the debug symbols where
> gdb can find them.

But never mind that --- it turns out to be trivial to reproduce the
crash.  Just create a database, set its datfrozenxid and datvacuumxid
far in the past (via a manual update of pg_database), enable autovacuum,
and wait a bit.

What is happening is that autovacuum_do_vac_analyze contains

    old_cxt = MemoryContextSwitchTo(AutovacMemCxt);
    ...
    vacuum(vacstmt, relids);
    ...
    MemoryContextSwitchTo(old_cxt);

and at the time it is called by process_whole_db, CurrentMemoryContext
points at TopTransactionContext.  Which gets destroyed because vacuum()
internally finishes that transaction and starts a new one.  When we
come out of vacuum(), CurrentMemoryContext again points at
TopTransactionContext, but *its not the same one*.  The closing
MemoryContextSwitchTo is installing a stale pointer, which then remains
active into CommitTransaction.  It's a wonder this code ever works.

The other path through do_autovacuum() escapes this fate because it
enters autovacuum_do_vac_analyze with CurrentMemoryContext pointing
at AutovacMemCxt, which isn't going to go away.

I argue that autovacuum_do_vac_analyze shouldn't attempt to restore the
caller's memory context at all.  One possible approach is to make it
re-select AutovacMemCxt at exit, but I wonder if we shouldn't define
its entry and exit conditions as current context being
(the current instance of) TopTransactionContext.

It looks like 8.3 and HEAD take the latter approach and are therefore
safe from this bug.  8.2 seems to escape it also because it doesn't have
process_whole_db anymore, but it's certainly not
autovacuum_do_vac_analyze fault that it's not broken, because it's still
trying to restore a context that it has no right to assume still exists.

Alvaro, you want to take charge of fixing this?

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее
От: Justin Pasher
Дата:
Сообщение: Re: Autovacuum daemon terminated by signal 11
Следующее
От: Erik Jones
Дата:
Сообщение: Re: Inheritance question