Re: [HACKERS] Vacuum analyze bug CAUGHT

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] Vacuum analyze bug CAUGHT
Дата
Msg-id 13510.937007320@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: [HACKERS] Vacuum analyze bug CAUGHT  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
I wrote:
> This was in the process doing table creates/drops, and I surmise that
> the problem was a tuple move executed concurrently by the process doing
> VACUUM.  In other words, it looks like this problem of missing lock
> operations might be the cause, or one cause, of Michael's symptoms.

On looking closer, that's not so, because the particular path taken here
*does* have a lock --- DeleteAttributeTuples() acquires
AccessExclusiveLock on pg_attribute, which is the relation heap_delete
is failing to find a tuple in.  The tuple it's trying to delete was
located by means of SearchSysCacheTupleCopy().

What I now think is that we have a variant of the SI-too-late problem:
vacuum has moved the underlying tuple, but the backend trying to do
the deletion hasn't heard about it yet, because it hasn't executed
a transaction start or CommandCounterIncrement since VACUUM processed
the table.  This is bolstered by the postmaster log, which shows the
second backend dying just as VACUUM commits pg_attribute:

DEBUG:  Rel pg_type: Pages: 6 --> 2; Tuple(s) moved: 1. Elapsed 0/0 sec.
DEBUG:  Index pg_type_typname_index: Pages 5; Tuples 116: Deleted 1. Elapsed 0/0 sec.
DEBUG:  Index pg_type_oid_index: Pages 2; Tuples 116: Deleted 1. Elapsed 0/0 sec.
DEBUG:  --Relation pg_attribute--
DEBUG:  Pages 33: Changed 1, Reapped 28, Empty 0, New 0; Tup 438: Vac 1976, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 97,
MaxLen97; Re-using: Free/Avail. Space 214444/207148; EndEmpty/Avail. Pages 0/27. Elapsed 0/0 sec.
 
DEBUG:  Index pg_attribute_attrelid_index: Pages 15; Tuples 438: Deleted 1976. Elapsed 0/1 sec.
DEBUG:  Index pg_attribute_relid_attnum_index: Pages 15; Tuples 438: Deleted 1976. Elapsed 0/0 sec.
DEBUG:  Index pg_attribute_relid_attnam_index: Pages 48; Tuples 438: Deleted 1976. Elapsed 0/1 sec.
DEBUG:  Rel pg_attribute: Pages: 33 --> 6; Tuple(s) moved: 8. Elapsed 0/0 sec.
DEBUG:  Index pg_attribute_attrelid_index: Pages 15; Tuples 438: Deleted 8. Elapsed 0/0 sec.
DEBUG:  Index pg_attribute_relid_attnum_index: Pages 15; Tuples 438: Deleted 8. Elapsed 0/0 sec.
DEBUG:  Index pg_attribute_relid_attnam_index: Pages 48; Tuples 438: Deleted 8. Elapsed 0/0 sec.
TRAP: Failed Assertion("!(( lp)->lp_flags & 0x01):", File: "heapam.c", Line: 1121)

!(( lp)->lp_flags & 0x01) (0) [Not a typewriter]
DEBUG:  --Relation pg_proc--
DEBUG:  Pages 21: Changed 0, Reapped 0, Empty 0, New 0; Tup 1021: Vac 0, Keep/VTL 0/0, Crash 0, UnUsed 0, MinLen 145,
MaxLen197; Re-using: Free/Avail. Space 0/0; EndEmpty/Avail. Pages 0/0. Elapsed 0/0 sec.
 
(vacuum manages to get through a couple more tables before hearing the
"thou shalt exit" signal from the postmaster...)


It's looking to me like there may be no way to fix this in 6.5.*
short of adopting the recent 6.6 relcache/SI changes.  Specifically,
the one we need is reading SI messages after acquiring a lock, but
I doubt we can pull out just that one without the rest.

I'm not real eager to do this given the little amount of testing
those changes have had, but maybe we have no choice...
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Vacuum analyze bug CAUGHT
Следующее
От: Michael Simms
Дата:
Сообщение: serial type