Re: pg_upgrade and frozen xids

Поиск
Список
Период
Сортировка
От bricklen
Тема Re: pg_upgrade and frozen xids
Дата
Msg-id CAGrpgQ9apRxeCng82nd0qwD7bKtNPebT8XtTcC0NxddBgcUnNA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: pg_upgrade and frozen xids  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-admin


On Wed, Mar 7, 2018 at 12:01 PM, Peter Geoghegan <pg@bowt.ie> wrote:
I happen to know that bricklen already ran amcheck. There were errors,
but they were not consistent with a collation issue. Rather, it looked
like something was up with the storage layer -- the sibling links of a
pair of pages were not in mutual agreement.

Even if that wasn't something that I knew already, I still would not
suspect opclass misbehavior of any variety. VACUUM doesn't care about
the ordering of items on the page in the case of nbtree. And, it
performs a physical order scan there (albeit with some extra trickery
to prevent races due to concurrent splits). Index tuples that could
end up being unreachable to index scans due to opclass misbehavior
should remain reachable to VACUUM.

​What little detail I've been able to collect so far is below. All for 10.1 clusters.

From the postgres logs, for 6 different databases (across 3 geo regions, of which two were on the same hypervisor). Each one was discovered when autovacuum tried to vacuum them:

ERROR:  could not find left sibling of block 4775 in index "<some index>"
ERROR:  right sibling 13983 of block 7196 is not next child 7246 of block 5208 in index "<some index>"
ERROR:  right sibling 60252 of block 60115 is not next child 60118 of block 60113 in index "<some index>"
ERROR:  right sibling 93058 of block 93057 is not next child 93061 of block 93008 in index "<some index>"
ERROR:  right sibling 10081 of block 10079 is not next child 10084 of block 10046 in index "<some index>"
ERROR:  left link changed unexpectedly in block 13868 of index "<some index>"
ERROR:  right sibling 145 of block 92 is not next child 93 of block 3 in index "<some index>"



A strace from the hung autovac process (before we killed it):

futex(0x7f07b8f575f8, FUTEX_WAIT, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f07b8f575f8, FUTEX_WAIT, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
futex(0x7f07b8f575f8, FUTEX_WAIT, 0, NULL) = -1 EAGAIN (Resource temporarily unavailable)
...

 

В списке pgsql-admin по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: pg_upgrade and frozen xids
Следующее
От: Mark Kirkwood
Дата:
Сообщение: Re: Reliable WAL file shipping over unreliable network