Re: VACUUM unable to accomplish because of a non-existent MultiXactId

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: VACUUM unable to accomplish because of a non-existent MultiXactId
Дата
Msg-id 20151128152044.GI4320@alvherre.pgsql
обсуждение исходный текст
Ответ на Re: VACUUM unable to accomplish because of a non-existent MultiXactId  (Kouber Saparev <kouber@gmail.com>)
Список pgsql-admin
Kouber Saparev wrote:
>
> > The state of your data is probably caused by some weird corner case of
> > the upgrade.  Can you see in the log files that the toast table has been
> > failing vacuum since the upgrade, or is it more recent than that?  (In
> > other words, is there any working vacuum after the upgrade?)
>
> We upgraded to 9.4.5 on 19 October, and there was a successful automatic vacuum over pg_toast_376621 just 3 days
later- on 22 October: 
>
> Oct 22 08:16:49 db-master postgres[10589]: [3-1] []: LOG:  automatic vacuum of table “db.pg_toast.pg_toast_376621":
indexscans: 1 
> Oct 22 08:16:49 db-master postgres[10589]: [3-2]      pages: 0 removed, 784361 remain
> Oct 22 08:16:49 db-master postgres[10589]: [3-3]      tuples: 110 removed, 3768496 remain, 0 are dead but not yet
removable
> Oct 22 08:16:49 db-master postgres[10589]: [3-4]      buffer usage: 37193 hits, 44891 misses, 32311 dirtied
> Oct 22 08:16:49 db-master postgres[10589]: [3-5]      avg read rate: 0.954 MB/s, avg write rate: 0.686 MB/s
> Oct 22 08:16:49 db-master postgres[10589]: [3-6]      system usage: CPU 1.10s/1.67u sec elapsed 367.73 sec
>
> The next automatic vacuum came 8 days later - on 30 October and failed and it is failing ever since:
>
> Oct 30 14:22:01 db-master postgres[16160]: [3-1] []: ERROR:  MultiXactId 2915905228 does no longer exist -- apparent
wraparound
> Oct 30 14:22:01 db-master postgres[16160]: [3-2] []: CONTEXT:  automatic vacuum of table
“db.pg_toast.pg_toast_376621”
>
> So I guess something happened between 22 and 30 October and there is no relation to the pg_upgrade we did on 19
October.

OK, so it wasn't pg_upgrade.  Good to know.

> > It would be useful to debug this that you attached gdb to a backend, set
> > breakpoint on errfinish, then run vacuum on that table.  Then you can
> > extract the page number from the backtrace.  With the page number we can
> > try pageinspect and heap_page_items until we find the culprit and
> > perhaps identify how it got in that state.
>
> I will try to obtain the page number, and will then send you the results, thank you.
>
> Can we somehow do it on one of our replicas (after detaching it), i.e.
> is the corrupted record propagated through the replication channel,
> and in the meantime fix the table on the master?

I suppose the corrupted data should be in any replicas as well, but I
have no way to be sure.  Also, I don't know how to get the data to a
valid state, other than dropping the table and reloading it.  I suppose
you can take a complete pg_dump of the data in that table?

If you want to reset it to a known-good state, you could just save the
pg_toast file aside (and its corresponding main table file, just in
case).  Doing forensics in a copy is better practice anyway, you know.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


В списке pgsql-admin по дате отправления:

Предыдущее
От: Häfliger Guido
Дата:
Сообщение: Re: pg_service and ldap
Следующее
От: Richard Tallent
Дата:
Сообщение: Bug Report: Left arrow in Object Browser does not collapse