Re: BUG #13970: Vacuum hangs on particular table; cannot be terminated - requires `kill -QUIT pid`

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: BUG #13970: Vacuum hangs on particular table; cannot be terminated - requires `kill -QUIT pid`
Дата
Msg-id 20160219161548.GA90757@alvherre.pgsql
обсуждение исходный текст
Ответ на Re: BUG #13970: Vacuum hangs on particular table; cannot be terminated - requires `kill -QUIT pid`  (Brian Ghidinelli <brian@vfive.com>)
Список pgsql-bugs
Brian Ghidinelli wrote:
>
> > On Feb 18, 2016, at 12:59, Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> >
> > Okay, great.  What would be most helpful is figuring out the pg_upgrade
> > history of this server; if you have copies of the cluster just before
> > the upgrade, to extract the "nextMultiXactId" value, that would be
> > useful.
>
> Unfortunately we removed the 9.3 data dir for space reasons… I may
> have backups from then so maybe I could spin up a docker container and
> restore it but would that tell us the same thing?

What kind of backups?  If you mean pg_dump, then it's of no use.  If you
have physical backups, then yes, it could be useful, assuming they are
not much older than the upgrade (one from immediately before the upgrade
would be best.)

> > How large is this table?  We could try to scan it to look for the values
> > that are causing the problem, and set oldestMxact to one that would not
> > cause a problem.
>
> Database size is 34gb.  This particular table is only 105MB. If you
> account for all of the relations and indices it’s 239MB. There is one
> table in the system which is 17GB that stores email campaigns and
> deliveries.

Email campaigns!  I'm always happy to receive those.

> Everything else is all pretty small-ish at 2.5gb or under.
>
> How do you query for oldestMxact?

What I was thinking is scanning the table using pageinspect
(get_raw_page + heap_page_items), grab the xmax value of the tuples that
have the HEAP_XMAX_IS_MULTI bit set, looking for the informask
combination that indicate's prior-to-pg_upgrade.

The combination to find is that HEAP_XMAX_LOCK_ONLY (0x0080) must be
set, and both HEAP_XMAX_EXCL_LOCK (0x0040) and HEAP_XMAX_KEYSHR_LOCK
(0x0010) must be cleared.  (See for instance the comments inside
MultiXactIdGetUpdateXid in src/backend/access/heap/heapam.c, where this
particular bit pattern is tested.)

This might help:
https://www.commandprompt.com/blogs/alvaro_herrera/2011/11/decoding_infomasks/

Add one to the newest multixact value you find with that combination of
bits, then use that as the new cluster's "oldestMultiXact" to
pg_resetxlog (make sure to shut down cleanly!) as in the previous post
you found.

In retrospect, it sounds like we should have made GetMultiXactIdMembers
receive the infomask too and check for the pre-upgrade combination of
bits.  It would have saved some headaches.  But I'm not sure it's worth
doing now.

> > How large is the cluster?  For experimentation, it would be very useful
> > if you could take a copy of it, on a server where you could recompile
> > with debugging symbols.
>
> Is there a Docker container by chance that has symbols enabled? That
> would make standing up a test environment a lot easier. Our production
> infrastructure is not yet inside Docker but we run it in dev and it’s
> easy to spin up and throw away.

There may be one, but then I wouldn't know where to find it.  I bet you
can modify one --- it's just a "--enable-debug" option to Postgres'
configure.

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-bugs по дате отправления:

Предыдущее
От: andreas.papst@univie.ac.at
Дата:
Сообщение: BUG #13974: temp_file_limit effects vacuum
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: BUG #13974: temp_file_limit effects vacuum