Re: BUG #13970: Vacuum hangs on particular table; cannot be terminated - requires `kill -QUIT pid`
От | Alvaro Herrera |
---|---|
Тема | Re: BUG #13970: Vacuum hangs on particular table; cannot be terminated - requires `kill -QUIT pid` |
Дата | |
Msg-id | 20160219161548.GA90757@alvherre.pgsql обсуждение исходный текст |
Ответ на | Re: BUG #13970: Vacuum hangs on particular table; cannot be terminated - requires `kill -QUIT pid` (Brian Ghidinelli <brian@vfive.com>) |
Список | pgsql-bugs |
Brian Ghidinelli wrote: > > > On Feb 18, 2016, at 12:59, Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > > > Okay, great. What would be most helpful is figuring out the pg_upgrade > > history of this server; if you have copies of the cluster just before > > the upgrade, to extract the "nextMultiXactId" value, that would be > > useful. > > Unfortunately we removed the 9.3 data dir for space reasons⦠I may > have backups from then so maybe I could spin up a docker container and > restore it but would that tell us the same thing? What kind of backups? If you mean pg_dump, then it's of no use. If you have physical backups, then yes, it could be useful, assuming they are not much older than the upgrade (one from immediately before the upgrade would be best.) > > How large is this table? We could try to scan it to look for the values > > that are causing the problem, and set oldestMxact to one that would not > > cause a problem. > > Database size is 34gb. This particular table is only 105MB. If you > account for all of the relations and indices itâs 239MB. There is one > table in the system which is 17GB that stores email campaigns and > deliveries. Email campaigns! I'm always happy to receive those. > Everything else is all pretty small-ish at 2.5gb or under. > > How do you query for oldestMxact? What I was thinking is scanning the table using pageinspect (get_raw_page + heap_page_items), grab the xmax value of the tuples that have the HEAP_XMAX_IS_MULTI bit set, looking for the informask combination that indicate's prior-to-pg_upgrade. The combination to find is that HEAP_XMAX_LOCK_ONLY (0x0080) must be set, and both HEAP_XMAX_EXCL_LOCK (0x0040) and HEAP_XMAX_KEYSHR_LOCK (0x0010) must be cleared. (See for instance the comments inside MultiXactIdGetUpdateXid in src/backend/access/heap/heapam.c, where this particular bit pattern is tested.) This might help: https://www.commandprompt.com/blogs/alvaro_herrera/2011/11/decoding_infomasks/ Add one to the newest multixact value you find with that combination of bits, then use that as the new cluster's "oldestMultiXact" to pg_resetxlog (make sure to shut down cleanly!) as in the previous post you found. In retrospect, it sounds like we should have made GetMultiXactIdMembers receive the infomask too and check for the pre-upgrade combination of bits. It would have saved some headaches. But I'm not sure it's worth doing now. > > How large is the cluster? For experimentation, it would be very useful > > if you could take a copy of it, on a server where you could recompile > > with debugging symbols. > > Is there a Docker container by chance that has symbols enabled? That > would make standing up a test environment a lot easier. Our production > infrastructure is not yet inside Docker but we run it in dev and itâs > easy to spin up and throw away. There may be one, but then I wouldn't know where to find it. I bet you can modify one --- it's just a "--enable-debug" option to Postgres' configure. -- Ãlvaro Herrera http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-bugs по дате отправления:
Предыдущее
От: andreas.papst@univie.ac.atДата:
Сообщение: BUG #13974: temp_file_limit effects vacuum