Re: We probably need autovacuum_max_wraparound_workers
От | Josh Berkus |
---|---|
Тема | Re: We probably need autovacuum_max_wraparound_workers |
Дата | |
Msg-id | 4FECAA73.6010305@agliodbs.com обсуждение исходный текст |
Ответ на | Re: We probably need autovacuum_max_wraparound_workers (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: We probably need autovacuum_max_wraparound_workers
(Alvaro Herrera <alvherre@commandprompt.com>)
Re: We probably need autovacuum_max_wraparound_workers (Christopher Browne <cbbrowne@gmail.com>) Re: We probably need autovacuum_max_wraparound_workers (Tom Lane <tgl@sss.pgh.pa.us>) |
Список | pgsql-hackers |
Robert, Tom, Stephen, So, first, a description of the specific problem I've encountered at two sites. I'm working on another email suggesting workarounds and solutions, but that's going to take a bit longer. Observation ----------- This problem occured on two database systems which shared the following characteristics: 1) They were running with default autovacuum & vacuum settings, except that one database had 5 workers instead of 3. 2) They have large partitioned tables, in which the partitions are time-based and do not receive UPDATES after a certain date. Each partition was larger than RAM. 3) The databases are old enough, and busy enough, to have been through XID wraparound at least a couple of times. Users reported that the database system became unresponsive, which was surprising since both of these DBs had been placed on hardware which was engineered for at least 100% growth over the current database size. On investigation, we discovered the following things: a) Each database had autovacuum_max_workers (one DB 5, one DB 3) doing anti-wraparound vacuum on several partitions simultaneously. b) The I/O created by the anti-wraparound vacuum was tying up the system. c) terminating any individual autovacuum process didn't help, as it simply caused autovac to start on a different partition. So, first question was: why was autovacuum wanting to anti-wrapround vacuum dozens of tables at the same time? A quick check showed that all of these partitions had nearly identical XID ages (as in less than 100,000 transactions apart), which all had exceeded autovacuum_max_freeze_age. How did this happen? I'm still not sure. One thought is: this is an artifact of the *previous* wraparound vacuums on each database. On cold partitions with old dead rows which have been through wraparound vacuum several times, this tends to result in the cold partitions converging towards having the same relfrozenxid over time; I'm still working on the math to prove this. Alternately, it's possible that a schema change to the partitioned tables gave them all the same effective relfrozenxid at some point in the past; both databases are still in development. So there are two parts to this problem, each of which needs a different solution: 1. Databases can inadvertently get to the state where many tables need wraparound vacuuming at exactly the same time, especially if they have many "cold" data partition tables. 2. When we do hit wraparound thresholds for multiple tables, autovacuum has no hesitation about doing autovacuum_max_workers worth of wraparound vacuum simultaneously, even when that exceeds the I/O capactity of the system. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Tom LaneДата:
Сообщение: Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)
Следующее
От: Peter GeogheganДата:
Сообщение: Re: Uh, I change my mind about commit_delay + commit_siblings (sort of)