Fixing statistics problem related to vacuum truncation termination
От | Kevin Grittner |
---|---|
Тема | Fixing statistics problem related to vacuum truncation termination |
Дата | |
Msg-id | 1366932579.39636.YahooMailNeo@web162902.mail.bf1.yahoo.com обсуждение исходный текст |
Ответы |
Re: Fixing statistics problem related to vacuum truncation termination
(Kevin Grittner <kgrittn@ymail.com>)
|
Список | pgsql-hackers |
After reviewing the threads and thinking about the various historical behaviors, I'm suggesting that we apply the attached, back-patched to 9.0, to fix the unintended changes from historical behavior related to lack of statistics generation in some cases where statistics were generated before b19e4250b45e91c9cbdd18d35ea6391ab5961c8d, and to generate them in some cases where they were historically suppressed. Prior behavior was different between a VACUUM command and autovacuum when there was sufficient empty space at the end of a table to trigger an attempt to truncate the table: * For a VACUUM command, statistics were always generated on a VACUUM ANALYZE, and truncation might or might not occur, depending on whether it was able to immediately get an AccessExclusiveLock on the table when it got to this phase of VACUUM processing. If it was able to get the lock, it would hold it for as long as the truncation took, blocking any other access to the table -- potentially for a very long time. If it was not immediately able to get the lock, it would not attempt any truncation, so truncation was not deterministic before the recent patch. * For autovacuum, if it was initially unable to acquire the AccessExclusiveLock on the table when it got to this phase no truncation was attempted and statistics were generated. If it acquired the lock and blocked any other process for the duration set by deadlock_timeout all work toward truncation was discarded and no statistics were generated. If it was able to complete the truncation, statistics were generated. So before the recent patch neither truncation nor statistics generation were deterministic. Current behavior for both the VACUUM command and autovacuum is to avoid any prolonged holding of AccessExclusiveLock on the table when there is contention for the lock, with limited retries to acquire or reacquire the lock to make incremental progress on truncation. Any progress on truncating is not lost if the lock is relinquished to allow other tasks to proceed. I'm proposing that we don't change that part of it. The problem is that current behavior is to skip statistics generation if the truncation attempt is terminated due to contention for the table lock; whereas historically that was never skipped for the VACUUM ANALYZE command, and only sometimes skipped when autovacuum was intending to analyze the table but was unable to complete the truncation. The attached will not skip the analyze step where it had historically run, and will actually allow autovacuum to run it when the truncation attempt was started but not able to complete. The old mechanism for terminating the truncation attempt (a cancel signal from the blocked process) did not allow this. The attached is along the lines of what Tom suggested was the minimal fix, and less drastic than what I was initially proposing -- which was to also restore historical behavior for the VACUUM command. After seeing how unpredictable that behavior was regarding truncation, it doesn't seem wise to complicate the code to try to go back to that. I also think that the new LOG level entry about giving up on the truncate attempt is too chatty, and we've gotten questions from users who were somewhat alarmed by it, so I toned it down. I'm still not sure that the logging is quite optimal yet, so any suggestions are welcome. The only change outside of local naming, white space, comments, messages, and moving a couple variables into a more local scope is this: - if (!vacrelstats->lock_waiter_detected) - pgstat_report_vacuum(RelationGetRelid(onerel), - onerel->rd_rel->relisshared, - new_rel_tuples); - else - vacstmt->options &= ~VACOPT_ANALYZE; + pgstat_report_vacuum(RelationGetRelid(onerel), + onerel->rd_rel->relisshared, + new_rel_tuples); ... which simply reverts this part to match older code. This is being presented for discussion; I have not finished testing it. Comments? -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
В списке pgsql-hackers по дате отправления: