Inaccuracy in VACUUM's tuple count estimates
От | Tom Lane |
---|---|
Тема | Inaccuracy in VACUUM's tuple count estimates |
Дата | |
Msg-id | 10787.1402083865@sss.pgh.pa.us обсуждение исходный текст |
Ответы |
Re: Inaccuracy in VACUUM's tuple count estimates
Re: Inaccuracy in VACUUM's tuple count estimates Re: Inaccuracy in VACUUM's tuple count estimates |
Список | pgsql-hackers |
I've been looking at the complaint Tim Wilson posted in pgsql-performance about badly inaccurate reltuples updates coming from VACUUM. There seem to be a number of problems leading to that. The key point is that when VACUUM has scanned only part of the relation, it assumes that the live-tuple density in that part of the relation is num_tuples (I'm speaking of the counter accumulated in lazy_scan_heap) divided by scanned_pages, and then it tries to extrapolate that information to the rest of the relation. Now, the validity of that extrapolation is a bit questionable given that VACUUM is considering a highly nonrandom subset of the table's pages, but the real problem is the values are wrong even for the pages we did look at. To wit: * scanned_pages is not reliably the number of pages we scanned, because somebody thought it would be cute to bump it even for pages we decided didn't need to be scanned because they contain no freezable tuples. So we have an increment in scanned_pages, but no corresponding increment in the tuple count, leading to a density underestimate. This seems to only happen in vacuum-for-wraparound cases, but it's still wrong. We need to separate the logic about whether we skipped any pages from the statistical counters. * num_tuples has very little to do with the number of live tuples, because it actually counts all nonremovable tuples, including RECENTLY_DEAD, INSERT_IN_PROGRESS, and DELETE_IN_PROGRESS tuples. In the case Tim is complaining about, the VACUUM happens concurrently with a long transaction that is bulk-updating most tuples in the relation, some of them several times, so that VACUUM sees multiple images of every tuple (one INSERT_IN_PROGRESS, the rest DELETE_IN_PROGRESS), and thus arrives at a horrid overestimate of the number of live tuples. I figured it'd be easy enough to get a better estimate by adding another counter to count just LIVE and INSERT_IN_PROGRESS tuples (thus effectively assuming that in-progress inserts and deletes will both commit). I did that, and found that it helped Tim's test case not at all :-(. A bit of sleuthing revealed that HeapTupleSatisfiesVacuum actually returns INSERT_IN_PROGRESS for any tuple whose xmin isn't committed, regardless of whether the transaction has since marked it for deletion: /* * It'd be possible to discern between INSERT/DELETE in progress * here by looking atxmax - but that doesn't seem beneficial for * the majority of callers and even detrimental for some. We'd * rather have callers look at/wait for xmin than xmax. It's * always correct to return INSERT_IN_PROGRESSbecause that's * what's happening from the view of other backends. */ returnHEAPTUPLE_INSERT_IN_PROGRESS; It did not use to blow this question off: back around 8.3 you got DELETE_IN_PROGRESS if the tuple had a delete pending. I think we need less laziness + fuzzy thinking here. Maybe we should have a separate HEAPTUPLE_INSERT_AND_DELETE_IN_PROGRESS result code? Is it *really* the case that callers other than VACUUM itself are okay with failing to make this distinction? I'm dubious: there are very few if any callers that treat the INSERT and DELETE cases exactly alike. regards, tom lane
В списке pgsql-hackers по дате отправления: