Re: Lowering the ever-growing heap->pd_lower
От | Peter Geoghegan |
---|---|
Тема | Re: Lowering the ever-growing heap->pd_lower |
Дата | |
Msg-id | CAH2-Wzns7Rfo_fqfjwcZW-md6weyT+pSx1n0-O+fSZg+ks-hgQ@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Lowering the ever-growing heap->pd_lower (Andres Freund <andres@anarazel.de>) |
Список | pgsql-hackers |
On Fri, Apr 8, 2022 at 2:06 PM Andres Freund <andres@anarazel.de> wrote: > It's not hard to hit scenarios where pages are effectively unusable, because > they have close to 291 dead items, without autovacuum triggering (or > autovacuum just taking a while). I think that this is mostly a problem with HOT-updates, and regular updates to a lesser degree. Deletes seem less troublesome. I find that it's useful to think in terms of the high watermark number of versions required for a given logical row over time. It's probably quite rare for most individual logical rows to truly require more than 2 or 3 versions per row at the same time, to serve queries. Even in update-heavy tables. And without doing anything fancy with the definition of HeapTupleSatisfiesVacuum(). There are important exceptions, certainly, but overall I think that we're still not doing good enough with these easier cases. The high watermark number of versions is probably going to be significantly greater than the typical number of versions for the same row. So maybe we give up on keeping a row on its original heap block today, all because of a once-off (or very rare) event where we needed slightly more extra space for only a fraction of a second. The tell-tale sign of these kinds of problems can sometimes be seen with synthetic, rate-limited benchmarks. If it takes a very long time for the problem to grow, but nothing about the workload really ever changes, then that suggests problems that have this quality. The probability of any given logical row being moved to another heap block is very low. And yet it is inevitable that many (even all) will, given enough time, given enough opportunities to get unlucky. > This has become a bit more pronounced with vacuum skipping index cleanup when > there's "just a few" dead items - if all your updates concentrate in a small > region, 2% of the whole relation size isn't actually that small. The 2% threshold was chosen based on the observation that it was below the effective threshold where autovacuum just won't ever launch anything on a moderate sized table (unless you set autovacuum_vacuum_scale_factor to something absurdly low). The real problem is that IMV. That's why I think that we need to drive it based primarily on page-level characteristics. While effectively ignoring pages that are all-visible when deciding if enough bloat is present to necessitate vacuuming. > 1) It's kind of OK for heap-only tuples to get a high OffsetNumber - we can > reclaim them during pruning once they're dead. They don't leave behind a > dead item that's unreclaimable until the next vacuum with an index cleanup > pass. I like the general direction here, but this particular idea doesn't seem like a winner. > 2) Arguably the OffsetNumber of a redirect target can be changed. It might > break careless uses of WHERE ctid = ... though (which likely are already > broken, just harder to hit). That makes perfect sense to me, though. > a) heap_page_prune_prune() should take the number of used items into account > when deciding whether to prune. Right now we trigger hot pruning based on > the number of items only if PageGetMaxOffsetNumber(page) >= > MaxHeapTuplesPerPage. But because it requires a vacuum to reclaim an ItemId > used for a root tuple, we should trigger HOT pruning when it might lower > which OffsetNumber get used. Unsure about this. > b) heap_page_prune_prune() should be triggered in more paths. E.g. when > inserting / updating, we should prune if it allows us to avoid using a high > OffsetNumber. Unsure about this too. I prototyped a design that gives individual backends soft ownership of heap blocks that were recently allocated, and later prunes the heap page when it fills [1]. Useful for aborted transactions, where it preserves locality -- leaving aborted tuples behind makes their space ultimately reused for unrelated inserts, which is bad. But eager pruning allows the inserter to leave behind more or less pristine heap pages, which don't need to be pruned later on. > c) What if we left some percentage of ItemIds unused, when looking for the > OffsetNumber of a new HOT row version? That'd make it more likely for > non-HOT updates and inserts to fit onto the page, without permanently > increasing the size of the line pointer array. That sounds promising. [1] https://postgr.es/m/CAH2-Wzm-VhVeQYTH8hLyYho2wdG8Ecrm0uPQJWjap6BOVfe9Og@mail.gmail.com -- Peter Geoghegan
В списке pgsql-hackers по дате отправления:
Следующее
От: Tom LaneДата:
Сообщение: Re: pgsql: Add TAP test for archive_cleanup_command and recovery_end_comman