Re: Deleting older versions in unique indexes to avoid page splits
| От | Peter Geoghegan | 
|---|---|
| Тема | Re: Deleting older versions in unique indexes to avoid page splits | 
| Дата | |
| Msg-id | CAH2-WzmUAf=MSWxh28xV_veWiOmn-fmUPLbfhqX=t+RG6X9kbg@mail.gmail.com обсуждение исходный текст | 
| Ответ на | Re: Deleting older versions in unique indexes to avoid page splits (Peter Geoghegan <pg@bowt.ie>) | 
| Список | pgsql-hackers | 
On Fri, Oct 16, 2020 at 1:58 PM Peter Geoghegan <pg@bowt.ie> wrote: > Attached is v3, which is rebased against the master branch as of > today. No real changes, though. And now here's v4. This version adds handling of posting list tuples, which I was skipping over before. Highly contended leaf pages with posting list tuples could still sometimes get "logically unnecessary" page splits in v3, which this seems to fix (barring the most extreme cases of version churn, where the patch cannot and should not prevent page splits). Actually, posting list tuple handling was something that we had in v1 but lost in v2, because v2 changed our general strategy to focus on what is convenient to the heap/tableam, which is the most important thing by far (the cost of failing must be a low, fixed, well understood and well targeted cost). The fix was to include TIDs from posting lists, while marking them "non-promising". Only plain tuples that are duplicates are considered promising. Only promising tuples influence where we look for tuples to kill most of the time. The exception is when there is an even number of promising tuples on two heap pages, where we tiebreak on the total number of TIDs that point to the heap page from the leaf page. I seem to be able to cap the costs paid when the optimization doesn't work out to the extent that we can get away with visiting only *one* heap page before giving up. And, we now never visit more than 3 total (2 is the common case when the optimization is really effective). This may sound conservative -- because it is -- but it seems to work in practice. I may change my mind about that and decide to be less conservative, but so far all of the evidence I've seen suggests that it doesn't matter -- the heuristics seem to really work. Might as well be completely conservative. I'm *sure* that we can afford one heap access here -- we currently *always* visit at least one heap tuple in roughly the same way during each and every unique index tuple insert (not just when the page fills). Posting list TIDs are not the only kind of TIDs that are marked non-promising. We now also include TIDs from non-duplicate tuples. So we're prepared to kill any of the TIDs on the page, though we only really expect it to happen with the "promising" tuples (duplicates). But why not be open to the possibility of killing some extra TIDs in passing? We don't have to visit any extra heap pages to do so, so it's practically free. Empirical evidence shows that this happens quite often. Here's why this posting list tuple strategy is a good one: we consider posting list tuple TIDs non-promising to represent that we think that there are probably actually multiple logical rows involved, or at least to represent that they didn't work -- simple trial and error suggests that they aren't very "promising", whatever the true reason happens to be. But why not "keep an open mind" about the TIDs not each being for distinct logical rows? If it just so happens that the posting list TIDs really were multiple versions of the same logical row all along, then there is a reasonable chance that there'll be even more versions on that same heap page later on. When this happens and when we end up back on the same B-Tree leaf page to think about dedup deletion once again, it's pretty likely that we'll also naturally end up looking into the later additional versions on this same heap page from earlier. At which point we won't miss the opportunity to check the posting lists TIDs in passing. So we get to delete the posting list after all! (If you're thinking "but we probably won't get lucky like that", then consider that it doesn't have to happen on the next occasion when delete deduplication happens on the same leaf page. Just at some point in the future. This is due to the way we visit the heap pages that look most promising first. It might take a couple of rounds of dedup deletion to get back to the same heap page, but that's okay. The simple heuristics proposed in the patch seem to have some really interesting and useful consequences in practice. It's hard to quantify how important these kinds of accidents of locality will be. I haven't targeted this or that effect -- my heuristics are dead simple, and based almost entirely on keeping costs down. You can think of it as "increase the number of TIDs to increase our chances of success" if you prefer.) The life cycle of logical rows/heap tuples seems to naturally lead to these kinds of opportunities. Obviously heapam is naturally very keen on storing related versions on the same heap block already, without any input from this patch. The patch is still evolving, and the overall structure and interface certainly still needs work -- I've focussed on the algorithm so far. I could really use some feedback on high level direction, though. It's a relatively short patch, even with all of my README commentary. But...it's not an easy concept to digest. Note: I'm not really sure if it's necessary to provide specialized qsort routines for the sorting we do to lists of TIDs, etc. I did do some experimentation with that, and it's an open question. So for now I rely on the patch that Thomas Munro posted to do that a little while back, which is why that's included here. The question of whether this is truly needed is unsettled. -- Peter Geoghegan
Вложения
В списке pgsql-hackers по дате отправления: