Re: Index Skip Scan
| От | Dmitry Dolgov | 
|---|---|
| Тема | Re: Index Skip Scan | 
| Дата | |
| Msg-id | 20200122160441.ograupmzyytde2mi@localhost обсуждение исходный текст | 
| Ответ на | RE: Index Skip Scan (Floris Van Nee <florisvannee@Optiver.com>) | 
| Ответы | Re: Index Skip Scan | 
| Список | pgsql-hackers | 
> On Wed, Jan 22, 2020 at 07:50:30AM +0000, Floris Van Nee wrote: > > Anyone please correct me if I'm wrong, but I think one case where the current patch relies on some data from the page ithas locked before it in checking this hi/lo key. I think it's possible for the following sequence to happen. Suppose wehave a very simple one leaf-page btree containing four elements: leaf page 1 = [2,4,6,8] > We do a backwards index skip scan on this and have just returned our first tuple (8). The buffer is left pinned but unlocked.Now, someone else comes in and inserts a tuple (value 5) into this page, but suppose the page happens to be full.So a page split occurs. As far as I know, a page split could happen at any random element in the page. One of the situationswe could be left with is: > Leaf page 1 = [2,4] > Leaf page 2 = [5,6,8] > However, our scan is still pointing to leaf page 1. In case if we just returned a tuple, the next action would be either check the next page for another key or search down to the tree. Maybe I'm missing something in your scenario, but the latter will land us on a required page (we do not point to any leaf here), and before the former there is a check for high/low key. Is there anything else missing? > Now that I look at the patch again, I fear there currently may also be such a dependency in the "Advance forward but readbackward"-case. It saves the offset number of a tuple in a variable, then does a _bt_search (releasing the lock and pinon the page). At this point, anything can happen to the tuples on this page - the page may be compacted by vacuum suchthat the offset number you have in your variable does not match the actual offset number of the tuple on the page anymore.Then, at the check for (nextOffset == startOffset) later, there's a possibility the offsets are different even thoughthey relate to the same tuple. Interesting point. The original idea here was to check that we're not returned to the same position after jumping, so maybe instead of offsets we can check a tuple we found.
В списке pgsql-hackers по дате отправления: