RE: Index Skip Scan

Поиск
Список
Период
Сортировка
От Floris Van Nee
Тема RE: Index Skip Scan
Дата
Msg-id a713fef85da14e1fbca51d19c872d023@opammb0562.comp.optiver.com
обсуждение исходный текст
Ответ на Re: Index Skip Scan  (Dmitry Dolgov <9erthalion6@gmail.com>)
Ответы Re: Index Skip Scan
Re: Index Skip Scan
Список pgsql-hackers
>
> Could you please elaborate why does it sound like that? If I understand
> correctly, to stop a scan only "between" pages one need to use only
> _bt_readpage/_bt_steppage? Other than that there is no magic with scan
> position in the patch, so I'm not sure if I'm missing something here.

Anyone please correct me if I'm wrong, but I think one case where the current patch relies on some data from the page
ithas locked before it in checking this hi/lo key. I think it's possible for the following sequence to happen. Suppose
wehave a very simple one leaf-page btree containing four elements: leaf page 1 = [2,4,6,8] 
We do a backwards index skip scan on this and have just returned our first tuple (8). The buffer is left pinned but
unlocked.Now, someone else comes in and inserts a tuple (value 5) into this page, but suppose the page happens to be
full.So a page split occurs. As far as I know, a page split could happen at any random element in the page. One of the
situationswe could be left with is: 
Leaf page 1 = [2,4]
Leaf page 2 = [5,6,8]
However, our scan is still pointing to leaf page 1. For non-skip scans this is not a problem, as we already read all
matchingelements in our local buffer and we'll return those. But the skip scan currently: 
a) checks the lo-key of the page to see if the next prefix can be found on the leaf page 1
b) finds out that this is actually true
c) does a search on the page and returns value=4 (while it should have returned value=6)

Peter, is my understanding about the btree internals correct so far?

Now that I look at the patch again, I fear there currently may also be such a dependency in the "Advance forward but
readbackward"-case. It saves the offset number of a tuple in a variable, then does a _bt_search (releasing the lock and
pinon the page). At this point, anything can happen to the tuples on this page - the page may be compacted by vacuum
suchthat the offset number you have in your variable does not match the actual offset number of the tuple on the page
anymore.Then, at the check for (nextOffset == startOffset) later, there's a possibility the offsets are different even
thoughthey relate to the same tuple. 


-Floris




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Mahendra Singh Thalor
Дата:
Сообщение: Re: Error message inconsistency
Следующее
От: Amit Langote
Дата:
Сообщение: Re: table partitioning and access privileges