On Sat, Feb 11, 2017 at 6:35 PM, Amit Kapila <amit.kapila16@gmail.com> wrote:
>>>> Why can't we rely on _bt_walk_left?
>>> The reason is mentioned in comments, but let me try to explain with
>>> some example. When you reach that point of code, it means that either
>>> the current page (assume page number is 10) doesn't contain any
>>> matching items or it is a half-dead page, both of which indicates that
>>> we have to move to the previous page. Now, before checking if the
>>> current page contains matching items, we signal parallel machinery
>>> (via _bt_parallel_release) to allow workers to read the previous page
>>> (assume previous page number is 9). So it is quite possible that
>>> after deciding that current page (page number 10) doesn't contain any
>>> matching tuples if we directly move to the previous page (in this case
>>> it will be 9) by using _bt_walk_left, some other worker would have
>>> read page 9. In short, if we directly use _bt_walk_left(), then we
>>> are prone to returning some of the values twice as multiple workers
>>> can read the same page.
>> But ... the entire point of the seize-and-release stuff is to avoid
>> this problem. You're suppose to seize the scan, read the current
>> page, walk left, store the page you find in the scan, and then release
>> the scan.
> Exactly and that is what is done in the patch. Basically, if we found
> that the current page is half-dead or it doesn't contain any matching
> items, then release the current buffer, seize the scan, read the
> current page, walk left and so on. I am slightly confused here
> because it seems both of us agree on what is the right thing to do and
> according to me that is how it is implemented. Are you just ensuring
> about whether I have implemented as discussed or do you see a problem
> with the way it is implemented?
Well, before, I thought you said that relying entirely on
_bt_walk_left couldn't work because then two people might end up
running it at the same time, and that would cause problems. But if
you can only run _bt_walk_left while you've got the scan seized, then
that can't happen. Evidently I'm missing something here.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company