Re: Race condition within _bt_findinsertloc()? (new page split code)

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: Race condition within _bt_findinsertloc()? (new page split code)
Дата
Msg-id CAM3SWZT4bgJTVqb8XLvhxgap9zeEr2yjUTnSOb1MiZtL6uG8fg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Race condition within _bt_findinsertloc()? (new page split code)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: Race condition within _bt_findinsertloc()? (new page split code)
Список pgsql-hackers
On Tue, May 27, 2014 at 4:54 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
>> The "left sibling" referred to here is "the first page this key could
>> be on", an important concept for unique index enforcement.
>
> No, it's not "the first page this key could be on".

Well, it may be initially. I could have been more cautious about the
terminology here.

> Also note that _bt_moveright() also finishes any incomplete splits it
> encounters (when called for an insertion). So _bt_findinsertloc() never gets
> called on a page with the incomplete-split flag set. It might encounter one
> when it moves right, but never the first page.

Fair enough, but I don't think that affects correctness either way (I
don't think that you meant to imply that this was a necessary
precaution that you'd taken - right?). It's a nice property, since it
makes the extra locking while completing a split within
_bt_findinsertloc() particularly infrequent. But, that might also be a
bad thing, when considered from a different perspective.

> If I understood correctly, the tree looks like this before the insertion:
>
> Parent page:
> +-------------+
> |             |
> | 9 -> A      |
> +-------------+
>
> Leaf A
> +-------------+
> | HI-key: 9   |
> |             |
> | 7 8 9       |
> +-------------+
>
> And after insertion and incomplete split:
>
> Parent page
> +-------------+
> |             |
> | 9 -> A      |
> +-------------+
>
> Leaf A              Leaf B
> +--------------+     +-------------+
> | HI-key: 8    |     | HI-key: 9   |
> | (INCOMPLETE_ |     |             |
> | SPLIT)       | <-> |             |
> |              |     |             |
> |  7   7*   8  |     |   9         |
> +--------------+     +-------------+

> After the split is finished, the tree looks like this:
>
> Parent page
> +-------------+
> | 8 -> A      |
> | 9 -> B      |
> +-------------+
>
> Leaf A              Leaf B
> +-------------+     +-------------+
> | HI-key: 8   |     | HI-key: 9   |
> |             | <-> |             |
> |  7   7*  8  |     |   9         |
> +-------------+     +-------------+

How did the parent page change between before and after the final
atomic operation (page split completion)? What happened to "9 -> A"?

>> Regardless of whether or not I have these details exactly right (that
>> is, regardless of whether or not this scenario is strictly possible) I
>> suggest as a code-hardening measure that _bt_findinsertloc() release
>> its "value lock", upon realizing it must complete splits, and then
>> complete the split or splits known to be required. It would finally
>> report that it "couldn't find" an insertion location to
>> _bt_doinsert(), which would then retry from the start, just like when
>> _bt_check_unique() finds an inconclusive conflict.
>
> Yeah, that would work too. It seems safe enough as it is, though, so I don't
> see the point.

Well, it would be nice to not have to finish the page split in what is
a particularly critical path, with that extra buffer lock. It's not
strictly necessary, but then it is theoretically safer, and certainly
much clearer. The fact that this code is so seldom executed is one
issue that made me revisit this. On the other hand, I can see why
you'd want to avoid cluttering up the relatively comprehensible
_bt_doinsert() function if it could be avoided. I defer to you.

> PS. Thanks for looking into this again! These B-tree changes really need
> thorough review.

You're welcome. Hopefully my questions will lead you in a useful
direction, even if my concerns turn out to be, in the main, unfounded.:-)

It previously wasn't in evidence that you'd considered these
interactions, and I feel better knowing that you have.
-- 
Peter Geoghega



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [PATCH] Replacement for OSSP-UUID for Linux and BSD
Следующее
От: Evan Jones
Дата:
Сообщение: PG Manual: Clarifying the repeatable read isolation example