Re: Race condition within _bt_findinsertloc()? (new page split code)

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Race condition within _bt_findinsertloc()? (new page split code)
Дата
Msg-id 5384E53E.5030205@vmware.com
обсуждение исходный текст
Ответ на Re: Race condition within _bt_findinsertloc()? (new page split code)  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: Race condition within _bt_findinsertloc()? (new page split code)
Re: Race condition within _bt_findinsertloc()? (new page split code)
Список pgsql-hackers
On 05/27/2014 09:47 PM, Peter Geoghegan wrote:
> On Tue, May 27, 2014 at 4:54 AM, Heikki Linnakangas
> <hlinnakangas@vmware.com> wrote:
>> Also note that _bt_moveright() also finishes any incomplete splits it
>> encounters (when called for an insertion). So _bt_findinsertloc() never gets
>> called on a page with the incomplete-split flag set. It might encounter one
>> when it moves right, but never the first page.
>
> Fair enough, but I don't think that affects correctness either way (I
> don't think that you meant to imply that this was a necessary
> precaution that you'd taken - right?).

Right.

>> If I understood correctly, the tree looks like this before the insertion:
>>
>> Parent page:
>> +-------------+
>> |             |
>> | 9 -> A      |
>> +-------------+
>>
>> Leaf A
>> +-------------+
>> | HI-key: 9   |
>> |             |
>> | 7 8 9       |
>> +-------------+
>>
>> And after insertion and incomplete split:
>>
>> Parent page
>> +-------------+
>> |             |
>> | 9 -> A      |
>> +-------------+
>>
>> Leaf A              Leaf B
>> +--------------+     +-------------+
>> | HI-key: 8    |     | HI-key: 9   |
>> | (INCOMPLETE_ |     |             |
>> | SPLIT)       | <-> |             |
>> |              |     |             |
>> |  7   7*   8  |     |   9         |
>> +--------------+     +-------------+
>
>> After the split is finished, the tree looks like this:
>>
>> Parent page
>> +-------------+
>> | 8 -> A      |
>> | 9 -> B      |
>> +-------------+
>>
>> Leaf A              Leaf B
>> +-------------+     +-------------+
>> | HI-key: 8   |     | HI-key: 9   |
>> |             | <-> |             |
>> |  7   7*  8  |     |   9         |
>> +-------------+     +-------------+
>
> How did the parent page change between before and after the final
> atomic operation (page split completion)? What happened to "9 -> A"?

Ah, sorry, I got that wrong. The downlinks store the *low* key of the 
child page, not the high key as I depicted. Let me try again:

On 05/27/2014 09:17 AM, Peter Geoghegan wrote:
> Anyway, the concern is that there may be problems when we call
> _bt_finish_split() with that left sibling locked thoughout (i.e.
> finish a split where the right sibling is BTP_INCOMPLETE_SPLIT, and
> itself has a right sibling from the incomplete split (which is usually
> the value lock page's right-right sibling)). I'm not concerned about
> performance, since as the comment says it ought to be an infrequent
> occurrence. I also believe that there are no deadlock hazards. But
> consider this scenario:
>
> * We insert the value 7 into an int4 unique index. We need to split
> the leaf page. We run out of memory or something, and ours is an
> incomplete page split. Our transaction is aborted. For the sake of
> argument, suppose that there are also already a bunch of dead tuples
> within the index with values of 7, 8 and 9.

So, initially the tree looks like this:

Parent page:
+-------------+
|             |
| 7 -> A      |
+-------------+

Leaf A
+-------------+
| HI-key: 9   |
|             |
| 7 8 9       |
+-------------+

And after insertion and incomplete split:

Parent page
+-------------+
|             |
| 7 -> A      |
+-------------+

Leaf A              Leaf B
+--------------+     +-------------+
| HI-key: 8    |     | HI-key: 9   |
| (INCOMPLETE_ |     |             |
| SPLIT)       | <-> |             |
|              |     |             |
|  7   7*  8   |     |    9        |
+--------------+     +-------------+

where 7* is the newly inserted key, with value 7.

(you didn't mention at what point the split happens, but in the next
paragraph you said the new downlink has value 8, which implies the above 
split)

> * Another inserter of the value 7 comes along. It follows exactly the
> same downlink as the first, now aborted inserter (suppose the
> downlink's value is 9). It also locks the same leaf page to establish
> a "value lock" in precisely the same manner. Finding no room on the
> first page, it looks further right, maintaining its original "value
> lock" throughout. It finishes the first inserter's split of the
> non-value-lock page - a new downlink is inserted into the parent page,
> with the value 8. It then releases all buffer locks except the first
> one/original "value lock". A physical insertion has yet to occur.

The downlink of the original page cannot contain 9. Because, as I now 
remember ;-), the downlinks contain low-keys, not high keys.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Evan Jones
Дата:
Сообщение: PG Manual: Clarifying the repeatable read isolation example
Следующее
От: Tom Lane
Дата:
Сообщение: Why is pg_lsn marking itself a preferred type?