Re: Reproducible GIST index corruption under concurrent updates

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Reproducible GIST index corruption under concurrent updates
Дата
Msg-id 907491b3-87c7-aadd-3fb1-be3c542b7fb7@iki.fi
обсуждение исходный текст
Ответ на Re: Reproducible GIST index corruption under concurrent updates  (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы Re: Reproducible GIST index corruption under concurrent updates  (Duncan Sands <duncan.sands@deepbluecap.com>)
Список pgsql-bugs
On 19/01/2021 23:53, Heikki Linnakangas wrote:
> On 19/01/2021 19:12, Andrey Borodin wrote:
>>> 19 янв. 2021 г., в 16:15, Duncan Sands <duncan.sands@deepbluecap.com> написал(а):
>>>
>>> Enjoy!
>>
>> Thanks!
>> It reproduces on my machine. Seems like all tuples are indexed, but some root downlinks are not adjusted properly.
Probably,after concurrent split.
 
> 
> This is reproducable down to v12. I bisected it to this commit:
> 
> commit e2e992c93145cfc0e3563fb84efd25b390a84bb9 (refs/bisect/bad)
> Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date:   Wed Jul 24 20:24:05 2019 +0300
> 
>       Refactor checks for deleted GiST pages.
> 
>       The explicit check in gistScanPage() isn't currently really
> necessary, as
>       a deleted page is always empty, so the loop would fall through without
>       doing anything, anyway. But it's a marginal optimization, and it
> gives a
>       nice place to attach a comment to explain how it works.
> 
>       Backpatch to v12, where GiST page deletion was introduced.
> 
>       Reviewed-by: Andrey Borodin
>       Discussion:
> https://www.postgresql.org/message-id/835A15A5-F1B4-4446-A711-BF48357EB602%40yandex-team.ru
> 
> I'll dig deeper tomorrow.

The culprit is this change:

> @@ -858,12 +856,13 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace,
>                                          * leaf/inner is enough to recognize split for root
>                                          */
>                                 }
> -                               else if (GistFollowRight(stack->page) ||
> -                                                stack->parent->lsn < GistPageGetNSN(stack->page))
> +                               else if ((GistFollowRight(stack->page) ||
> +                                                 stack->parent->lsn < GistPageGetNSN(stack->page)) &&
> +                                                GistPageIsDeleted(stack->page))
>                                 {
>                                         /*
> -                                        * The page was split while we momentarily unlocked the
> -                                        * page. Go back to parent.
> +                                        * The page was split or deleted while we momentarily
> +                                        * unlocked the page. Go back to parent.
>                                          */
>                                         UnlockReleaseBuffer(stack->buffer);
>                                         xlocked = false;

The comment change was correct, but the condition used &&. Should've 
been ||. There is another copy of basically the same condition earlier 
in the function that was changed correctly, but I blundered this one. Oops.

The attached patch fixes this. I also added an assertion to the 
gistplacetopage() function, to check that we never try to insert on a 
deleted page. This bug could've made that happen too, although in this 
case the problem was a concurrent split, not a deletion. I'll backpatch 
and push this tomorrow.

Many thanks for the easy reproducer script, Duncan!

- Heikki

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Reproducible GIST index corruption under concurrent updates
Следующее
От: Simon Riggs
Дата:
Сообщение: Bug in error reporting for multi-line JSON