Обсуждение: Reproducible GIST index corruption under concurrent updates

Поиск

Список

Период

Сортировка

Reproducible GIST index corruption under concurrent updates

От

Duncan Sands

Дата:

19 января 2021 г., 14:15:57

To reproduce, run the attached python3 script after adjusting CONNECTION_STRING 
appropriately.  The target database should have the btree_gist extension 
installed.  If it fails due to making too many connections to the database, try 
reducing NUM_THREADS (or allowing more connections).

Example bad output (may take a minute or two):

$ ./gist_issue.py

inconsistent: 9286


Each "inconsistent" line means that looking up by "id" in the "sections" table 
returns a row but looking up by "file" doesn't, even though "id" and "file" are 
always the same:

duncan=> SELECT * FROM sections WHERE id=9286;

   id  | file |     valid

------+------+---------------

  9286 | 9286 | (,1970-01-01)

(1 row)



duncan=> SELECT * FROM sections WHERE file=9286;

  id | file | valid

----+------+-------

(0 rows)


What is going on here is that querying by "file" uses the GIST index but this 
index is corrupt:

duncan=> EXPLAIN SELECT * FROM sections WHERE file=9286;

                                         QUERY PLAN 


------------------------------------------------------------------------------------------

  Index Scan using sections_file_valid_excl on sections  (cost=0.15..2.37 rows=1 
width=18)

    Index Cond: (file = 9286)

(2 rows)



If the "gist_issue.py" script runs forever then it failed to reproduce the problem.


Reproduces with postgres 13.1 on Linux (ubuntu groovy, package 
13.1-1.pgdg20.10+1, x86-64).  I reproduced it with default database settings and 
with a tuned database, and on multiple Linux machines.

Enjoy!

Best wishes, Duncan.

Вложения

gist_issue.py

Re: Reproducible GIST index corruption under concurrent updates

От

Andrey Borodin

Дата:

19 января 2021 г., 20:12:39


> 19 янв. 2021 г., в 16:15, Duncan Sands <duncan.sands@deepbluecap.com> написал(а):
>
> Enjoy!

Thanks!
It reproduces on my machine. Seems like all tuples are indexed, but some root downlinks are not adjusted properly.
Probably,after concurrent split. 

Best regards, Andrey Borodin.

Re: Reproducible GIST index corruption under concurrent updates

От

Heikki Linnakangas

Дата:

20 января 2021 г., 00:53:56

On 19/01/2021 19:12, Andrey Borodin wrote:
>> 19 янв. 2021 г., в 16:15, Duncan Sands <duncan.sands@deepbluecap.com> написал(а):
>>
>> Enjoy!
> 
> Thanks!
> It reproduces on my machine. Seems like all tuples are indexed, but some root downlinks are not adjusted properly.
Probably,after concurrent split.
 

This is reproducable down to v12. I bisected it to this commit:

commit e2e992c93145cfc0e3563fb84efd25b390a84bb9 (refs/bisect/bad)
Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
Date:   Wed Jul 24 20:24:05 2019 +0300

     Refactor checks for deleted GiST pages.

     The explicit check in gistScanPage() isn't currently really 
necessary, as
     a deleted page is always empty, so the loop would fall through without
     doing anything, anyway. But it's a marginal optimization, and it 
gives a
     nice place to attach a comment to explain how it works.

     Backpatch to v12, where GiST page deletion was introduced.

     Reviewed-by: Andrey Borodin
     Discussion: 
https://www.postgresql.org/message-id/835A15A5-F1B4-4446-A711-BF48357EB602%40yandex-team.ru

I'll dig deeper tomorrow.

- Heikki

Re: Reproducible GIST index corruption under concurrent updates

От

Heikki Linnakangas

Дата:

20 января 2021 г., 01:31:30

On 19/01/2021 23:53, Heikki Linnakangas wrote:
> On 19/01/2021 19:12, Andrey Borodin wrote:
>>> 19 янв. 2021 г., в 16:15, Duncan Sands <duncan.sands@deepbluecap.com> написал(а):
>>>
>>> Enjoy!
>>
>> Thanks!
>> It reproduces on my machine. Seems like all tuples are indexed, but some root downlinks are not adjusted properly.
Probably,after concurrent split.
 
> 
> This is reproducable down to v12. I bisected it to this commit:
> 
> commit e2e992c93145cfc0e3563fb84efd25b390a84bb9 (refs/bisect/bad)
> Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date:   Wed Jul 24 20:24:05 2019 +0300
> 
>       Refactor checks for deleted GiST pages.
> 
>       The explicit check in gistScanPage() isn't currently really
> necessary, as
>       a deleted page is always empty, so the loop would fall through without
>       doing anything, anyway. But it's a marginal optimization, and it
> gives a
>       nice place to attach a comment to explain how it works.
> 
>       Backpatch to v12, where GiST page deletion was introduced.
> 
>       Reviewed-by: Andrey Borodin
>       Discussion:
> https://www.postgresql.org/message-id/835A15A5-F1B4-4446-A711-BF48357EB602%40yandex-team.ru
> 
> I'll dig deeper tomorrow.

The culprit is this change:

> @@ -858,12 +856,13 @@ gistdoinsert(Relation r, IndexTuple itup, Size freespace,
>                                          * leaf/inner is enough to recognize split for root
>                                          */
>                                 }
> -                               else if (GistFollowRight(stack->page) ||
> -                                                stack->parent->lsn < GistPageGetNSN(stack->page))
> +                               else if ((GistFollowRight(stack->page) ||
> +                                                 stack->parent->lsn < GistPageGetNSN(stack->page)) &&
> +                                                GistPageIsDeleted(stack->page))
>                                 {
>                                         /*
> -                                        * The page was split while we momentarily unlocked the
> -                                        * page. Go back to parent.
> +                                        * The page was split or deleted while we momentarily
> +                                        * unlocked the page. Go back to parent.
>                                          */
>                                         UnlockReleaseBuffer(stack->buffer);
>                                         xlocked = false;

The comment change was correct, but the condition used &&. Should've 
been ||. There is another copy of basically the same condition earlier 
in the function that was changed correctly, but I blundered this one. Oops.

The attached patch fixes this. I also added an assertion to the 
gistplacetopage() function, to check that we never try to insert on a 
deleted page. This bug could've made that happen too, although in this 
case the problem was a concurrent split, not a deletion. I'll backpatch 
and push this tomorrow.

Many thanks for the easy reproducer script, Duncan!

- Heikki

Вложения

fix-gist-corruption-bug.patch

Re: Reproducible GIST index corruption under concurrent updates

От

Duncan Sands

Дата:

20 января 2021 г., 11:23:14

> The comment change was correct, but the condition used &&. Should've been ||. 
> There is another copy of basically the same condition earlier in the function 
> that was changed correctly, but I blundered this one. Oops.
> 
> The attached patch fixes this. I also added an assertion to the 
> gistplacetopage() function, to check that we never try to insert on a deleted 
> page. This bug could've made that happen too, although in this case the problem 
> was a concurrent split, not a deletion. I'll backpatch and push this tomorrow.
> 
> Many thanks for the easy reproducer script, Duncan!

No problem Heikki, thanks for the quick fix.

Best wishes, Duncan.

Re: Reproducible GIST index corruption under concurrent updates

От

Heikki Linnakangas

Дата:

20 января 2021 г., 13:04:23

On 20/01/2021 10:23, Duncan Sands wrote:
>> The comment change was correct, but the condition used &&. Should've been ||.
>> There is another copy of basically the same condition earlier in the function
>> that was changed correctly, but I blundered this one. Oops.
>>
>> The attached patch fixes this. I also added an assertion to the
>> gistplacetopage() function, to check that we never try to insert on a deleted
>> page. This bug could've made that happen too, although in this case the problem
>> was a concurrent split, not a deletion. I'll backpatch and push this tomorrow.
>>
>> Many thanks for the easy reproducer script, Duncan!
> 
> No problem Heikki, thanks for the quick fix.

Pushed as commit 6b4d3046f4, and backpatched down to version 12, where 
this bug was introduced. It will appear in the next minor release.

- Heikki

Re: Reproducible GIST index corruption under concurrent updates

От

Tom Lane

Дата:

20 января 2021 г., 18:25:10

Heikki Linnakangas <hlinnaka@iki.fi> writes:
> Pushed as commit 6b4d3046f4, and backpatched down to version 12, where 
> this bug was introduced. It will appear in the next minor release.

I see you noted that REINDEX is required to repair any corrupted indexes.
For the purposes of the release notes, can we characterize which indexes
might be affected, or do we just have to say "better reindex all GIST
indexes"?

            regards, tom lane

Re: Reproducible GIST index corruption under concurrent updates

От

Heikki Linnakangas

Дата:

21 января 2021 г., 01:26:12

On 20/01/2021 17:25, Tom Lane wrote:
> Heikki Linnakangas <hlinnaka@iki.fi> writes:
>> Pushed as commit 6b4d3046f4, and backpatched down to version 12, where
>> this bug was introduced. It will appear in the next minor release.
> 
> I see you noted that REINDEX is required to repair any corrupted indexes.
> For the purposes of the release notes, can we characterize which indexes
> might be affected, or do we just have to say "better reindex all GIST
> indexes"?

Any GiST index that's been concurrently inserted might be corrupt. 
Better reindex all GiST indexes.

- Heikki

P.S. The GiST amcheck extension that's been discussed at [1] would catch 
this corruption, so if that was already committed, we could suggest 
using it. But that doesn't help us now.

[1] 
https://www.postgresql.org/message-id/CAF3eApa07-BajjG8+RYx-Dr_cq28ZA0GsZmUQrGu5b2ayRhB5A@mail.gmail.com 
would catch this corruption

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Reproducible GIST index corruption under concurrent updates

Вложения

Вложения