Re: Failure in contrib test _int on loach

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Failure in contrib test _int on loach
Дата
Msg-id fb0bd75c-d925-a9f5-75cc-6b387bb31ebc@iki.fi
обсуждение исходный текст
Ответ на Re: Failure in contrib test _int on loach  (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
Ответы Re: Failure in contrib test _int on loach  (Andrey Lepikhov <a.lepikhov@postgrespro.ru>)
Список pgsql-hackers
On 11/04/2019 09:10, Andrey Lepikhov wrote:
> On 10/04/2019 20:25, Heikki Linnakangas wrote:
>> On 09/04/2019 19:11, Anastasia Lubennikova wrote:
>>> After introducing GistBuildNSN this code path became unreachable.
>>> To fix it, I added new flag to detect such splits during indexbuild.
>>
>> Isn't it possible that the grandparent page is also split, so that we'd
>> need to climb further up?
>
> Based on Anastasia's idea i prepare alternative solution to fix the bug
> (see attachment).
> It utilizes the idea of linear increment of LSN/NSN. WAL write process
> is used for change NSN value to 1 for each block of index relation.
> I hope this can be a fairly clear and safe solution.

That's basically the same idea as always using the "fake LSN" during 
index build, like the original version of this patch did. It's got the 
problem that I mentioned at 
https://www.postgresql.org/message-id/090fb3cb-1ca4-e173-ecf7-47d41ebac620@iki.fi:

> * Using "fake" unlogged LSNs for GiST index build seemed fishy. I could 
> not convince myself that it was safe in all corner cases. In a recently 
> initdb'd cluster, it's theoretically possible that the fake LSN counter 
> overtakes the real LSN value, and that could lead to strange behavior. 
> For example, how would the buffer manager behave, if there was a dirty 
> page in the buffer cache with an LSN value that's greater than the 
> current WAL flush pointer? I think you'd get "ERROR: xlog flush request 
> %X/%X is not satisfied --- flushed only to %X/%X".

Perhaps the risk is theoretical; the real WAL begins at XLOG_SEG_SIZE, 
so with defaults WAL segment size, the index build would have to do 
about 16 million page splits. The index would have to be at least 150 GB 
for that. But it seems possible, and with non-default segment and page 
size settings more so.

Perhaps we could start at 1, but instead of using a global counter, 
whenever a page is split, we take the parent's LSN value and increment 
it by one. So different branches of the tree could use the same values, 
which would reduce the consumption of the counter values.

Yet another idea would be to start the counter at 1, but check that it 
doesn't overtake the WAL insert pointer. If it's about to overtake it, 
just generate some dummy WAL.

But it seems best to deal with this in gistdoinsert(). I think 
Anastasia's approach of adding a flag to GISTInsertStack can be made to 
work, if we set the flag somewhere in gistinserttuples() or 
gistplacetopage(), whenever a page is split. That way, if it needs to 
split multiple levels, the flag is set on all of the corresponding 
GISTInsertStack entries.

Yet another trivial fix would be just always start the tree descend from 
the root in gistdoinsert(), if a page is split. Not as efficient, but 
probably negligible in practice.

- Heikki



В списке pgsql-hackers по дате отправления:

Предыдущее
От: tushar
Дата:
Сообщение: Re: Minimal logical decoding on standbys
Следующее
От: Magnus Hagander
Дата:
Сообщение: Re: pg_rewind vs superuser