Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Дата
Msg-id acbce1b9-e88e-4a33-9fa7-4728f67454f2@iki.fi
обсуждение исходный текст
Ответ на Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: Vacuum ERRORs out considering freezing dead tuples from before OldestXmin
Список pgsql-hackers
On 21/06/2024 03:02, Peter Geoghegan wrote:
> On Thu, Jun 20, 2024 at 7:42 PM Melanie Plageman
> <melanieplageman@gmail.com> wrote:
>> If vacuum fails to remove a tuple with xmax older than
>> VacuumCutoffs->OldestXmin and younger than
>> GlobalVisState->maybe_needed, it will ERROR out when determining
>> whether or not to freeze the tuple with "cannot freeze committed
>> xmax".
>>
>> In back branches starting with 14, failing to remove tuples older than
>> OldestXmin during pruning caused vacuum to infinitely loop in
>> lazy_scan_prune(), as investigated on this [1] thread.
> 
> This is a great summary.

+1

>> We can fix this by always removing tuples considered dead before
>> VacuumCutoffs->OldestXmin. This is okay even if a reconnected standby
>> has a transaction that sees that tuple as alive, because it will
>> simply wait to replay the removal until it would be correct to do so
>> or recovery conflict handling will cancel the transaction that sees
>> the tuple as alive and allow replay to continue.
> 
> I think that this is the right general approach.

+1

>> The repro forces a round of index vacuuming after the standby
>> reconnects and before pruning a dead tuple whose xmax is older than
>> OldestXmin.
>>
>> At the end of the round of index vacuuming, _bt_pendingfsm_finalize()
>> calls GetOldestNonRemovableTransactionId(), thereby updating the
>> backend's GlobalVisState and moving maybe_needed backwards.
> 
> Right. I saw details exactly consistent with this when I used GDB
> against a production instance.
> 
> I'm glad that you were able to come up with a repro that involves
> exactly the same basic elements, including index page deletion.

Would it be possible to make it robust so that we could always run it 
with "make check"? This seems like an important corner case to 
regression test.

-- 
Heikki Linnakangas
Neon (https://neon.tech)




В списке pgsql-hackers по дате отправления:

Предыдущее
От: shveta malik
Дата:
Сообщение: Re: Conflict Detection and Resolution
Следующее
От: Stepan Neretin
Дата:
Сообщение: Re: New function normal_rand_array function to contrib/tablefunc.