Re: [HACKERS] PANIC in pg_commit_ts slru after crashes

Поиск
Список
Период
Сортировка
От Michael Paquier
Тема Re: [HACKERS] PANIC in pg_commit_ts slru after crashes
Дата
Msg-id CAB7nPqQfU=ULOo7u=tnPyVFJsdDgGkdSP76TQYFcw=TYrhBgQQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] PANIC in pg_commit_ts slru after crashes  (Simon Riggs <simon@2ndquadrant.com>)
Ответы Re: [HACKERS] PANIC in pg_commit_ts slru after crashes
Список pgsql-hackers
On Sun, Apr 16, 2017 at 6:02 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> On 15 April 2017 at 21:30, Jeff Janes <jeff.janes@gmail.com> wrote:
>> On Fri, Apr 14, 2017 at 9:33 PM, Pavan Deolasee <pavan.deolasee@gmail.com>
>> wrote:
>>
>>>
>>> Since all those offsets fall on a page boundary, my guess is that we're
>>> somehow failing to handle a new page correctly.
>>>
>>> Looking at the patch itself, my feeling is that the following code in
>>> src/backend/access/transam/twophase.c might be causing the problem.
>>>
>>> 1841
>>> 1842     /* update nextXid if needed */
>>> 1843     if (TransactionIdFollowsOrEquals(maxsubxid,
>>> ShmemVariableCache->nextXid))
>>> 1844     {
>>> 1845         LWLockAcquire(XidGenLock, LW_EXCLUSIVE);
>>> 1846         ShmemVariableCache->nextXid = maxsubxid;
>>> 1847         TransactionIdAdvance(ShmemVariableCache->nextXid);
>>> 1848         LWLockRelease(XidGenLock);
>>> 1849     }
>>>
>>> The function PrescanPreparedTransactions() gets called at the start of the
>>> redo recovery and this specific block will get exercised irrespective of
>>> whether there are any prepared transactions or not. What I find particularly
>>> wrong here is that we are initialising maxsubxid to current value of
>>> ShmemVariableCache->nextXid when the function enters, but this block would
>>> then again increment ShmemVariableCache->nextXid, when there are no prepared
>>> transactions in the system.
>>>
>>> I wonder if we should do as in attached patch.
>>
>>
>> That solves it for me.
>
> Thanks for patching and testing. I'll review and probably commit tomorrow.

Actually I think that the fix proposed is not correct as we should
just never take the risk to call TransactionIdAdvance() if there are
no 2PC files to scan, and it adds more difficulty in understanding
something that the 2PC restore code has introduced and already made
harder to catch (2nd thoughts and regrets here). If you look closely
at the code, ProcessTwoPhaseBuffer() uses *result and *maxsubid only
for PrescanPreparedTransactions() so there is a link between both.
Attached is a patch to rework ProcessTwoPhaseBuffer() by removing
those two arguments and replace them by a boolean to decide if the
next cached transaction ID should be updated or not. The cached next
TXID gets updated only if caller of ProcessTwoPhaseBuffer() wants to
do so and if the subxid scanned follows the XID of the scanned 2PC
file. This looks safer to me in the long run.

Jeff, does this patch make the situation better? The fix is rather
simple as it just makes sure that the next XID never gets updated if
there are no 2PC files.
-- 
Michael

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andrew Dunstan
Дата:
Сообщение: Re: [HACKERS] Cutting initdb's runtime (Perl question embedded)
Следующее
От: Noah Misch
Дата:
Сообщение: Re: [HACKERS] tablesync patch broke the assumption that logical rep depends on?