Re: Completely broken replica after PANIC: WAL contains references to invalid pages

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Completely broken replica after PANIC: WAL contains references to invalid pages
Дата
Msg-id 20130402101012.GB2415@alap2.anarazel.de
обсуждение исходный текст
Ответ на Re: Completely broken replica after PANIC: WAL contains references to invalid pages  (Simon Riggs <simon@2ndQuadrant.com>)
Ответы Re: Completely broken replica after PANIC: WAL contains references to invalid pages  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-bugs
On 2013-04-01 08:49:16 +0100, Simon Riggs wrote:
> On 30 March 2013 17:21, Andres Freund <andres@2ndquadrant.com> wrote:
>
> > So if the xid is later than latestObservedXid we extend subtrans one by
> > one. So far so good. But we initialize it in
> > ProcArrayApplyRecoveryInfo() when consistency is initially reached:
> >                              latestObservedXid = running->nextXid;
> >                              TransactionIdRetreat(latestObservedXid);
> > Before that subtrans has initially been started up with:
> >                         if (wasShutdown)
> >                                 oldestActiveXID = PrescanPreparedTransactions(&xids, &nxids);
> >                         else
> >                                 oldestActiveXID = checkPoint.oldestActiveXid;
> > ...
> >                         StartupSUBTRANS(oldestActiveXID);
> >
> > That means its only initialized up to checkPoint.oldestActiveXid. As it
> > can take some time till we reach consistency it seems rather plausible
> > that there now will be a gap in initilized pages. From
> > checkPoint.oldestActiveXid to running->nextXid if there are pages
> > inbetween.
>
> That was an old bug.
>
> StartupSUBTRANS() now explicitly fills that gap. Are you saying it
> does that incorrectly? How?

Well, no. I think StartupSUBTRANS does this correctly, but there's a gap
between the call to Startup* and the first call to ExtendSUBTRANS. The
latter is only called *after* we reached STANDBY_INITIALIZED via
ProcArrayApplyRecoveryInfo(). The problem is that we StartupSUBTRANS to
checkPoint.oldestActiveXid while we start to ExtendSUBTRANS from
running->nextXid - 1. There very well can be a gap inbetween.
The window isn't terribly big but if you use subtransactions as heavily
as Sergey seems to be it doesn't seem unlikely to hit it.

Let me come up with a testcase and patch.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Sandeep Thakkar
Дата:
Сообщение: Re: BUG #7985: Postgres Windows Installer fails with "permission denied"
Следующее
От: mohansammeta@gmail.com
Дата:
Сообщение: BUG #8027: Get generated key value while inserting in partitioned table