Re: gcc 4.6 and hot standby

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	Re: gcc 4.6 and hot standby
Дата	10 июня 2011 г. 15:39:16
Msg-id	23049.1307731137@sss.pgh.pa.us обсуждение
Ответ на	Re: gcc 4.6 and hot standby (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>)
Ответы	Re: gcc 4.6 and hot standby
Список	pgsql-hackers

Дерево обсуждения

Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes:
> On 09/06/11 06:58, Alex Hunsaker wrote:
>> Yeah :-). However ill note it looks like its the default compiler for
>> fedora 15, ubuntu natty and debian sid.

> FWIW Ubuntu natty uses gcc 4.5.2, probably just as as well in the light 
> of your findings :-)

I've been able to reproduce this on released Fedora 15, and sure enough
it is a compiler bug.  The problem comes from these fragments of
ReadRecord():

ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt)
{   XLogRecPtr    tmpRecPtr = EndRecPtr;
   if (RecPtr == NULL)       RecPtr = &tmpRecPtr;
   targetRecOff = RecPtr->xrecoff % XLOG_BLCKSZ;
   if (targetRecOff == 0)       tmpRecPtr.xrecoff += pageHeaderSize;
   record = (XLogRecord *) ((char *) readBuf + RecPtr->xrecoff % XLOG_BLCKSZ);

gcc 4.6.0 is assuming that the value it first fetches for RecPtr->xrecoff
is still usable for computing the "record" pointer, despite the possible
intervening update of tmpRecPtr.  That of course leads to "record"
pointing to the wrong place, which results in an incorrect conclusion
that we're looking at an invalid record header, which leads to killing
and restarting the walreceiver.  I haven't traced what happens after
that, but apparently in standby mode we'll come back to ReadRecord with
a record pointer that's already advanced over the page header, else it'd
be an infinite loop.

Note that this means that crash recovery, not only standby mode, is
broken with this compiler.

I've filed a bug report for this:
https://bugzilla.redhat.com/show_bug.cgi?id=712480
but I wouldn't care to hold my breath while waiting for a fix to appear
upstream, let alone propagate to all the distros already using 4.6.0.
I think we need a workaround.

The obvious question to ask here is why are we updating
"tmpRecPtr.xrecoff" and not "RecPtr->xrecoff"?  The latter choice would
be a lot more readable, since the immediately surrounding code doesn't
refer to tmpRecPtr.  I think the idea is to guarantee that the caller's
referenced record pointer (if any) isn't modified, but if RecPtr is not
pointing at tmpRecPtr here, we have got big problems anyway.

So I'm tempted to propose this fix:
   if (targetRecOff == 0)   {       /*        * Can only get here in the continuing-from-prev-page case, because
*XRecOffIsValid eliminated the zero-page-offset case otherwise. Need        * to skip over the new page's header.
*/

-       tmpRecPtr.xrecoff += pageHeaderSize;
+       Assert(RecPtr == &tmpRecPtr);
+       RecPtr->xrecoff += pageHeaderSize;       targetRecOff = pageHeaderSize;   }

Another possibility, which might be less ugly, is to delete the above
code entirely and handle the page-header case in the RecPtr == NULL
branch a few lines above.

Comments?
        regards, tom lane

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: gcc 4.6 and hot standby