Re: gcc 4.6 and hot standby
От | Tom Lane |
---|---|
Тема | Re: gcc 4.6 and hot standby |
Дата | |
Msg-id | 23049.1307731137@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: gcc 4.6 and hot standby (Mark Kirkwood <mark.kirkwood@catalyst.net.nz>) |
Ответы |
Re: gcc 4.6 and hot standby
|
Список | pgsql-hackers |
Mark Kirkwood <mark.kirkwood@catalyst.net.nz> writes: > On 09/06/11 06:58, Alex Hunsaker wrote: >> Yeah :-). However ill note it looks like its the default compiler for >> fedora 15, ubuntu natty and debian sid. > FWIW Ubuntu natty uses gcc 4.5.2, probably just as as well in the light > of your findings :-) I've been able to reproduce this on released Fedora 15, and sure enough it is a compiler bug. The problem comes from these fragments of ReadRecord(): ReadRecord(XLogRecPtr *RecPtr, int emode, bool fetching_ckpt) { XLogRecPtr tmpRecPtr = EndRecPtr; if (RecPtr == NULL) RecPtr = &tmpRecPtr; targetRecOff = RecPtr->xrecoff % XLOG_BLCKSZ; if (targetRecOff == 0) tmpRecPtr.xrecoff += pageHeaderSize; record = (XLogRecord *) ((char *) readBuf + RecPtr->xrecoff % XLOG_BLCKSZ); gcc 4.6.0 is assuming that the value it first fetches for RecPtr->xrecoff is still usable for computing the "record" pointer, despite the possible intervening update of tmpRecPtr. That of course leads to "record" pointing to the wrong place, which results in an incorrect conclusion that we're looking at an invalid record header, which leads to killing and restarting the walreceiver. I haven't traced what happens after that, but apparently in standby mode we'll come back to ReadRecord with a record pointer that's already advanced over the page header, else it'd be an infinite loop. Note that this means that crash recovery, not only standby mode, is broken with this compiler. I've filed a bug report for this: https://bugzilla.redhat.com/show_bug.cgi?id=712480 but I wouldn't care to hold my breath while waiting for a fix to appear upstream, let alone propagate to all the distros already using 4.6.0. I think we need a workaround. The obvious question to ask here is why are we updating "tmpRecPtr.xrecoff" and not "RecPtr->xrecoff"? The latter choice would be a lot more readable, since the immediately surrounding code doesn't refer to tmpRecPtr. I think the idea is to guarantee that the caller's referenced record pointer (if any) isn't modified, but if RecPtr is not pointing at tmpRecPtr here, we have got big problems anyway. So I'm tempted to propose this fix: if (targetRecOff == 0) { /* * Can only get here in the continuing-from-prev-page case, because *XRecOffIsValid eliminated the zero-page-offset case otherwise. Need * to skip over the new page's header. */ - tmpRecPtr.xrecoff += pageHeaderSize; + Assert(RecPtr == &tmpRecPtr); + RecPtr->xrecoff += pageHeaderSize; targetRecOff = pageHeaderSize; } Another possibility, which might be less ugly, is to delete the above code entirely and handle the page-header case in the RecPtr == NULL branch a few lines above. Comments? regards, tom lane
В списке pgsql-hackers по дате отправления: