Обсуждение: 9.4 failure on skink in _bt_newroot/XLogCheckBuffer
The valgrind animal just reported a large object related failure on 9.4: http://www.pgbuildfarm.org/cgi-bin/show_log.pl?nm=skink&dt=2016-05-19%2006%3A23%3A05 ==9952== VALGRINDERROR-BEGIN ==9952== Conditional jump or move depends on uninitialised value(s) ==9952== at 0x4DC6D3: XLogCheckBuffer (xlog.c:2077) ==9952== by 0x4E5E52: XLogInsert (xlog.c:956) ==9952== by 0x4ACB10: _bt_newroot (nbtinsert.c:2123) ==9952== by 0x4ACEFF: _bt_insert_parent (nbtinsert.c:1727) ==9952== by 0x4AD4B7: _bt_insertonpg (nbtinsert.c:776) ==9952== by 0x4AE56F: _bt_doinsert (nbtinsert.c:191) ==9952== by 0x4B3409: btinsert (nbtree.c:251) ==9952== by 0x7A87E3: FunctionCall6Coll (fmgr.c:1437) ==9952== by 0x4A8D36: index_insert (indexam.c:226) ==9952== by 0x4FC62C: CatalogIndexInsert (indexing.c:136) ==9952== by 0x6A7210: inv_write (inv_api.c:723) ==9952== by 0x5E2985: lo_write (be-fsstubs.c:223) ==9952== Uninitialised value was created by a stack allocation ==9952== at 0x4AC481: _bt_newroot (nbtinsert.c:1989) ==9952== ==9952== VALGRINDERROR-END I've not analyzed the problem beyond noticing that xlog.c:2077 if (rdata->buffer_std) which suggests an actual bug. Regards, Andres
Andres Freund <andres@anarazel.de> writes: > The valgrind animal just reported a large object related failure on 9.4: The proximate cause seems to be that _bt_newroot isn't bothering to fill the buffer_std field here: /* Make a full-page image of the left child if needed */ rdata[2].data = NULL; rdata[2].len = 0; rdata[2].buffer= lbuf; rdata[2].next = NULL; which is indeed an actual bug, but the only consequence would be poor compression of the full-page image (if the value chanced to be zero), so it's not much of a problem. What remains unclear is how come this only fails once in a blue moon. Seems like any valgrind run of the regression tests should have caught it. regards, tom lane
Hi tom, On 2016-05-21 17:18:14 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > The valgrind animal just reported a large object related failure on 9.4: > > The proximate cause seems to be that _bt_newroot isn't bothering to > fill the buffer_std field here: > > /* Make a full-page image of the left child if needed */ > rdata[2].data = NULL; > rdata[2].len = 0; > rdata[2].buffer = lbuf; > rdata[2].next = NULL; > > which is indeed an actual bug, but the only consequence would be poor > compression of the full-page image (if the value chanced to be zero), > so it's not much of a problem. Thanks for fixing that one! > What remains unclear is how come this only fails once in a blue moon. > Seems like any valgrind run of the regression tests should have caught it. Looks like a timing issue. The relevant access to the uninitialized buffer_std field only happens whenif (*lsn <= RedoRecPtr){ which presumably is not that likely to be hit. Even under valgrind the individual tests are likely to finish below a checkpoint timeout. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2016-05-21 17:18:14 -0400, Tom Lane wrote: >> What remains unclear is how come this only fails once in a blue moon. >> Seems like any valgrind run of the regression tests should have caught it. > Looks like a timing issue. Yeah, I came to the same conclusion after awhile. regards, tom lane