Re: WIP: SP-GiST, Space-Partitioned GiST

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: WIP: SP-GiST, Space-Partitioned GiST
Дата
Msg-id 14742.1323203111@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: WIP: SP-GiST, Space-Partitioned GiST  (Oleg Bartunov <oleg@sai.msu.su>)
Ответы Re: WIP: SP-GiST, Space-Partitioned GiST  (Teodor Sigaev <teodor@sigaev.ru>)
Список pgsql-hackers
Oleg Bartunov <oleg@sai.msu.su> writes:
> There is one annoying problem under MAC OS (Linux, FreeBSD have no problem), we 
> just can't figure out how to find it, since we are not familiar with MAC OS - 
> it fails to restart after 'kill -9' backend, but only if sources were 
> compiled with -O2 option (no problem occured with -O0). Since the fail happens
> not every time, we use following script to reproduce the problem. We ask
> MAC OS guru to help us debugging this problem.

I don't think it's Mac-specific at all; it looks to me like garden
variety uninitialized data, specifically that there are paths through
doPickSplit that don't set xlrec.newPage.  The crash I'm seeing is

TRAP: FailedAssertion("!(offset <= (((PageHeader) (page))->pd_lower <= (__builtin_offsetof (PageHeaderData, pd_linp)) ?
0: ((((PageHeader) (page))->pd_lower - (__builtin_offsetof (PageHeaderData, pd_linp))) / sizeof(ItemIdData))) + 1)",
File:"spgxlog.c", Line: 81)
 

#0  0x00007fff883f982a in __kill ()
#1  0x00007fff85bdda9c in abort ()
#2  0x0000000103165a71 in ExceptionalCondition (conditionName=<value temporarily unavailable, due to optimizations>,
errorType=<valuetemporarily unavailable, due to optimizations>, fileName=<value temporarily unavailable, due to
optimizations>,lineNumber=<value temporarily unavailable, due to optimizations>) at assert.c:57
 
#3  0x0000000102eeec73 in addOrReplaceTuple (page=0x74cc <Address 0x74cc out of bounds>, tuple=0x7faa1182d64c " ",
size=88,offset=70) at spgxlog.c:81
 
#4  0x0000000102eed4bc in spgRedoPickSplit [inlined] () at /Users/tgl/pgsql/src/backend/access/spgist/spgxlog.c:504
#5  0x0000000102eed4bc in spg_redo (record=0x7fff62a5ccf0) at spgxlog.c:803
#6  0x0000000102ec4f48 in StartupXLOG () at xlog.c:6534
#7  0x0000000103054378 in StartupProcessMain () at startup.c:220
#8  0x0000000102ef4449 in AuxiliaryProcessMain (argc=2, argv=0x7fff62a60030) at bootstrap.c:414

The xlog record it's working on is

(gdb) p *(spgxlogPickSplit*)(0x7fcb20826600 + 32)
$6 = { node = {   spcNode = 1663,    dbNode = 41578,    relNode = 204800 },  nTuples = 75,  nNodes = 4,  blknoSrc =
988, nDelete = 74,  blknoInner = 929,  offnumInner = 70,  newPage = 1 '\001',  blknoParent = 929,  offnumParent = 13,
nodeI= 2,  stateSrc = {   attType_attlen = 16,    fakeTupleSize = 32,    isBuild = 1 }
 
}

Since newPage is set, addOrReplaceTuple gets called on a freshly
initialized page, and not surprisingly complains that offset 70 is
way out of range.  Maybe there's something wrong with the replay
logic, but what I'm thinking is that newPage should not have been
true here, which means that doPickSplit failed to set it correctly,
which doesn't look at all improbable.  I added a memset at the
top of doPickSplit to force the whole struct to zeroes, and so far
haven't seen the crash again.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: ecmascript 5 DATESTYLE
Следующее
От: ben hockey
Дата:
Сообщение: Re: ecmascript 5 DATESTYLE