Re: New WAL code dumps core trivially on replay of bad data
От | Amit kapila |
---|---|
Тема | Re: New WAL code dumps core trivially on replay of bad data |
Дата | |
Msg-id | 6C0B27F7206C9E4CA54AE035729E9C38285258A8@szxeml509-mbx обсуждение исходный текст |
Ответ на | New WAL code dumps core trivially on replay of bad data (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: New WAL code dumps core trivially on replay of bad
data
|
Список | pgsql-hackers |
Tom Lane Sent: Saturday, August 18, 2012 7:16 AM > The startup process's stack trace is > #0 0x26fd1c in RecordIsValid (record=0x4008d7a0, recptr=80658424, emode=15) > at xlog.c:3713 > 3713 COMP_CRC32(crc, XLogRecGetData(record), len); > (gdb) bt > #0 0x26fd1c in RecordIsValid (record=0x4008d7a0, recptr=80658424, emode=15) > at xlog.c:3713 > #1 0x270690 in ReadRecord (RecPtr=0x7b03bad0, emode=15, > fetching_ckpt=0 '\000') at xlog.c:4006 > The current WAL address is 80658424 == 0x04cebff8, that is just 8 bytes > short of a page boundary, and what RecordIsValid thinks it is dealing > with is > so it merrily tries to compute a checksum on a gigabyte worth of data, > and soon falls off the end of memory. > In reality, inspection of the WAL file suggests that this is the end of > valid data and what should have happened is that replay just stopped. > The xl_len and so forth shown above are just garbage from off the end of > what was actually read from the file (everything beyond offset 0xcebff8 > in file 4 is in fact zeroes). > I'm not sure whether this is just a matter of having failed to > sanity-check that xl_tot_len is at least SizeOfXLogRecord, or whether > there is a deeper problem with the new design of continuation records > that makes it impossible to validate records safely. Earlier there was a check related to total length in ReadRecord, before it calls RecordIsValid() if (record->xl_tot_len< SizeOfXLogRecord + record->xl_len || record->xl_tot_len > SizeOfXLogRecord + record->xl_len+ XLR_MAX_BKP_BLOCKS * (sizeof(BkpBlock) + BLCKSZ)) I think that missing check of total length has caused this problem. However now this check will be different. With Regards, Amit Kapila.
В списке pgsql-hackers по дате отправления: