Re: Recovery inconsistencies, standby much larger than primary

Поиск

Список

Период

Сортировка

От	Greg Stark
Тема	Re: Recovery inconsistencies, standby much larger than primary
Дата	31 января 2014 г. 20:28:37
Msg-id	CAM-w4HObtoH7vekEP6W5C-CCie26CDNyAXK8G3vPcVTWxZdGtw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Recovery inconsistencies, standby much larger than primary (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Recovery inconsistencies, standby much larger than primary
Список	pgsql-hackers

Дерево обсуждения

<p dir="ltr">One thing I keep coming back to is a bad ran chip setting a bit in the block number. But I just can't seem
toget it to add up. The difference is not a power of two, it had happened on two different machines, and we don't see
otherweirdness on the machine. It seems like a strange coincidence it would happen to the same variable twice and not
toother variables.<p dir="ltr">Unless there's some unrelated code writing through a wild pointer, possibly to a stack
allocatedobject that just happens to often be that variable?<p dir="ltr">-- <br /> greg<div class="gmail_quote">On 31
Jan2014 20:21, "Tom Lane" <<a href="mailto:tgl@sss.pgh.pa.us">tgl@sss.pgh.pa.us</a>> wrote:<br type="attribution"
/><blockquoteclass="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"> Greg Stark
<<ahref="mailto:stark@mit.edu">stark@mit.edu</a>> writes:<br /> > So just to summarize, this xlog record:<br
/>> [cur:EA1/637140, xid:1418089147, rmid:11(Btree), len/tot_len:18/6194,<br /> > info:8, prev:EA1/635290]
insert_leaf:s/d/r:1663/16385/1261982 tid<br /> > 3634978/282<br /> > [cur:EA1/637140, xid:1418089147,
rmid:11(Btree),len/tot_len:18/6194,<br /> > info:8, prev:EA1/635290] bkpblock[1]: s/d/r:1663/16385/1261982<br />
>blk:3634978 hole_off/len:1240/2072<br /><br /> > Appears to have been written to [ block 7141472 ]<br /><br />
I'vebeen staring at the code for a bit trying to guess how that could<br /> have happened.  Since the WAL record has a
backupblock, btree_xlog_insert<br /> would have passed control to RestoreBackupBlock, which would call<br />
XLogReadBufferExtendedwith mode RBM_ZERO, so there would be no complaint<br /> about writing past the end of the
relation. Now, you can imagine some<br /> very low-level error causing a write to go to the wrong page due to a seek<br
/>problem or some such, but it's hard to credit that that would've resulted<br /> in creation of all the intervening
segmentfiles.  Some level of our code<br /> had to have thought it was being told to extend the relation.<br /><br />
However,on closer inspection I was a bit surprised to realize that there<br /> are two possible candidates for doing
that! XLogReadBufferExtended will<br /> extend the relation, a block at a time, if told to write a page past<br /> the
currentnominal EOF.  And in md.c, _mdfd_getseg will *also* extend<br /> the relation if we're InRecovery, even though
itnormally would not do<br /> so when called from mdwrite().<br /><br /> Given the behavior in XLogReadBufferExtended,
Irather think that the<br /> InRecovery special case in _mdfd_getseg is dead code and should be<br /> removed.  But for
thepurpose at hand, it's more interesting to try to<br /> confirm which of these code levels did the extension.  I
noticethat<br /> _mdfd_getseg only bothers to write the last physical page of each segment,<br /> whereas
XLogReadBufferExtendedknows nothing of segments and will<br /> ploddingly write every page.  So on a filesystem that
supports"holes"<br /> in files, I'd expect that the added segments would be fully allocated<br /> if
XLogReadBufferExtendeddid the deed, but they'd be quite small if<br /> _mdfd_getseg did so.  The du results you started
withsuggest that the<br /> former is the case, but could you verify that the filesystem this is<br /> on supports holes
andthat du will report only the actually allocated<br /> space when there's a hole?<br /><br /> Assuming that the
extensionwas done in XLogReadBufferExtended, we are<br /> forced to the conclusion that XLogReadBufferExtended was
passeda bad<br /> block number (viz 7141472); and it's pretty hard to see how that could<br /> happen.
 RestoreBackupBlockis just passing the value it got out of the<br /> WAL record.  I thought about the idea that it was
wrongabout exactly<br /> where the BkpBlock struct was in the record, but that would presumably<br /> lead to garbage
relnodeand fork numbers not just a bad block number.<br /><br /> So I'm still baffled ...<br /><br />                  
     regards, tom lane<br /></blockquote></div>

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Recovery inconsistencies, standby much larger than primary