Re: mdnblocks() sabotages error checking in _mdfd_getseg()

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: mdnblocks() sabotages error checking in _mdfd_getseg()
Дата
Msg-id CA+Tgmob6odyOqOq0MixwBkSV2JLb_uAi3czStMGjbx3i9jVKBw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: mdnblocks() sabotages error checking in _mdfd_getseg()  (Andres Freund <andres@anarazel.de>)
Ответы Re: mdnblocks() sabotages error checking in _mdfd_getseg()  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
On Tue, Dec 15, 2015 at 9:51 AM, Andres Freund <andres@anarazel.de> wrote:
> On 2015-12-09 16:50:06 -0500, Robert Haas wrote:
>> Now, if subsequent to this an index scan happens to sweep through and
>> try to fetch a block in 123456.2, it will work!  This happens because
>> _mdfd_getseg() doesn't care about the length of the segments; it only
>> cares whether or not they exist.
>
> Unless in recovery in the startup process, or when EXTENSION_CREATE is
> passed to it. Depending on whether it or mdnblocks were called first,
> and depending on which segment is missing.  In that case it'd *possibly*
> pad the last block in a sgment with zeroes, Yes, only the last block:

Yes, that's clearly inadequate.  I think the fact that it only pads
the last block, possibly creating a sparse file, should also be fixed,
but as a separate patch.

>> If 123456.1 were actually missing,
>> then we'd never test whether 123456.2 exists and we'd get an error.
>> But because mdnblocks() created 123456.1, _mdfd_getseg() is now quite
>> happy to fetch blocks in 123456.2; it considers the empty 123456.1
>> file to be a sufficient reason to look for 123456.2, and seeing that
>> file, and finding the block it wants therein, it happily returns that
>> block, blithely ignoring the fact that it passed over a completely .1
>> segment before returning a block from .2.  This is maybe not the
>> smartest thing ever.
>
> I think it might be worthwhile to add a check against this. It'd not be
> very hard to ensure the segment is indeed large enough iff further
> segments exist. Possibly not an ERROR, but a WARNING might be useful. To
> avoid overhead and absurd log spamming we should make that check only if
> the the segment isn't already open.

We could:

1. Before extending a relation to RELSEG_SIZE, verify that the next
segment of the relation doesn't already exist on disk.  If we find out
that it does, then throw an error and refuse the extension.

2. Teach _mdfd_getseg() that a segment of a size less than RELSEG_SIZE
is, by definition, the last segment, and don't even check whether
further files exist on disk.  mdnblocks() already behaves this way, so
it would just be making things consistent.

If we did these things, then if you got a hole in your relation, the
entire system would agree that the data after the hole doesn't exist
any more, and if the segment prior to the hole got re-extended, it
would be prevented from "bridging" the hole.  We wouldn't be incurring
any additional overhead, either, because I bet #2 would save more
cycles than #1 costs.

However, I don't think this is exactly what you are proposing.  I'm
skeptical of the idea that _mdfd_getseg() should probe ahead to see
whether we're dealing with a malformed relation where the intermediate
segments still exist but have zero length.  If we fix mdnblocks() as
I've proposed, then that scenario will become much less probable.  Of
course, the operating system could still decide to truncate a segment
other than the last for some reason or other, but it could also decide
to remove the file completely, or to remove several files, or lots of
other things.

>> The comment in mdnblocks.c says this:
>>
>>                          * Because we pass O_CREAT, we will create the
>> next segment (with
>>                          * zero length) immediately, if the last
>> segment is of length
>>                          * RELSEG_SIZE.  While perhaps not strictly
>> necessary, this keeps
>>                          * the logic simple.
>>
>> I don't really see how this "keeps the logic simple".  What it does is
>> allow sequential scans and index scans to have two completely
>> different notions of how many accessible blocks there are in the
>> relation.  Granted, this kind of thing really shouldn't happen, but
>> sometimes bad things do happen.  Therefore, I propose the attached
>> patch.
>
> I'm in favor of changing this behaviour - it's indeed weird and somewhat
> error prone.
>
> Do you want to change this in master, or backpatch a change? To me the
> code in md.c is subtle and complicated enough that I'm rather inclined
> to only change this in master.

Yeah, that's my inclination as well.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: mdnblocks() sabotages error checking in _mdfd_getseg()
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Fixing warnings in back branches?