Re: new heapcheck contrib module

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: new heapcheck contrib module
Дата
Msg-id CAH2-WznaU6HcahLV4Hg-DnhEmW8DuSdYfn3vfWXoj3Me9jq=sQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: new heapcheck contrib module  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: new heapcheck contrib module  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Wed, May 13, 2020 at 12:22 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I think this is a good summary of the problems in this area. On the
> one hand, I think it's hideous that we sanity check user input to
> death, but blindly trust the bytes on disk to the point of seg
> faulting if they're wrong. The idea that int4 + int4 has to have
> overflow checking because otherwise a user might be sad when they get
> a negative result from adding two negative numbers, while at the same
> time supposing that the same user will be unwilling to accept the
> performance hit to avoid crashing if they have a bad tuple, is quite
> suspect in my mind. The overflow checking is also expensive, but we do
> it because it's the right thing to do, and then we try to minimize the
> overhead. It is unclear to me why we shouldn't also take that approach
> with bytes that come from disk. In particular, using Assert() checks
> for such things instead of elog() is basically Assert(there is no such
> thing as a corrupted database).

I think that it depends. It's nice to be able to add an Assert()
without really having to worry about the overhead at all. I sometimes
call relatively expensive functions in assertions. For example, there
is an assert that calls _bt_compare() within _bt_check_unique() that I
added at one point -- it caught a real bug a few weeks later. You
could always be doing more.

In general we don't exactly trust the bytes blindly. I've found that
corrupting tuples in a creative way with pg_hexedit doesn't usually
result in a segfault. Sometimes we'll do things like display NULL
values when heap line pointers are corrupt, which isn't as good as an
error but is still okay. We ought to protect against Murphy, not
Machiavelli. ISTM that access method code naturally evolves towards
avoiding the most disruptive errors in the event of real world
corruption, in particular avoiding segfaulting. It's very hard to
prove that, though.

Do you recall seeing corruption resulting in segfaults in production?
I personally don't recall seeing that. If it happened, the segfaults
themselves probably wouldn't be the main concern.

> On the other hand, that problem is clearly way above this patch's pay
> grade. There's a lot of stuff all over the code base that would have
> to be changed to fix it. It can't be done as an incidental thing as
> part of this patch or any other. It's a massive effort unto itself. We
> need to somehow draw a clean line between what this patch does and
> what it does not do, such that the scope of this patch remains
> something achievable. Otherwise, we'll end up with nothing.

I can easily come up with an adversarial input that will segfault a
backend, even amcheck, but it'll be somewhat contrived. It's hard to
fool amcheck currently because it doesn't exactly trust line pointers.
But I'm sure I could get the backend to segfault amcheck if I tried.
I'd probably try to play around with varlena headers. It would require
a certain amount of craftiness.

It's not exactly clear where you draw the line here. And I don't think
that the line will be very clearly defined, in the end. It'll be
something that is subject to change over time, as new information
comes to light. I think that it's necessary to accept a certain amount
of ambiguity here.

-- 
Peter Geoghegan



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Our naming of wait events is a disaster.
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: new heapcheck contrib module