Re: new heapcheck contrib module

Поиск
Список
Период
Сортировка
От Mark Dilger
Тема Re: new heapcheck contrib module
Дата
Msg-id 4D704A12-2546-4789-90B0-B0445BFC717B@enterprisedb.com
обсуждение исходный текст
Ответ на Re: new heapcheck contrib module  (Peter Geoghegan <pg@bowt.ie>)
Ответы Re: new heapcheck contrib module  (Peter Geoghegan <pg@bowt.ie>)
Список pgsql-hackers

> On May 13, 2020, at 3:29 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Wed, May 13, 2020 at 3:10 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
>> Hmm.  I think we should (try to?) write code that avoids all crashes
>> with production builds, but not extend that to assertion failures.
>
> Assertions are only a problem at all because Mark would like to write
> tests that involve a selection of truly corrupt data. That's a new
> requirement, and one that I have my doubts about.
>
>>> I'll stick with your example. You're calling
>>> TransactionIdDidCommit() from check_tuphdr_xids(), which will
>>> interrogate the commit log and pg_subtrans. It's just not under your
>>> control.
>>
>> in a production build this would just fail with an error that the
>> pg_xact file cannot be found, which is fine -- if this happens in a
>> production system, you're not disturbing any other sessions.  Or maybe
>> the file is there and the byte can be read, in which case you would get
>> the correct response; but that's fine too.
>
> I think that this is fine, too, since I don't consider assertion
> failures with corrupt data all that important. I'd make some effort to
> avoid it, but not too much, and not at the expense of a useful general
> purpose assertion that could catch bugs in many different contexts.

I am not removing any assertions.  I do not propose to remove any assertions. When I talk about "hardening against
assertions",that is not in any way a proposal to remove assertions from the code.  What I'm talking about is writing
theamcheck contrib module code in such a way that it only calls a function that could assert on bad data after checking
thatthe data is not bad. 

I don't know that hardening against assertions in this manner is worth doing, but this is none the less what I'm
talkingabout.  You have made decent arguments that it probably isn't worth doing for the btree checking code.  And in
anyevent, it is probably something that could be addressed in a future patch after getting this patch committed. 

There is a separate but related question in the offing about whether the backend code, independently of any amcheck
contribstuff, should be more paranoid in how it processes tuples to check for corruption.  The heap deform tuple code
inquestion is on a pretty hot code path, and I don't know that folks would accept the performance hit of more checks
beingdone in that part of the system, but that's pretty far from relevant to this patch.  That should be hashed out, or
not,at some other time on some other thread. 

> I would be willing to make a larger effort to avoid crashing a
> backend, since that affects production. I might go to some effort to
> not crash with downright adversarial inputs, for example. But it seems
> inappropriate to take extreme measures just to avoid a crash with
> extremely contrived inputs that will probably never occur.

I think this is a misrepresentation of the tests that I've been running.  There are two kinds of tests that I have
done:

First, there is the regression tests, t/004_verify_heapam.pl, which is obviously contrived.  That was included in the
regressiontest suite because it needed to be something other developers could read, verify, "yeah, I can see why that
wouldbe corruption, and would give an error message of the sort the test expects", and then could be run to verify that
indeedthat expected error message was generated. 

The second kind of corruption test I have been running is nothing more than writing random nonsense into randomly
chosenlocations within heap files and then running verify_heapam against those heap relations.  It's much more Murphy
thanMachiavelli when it's just generated by calling random().  When I initially did this kind of testing, the heapam
checkingcode had lots of problems.  Now it doesn't.  There's very little contrived about that which I can see. It's the
kindof corruption you'd expect from any number of faulty storage systems.  The one "contrived" aspect of my testing in
thisregard is that the script I use to write random nonsense to random locations in heap files is smart enough not to
writerandom junk to the page headers.  This is because if I corrupt the page headers, the backend never even gets as
faras running the verify_heapam functions, as the page cache rejects loading the page. 


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: new heapcheck contrib module
Следующее
От: Peter Geoghegan
Дата:
Сообщение: Re: new heapcheck contrib module