On 2020-May-13, Peter Geoghegan wrote:
> On Wed, May 13, 2020 at 3:10 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> > Hmm. I think we should (try to?) write code that avoids all crashes
> > with production builds, but not extend that to assertion failures.
>
> Assertions are only a problem at all because Mark would like to write
> tests that involve a selection of truly corrupt data. That's a new
> requirement, and one that I have my doubts about.
I agree that this (a test tool that exercises our code against
arbitrarily corrupted data pages) is not going to work as a test that
all buildfarm members run -- it seems something for specialized
buildfarm members to run, or even something that's run outside of the
buildfarm, like sqlsmith. Obviously such a tool would not be able to
run against an assertion-enabled build, and we shouldn't even try.
> I would be willing to make a larger effort to avoid crashing a
> backend, since that affects production. I might go to some effort to
> not crash with downright adversarial inputs, for example. But it seems
> inappropriate to take extreme measures just to avoid a crash with
> extremely contrived inputs that will probably never occur. My sense is
> that this is subject to sharply diminishing returns. Completely
> nailing down hard crashes from corrupt data seems like the wrong
> priority, at the very least. Pursuing that objective over other
> objectives sounds like zero-risk bias.
I think my initial approach for this would be to use a fuzzing tool that
generates data blocks semi-randomly, then uses them as Postgres data
pages somehow, and see what happens -- examine any resulting crashes and
make individual judgement calls about the fix(es) necessary to prevent
each of them. I expect that many such pages would be rejected as
corrupt by page header checks.
--
Álvaro Herrera https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services