Обсуждение: room for improvement in amcheck btree checking?

Поиск
Список
Период
Сортировка

room for improvement in amcheck btree checking?

От
Mark Dilger
Дата:
Peter,

While working on the pg_amcheck code for [1], I discovered an unexpected deficiency in the way btree indexes are
checked. So far as I can tell from the docs [2], the deficiency does not violate any promises that amcheck is making,
butI found it rather surprising all the same.  To reproduce: 

1) Create a (possibly empty) table and btree index over the table.
2) Flush buffers and backup a copy of the heap relation file.
3) Load (more) data into the table.
4) Flushing buffers as needed, revert the heap relation file to the backup previously taken.
5) Run bt_index_check and bt_index_parent_check with and without heapallindexed and/or rootdescend.  Note that the
indexpasses all checks. 
6) Run a SQL query that uses a sequential scan on the table and observe no errors.
7) Run a SQL query that uses an index scan on the table and see that it errors with something like:

   ERROR:  could not read block 0 in file "base/13097/16391": read only 0 of 8192 bytes

I found it surprising that even when precisely zero of the tids in the index exist in the table the index checks all
comeback clean.  The heapallindexed check is technically running as advertised, checking that all of the zero tuples in
theheap are present in the index.  That is a pretty useless check under this condition, though.  Is a "indexallheaped"
option(by some less crazy name) needed? 

Users might also run into this problem when a heap relation file gets erroneously shortened by some number of blocks
butnot fully truncated, or perhaps with torn page writes. 

Have you already considered and rejected a "indexallheaped" type check?



Background
-------

I have up until recently been focused on corruption caused by twiddling the bits within heap and index relation pages,
butreal-world user error, file system error, and perhaps race conditions in the core postgresql code seem at least as
likelyto result in missing or incorrect versions of blocks of relation files rather than individual bytes within those
blocksbeing wrong.  Per our discussions in [3], not all corruptions that can be created under laboratory conditions are
equallylikely to occur in the wild, and it may be reasonable to only harden the amcheck code against corruptions that
aremore likely to happen in actual practice. 

To make it easier for tap tests to cover common corruption type scenarios, I have been extending PostgresNode.pm with
functionsto perform these kinds of file system corruptions.  I expect to post that work in another thread soon.  I am
notembedding any knowledge of the internal structure of heap, index, or toast relations in PostgresNode, only creating
functionsto archive versions of files and perform full or partial reversions of them later. 

The ultimate goal of this work is to have sufficient regression tests to demonstrate that pg_amcheck can be run with
defaultoptions against a system corrupted in these common ways without crashing, and with reasonable likelihood of
detectingthese common corruptions.  Users might understand that hard to detect corruption will go unnoticed, but it
wouldbe harder to explain to them why, immediately after getting a clean bill of health on their system, a query bombed
outwith the sort of error shown above. 


[1] https://commitfest.postgresql.org/31/2670/
[2] https://www.postgresql.org/docs/13/amcheck.html
[3]
https://www.postgresql.org/message-id/flat/CAH2-WznaU6HcahLV4Hg-DnhEmW8DuSdYfn3vfWXoj3Me9jq%3DsQ%40mail.gmail.com#0691475da5e9163d21b13fc415095801

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: room for improvement in amcheck btree checking?

От
Alvaro Herrera
Дата:
On 2020-Dec-01, Mark Dilger wrote:

> 7) Run a SQL query that uses an index scan on the table and see that it errors with something like:
> 
>    ERROR:  could not read block 0 in file "base/13097/16391": read only 0 of 8192 bytes
> 
> I found it surprising that even when precisely zero of the tids in the
> index exist in the table the index checks all come back clean.

Yeah, I've seen this kind of symptom in production databases (indexes
pointing to non-existant heap pages).

I think one useful cross-check that amcheck could do, is verify that if
a heap page is referenced from the index, then the heap page must exist.
Otherwise, it's a future index corruption of sorts: the old index
entries will point to the wrong new heap tuples as soon as the table
grows again.



Re: room for improvement in amcheck btree checking?

От
Peter Geoghegan
Дата:
On Tue, Dec 1, 2020 at 12:39 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> I found it surprising that even when precisely zero of the tids in the index exist in the table the index checks all
comeback clean.  The heapallindexed check is technically running as advertised, checking that all of the zero tuples in
theheap are present in the index.  That is a pretty useless check under this condition, though.  Is a "indexallheaped"
option(by some less crazy name) needed? 
>
> Users might also run into this problem when a heap relation file gets erroneously shortened by some number of blocks
butnot fully truncated, or perhaps with torn page writes. 

It would probably be possible to detect this exact condition (though
not other variants of the more general problem) relatively easily.
Perhaps by doing something with RelationOpenSmgr() with the heap
relation, while applying a little knowledge of what must be true about
the heap relation given what is known about the index relation. (I'm
not 100% sure that that's possible, but offhand it seems like it
probably is.)

See also: commit 6754fe65a4c, which tightened things up in this area
for the index relation itself.

> Have you already considered and rejected a "indexallheaped" type check?

Actually, I pointed out that we should do something along these lines
very recently:

https://postgr.es/m/CAH2-Wz=dy--FG5iJ0kPcQumS0W5g+xQED3t-7HE+UqAK_hmLTw@mail.gmail.com

I'd be happy to see you pursue it.

> Background
> -------
>
> I have up until recently been focused on corruption caused by twiddling the bits within heap and index relation
pages,but real-world user error, file system error, and perhaps race conditions in the core postgresql code seem at
leastas likely to result in missing or incorrect versions of blocks of relation files rather than individual bytes
withinthose blocks being wrong.  Per our discussions in [3], not all corruptions that can be created under laboratory
conditionsare equally likely to occur in the wild, and it may be reasonable to only harden the amcheck code against
corruptionsthat are more likely to happen in actual practice. 

Right. I also like to harden amcheck in ways that happen to be easy,
especially when it seems like it might kind of work as a broad
defense that doesn't consider any particular scenario. For example,
hardening to make sure the an index tuple's lp_len matches
IndexTupleSize() for the tuple proper (this also kind of works as
storage format documentation). I am concerned about both costs and
benefits.

FWIW, this is a perfect example of the kind of hardening that makes
sense to me. Clearly this kind of failure is a reasonably plausible
scenario (probably even a known real world scenario with catalog
corruption), and clearly we could do something about it that's pretty
simple and obviously low risk. It easily meets the standard that I
might apply here.

--
Peter Geoghegan



Re: room for improvement in amcheck btree checking?

От
Mark Dilger
Дата:

> On Dec 1, 2020, at 4:38 PM, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Tue, Dec 1, 2020 at 12:39 PM Mark Dilger
> <mark.dilger@enterprisedb.com> wrote:
>> I found it surprising that even when precisely zero of the tids in the index exist in the table the index checks all
comeback clean.  The heapallindexed check is technically running as advertised, checking that all of the zero tuples in
theheap are present in the index.  That is a pretty useless check under this condition, though.  Is a "indexallheaped"
option(by some less crazy name) needed? 
>>
>> Users might also run into this problem when a heap relation file gets erroneously shortened by some number of blocks
butnot fully truncated, or perhaps with torn page writes. 
>
> It would probably be possible to detect this exact condition (though
> not other variants of the more general problem) relatively easily.
> Perhaps by doing something with RelationOpenSmgr() with the heap
> relation, while applying a little knowledge of what must be true about
> the heap relation given what is known about the index relation. (I'm
> not 100% sure that that's possible, but offhand it seems like it
> probably is.)
>
> See also: commit 6754fe65a4c, which tightened things up in this area
> for the index relation itself.

Yes, some of the test code I've been playing with already hit the error messages introduced in that commit.

>> Have you already considered and rejected a "indexallheaped" type check?
>
> Actually, I pointed out that we should do something along these lines
> very recently:
>
> https://postgr.es/m/CAH2-Wz=dy--FG5iJ0kPcQumS0W5g+xQED3t-7HE+UqAK_hmLTw@mail.gmail.com

Right.

> I'd be happy to see you pursue it.

I'm not sure how much longer I'll be pursuing corruption checkers before switching to another task.  Certainly, I'm
interestedin pursuing this if time permits. 

>> Background
>> -------
>>
>> I have up until recently been focused on corruption caused by twiddling the bits within heap and index relation
pages,but real-world user error, file system error, and perhaps race conditions in the core postgresql code seem at
leastas likely to result in missing or incorrect versions of blocks of relation files rather than individual bytes
withinthose blocks being wrong.  Per our discussions in [3], not all corruptions that can be created under laboratory
conditionsare equally likely to occur in the wild, and it may be reasonable to only harden the amcheck code against
corruptionsthat are more likely to happen in actual practice. 
>
> Right. I also like to harden amcheck in ways that happen to be easy,
> especially when it seems like it might kind of work as a broad
> defense that doesn't consider any particular scenario. For example,
> hardening to make sure the an index tuple's lp_len matches
> IndexTupleSize() for the tuple proper (this also kind of works as
> storage format documentation). I am concerned about both costs and
> benefits.
>
> FWIW, this is a perfect example of the kind of hardening that makes
> sense to me. Clearly this kind of failure is a reasonably plausible
> scenario (probably even a known real world scenario with catalog
> corruption), and clearly we could do something about it that's pretty
> simple and obviously low risk. It easily meets the standard that I
> might apply here.

Ok, it seems we're in general agreement about that.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company