Обсуждение: Re: amcheck: support for GiST
> On 30 Jun 2025, at 16:34, Andrey Borodin <x4mmm@yandex-team.ru> wrote: > > Please find attached two new steps for amcheck: > 1. A function to verify GiST integrity. This patch is in decent shape, simply rebased from previous year. > 2. Support on pg_amcheck's side for this function. This patch did not receive such review attention before. And, perhaps,should be extended to support existing GIN functions. Here's a version that adds GIN functions to pg_amcheck. IDK, maybe we should split pg_amcheck part into another thread and add there BRIN too... Thanks! Best regards, Andrey Borodin.
Вложения
Hi, Andrey!
Thank you for working on this! There is a long history of the patch, I
hope it will be committed soon!)
On Fri, Jul 11, 2025 at 3:39 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
>
>
>
> > On 30 Jun 2025, at 16:34, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> >
> > Please find attached two new steps for amcheck:
> > 1. A function to verify GiST integrity. This patch is in decent shape, simply rebased from previous year.
> > 2. Support on pg_amcheck's side for this function. This patch did not receive such review attention before. And,
perhaps,should be extended to support existing GIN functions.
>
> Here's a version that adds GIN functions to pg_amcheck.
> IDK, maybe we should split pg_amcheck part into another thread and add there BRIN too...
>
Speaking of BRIN pg_amcheck, I probably wouldn't merge it with the
gist/gin pg_amceck patchset because that would create a dependency on
the amcheck BRIN support patch, which is not clear when it will be
ready.
There are some points about the patch:
1) There are several typos in verify_gist.c:
downlinks -> downlink (header comment)
discrepencies -> discrepancies
Correctess -> Correctness
hande -> handle
Initaliaze -> Initialize
numbmer -> number
replcaed -> replaced
aquire -> aqcuire
2) Copyright year is 2023 in the patch. Time flies:)
3) There is the same check in btree and while reviewing the patch I
realised it should be added to the BRIN amcheck as well. Probably it
will be needed for GIN someday. What do you think about moving it to
verify_common?
if (IsolationUsesXactSnapshot() && rel->rd_index->indcheckxmin &&
!TransactionIdPrecedes(HeapTupleHeaderGetXmin(rel->rd_indextuple->t_data),
result->snapshot->xmin))
4) Should we check blknum of the new entry before pushing to the
stack? Probably we can check if it's a valid blknum and it's not
outside of the index. This way we can give a more detailed error
message in case we meet the wrong blknum.
in the split detection code:
ptr->blkno = GistPageGetOpaque(page)->rightlink;
and when we add children of an inner page:
ptr->blkno = ItemPointerGetBlockNumber(&(idxtuple->t_tid));
5) There is a macros for 0xffff - 'TUPLE_IS_VALID'. Maybe we can use
it to make the code more readable? Also the error message contains one
extra 'has'.
if (off != 0xffff)
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
errmsg("index \"%s\" has on page %u offset %u has
item id not pointing to 0xffff, but %hu",
RelationGetRelationName(check_state->rel),
stack->blkno, i, off)));
6) Several points about 'gistFormNormalizedTuple'. I read the previous
thread [1] and it seems there is an unfinished discussion about
normalizing during heapallindexed check. I think normalization needs
some more work here.
6a) There is a TODO
/* pfree(DatumGetPointer(old)); // TODO: this fails. Why? */
6b) AFAICS 'compress' is an optional support function. If opclass
doesn't have a 'compress' function, then 'gistCompressValues' leaves
such attributes as it is. Here we get attdata from the heap scan, and
it could be toasted. That means that these checks can result in false
positives:
gistCompressValues(giststate, r, attdata, isnull, true, compatt);
...
for (int i = 0; i < r->rd_att->natts; i++)
{
if (VARATT_IS_EXTERNAL(DatumGetPointer(compatt[i])))
ereport(ERROR,
(errcode(ERRCODE_INDEX_CORRUPTED),
....
if (VARATT_IS_COMPRESSED(DatumGetPointer(compatt[i])))
{
if (i < IndexRelationGetNumberOfKeyAttributes(r))
ereport(ERROR,
Also 'VARATT_IS_EXTERNAL' check will always result in a false
positive for toasted include attributes here. Reproducer for it:
DROP TABLE IF EXISTS tbl;
CREATE TABLE tbl(a point, t text);
-- disable compression for 't', but let it to be external
ALTER TABLE tbl ALTER COLUMN t SET STORAGE external ;
INSERT INTO tbl values (point(random(), random()), repeat('a',3000 ));
CREATE INDEX tbl_idx ON tbl using gist (a) include (t);
SELECT gist_index_check('tbl_idx', true);
So I think we need to remove these checks completely.
6c) Current code doesn't apply normalization for existing index tuples
during adding to bloom filter, which can result in false positive,
reproducer:
Here we use plain storage during index build, then during check we
have extended storage, which results in different binary
representation of the same data and we have false positive here.
DROP TABLE IF EXISTS tbl;
CREATE TABLE tbl(a tsvector);
CREATE INDEX tbl_idx ON tbl using gist (a) ;
ALTER TABLE tbl ALTER COLUMN a SET STORAGE plain;
INSERT INTO tbl values ('a' ::tsvector);
ALTER TABLE tbl ALTER COLUMN a SET STORAGE extended ;
SELECT gist_index_check('tbl_idx', true);
6d) In the end of 'gistFormNormalizedTuple' we have
ItemPointerSetOffsetNumber(&(res->t_tid), 0xffff);
I guess it follows gistFormTuple function, but here we use
gistFormNormalizedTuple only for leaf tuples and we override
offsetnumber right after 'gistFormNormalizedTuple' function call, so
looks like we can drop it.
In general I think normalization here can follow the same logic as
for verify_nbtree. We can even reuse 'bt_normalize_tuple' as a
normalization function. It handles all corner cases like short varatt,
differences in compressions etc, that we can have in gist as well. It
contains just a few lines about btree and everything else valid for
gist, so we need to modify it a bit. I think we can move it to
verify_common. Then we need to normalize every existing leaf index
tuple before adding it to the bloom filter. During the probing phase I
think we just can use 'gistFormTuple' to build an index tuple and then
normalize it before probing. What do you think?
Thank you!
[1] https://www.postgresql.org/message-id/CAAhFRxiHCWe_6AmqGWZqYEkgN_uQG3Jgw0WgPw%2B0zO3_D-q4DA%40mail.gmail.com
Best regards,
Arseniy Mukhin
Hi! Thank you for your review.
Im posting new version of 0001 patch of series
On Tue, 22 Jul 2025 at 15:47, Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:
>
> Hi, Andrey!
>
> Thank you for working on this! There is a long history of the patch, I
> hope it will be committed soon!)
>
> On Fri, Jul 11, 2025 at 3:39 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> >
> >
> >
> > > On 30 Jun 2025, at 16:34, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> > >
> > > Please find attached two new steps for amcheck:
> > > 1. A function to verify GiST integrity. This patch is in decent shape, simply rebased from previous year.
> > > 2. Support on pg_amcheck's side for this function. This patch did not receive such review attention before. And,
perhaps,should be extended to support existing GIN functions.
> >
> > Here's a version that adds GIN functions to pg_amcheck.
> > IDK, maybe we should split pg_amcheck part into another thread and add there BRIN too...
> >
>
> Speaking of BRIN pg_amcheck, I probably wouldn't merge it with the
> gist/gin pg_amceck patchset because that would create a dependency on
> the amcheck BRIN support patch, which is not clear when it will be
> ready.
>
>
> There are some points about the patch:
>
> 1) There are several typos in verify_gist.c:
>
> downlinks -> downlink (header comment)
> discrepencies -> discrepancies
> Correctess -> Correctness
> hande -> handle
> Initaliaze -> Initialize
> numbmer -> number
> replcaed -> replaced
> aquire -> aqcuire
>
> 2) Copyright year is 2023 in the patch. Time flies:)
These two are (trivially) fixed.
> 3) There is the same check in btree and while reviewing the patch I
> realised it should be added to the BRIN amcheck as well. Probably it
> will be needed for GIN someday. What do you think about moving it to
> verify_common?
>
> if (IsolationUsesXactSnapshot() && rel->rd_index->indcheckxmin &&
> !TransactionIdPrecedes(HeapTupleHeaderGetXmin(rel->rd_indextuple->t_data),
> result->snapshot->xmin))
I think this is a good idea. I'm not sure if we should bother with
refactoring in this series though...
> 4) Should we check blknum of the new entry before pushing to the
> stack? Probably we can check if it's a valid blknum and it's not
> outside of the index. This way we can give a more detailed error
> message in case we meet the wrong blknum.
>
> in the split detection code:
>
> ptr->blkno = GistPageGetOpaque(page)->rightlink;
>
> and when we add children of an inner page:
>
> ptr->blkno = ItemPointerGetBlockNumber(&(idxtuple->t_tid));
We indeed need to recheck all bytes in possibly-corrupted indexes,
including downlinks.
But amcheck can be run concurrently with Index insert, which will
change the current index size, so checking is not trivial.
And given it is not checked in already-committed nbtree & GIN amcheck
modules, I suggest not bother with that in this thread.
This can be a separate patch to verify_nbtree.
> 5) There is a macros for 0xffff - 'TUPLE_IS_VALID'. Maybe we can use
> it to make the code more readable? Also the error message contains one
> extra 'has'.
>
> if (off != 0xffff)
> ereport(ERROR,
> (errcode(ERRCODE_INDEX_CORRUPTED),
> errmsg("index \"%s\" has on page %u offset %u has
> item id not pointing to 0xffff, but %hu",
> RelationGetRelationName(check_state->rel),
> stack->blkno, i, off)));
Sure, I replaced all usages with TUPLE_IS_VALID.
> 6) Several points about 'gistFormNormalizedTuple'. I read the previous
> thread [1] and it seems there is an unfinished discussion about
> normalizing during heapallindexed check. I think normalization needs
> some more work here.
>
> 6a) There is a TODO
> /* pfree(DatumGetPointer(old)); // TODO: this fails. Why? */
>
> 6b) AFAICS 'compress' is an optional support function. If opclass
> doesn't have a 'compress' function, then 'gistCompressValues' leaves
> such attributes as it is. Here we get attdata from the heap scan, and
> it could be toasted. That means that these checks can result in false
> positives:
>
> gistCompressValues(giststate, r, attdata, isnull, true, compatt);
> ...
>
> for (int i = 0; i < r->rd_att->natts; i++)
> {
> if (VARATT_IS_EXTERNAL(DatumGetPointer(compatt[i])))
> ereport(ERROR,
> (errcode(ERRCODE_INDEX_CORRUPTED),
> ....
> if (VARATT_IS_COMPRESSED(DatumGetPointer(compatt[i])))
> {
> if (i < IndexRelationGetNumberOfKeyAttributes(r))
> ereport(ERROR,
>
> Also 'VARATT_IS_EXTERNAL' check will always result in a false
> positive for toasted include attributes here. Reproducer for it:
>
> DROP TABLE IF EXISTS tbl;
> CREATE TABLE tbl(a point, t text);
> -- disable compression for 't', but let it to be external
> ALTER TABLE tbl ALTER COLUMN t SET STORAGE external ;
> INSERT INTO tbl values (point(random(), random()), repeat('a',3000 ));
> CREATE INDEX tbl_idx ON tbl using gist (a) include (t);
> SELECT gist_index_check('tbl_idx', true);
>
> So I think we need to remove these checks completely.
>
> 6c) Current code doesn't apply normalization for existing index tuples
> during adding to bloom filter, which can result in false positive,
> reproducer:
> Here we use plain storage during index build, then during check we
> have extended storage, which results in different binary
> representation of the same data and we have false positive here.
>
> DROP TABLE IF EXISTS tbl;
> CREATE TABLE tbl(a tsvector);
> CREATE INDEX tbl_idx ON tbl using gist (a) ;
> ALTER TABLE tbl ALTER COLUMN a SET STORAGE plain;
> INSERT INTO tbl values ('a' ::tsvector);
> ALTER TABLE tbl ALTER COLUMN a SET STORAGE extended ;
> SELECT gist_index_check('tbl_idx', true);
>
> 6d) In the end of 'gistFormNormalizedTuple' we have
>
> ItemPointerSetOffsetNumber(&(res->t_tid), 0xffff);
>
> I guess it follows gistFormTuple function, but here we use
> gistFormNormalizedTuple only for leaf tuples and we override
> offsetnumber right after 'gistFormNormalizedTuple' function call, so
> looks like we can drop it.
>
>
> In general I think normalization here can follow the same logic as
> for verify_nbtree. We can even reuse 'bt_normalize_tuple' as a
> normalization function. It handles all corner cases like short varatt,
> differences in compressions etc, that we can have in gist as well. It
> contains just a few lines about btree and everything else valid for
> gist, so we need to modify it a bit. I think we can move it to
> verify_common. Then we need to normalize every existing leaf index
> tuple before adding it to the bloom filter. During the probing phase I
> think we just can use 'gistFormTuple' to build an index tuple and then
> normalize it before probing. What do you think?
>
> Thank you!
I did a little refactor in patch one, so we can reuse
bt_normalize_tuple. With these changes, your reproducers do not
complain.
I guess `gistFormNormalizedTuple` is now unneeded and its comment no
longer true. index_form_tuple in gist_tuple_present_callback ensures
all attributes are detoasted, and then we do `amcheck_normalize_tuple`
call.
I will remove gistFormNormalizedTuple function in next iteration if
this approach is OK
>
> [1] https://www.postgresql.org/message-id/CAAhFRxiHCWe_6AmqGWZqYEkgN_uQG3Jgw0WgPw%2B0zO3_D-q4DA%40mail.gmail.com
>
>
> Best regards,
> Arseniy Mukhin
>
>
--
Best regards,
Kirill Reshke
Вложения
Hi,
On Wed, Oct 22, 2025 at 9:57 PM Kirill Reshke <reshkekirill@gmail.com> wrote:
>
> Hi! Thank you for your review.
Thank you for the new version!
> Im posting new version of 0001 patch of series
>
> On Tue, 22 Jul 2025 at 15:47, Arseniy Mukhin
> <arseniy.mukhin.dev@gmail.com> wrote:
> >
> > Hi, Andrey!
> >
> > Thank you for working on this! There is a long history of the patch, I
> > hope it will be committed soon!)
> >
> > On Fri, Jul 11, 2025 at 3:39 PM Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> > >
> > >
> > >
> > > > On 30 Jun 2025, at 16:34, Andrey Borodin <x4mmm@yandex-team.ru> wrote:
> > > >
> > > > Please find attached two new steps for amcheck:
> > > > 1. A function to verify GiST integrity. This patch is in decent shape, simply rebased from previous year.
> > > > 2. Support on pg_amcheck's side for this function. This patch did not receive such review attention before.
And,perhaps, should be extended to support existing GIN functions.
> > >
> > > Here's a version that adds GIN functions to pg_amcheck.
> > > IDK, maybe we should split pg_amcheck part into another thread and add there BRIN too...
> > >
> >
> > Speaking of BRIN pg_amcheck, I probably wouldn't merge it with the
> > gist/gin pg_amceck patchset because that would create a dependency on
> > the amcheck BRIN support patch, which is not clear when it will be
> > ready.
> >
> >
> > There are some points about the patch:
> >
> > 1) There are several typos in verify_gist.c:
> >
> > downlinks -> downlink (header comment)
> > discrepencies -> discrepancies
> > Correctess -> Correctness
> > hande -> handle
> > Initaliaze -> Initialize
> > numbmer -> number
> > replcaed -> replaced
> > aquire -> aqcuire
> >
> > 2) Copyright year is 2023 in the patch. Time flies:)
>
> These two are (trivially) fixed.
>
> > 3) There is the same check in btree and while reviewing the patch I
> > realised it should be added to the BRIN amcheck as well. Probably it
> > will be needed for GIN someday. What do you think about moving it to
> > verify_common?
> >
> > if (IsolationUsesXactSnapshot() && rel->rd_index->indcheckxmin &&
> > !TransactionIdPrecedes(HeapTupleHeaderGetXmin(rel->rd_indextuple->t_data),
> > result->snapshot->xmin))
>
> I think this is a good idea. I'm not sure if we should bother with
> refactoring in this series though...
Great, so maybe we can start a separate thread for this small
refactoring. Some work in this direction has been already done in brin
amcheck thread [0].
>
> > 4) Should we check blknum of the new entry before pushing to the
> > stack? Probably we can check if it's a valid blknum and it's not
> > outside of the index. This way we can give a more detailed error
> > message in case we meet the wrong blknum.
> >
> > in the split detection code:
> >
> > ptr->blkno = GistPageGetOpaque(page)->rightlink;
> >
> > and when we add children of an inner page:
> >
> > ptr->blkno = ItemPointerGetBlockNumber(&(idxtuple->t_tid));
>
> We indeed need to recheck all bytes in possibly-corrupted indexes,
> including downlinks.
> But amcheck can be run concurrently with Index insert, which will
> change the current index size, so checking is not trivial.
> And given it is not checked in already-committed nbtree & GIN amcheck
> modules, I suggest not bother with that in this thread.
> This can be a separate patch to verify_nbtree.
>
OK.
>
> > 6) Several points about 'gistFormNormalizedTuple'. I read the previous
> > thread [1] and it seems there is an unfinished discussion about
> > normalizing during heapallindexed check. I think normalization needs
> > some more work here.
> >
> > 6a) There is a TODO
> > /* pfree(DatumGetPointer(old)); // TODO: this fails. Why? */
> >
> > 6b) AFAICS 'compress' is an optional support function. If opclass
> > doesn't have a 'compress' function, then 'gistCompressValues' leaves
> > such attributes as it is. Here we get attdata from the heap scan, and
> > it could be toasted. That means that these checks can result in false
> > positives:
> >
> > gistCompressValues(giststate, r, attdata, isnull, true, compatt);
> > ...
> >
> > for (int i = 0; i < r->rd_att->natts; i++)
> > {
> > if (VARATT_IS_EXTERNAL(DatumGetPointer(compatt[i])))
> > ereport(ERROR,
> > (errcode(ERRCODE_INDEX_CORRUPTED),
> > ....
> > if (VARATT_IS_COMPRESSED(DatumGetPointer(compatt[i])))
> > {
> > if (i < IndexRelationGetNumberOfKeyAttributes(r))
> > ereport(ERROR,
> >
> > Also 'VARATT_IS_EXTERNAL' check will always result in a false
> > positive for toasted include attributes here. Reproducer for it:
> >
> > DROP TABLE IF EXISTS tbl;
> > CREATE TABLE tbl(a point, t text);
> > -- disable compression for 't', but let it to be external
> > ALTER TABLE tbl ALTER COLUMN t SET STORAGE external ;
> > INSERT INTO tbl values (point(random(), random()), repeat('a',3000 ));
> > CREATE INDEX tbl_idx ON tbl using gist (a) include (t);
> > SELECT gist_index_check('tbl_idx', true);
> >
> > So I think we need to remove these checks completely.
> >
> > 6c) Current code doesn't apply normalization for existing index tuples
> > during adding to bloom filter, which can result in false positive,
> > reproducer:
> > Here we use plain storage during index build, then during check we
> > have extended storage, which results in different binary
> > representation of the same data and we have false positive here.
> >
> > DROP TABLE IF EXISTS tbl;
> > CREATE TABLE tbl(a tsvector);
> > CREATE INDEX tbl_idx ON tbl using gist (a) ;
> > ALTER TABLE tbl ALTER COLUMN a SET STORAGE plain;
> > INSERT INTO tbl values ('a' ::tsvector);
> > ALTER TABLE tbl ALTER COLUMN a SET STORAGE extended ;
> > SELECT gist_index_check('tbl_idx', true);
> >
> > 6d) In the end of 'gistFormNormalizedTuple' we have
> >
> > ItemPointerSetOffsetNumber(&(res->t_tid), 0xffff);
> >
> > I guess it follows gistFormTuple function, but here we use
> > gistFormNormalizedTuple only for leaf tuples and we override
> > offsetnumber right after 'gistFormNormalizedTuple' function call, so
> > looks like we can drop it.
> >
> >
> > In general I think normalization here can follow the same logic as
> > for verify_nbtree. We can even reuse 'bt_normalize_tuple' as a
> > normalization function. It handles all corner cases like short varatt,
> > differences in compressions etc, that we can have in gist as well. It
> > contains just a few lines about btree and everything else valid for
> > gist, so we need to modify it a bit. I think we can move it to
> > verify_common. Then we need to normalize every existing leaf index
> > tuple before adding it to the bloom filter. During the probing phase I
> > think we just can use 'gistFormTuple' to build an index tuple and then
> > normalize it before probing. What do you think?
> >
> > Thank you!
>
> I did a little refactor in patch one, so we can reuse
> bt_normalize_tuple. With these changes, your reproducers do not
> complain.
> I guess `gistFormNormalizedTuple` is now unneeded and its comment no
> longer true. index_form_tuple in gist_tuple_present_callback ensures
> all attributes are detoasted, and then we do `amcheck_normalize_tuple`
> call.
> I will remove gistFormNormalizedTuple function in next iteration if
> this approach is OK
>
LGTM. Only one point here: I think maybe we need to move the
bt_normalize_tuple comment as well, as now it is more about
amcheck_normalize_tuple than bt_normalize_tuple.
[0] https://www.postgresql.org/message-id/CAE7r3MKUOGJ0v5-b5fYaF6sxKZvr0J-YXHTJf8u8GUr1tTcvNg%40mail.gmail.com
Best regards,
Arseniy Mukhin