Обсуждение: Re: amcheck: support for GiST

Поиск
Список
Период
Сортировка

Re: amcheck: support for GiST

От
Japin Li
Дата:
On Sat, 10 Jan 2026 at 19:56, Kirill Reshke <reshkekirill@gmail.com> wrote:
> On Thu, 1 Jan 2026 at 17:05, Kirill Reshke <reshkekirill@gmail.com> wrote:
>>
>> CF bot was unhappy about the last version due to obvious bug, PFA new
>> version with fixes.
>>
>> The problem was "DROP TABLE toast_bug;" missing in expected regression output.
>>
>>
>> [0] https://cirrus-ci.com/task/6378051304423424
>>
>>
>> --
>> Best regards,
>> Kirill Reshke
>
>
> Attached new version with commit message polishing, and address CF
> feedback, which was unhappy due to headercheck
>

After a quick preliminary review, here are some comments.

v2026-01-10-0001
================

1.
I'm pretty sure access/heaptoast.h is not needed by verify_nbtree.c.


v2026-01-10-0002
================

1.
+    if (GistPageGetOpaque(page)->gist_page_id != GIST_PAGE_ID)
+        ereport(ERROR,
+                (errcode(ERRCODE_INDEX_CORRUPTED),
+                 errmsg("index \"%s\" has corrupted page %d",
+                        RelationGetRelationName(rel), blockNo)));
+
+    if (GistPageIsDeleted(page))
+    {
+        if (!GistPageIsLeaf(page))
+            ereport(ERROR,
+                    (errcode(ERRCODE_INDEX_CORRUPTED),
+                     errmsg("index \"%s\" has deleted internal page %d",
+                            RelationGetRelationName(rel), blockNo)));
+        if (PageGetMaxOffsetNumber(page) > InvalidOffsetNumber)
+            ereport(ERROR,
+                    (errcode(ERRCODE_INDEX_CORRUPTED),
+                     errmsg("index \"%s\" has deleted page %d with tuples",
+                            RelationGetRelationName(rel), blockNo)));

blockNo is unsigned integer, so we should use %u in the format string.

2.
+        ereport(ERROR,
+                (errcode(ERRCODE_INDEX_CORRUPTED),
+                 errmsg("index \"%s\" internal page %d became leaf",
+                        RelationGetRelationName(rel), parentblkno)));

The same goes for parentblkno — it should also use %u.

> --
> Best regards,
> Kirill Reshke
>
> [2. text/x-diff; v2026-01-10-0001-Move-normalize-tuple-logic-from-nbtcheck.patch]...
>
> [3. text/x-diff; v2026-01-10-0002-Add-gist_index_check-function-to-verify-.patch]...

--
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.



Re: amcheck: support for GiST

От
Roman Khapov
Дата:
Hi!
Thanks for your review!

We discussed offline about the patch, and I decided to make a review of it and fixes
up to your comments.


>
> v2026-01-10-0001
> ================
>
> 1.
> I'm pretty sure access/heaptoast.h is not needed by verify_nbtree.c.
>

In fact this include is necessary because of line verify_common.c:236 which
uses TOAST_INDEX_TARGET


>
> v2026-01-10-0002
> ================
>
> blockNo is unsigned integer, so we should use %u in the format string.
>
> The same goes for parentblkno — it should also use %u.
>

Yes seems like there was misusage of format symbol.. I attached the v2026-01-12 with
fixed symbol. Thanks!

--
Best regards,
Roman Khapov




Вложения

Re: amcheck: support for GiST

От
Japin Li
Дата:
Hi, Roman Khapov

On Mon, 12 Jan 2026 at 01:36, Roman Khapov <rkhapov@yandex-team.ru> wrote:
> Hi!
> Thanks for your review!
>
> We discussed offline about the patch, and I decided to make a review of it and fixes
> up to your comments.
>

Thank you for updating the patches.

>> 
>> v2026-01-10-0001
>> ================
>> 
>> 1.
>> I'm pretty sure access/heaptoast.h is not needed by verify_nbtree.c.
>> 
>
> In fact this include is necessary because of line verify_common.c:236 which
> uses TOAST_INDEX_TARGET
>

Yeah, verify_common.c does require the header, but what I meant was that
verify_nbtree.c no longer needs it.

diff --git a/contrib/amcheck/verify_nbtree.c b/contrib/amcheck/verify_nbtree.c
index c1e24338361..426e23d2960 100644
--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -23,7 +23,6 @@
  */
 #include "postgres.h"
 
-#include "access/heaptoast.h"
 #include "access/htup_details.h"
 #include "access/nbtree.h"
 #include "access/table.h"


-- 
Regards,
Japin Li
ChengDu WenWu Information Technology Co., Ltd.



Re: amcheck: support for GiST

От
Roman Khapov
Дата:
> 
> Yeah, verify_common.c does require the header, but what I meant was that
> verify_nbtree.c no longer needs it.

Oh, thanks, now I see..

Updated the patches.

--
Best regards,
Roman Khapov


Вложения

Re: amcheck: support for GiST

От
Nikolay Samokhvalov
Дата:
On Sat, Jan 31, 2026 at 2:17 PM Roman Khapov <rkhapov@yandex-team.ru> wrote:
>
> Yeah, verify_common.c does require the header, but what I meant was that
> verify_nbtree.c no longer needs it.

Oh, thanks, now I see..

Updated the patches.

I additionally tested the latest version (v2026-01-12-v2) -- and all looked good to me.

testing performed:

1. Applied cleanly; built with --enable-cassert, make -C contrib/amcheck check — all tests pass

2. Tested for 9 core GiST opclasses, on larger indexes (10M records), with both heapallindexed=false and heapallindexed=true
(10M records, old macbook m1):

Opclass          heapallindexed=false  heapallindexed=true
--------------   --------------------  -------------------
point_ops        3.9s                  5.9s
box_ops          6.3s                  9.3s
circle_ops       5.5s                  10.6s
poly_ops         5.8s                  13.7s
inet_ops         0.6s                  5.6s
range_ops        3.2s                  6.9s
multirange_ops   2.4s                  10.8s
tsvector_ops     3.8s                  13.7s
tsquery_ops      0.3s                  4.9s

3. Similarly, just to double-check, for extensions that use GiST:
- contrib modules tested: btree_gist, cube, hstore, ltree, pg_trgm, seg (tested a single opclass for each; so 8 out of 30+ opclasses)
- skipped: intarray because it was annoyingly slow to build (looks like O(n²)?)

4. Verified corruption detection works. Reproduction steps:

Setup (with data checksums disabled)

create table t as select point(random()*1000, random()*1000) c from generate_series(1,50000);
create index t_idx on t using gist(c);
select pg_relation_filepath('t_idx');  -- e.g., base/5/16398

Stop server, corrupt index, start:

pg_ctl stop -D $PGDATA
python3 -c "
f = open('$PGDATA/base/5/16398', 'r+b')
f.seek(8192 + 24)  # page 1, line pointer area
f.write(b'\xDE\xAD\xBE\xEF' * 8)
f.close()
"
pg_ctl start -D $PGDATA

And then we see an error as expected:

select gist_index_check('t_idx', false);
-- ERROR: invalid line pointer storage in index "t_idx"
-- DETAIL: Index tid=(1,20) lp_off=0, lp_len=0 lp_flags=0.

*****

Side note (general amcheck observation, not specific to this patch): the existing amcheck TAP tests for heapam include corruption injection tests (see 001_verify_heapam.pl). Similar tests for index checkers would be valuable for amcheck -- but I think it's a broader effort beyond this particular patch's scope.

The patch looks solid.

Nik