Обсуждение: Bug in amcheck?

Поиск
Список
Период
Сортировка

Bug in amcheck?

От
Konstantin Knizhnik
Дата:
Hi hackers.

We see the following error reported by amcheck (I have added dump of 
opaque) when it interleaves with autovacuum and cancel pt:


ERROR:  mismatch between parent key and child high key in index 
"pg_attribute_relid_attnam_index"
DETAIL:  Target block=274, target opaque->flags=0, child block=427, 
child opaque=11, target page lsn=1/484A8FC8.
CONTEXT:  SQL statement "SELECT bt_index_parent_check(indexrelid, true, 
true) from pg_index"

So child has BTP_HALF_DEAD bit set.
Autovacuum is interrupted in this place in _bt_pagedel:

         /*
          * Check here, as calling loops will have locks held, preventing
          * interrupts from being processed.
          */
         CHECK_FOR_INTERRUPTS();

Reproducing it is not so easy.
First of all I added sleep here:

         /*
          * Check here, as calling loops will have locks held, preventing
          * interrupts from being processed.
          */
         pg_usleep(10000);
         CHECK_FOR_INTERRUPTS();

Then I create two procedures:

create or replace procedure create_tables(tables integer, partitions 
integer) as $$
declare
     i integer;
     j integer;
begin
     for i in 1..tables
     loop
         execute 'DROP TABLE IF EXISTS t_' || i;
         execute 'CREATE TABLE t_' || i || '(pk integer) partition by 
range (pk)';
         for j in 1..partitions
         loop
             execute 'create table p_'||i||'_'||j||' partition of 
t_'||i||' for values from ('||j||') to ('||(j + 1)||')';
         end loop;
         execute 'insert into t_'||i||' values 
(generate_series(1,'||partitions||'))';
     end loop;
end;
$$ language plpgsql;

and

create or replace procedure run_amcheck() as $$
begin
     loop
         if (select count(*) from pg_stat_activity where 
backend_type='autovacuum worker') > 0
         then
             raise notice 'Run amcheck!';
             perform bt_index_parent_check(indexrelid, true, true) from 
pg_index;
         end if;
         perform pg_sleep(1);
     end loop;
end;
$$ language plpgsql;

Then I run concurrently run_amcheck()
and the following script for pgbench:

call create_tables(2,1000);
select pg_sleep(2);

If the problem is not reproduced, then cancel run_amcheck()  and restart 
it once again.


Backtrace (pg16) is the following:

   * frame #0: 0x00000001017b6aac 
amcheck.dylib`bt_child_highkey_check(state=0x000000010c846318, 
target_downlinkoffnum=37, loaded_child="\U00000001", target_level=1) at 
verify_nbtree.c:2146:23
     frame #1: 0x00000001017b7fd8 
amcheck.dylib`bt_child_check(state=0x000000010c846318, 
targetkey=0x000000013c01c448, downlinkoffnum=37) at verify_nbtree.c:2262:2
     frame #2: 0x00000001017b5f4c 
amcheck.dylib`bt_target_page_check(state=0x000000010c846318) at 
verify_nbtree.c:1623:4
     frame #3: 0x00000001017b3908 
amcheck.dylib`bt_check_level_from_leftmost(state=0x000000010c846318, 
level=(level = 1, leftmost = 3, istruerootlevel = false)) at 
verify_nbtree.c:859:3
     frame #4: 0x00000001017b24e8 
amcheck.dylib`bt_check_every_level(rel=0x0000000140074f18, 
heaprel=0x0000000130070148, heapkeyspace=true, readonly=true, 
heapallindexed=true, rootdescend=true) at verify_nbtree.c:603:13
     frame #5: 0x00000001017b198c 
amcheck.dylib`bt_index_check_internal(indrelid=2674, parentcheck=true, 
heapallindexed=true, rootdescend=true) at verify_nbtree.c:362:3
     frame #6: 0x00000001017b1a78 
amcheck.dylib`bt_index_parent_check(fcinfo=0x000000010c83b040) at 
verify_nbtree.c:242:2


I wonder if we should add P_ISHALFDEAD(opaque) for child page?





Re: Bug in amcheck?

От
Mihail Nikalayeu
Дата:
Hello!

> I wonder if we should add P_ISHALFDEAD(opaque) for child page?

I am not a btree expert, but things I was able to find so far:

In commit d114cc538715e14d29d6de8b6ea1a1d5d3e0edb4 next check is added:

> bt_child_highkey_check(state, downlinkoffnum,
>                   child, topaque->btpo_level);

At the same time there is a comment below:

> * We go ahead with our checks if the child page is half-dead.  It's safe
> * to do so because we do not test the child's high key, so it does not
> * matter that the original high key will have been replaced by a dummy
> * truncated high key within _bt_mark_page_halfdead().  All other page
> * items are left intact on a half-dead page, so there is still something
> * to test.

So, yes, it looks like we need to skip the child's high key test for
half-dead pages.

BWT, have you tried to create an injection_point-based reproducer?

Best regards,
Mikhail.



Re: Bug in amcheck?

От
Konstantin Knizhnik
Дата:
On 02/11/2025 2:27 PM, Mihail Nikalayeu wrote:
> Hello!
>
>> I wonder if we should add P_ISHALFDEAD(opaque) for child page?
> I am not a btree expert, but things I was able to find so far:
>
> In commit d114cc538715e14d29d6de8b6ea1a1d5d3e0edb4 next check is added:
>
>> bt_child_highkey_check(state, downlinkoffnum,
>>                    child, topaque->btpo_level);
> At the same time there is a comment below:
>
>> * We go ahead with our checks if the child page is half-dead.  It's safe
>> * to do so because we do not test the child's high key, so it does not
>> * matter that the original high key will have been replaced by a dummy
>> * truncated high key within _bt_mark_page_halfdead().  All other page
>> * items are left intact on a half-dead page, so there is still something
>> * to test.
> So, yes, it looks like we need to skip the child's high key test for
> half-dead pages.
>
> BWT, have you tried to create an injection_point-based reproducer?
>
> Best regards,
> Mikhail.


Hello Mikhail,


Thank you very much for looking at this issue. And I am very sorry for 
delay with answer.
Unfortunately I was not able to reproduce the problem for the latest 
Postgres: neither with injection points, neither with my original 
approach with sleeps.

Originally I investigated the customer's problem with PG16. And have 
reproduced it for pg16,. I checked that relevant amcheck code was not 
changed since pg16, so I thought that the problem takes place for all 
Postgres versions. But looks like it is not true.





Re: Bug in amcheck?

От
Mihail Nikalayeu
Дата:
Hello!

> Originally I investigated the customer's problem with PG16. And have
> reproduced it for pg16,. I checked that relevant amcheck code was not
> changed since pg16, so I thought that the problem takes place for all
> Postgres versions. But looks like it is not true.

I think it is still broken, but with less probability.
Have you tried injection points on v16? Such a test case will make
things much more clear.

Also, I added Alexander to CC (he is author of bt_child_highkey_check)
- maybe the issue is easily understandable for him.

Best regards,
Mikhail.



Re: Bug in amcheck?

От
Heikki Linnakangas
Дата:
On 19/11/2025 00:19, Mihail Nikalayeu wrote:
> Hello!
> 
>> Originally I investigated the customer's problem with PG16. And have
>> reproduced it for pg16,. I checked that relevant amcheck code was not
>> changed since pg16, so I thought that the problem takes place for all
>> Postgres versions. But looks like it is not true.
> 
> I think it is still broken, but with less probability.
> Have you tried injection points on v16? Such a test case will make
> things much more clear.

Konstantin's original repro involved autovacuum and concurrent sessions. 
I was confused by that, because bt_index_parent_check() holds a 
ShareLock, which prevents it from running concurrently with vacuum. But 
this isn't a race condition as such, the issue arises whenever there is 
a half-dead page in the index. To end up with a half-dead page, you need 
to gracefully cancel/interrupt autovacuum while it's deleting a page. 
The repro's way of canceling autovacuum was very complicated. I didn't 
fully understand it, but I think the concurrent dropping/creating of 
tables would sometimes cause autovauum to be canceled.

Here's a much more straightforward repro. Apply this little patch:

diff --git a/src/backend/access/nbtree/nbtpage.c 
b/src/backend/access/nbtree/nbtpage.c
index 30b43a4dd18..c132fc90277 100644
--- a/src/backend/access/nbtree/nbtpage.c
+++ b/src/backend/access/nbtree/nbtpage.c
@@ -2353,6 +2353,12 @@ _bt_unlink_halfdead_page(Relation rel, Buffer 
leafbuf, BlockNumber scanblkno,
       * Check here, as calling loops will have locks held, preventing
       * interrupts from being processed.
       */
+    if (random()  < INT32_MAX / 2)
+    {
+        elog(ERROR, "aborting page deletion");
+    }
+    else
+        elog(NOTICE, "unlinking halfdead page %u %u", 
BufferGetBlockNumber(leafbuf), scanblkno);
      CHECK_FOR_INTERRUPTS();

      /* Unlink the current top parent of the subtree */

Then run this:

postgres=# CREATE TABLE amchecktest (id int4);
CREATE TABLE
postgres=# INSERT INTO amchecktest SELECT g from generate_series(1, 
1000000) g;
INSERT 0 1000000
postgres=# CREATE INDEX on amchecktest (id);
CREATE INDEX
postgres=# DELETE FROM amchecktest WHERE id > 100000 AND id < 120000;
DELETE 19999
postgres=# -- this will hit the error added by the patch
VACUUM amchecktest;
ERROR:  aborting page deletion
CONTEXT:  while vacuuming index "amchecktest_id_idx" of relation 
"public.amchecktest"
postgres=# select bt_index_parent_check('amchecktest_id_idx');
ERROR:  mismatch between parent key and child high key in index 
"amchecktest_id_idx"
DETAIL:  Target block=3 child block=276 target page lsn=0/6ED0DB68.


To fix this, I guess we need to teach bt_index_parent_check() about 
half-dead pages. Anyone volunteer to write that patch?

- Heikki




Re: Bug in amcheck?

От
Peter Geoghegan
Дата:
On Tue, Nov 25, 2025 at 3:03 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> To fix this, I guess we need to teach bt_index_parent_check() about
> half-dead pages. Anyone volunteer to write that patch?

It's not like bt_index_parent_check doesn't generally know about them.
For example, bt_downlink_missing_check goes to great lengths to
distinguish between legitimate "missing" downlinks caused by an
interrupted page deletion, and real missing downlinks caused by
corruption.

The problem we're seeing here seems likely limited to code added by
commit d114cc53, which enhanced bt_index_parent_check by adding the
new bt_child_highkey_check check. bt_child_highkey_check actually
reuses bt_downlink_missing_check (which deals with half-dead pages
correctly), but still isn't careful enough about half-dead pages. This
is kind of surprising, since it *does* account for incomplete splits,
which are similar.

In short, I think that we need to track something like
BtreeCheckState.previncompletesplit, but for half-dead pages. And then
actually use that within bt_child_highkey_check, to avoid spurious
false-positive reports of corruption.

--
Peter Geoghegan



Re: Bug in amcheck?

От
Heikki Linnakangas
Дата:
On 25/11/2025 22:51, Peter Geoghegan wrote:
> On Tue, Nov 25, 2025 at 3:03 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> To fix this, I guess we need to teach bt_index_parent_check() about
>> half-dead pages. Anyone volunteer to write that patch?
>
> It's not like bt_index_parent_check doesn't generally know about them.
> For example, bt_downlink_missing_check goes to great lengths to
> distinguish between legitimate "missing" downlinks caused by an
> interrupted page deletion, and real missing downlinks caused by
> corruption.
>
> The problem we're seeing here seems likely limited to code added by
> commit d114cc53, which enhanced bt_index_parent_check by adding the
> new bt_child_highkey_check check. bt_child_highkey_check actually
> reuses bt_downlink_missing_check (which deals with half-dead pages
> correctly), but still isn't careful enough about half-dead pages. This
> is kind of surprising, since it *does* account for incomplete splits,
> which are similar.

+1. It took me a moment to understand bt_downlink_missing_check(). The
comments there talk about interrupted page deletions and incomplete
splits, but it's actually never called for half-dead pages. The caller
checks !P_IGNORE(opaque), and P_IGNORE means BTP_DELETED | BTP_HALF_DEAD.

I don't think BTP_DELETED pages should be reachable here at all. So
perhaps we should specifically check BTP_DELETED and give an ERROR on
those. And for clarity, perhaps move the check for BTP_HALF_DEAD into
bt_downlink_missing_check(), alongside the incomplete-split check, so
that both cases would be checked at the same place. Just for clarity,
it's not wrong as it is.

> In short, I think that we need to track something like
> BtreeCheckState.previncompletesplit, but for half-dead pages. And then
> actually use that within bt_child_highkey_check, to avoid spurious
> false-positive reports of corruption.

I think it's even simpler than that, and we can just do this:

--- a/contrib/amcheck/verify_nbtree.c
+++ b/contrib/amcheck/verify_nbtree.c
@@ -2268,7 +2268,7 @@ bt_child_highkey_check(BtreeCheckState *state,
           * If we visit page with high key, check that it is equal to the
           * target key next to corresponding downlink.
           */
-        if (!rightsplit && !P_RIGHTMOST(opaque))
+        if (!rightsplit && !P_RIGHTMOST(opaque) && !P_ISHALFDEAD(opaque))
          {
              BTPageOpaque topaque;
              IndexTuple    highkey;


Both BTP_HALF_DEAD and BTP_INCOMPLETE_SPLIT indicate that a downlink is
missing, but they work slightly differently. If a page is marked as
BTP_INCOMPLETE_SPLIT, it means that its right sibling has no downlink,
but if a page is marked as BTP_HALF_DEAD, it means that the page itself
has no downlink.

- Heikki




Re: Bug in amcheck?

От
Heikki Linnakangas
Дата:
On 26/11/2025 10:20, Heikki Linnakangas wrote:
> On 25/11/2025 22:51, Peter Geoghegan wrote:
>> In short, I think that we need to track something like
>> BtreeCheckState.previncompletesplit, but for half-dead pages. And then
>> actually use that within bt_child_highkey_check, to avoid spurious
>> false-positive reports of corruption.
> 
> I think it's even simpler than that, and we can just do this:
> 
> --- a/contrib/amcheck/verify_nbtree.c
> +++ b/contrib/amcheck/verify_nbtree.c
> @@ -2268,7 +2268,7 @@ bt_child_highkey_check(BtreeCheckState *state,
>            * If we visit page with high key, check that it is equal to the
>            * target key next to corresponding downlink.
>            */
> -        if (!rightsplit && !P_RIGHTMOST(opaque))
> +        if (!rightsplit && !P_RIGHTMOST(opaque) && !P_ISHALFDEAD(opaque))
>           {
>               BTPageOpaque topaque;
>               IndexTuple    highkey;
> 
> 
> Both BTP_HALF_DEAD and BTP_INCOMPLETE_SPLIT indicate that a downlink is 
> missing, but they work slightly differently. If a page is marked as 
> BTP_INCOMPLETE_SPLIT, it means that its right sibling has no downlink, 
> but if a page is marked as BTP_HALF_DEAD, it means that the page itself 
> has no downlink.

Ok, here's a proper patch with tests. The patch itself is the above 
one-liner. It's in patch 0004.

While testing this, I bumped into another similar amcheck bug: if the 
root page split is interrupted, verify_btree() complains:

ERROR:  block 3 is not true root in index "nbtree_incomplete_splits_i_idx"

Attached patch 0002 contains a fix and a test for that. The fix for that 
is also one-liner.


Summary of the patches:

Patch 0001 adds an injection point test for incomplete splits. We 
already had such a test for GIN, which handles incomplete splits the 
same way as B-tree. I copy-pasted and adapted the GIN test for B-tree. 
This was an easy way to increase our test coverage.

Patch 0002 fixes the incomplete-root-split bug in amcheck. It modifies 
the test added in patch 0001 to cover the bug fix.

Patch 0003 adds a test for half-dead pages, similar to what 0001 did for 
incomplete splits.

Patch 0004 fixes the bogus half-deaf-page error in amcheck, i.e. the 
issue that started this thread. It modifies the test introduced in patch 
0003 to add amcheck calls, to cover the bug fix.

- Heikki

Вложения

Re: Bug in amcheck?

От
Peter Geoghegan
Дата:
On Mon, Dec 1, 2025 at 4:20 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> Ok, here's a proper patch with tests. The patch itself is the above
> one-liner. It's in patch 0004.
>
> While testing this, I bumped into another similar amcheck bug: if the
> root page split is interrupted, verify_btree() complains:
>
> ERROR:  block 3 is not true root in index "nbtree_incomplete_splits_i_idx"
>
> Attached patch 0002 contains a fix and a test for that. The fix for that
> is also one-liner.

Good catch.

> Summary of the patches:
>
> Patch 0001 adds an injection point test for incomplete splits. We
> already had such a test for GIN, which handles incomplete splits the
> same way as B-tree. I copy-pasted and adapted the GIN test for B-tree.
> This was an easy way to increase our test coverage.
>
> Patch 0002 fixes the incomplete-root-split bug in amcheck. It modifies
> the test added in patch 0001 to cover the bug fix.
>
> Patch 0003 adds a test for half-dead pages, similar to what 0001 did for
> incomplete splits.
>
> Patch 0004 fixes the bogus half-deaf-page error in amcheck, i.e. the
> issue that started this thread. It modifies the test introduced in patch
> 0003 to add amcheck calls, to cover the bug fix.

All seem reasonable.

These tests will increase nbtree code coverage quite a bit, which is a
nice bonus.

--
Peter Geoghegan



Re: Bug in amcheck?

От
Heikki Linnakangas
Дата:
On 02/12/2025 19:59, Peter Geoghegan wrote:
> On Mon, Dec 1, 2025 at 4:20 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>> Summary of the patches:
>>
>> Patch 0001 adds an injection point test for incomplete splits. We
>> already had such a test for GIN, which handles incomplete splits the
>> same way as B-tree. I copy-pasted and adapted the GIN test for B-tree.
>> This was an easy way to increase our test coverage.
>>
>> Patch 0002 fixes the incomplete-root-split bug in amcheck. It modifies
>> the test added in patch 0001 to cover the bug fix.
>>
>> Patch 0003 adds a test for half-dead pages, similar to what 0001 did for
>> incomplete splits.
>>
>> Patch 0004 fixes the bogus half-deaf-page error in amcheck, i.e. the
>> issue that started this thread. It modifies the test introduced in patch
>> 0003 to add amcheck calls, to cover the bug fix.
> 
> All seem reasonable.
> 
> These tests will increase nbtree code coverage quite a bit, which is a
> nice bonus.

Committed, thanks for the review!

- Heikki




Re: Bug in amcheck?

От
Tom Lane
Дата:
Heikki Linnakangas <hlinnaka@iki.fi> writes:
> On 02/12/2025 19:59, Peter Geoghegan wrote:
>> These tests will increase nbtree code coverage quite a bit, which is a
>> nice bonus.

> Committed, thanks for the review!

In the past couple of days, scorpion and skink have failed
the nbtree_half_dead_pages test with identical symptoms [1][2]:

@@ -41,8 +41,6 @@
 (1 row)

 vacuum nbtree_half_dead_pages;
-ERROR:  error triggered for injection point nbtree-leave-page-half-dead
-CONTEXT:  while vacuuming index "nbtree_half_dead_pages_id_idx" of relation "public.nbtree_half_dead_pages"
 SELECT injection_points_detach('nbtree-leave-page-half-dead');
  injection_points_detach
 -------------------------
@@ -67,7 +65,6 @@

 -- Finish the deletion and re-check
 vacuum nbtree_half_dead_pages;
-NOTICE:  notice triggered for injection point nbtree-finish-half-dead-page-vacuum
 select * from nbtree_half_dead_pages where id > 99998 and id < 120002;
    id
 --------

No idea why it took a month for this to show up.  Could some
recent change have affected it?

            regards, tom lane

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=scorpion&dt=2026-01-02%2004%3A54%3A38
[2] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=skink&dt=2025-12-31%2003%3A34%3A51