Обсуждение: BUG #19018: high memory usage and "stack depth limit exceeded", with GiST index on ltree

Поиск

Список

Период

Сортировка

BUG #19018: high memory usage and "stack depth limit exceeded", with GiST index on ltree

От

PG Bug reporting form

Дата:

12 августа 2025 г., 01:00:52

The following bug has been logged on the website:

Bug reference:      19018
Logged by:          Joseph Silva
Email address:      dull.bananas0@gmail.com
PostgreSQL version: 17.5
Operating system:   Fedora
Description:

If I run this, then the postgres process's memory usage approaches 6 GB, and
the insertion when
i=253 fails with "stack depth limit exceeded":

```
CREATE EXTENSION ltree;

CREATE TABLE comment (path ltree);

CREATE INDEX ON comment USING gist (path);

DO $$
    DECLARE
        i int := 1;
        p text := '0';
    BEGIN
        WHILE i < 1000 LOOP
            p := p || '.' || i::text;
            i := i + 1;
            INSERT INTO comment (path) VALUES (p::ltree);
            COMMIT;
        END LOOP;
    END
$$;
```

If index creation is delayed until after insertions, then the insertions
succeed but index creation
fails.

Re: BUG #19018: high memory usage and "stack depth limit exceeded", with GiST index on ltree

От

Dilip Kumar

Дата:

13 августа 2025 г., 07:45:39

On Tue, Aug 12, 2025 at 5:44 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      19018
> Logged by:          Joseph Silva
> Email address:      dull.bananas0@gmail.com
> PostgreSQL version: 17.5
> Operating system:   Fedora
> Description:
>
> If I run this, then the postgres process's memory usage approaches 6 GB, and
> the insertion when
> i=253 fails with "stack depth limit exceeded":
>
> ```
> CREATE EXTENSION ltree;
>
> CREATE TABLE comment (path ltree);
>
> CREATE INDEX ON comment USING gist (path);
>
> DO $$
>     DECLARE
>         i int := 1;
>         p text := '0';
>     BEGIN
>         WHILE i < 1000 LOOP
>             p := p || '.' || i::text;
>             i := i + 1;
>             INSERT INTO comment (path) VALUES (p::ltree);
>             COMMIT;
>         END LOOP;
>     END
> $$;
> ```
>
> If index creation is delayed until after insertions, then the insertions
> succeed but index creation
> fails.
>

Thanks for reporting, I didn't analyze it fully but here is what I
have analyzed so far.  While debugging I have noticed that it is
recursively trying to complete the previously incomplete split,
ideally there should not be any incomplete split because I am just
running this from a single sessions so there should not be any issue
in acquiring parent page lock and there is no crash so
GistFollowRight() must be cleared but it is not in some cases and its
keep recursively splitting until it hits the stack overflow [2].  So
this seems like somewhere we have missed to call
GistClearFollowRight() after splitting.

Then I tried to observed the relpages and it shows 270065 relpages for
comment_path_idx which was just 1 before executing this ANONYMOUS
block[2]


[1]
postgres[2882817]=# ANALYZE ;
ANALYZE
postgres[2882817]=# select relname, relpages from pg_class where
relname like '%comment%';
     relname      | relpages
------------------+----------
 comment          |        0
 comment_path_idx |        1
(2 rows)

postgres[2882817]=# DO $$
    DECLARE
        i int := 1;
        p text := '0';
    BEGIN
        WHILE i < 500 LOOP
            p := p || '.' || i::text;
            i := i + 1;
            INSERT INTO comment (path) VALUES (p::ltree);
            COMMIT;
        END LOOP;
    END
$$

ERROR:  54001: stack depth limit exceeded
HINT:  Increase the configuration parameter "max_stack_depth"
(currently 2048kB), after ensuring the platform's stack depth limit is
adequate.
CONTEXT:  SQL statement "INSERT INTO comment (path) VALUES (p::ltree)"
PL/pgSQL function inline_code_block line 9 at SQL statement
LOCATION:  check_stack_depth, stack_depth.c:99

postgres[2882817]=# select relname, relpages from pg_class where
relname like '%comment%';
     relname      | relpages
------------------+----------
 comment          |       35
 comment_path_idx |   270065
(2 rows)

[2]

#0  check_stack_depth () at stack_depth.c:99
#1  0x000055659812bad5 in gistSplit (r=0x7f0c1d9dd008,
page=0x7f0c1e44d000 "\001", itup=0x5567a00a1650, len=3,
giststate=0x5565d74ce9f8) at gist.c:1463
#2  0x0000556598129720 in gistplacetopage (rel=0x7f0c1d9dd008,
freespace=0, giststate=0x5565d74ce9f8, buffer=453,
itup=0x7ffd5f39b800, ntup=2, oldoffnum=2, newblkno=0x0,
    leftchildbuf=13010, splitinfo=0x7ffd5f39b790,
markfollowright=true, heapRel=0x7f0c1d9de928, is_build=false) at
gist.c:315
#3  0x000055659812b7eb in gistinserttuples (state=0x7ffd5f5993c0,
stack=0x5567a009a078, giststate=0x5565d74ce9f8, tuples=0x7ffd5f39b800,
ntup=2, oldoffnum=2

<--------clip------>
    leftchild=12971, rightchild=12974, unlockbuf=true,
unlockleftchild=true) at gist.c:1337
#18642 0x000055659812ba52 in gistfinishsplit (state=0x7ffd5f5993c0,
stack=0x5565d7726968, giststate=0x5565d74ce9f8,
splitinfo=0x5565d7768848, unlockbuf=true)
    at gist.c:1408
#18643 0x000055659812b855 in gistinserttuples (state=0x7ffd5f5993c0,
stack=0x5565d7726968, giststate=0x5565d74ce9f8, tuples=0x7ffd5f599300,
ntup=2, oldoffnum=1,
    leftchild=12972, rightchild=12973, unlockbuf=true,
unlockleftchild=false) at gist.c:1337
#18644 0x000055659812ba52 in gistfinishsplit (state=0x7ffd5f5993c0,
stack=0x5565d7758388, giststate=0x5565d74ce9f8,
splitinfo=0x5565d77593e8, unlockbuf=false)
    at gist.c:1408
#18645 0x000055659812b6e2 in gistfixsplit (state=0x7ffd5f5993c0,
giststate=0x5565d74ce9f8) at gist.c:1246
#18646 0x000055659812a61c in gistdoinsert (r=0x7f0c1d9dd008,
itup=0x5565d746c580, freespace=0, giststate=0x5565d74ce9f8,
heapRel=0x7f0c1d9de928, is_build=false)


--
Regards,
Dilip Kumar
Google

Re: BUG #19018: high memory usage and "stack depth limit exceeded", with GiST index on ltree

От

Arseniy Mukhin

Дата:

13 августа 2025 г., 18:53:39

On Wed, Aug 13, 2025 at 7:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Aug 12, 2025 at 5:44 PM PG Bug reporting form
> <noreply@postgresql.org> wrote:
> >
> > The following bug has been logged on the website:
> >
> > Bug reference:      19018
> > Logged by:          Joseph Silva
> > Email address:      dull.bananas0@gmail.com
> > PostgreSQL version: 17.5
> > Operating system:   Fedora
> > Description:
> >
> > If I run this, then the postgres process's memory usage approaches 6 GB, and
> > the insertion when
> > i=253 fails with "stack depth limit exceeded":
> >
> > ```
> > CREATE EXTENSION ltree;
> >
> > CREATE TABLE comment (path ltree);
> >
> > CREATE INDEX ON comment USING gist (path);
> >
> > DO $$
> >     DECLARE
> >         i int := 1;
> >         p text := '0';
> >     BEGIN
> >         WHILE i < 1000 LOOP
> >             p := p || '.' || i::text;
> >             i := i + 1;
> >             INSERT INTO comment (path) VALUES (p::ltree);
> >             COMMIT;
> >         END LOOP;
> >     END
> > $$;
> > ```
> >
> > If index creation is delayed until after insertions, then the insertions
> > succeed but index creation
> > fails.
> >
>
> Thanks for reporting, I didn't analyze it fully but here is what I
> have analyzed so far.  While debugging I have noticed that it is
> recursively trying to complete the previously incomplete split,
> ideally there should not be any incomplete split because I am just
> running this from a single sessions so there should not be any issue
> in acquiring parent page lock and there is no crash so
> GistFollowRight() must be cleared but it is not in some cases and its
> keep recursively splitting until it hits the stack overflow [2].  So
> this seems like somewhere we have missed to call
> GistClearFollowRight() after splitting.

Hi,

I managed to reproduce it too. I agree that the problem here is an
endless split. I encountered similar issues while I was trying to
insert big tuples into the gist index. And the main problem I think
here is that GIST insert code does not limit index tuple size (ofc it
could not be larger then page size). There is a macros
GISTMaxIndexTupleSize, but it's not used. Probably there are some
limits on the operator classes side, but it doesn't seem to work here.

I added several logs while investigating it. So how stack overflow is
happening in this case:

During the insert that leads to stack overflow, we have a split. To
finish the split we need to insert into the parent last two downlinks.
Here the sizes of these downlinks (all sizes here are in bytes):

2025-08-13 17:38:38.614 MSK [829680] DEBUG:  finish split: insert
child left size: 3776, right size: 4016

You can see that it's a huge index tuples.

At the moment of insert of the two new downlinks, the parent has 2 tuples in it:
2025-08-13 17:38:38.614 MSK [829680] DEBUG:  number of tups on the
page before split: 2

We want to add two more. But it does not fit into the page, so we have
a parent split. Split algorithm decided that we need 3 pages in this
split:
2025-08-13 17:38:38.614 MSK [829680] DEBUG:  split during insert,
children number: 3

And guess what, here are sizes of new downlinks that we have for these 3 pages:
2025-08-13 17:38:38.614 MSK [829680] DEBUG:  downlink size: 3776
2025-08-13 17:38:38.614 MSK [829680] DEBUG:  downlink size: 4016
2025-08-13 17:38:38.614 MSK [829680] DEBUG:  downlink size: 4160

Here we can see that sizes of the last two downlinks that we will try
to insert into the parent of the parent are 3776 and 4016. The same
that we were trying to insert into the initial parent.

In short, when we try to insert 3 huge tuples in the parent page, we
have a new split, which results in new 3 huge tuples that we need to
insert into the parent of the parent etc.

This way we have neverending split.

Here is a draft patch that checks index size before insert and during
the split, and now the reproducer fails on the check.

Best regards,
Arseniy Mukhin

Вложения

v1-0001-Adds-check-of-index-tuple-size-during-gist-insert.patch

Re: BUG #19018: high memory usage and "stack depth limit exceeded", with GiST index on ltree

От

Dilip Kumar

Дата:

14 августа 2025 г., 06:14:50

On Wed, Aug 13, 2025 at 9:23 PM Arseniy Mukhin
<arseniy.mukhin.dev@gmail.com> wrote:
>
> On Wed, Aug 13, 2025 at 7:46 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> >
> > On Tue, Aug 12, 2025 at 5:44 PM PG Bug reporting form
> > <noreply@postgresql.org> wrote:
> > >
> > > The following bug has been logged on the website:
> > >
> > > Bug reference:      19018
> > > Logged by:          Joseph Silva
> > > Email address:      dull.bananas0@gmail.com
> > > PostgreSQL version: 17.5
> > > Operating system:   Fedora
> > > Description:
> > >
> > > If I run this, then the postgres process's memory usage approaches 6 GB, and
> > > the insertion when
> > > i=253 fails with "stack depth limit exceeded":
> > >
> > > ```
> > > CREATE EXTENSION ltree;
> > >
> > > CREATE TABLE comment (path ltree);
> > >
> > > CREATE INDEX ON comment USING gist (path);
> > >
> > > DO $$
> > >     DECLARE
> > >         i int := 1;
> > >         p text := '0';
> > >     BEGIN
> > >         WHILE i < 1000 LOOP
> > >             p := p || '.' || i::text;
> > >             i := i + 1;
> > >             INSERT INTO comment (path) VALUES (p::ltree);
> > >             COMMIT;
> > >         END LOOP;
> > >     END
> > > $$;
> > > ```
> > >
> > > If index creation is delayed until after insertions, then the insertions
> > > succeed but index creation
> > > fails.
> > >
> >
> > Thanks for reporting, I didn't analyze it fully but here is what I
> > have analyzed so far.  While debugging I have noticed that it is
> > recursively trying to complete the previously incomplete split,
> > ideally there should not be any incomplete split because I am just
> > running this from a single sessions so there should not be any issue
> > in acquiring parent page lock and there is no crash so
> > GistFollowRight() must be cleared but it is not in some cases and its
> > keep recursively splitting until it hits the stack overflow [2].  So
> > this seems like somewhere we have missed to call
> > GistClearFollowRight() after splitting.
>
> Hi,
>
> I managed to reproduce it too. I agree that the problem here is an
> endless split. I encountered similar issues while I was trying to
> insert big tuples into the gist index. And the main problem I think
> here is that GIST insert code does not limit index tuple size (ofc it
> could not be larger then page size). There is a macros
> GISTMaxIndexTupleSize, but it's not used. Probably there are some
> limits on the operator classes side, but it doesn't seem to work here.
>
> I added several logs while investigating it. So how stack overflow is
> happening in this case:
>
> During the insert that leads to stack overflow, we have a split. To
> finish the split we need to insert into the parent last two downlinks.
> Here the sizes of these downlinks (all sizes here are in bytes):
>
> 2025-08-13 17:38:38.614 MSK [829680] DEBUG:  finish split: insert
> child left size: 3776, right size: 4016
>
> You can see that it's a huge index tuples.
>
> At the moment of insert of the two new downlinks, the parent has 2 tuples in it:
> 2025-08-13 17:38:38.614 MSK [829680] DEBUG:  number of tups on the
> page before split: 2
>
> We want to add two more. But it does not fit into the page, so we have
> a parent split. Split algorithm decided that we need 3 pages in this
> split:
> 2025-08-13 17:38:38.614 MSK [829680] DEBUG:  split during insert,
> children number: 3
>
> And guess what, here are sizes of new downlinks that we have for these 3 pages:
> 2025-08-13 17:38:38.614 MSK [829680] DEBUG:  downlink size: 3776
> 2025-08-13 17:38:38.614 MSK [829680] DEBUG:  downlink size: 4016
> 2025-08-13 17:38:38.614 MSK [829680] DEBUG:  downlink size: 4160
>
> Here we can see that sizes of the last two downlinks that we will try
> to insert into the parent of the parent are 3776 and 4016. The same
> that we were trying to insert into the initial parent.
>
> In short, when we try to insert 3 huge tuples in the parent page, we
> have a new split, which results in new 3 huge tuples that we need to
> insert into the parent of the parent etc.
>
> This way we have neverending split.
>
> Here is a draft patch that checks index size before insert and during
> the split, and now the reproducer fails on the check.


I think your root cause analysis looks valid to me, In my analysis I
said that we had the F_FOLLOW_RIGHT flag set for some of the pages and
I assumed we missed to clear that flag.  But your analysis shows that
we never completed the split and that's the reason this flag is not
cleared.  And I remember I debugged the second run, after I hit the
stack overflow in the first run. So during the second run it was
trying to finish the incomplete split because that was leftover from
the first run.  So yeah this analysis makes sense, and the fix too.

--
Regards,
Dilip Kumar
Google

Re: BUG #19018: high memory usage and "stack depth limit exceeded", with GiST index on ltree

От

Sergei Kornilov

Дата:

14 августа 2025 г., 12:22:36

Hello

This reminded me of some very old bug reports, like mine:
https://www.postgresql.org/message-id/15431-7a89470f7879bed4%40postgresql.org
Still reproduces giving different results depending on the order of operations. The proposed patch makes the behavior
thesame in both cases.
 

Since this code path may be reachable by the user, instead of elog there should be ereport with a more user-friendly
description.I would suggest something more like the text used for btree in a case like this. (ereport in
_bt_check_third_page)

    ereport(ERROR,
            (errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
             errmsg("index row size %zu exceeds gist maximum %zu for index \"%s\""...
                         errdetail("Index row references tuple (%u,%u) in relation \"%s\".",...
                         errhint("Values larger than 1/2 of a buffer page cannot be indexed"...

regards, Sergei

Re: BUG #19018: high memory usage and "stack depth limit exceeded", with GiST index on ltree

От

Arseniy Mukhin

Дата:

16 августа 2025 г., 15:07:47

Hi,

Dilip, Sergei, thank you for your feedback. After researching it more
I think the fix is incorrect. It seems that we shouldn't limit index
tuple size of leaf pages on the gist core level. Some operator classes
can handle page size tuples on the leaf level and it doesn't break
split (like trgm, hstore) because representation of leaf and not leaf
index tuples are different. So you can have almost page sized index
tuples on the leaf level and all inner level tuples will still be
small enough to provide valid splitting.
It means if we want to introduce some limitation for leaf tuples, we
should do it in operator class implementation.

For inner level tuples it's probably a good idea to introduce some
limitation in the gist core, because no matter what opclass do you
use, it doesn't make sense to have page size tuples.


Speaking about the bug, I still think the main cause is a lack of
index tuple size limitation, but we need to fix it in the operator
class, not gist core. ltree inner page index tuple has next structure:

 *  Non-Leaf
 *            (len)(flag)(sign)(left_ltree)(right_ltree)
 *     ALLTRUE: (len)(flag)(left_ltree)(right_ltree)

Sign is a signature, it has a fixed size that you can set up while
index creation (param sigsize). left_ltree, right_ltree are keys that
we build index on.
So if we want to have internal tuples that less than
GISTMaxIndexTupleSize (~1/4 page size) than we should limit leaf index
tuples keys, and max size for such keys should be:

max_size  = (GISTMaxIndexKeySize - sigsize) / 2

For default signature size and default page size maximum key size is
1008. It works only for single column indexes, for multicolumn we need
a more strict limit, but it seems all operator classes are quite
carefree about it, so probably if we fix at least a single column case
it would be good enough.

There is a new patch version with implementation of the above idea.
Bug reproducer fails on the new check. Sergei's reproducer fails as
well (most of the inserts in it now fail on the new check). I found an
ereport about exceeding size limitation in gist code, so I use the
same error message here too, it's more user friendly than the original
message in v1.

Unfortunately, there is another point that should be addressed. Upper
limit for signature size is 2024, which results in max_size = 0 which
means we can't insert any tuples with it. There is a regression test
that fails because of it. I think the upper limit for signature should
be decreased. But I don't know what to do with existing indexes that
could have big signatures.

What do you think?


Best regards,
Arseniy Mukhin

Вложения

v2-0001-Adds-check-of-index-key-size-for-ltree-gist-opera.patch

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: BUG #19018: high memory usage and "stack depth limit exceeded", with GiST index on ltree

BUG #19018: high memory usage and "stack depth limit exceeded", with GiST index on ltree

Re: BUG #19018: high memory usage and "stack depth limit exceeded", with GiST index on ltree

Re: BUG #19018: high memory usage and "stack depth limit exceeded", with GiST index on ltree

Вложения

Re: BUG #19018: high memory usage and "stack depth limit exceeded", with GiST index on ltree

Re: BUG #19018: high memory usage and "stack depth limit exceeded", with GiST index on ltree

Re: BUG #19018: high memory usage and "stack depth limit exceeded", with GiST index on ltree

Вложения