Обсуждение: Odd code around ginScanToDelete

Поиск
Список
Период
Сортировка

Odd code around ginScanToDelete

От
Andres Freund
Дата:
Hi,

While looking at converting more places to UnlockReleaseBuffer(), in the
course of making UnlockReleaseBuffer() faster than the two separate
operations, I found this code:

static bool
ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
                DataPageDeleteStack *parent, OffsetNumber myoff)
...

    if (!meDelete)
    {
        if (BufferIsValid(me->leftBuffer))
            UnlockReleaseBuffer(me->leftBuffer);
        me->leftBuffer = buffer;
    }
    else
    {
        if (!isRoot)
            LockBuffer(buffer, GIN_UNLOCK);

        ReleaseBuffer(buffer);
    }

    if (isRoot)
        ReleaseBuffer(buffer);


Which sure looks like it'd release buffer twice if isRoot is set?  I guess
that's not reachable, because presumably the root page will always go down the
!meDelete path. But it sure made me wonder if there's a hard to reach bug.

This code was introduced in
  commit e14641197a5
  Author: Alexander Korotkov <akorotkov@postgresql.org>
  Date:   2019-11-19 23:07:36 +0300

      Fix deadlock between ginDeletePage() and ginStepRight()

I didn't trace it further to see if it existed before that in some fashion.



There's another oddity here: ginScanToDelete() requires that the root page has
been locked by the caller already, but will afaict re-read the root page? But
then have code to avoid locking it again, because that would not have worked?
Seems odd.


Greetings,

Andres Freund



Re: Odd code around ginScanToDelete

От
Alexander Korotkov
Дата:
On Tue, Feb 3, 2026 at 6:26 PM Andres Freund <andres@anarazel.de> wrote:
>
> While looking at converting more places to UnlockReleaseBuffer(), in the
> course of making UnlockReleaseBuffer() faster than the two separate
> operations, I found this code:
>
> static bool
> ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
>                                 DataPageDeleteStack *parent, OffsetNumber myoff)
> ...
>
>         if (!meDelete)
>         {
>                 if (BufferIsValid(me->leftBuffer))
>                         UnlockReleaseBuffer(me->leftBuffer);
>                 me->leftBuffer = buffer;
>         }
>         else
>         {
>                 if (!isRoot)
>                         LockBuffer(buffer, GIN_UNLOCK);
>
>                 ReleaseBuffer(buffer);
>         }
>
>         if (isRoot)
>                 ReleaseBuffer(buffer);
>
>
> Which sure looks like it'd release buffer twice if isRoot is set?  I guess
> that's not reachable, because presumably the root page will always go down the
> !meDelete path. But it sure made me wonder if there's a hard to reach bug.

Yes, it's not possible to have meDelete set for root, because
me->leftBuffer is always InvalidBuffer for the root.  So the branch
handling meDelete case should better do Assert(!isRoot).

> This code was introduced in
>   commit e14641197a5
>   Author: Alexander Korotkov <akorotkov@postgresql.org>
>   Date:   2019-11-19 23:07:36 +0300
>
>       Fix deadlock between ginDeletePage() and ginStepRight()
>
> I didn't trace it further to see if it existed before that in some fashion.

Yes.  I think generally this area needs to be reworked to become more
clear, and have vast more comments.  It was wrong from my side trying
to fix bugs there without reworking it into something more
appropriate.  I'm planning to put work on this during this week.

> There's another oddity here: ginScanToDelete() requires that the root page has
> been locked by the caller already, but will afaict re-read the root page? But
> then have code to avoid locking it again, because that would not have worked?
> Seems odd.


It seems a bit odd for me that caller already have locked buffer, but
passes BlockNumber making us re-read the buffer.  But I'm not sure
that's the same as your point.  Could you, please, elaborate more on
this?

------
Regards,
Alexander Korotkov
Supabase



Re: Odd code around ginScanToDelete

От
Alexander Korotkov
Дата:
On Wed, Feb 4, 2026 at 12:32 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Tue, Feb 3, 2026 at 6:26 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > While looking at converting more places to UnlockReleaseBuffer(), in the
> > course of making UnlockReleaseBuffer() faster than the two separate
> > operations, I found this code:
> >
> > static bool
> > ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
> >                                 DataPageDeleteStack *parent, OffsetNumber myoff)
> > ...
> >
> >         if (!meDelete)
> >         {
> >                 if (BufferIsValid(me->leftBuffer))
> >                         UnlockReleaseBuffer(me->leftBuffer);
> >                 me->leftBuffer = buffer;
> >         }
> >         else
> >         {
> >                 if (!isRoot)
> >                         LockBuffer(buffer, GIN_UNLOCK);
> >
> >                 ReleaseBuffer(buffer);
> >         }
> >
> >         if (isRoot)
> >                 ReleaseBuffer(buffer);
> >
> >
> > Which sure looks like it'd release buffer twice if isRoot is set?  I guess
> > that's not reachable, because presumably the root page will always go down the
> > !meDelete path. But it sure made me wonder if there's a hard to reach bug.
>
> Yes, it's not possible to have meDelete set for root, because
> me->leftBuffer is always InvalidBuffer for the root.  So the branch
> handling meDelete case should better do Assert(!isRoot).
>
> > This code was introduced in
> >   commit e14641197a5
> >   Author: Alexander Korotkov <akorotkov@postgresql.org>
> >   Date:   2019-11-19 23:07:36 +0300
> >
> >       Fix deadlock between ginDeletePage() and ginStepRight()
> >
> > I didn't trace it further to see if it existed before that in some fashion.
>
> Yes.  I think generally this area needs to be reworked to become more
> clear, and have vast more comments.  It was wrong from my side trying
> to fix bugs there without reworking it into something more
> appropriate.  I'm planning to put work on this during this week.
>
> > There's another oddity here: ginScanToDelete() requires that the root page has
> > been locked by the caller already, but will afaict re-read the root page? But
> > then have code to avoid locking it again, because that would not have worked?
> > Seems odd.
>
>
> It seems a bit odd for me that caller already have locked buffer, but
> passes BlockNumber making us re-read the buffer.  But I'm not sure
> that's the same as your point.  Could you, please, elaborate more on
> this?

Here is the refactoring patch.  Sorry for the delay.

------
Regards,
Alexander Korotkov
Supabase

Вложения

Re: Odd code around ginScanToDelete

От
Pavel Borisov
Дата:
Hi, Andres and Alexander!

On Sat, 21 Feb 2026 at 13:55, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Wed, Feb 4, 2026 at 12:32 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > On Tue, Feb 3, 2026 at 6:26 PM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > While looking at converting more places to UnlockReleaseBuffer(), in the
> > > course of making UnlockReleaseBuffer() faster than the two separate
> > > operations, I found this code:
> > >
> > > static bool
> > > ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
> > >                                 DataPageDeleteStack *parent, OffsetNumber myoff)
> > > ...
> > >
> > >         if (!meDelete)
> > >         {
> > >                 if (BufferIsValid(me->leftBuffer))
> > >                         UnlockReleaseBuffer(me->leftBuffer);
> > >                 me->leftBuffer = buffer;
> > >         }
> > >         else
> > >         {
> > >                 if (!isRoot)
> > >                         LockBuffer(buffer, GIN_UNLOCK);
> > >
> > >                 ReleaseBuffer(buffer);
> > >         }
> > >
> > >         if (isRoot)
> > >                 ReleaseBuffer(buffer);
> > >
> > >
> > > Which sure looks like it'd release buffer twice if isRoot is set?  I guess
> > > that's not reachable, because presumably the root page will always go down the
> > > !meDelete path. But it sure made me wonder if there's a hard to reach bug.
> >
> > Yes, it's not possible to have meDelete set for root, because
> > me->leftBuffer is always InvalidBuffer for the root.  So the branch
> > handling meDelete case should better do Assert(!isRoot).
> > > This code was introduced in
> > >   commit e14641197a5
> > >   Author: Alexander Korotkov <akorotkov@postgresql.org>
> > >   Date:   2019-11-19 23:07:36 +0300
> > >
> > >       Fix deadlock between ginDeletePage() and ginStepRight()
> > >
> > > I didn't trace it further to see if it existed before that in some fashion.
> >
> > Yes.  I think generally this area needs to be reworked to become more
> > clear, and have vast more comments.  It was wrong from my side trying
> > to fix bugs there without reworking it into something more
> > appropriate.  I'm planning to put work on this during this week.
> >
> > > There's another oddity here: ginScanToDelete() requires that the root page has
> > > been locked by the caller already, but will afaict re-read the root page? But
> > > then have code to avoid locking it again, because that would not have worked?
> > > Seems odd.
> >
> >
> > It seems a bit odd for me that caller already have locked buffer, but
> > passes BlockNumber making us re-read the buffer.  But I'm not sure
> > that's the same as your point.  Could you, please, elaborate more on
> > this?
>
> Here is the refactoring patch.  Sorry for the delay.

Hi, Andres and Alexander!

I've looked into the patch v1.
Overall, it looks good to me.

Some thoughts:

Is it worth/possible in recursive calls of ginScanToDelete() to free
allocated myStackItem->child after processing all children of the
current level, when they are not needed anymore?
Previously to this patch, palloc-ed "me" variable also was't freed at
recursion levels.

Could limiting the maximum recursion level be useful?

In the comment to myStackItem before ginScanToDelete(), it might be
worth adding that after processing all pages on the current level,
myStackItem is not needed anymore.

> > Yes, it's not possible to have meDelete set for root, because
> > me->leftBuffer is always InvalidBuffer for the root.  So the branch
> > handling meDelete case should better do Assert(!isRoot).
Looks like this additional Assert is not in patch v1.

In the root call of ginScanToDelete(gvs, &root); we can add Assert
checking that its return result is false:
-               ginScanToDelete(gvs, &root);
+              deleted = ginScanToDelete(gvs, &root);
+.             Assert(!deleted); /* Root page is never deleted */

Additionally, it could be good to rename all vacuum functions related
to posting tree pages only, to include "Posting" (e.g., ginDeletePage
-> ginDeletePostingPage). The same is for the functions only for the
entry tree. It is already named this way in many places (e.g.
ginVacuumPostingTreeLeaves). It could be good to extend this to all
relevant functions.

Several small proposals on wording:
"rightmost non-deleted page to its left" -> "closest non-deleted
sibling page to its left"
"each entry tracks the buffer of the page" -> "each entry tracks the
buffers of the page" (as two buffers are mentioned)
"must already be pinned" -> "must already have been pinned"

Best regards,
Pavel Borisov
Supabase



Re: Odd code around ginScanToDelete

От
Xuneng Zhou
Дата:
Hi Pavel, Alexander,

On Fri, Mar 6, 2026 at 10:45 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>
> Hi, Andres and Alexander!
>
> On Sat, 21 Feb 2026 at 13:55, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > On Wed, Feb 4, 2026 at 12:32 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > >
> > > On Tue, Feb 3, 2026 at 6:26 PM Andres Freund <andres@anarazel.de> wrote:
> > > >
> > > > While looking at converting more places to UnlockReleaseBuffer(), in the
> > > > course of making UnlockReleaseBuffer() faster than the two separate
> > > > operations, I found this code:
> > > >
> > > > static bool
> > > > ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
> > > >                                 DataPageDeleteStack *parent, OffsetNumber myoff)
> > > > ...
> > > >
> > > >         if (!meDelete)
> > > >         {
> > > >                 if (BufferIsValid(me->leftBuffer))
> > > >                         UnlockReleaseBuffer(me->leftBuffer);
> > > >                 me->leftBuffer = buffer;
> > > >         }
> > > >         else
> > > >         {
> > > >                 if (!isRoot)
> > > >                         LockBuffer(buffer, GIN_UNLOCK);
> > > >
> > > >                 ReleaseBuffer(buffer);
> > > >         }
> > > >
> > > >         if (isRoot)
> > > >                 ReleaseBuffer(buffer);
> > > >
> > > >
> > > > Which sure looks like it'd release buffer twice if isRoot is set?  I guess
> > > > that's not reachable, because presumably the root page will always go down the
> > > > !meDelete path. But it sure made me wonder if there's a hard to reach bug.
> > >
> > > Yes, it's not possible to have meDelete set for root, because
> > > me->leftBuffer is always InvalidBuffer for the root.  So the branch
> > > handling meDelete case should better do Assert(!isRoot).
> > > > This code was introduced in
> > > >   commit e14641197a5
> > > >   Author: Alexander Korotkov <akorotkov@postgresql.org>
> > > >   Date:   2019-11-19 23:07:36 +0300
> > > >
> > > >       Fix deadlock between ginDeletePage() and ginStepRight()
> > > >
> > > > I didn't trace it further to see if it existed before that in some fashion.
> > >
> > > Yes.  I think generally this area needs to be reworked to become more
> > > clear, and have vast more comments.  It was wrong from my side trying
> > > to fix bugs there without reworking it into something more
> > > appropriate.  I'm planning to put work on this during this week.
> > >
> > > > There's another oddity here: ginScanToDelete() requires that the root page has
> > > > been locked by the caller already, but will afaict re-read the root page? But
> > > > then have code to avoid locking it again, because that would not have worked?
> > > > Seems odd.
> > >
> > >
> > > It seems a bit odd for me that caller already have locked buffer, but
> > > passes BlockNumber making us re-read the buffer.  But I'm not sure
> > > that's the same as your point.  Could you, please, elaborate more on
> > > this?
> >
> > Here is the refactoring patch.  Sorry for the delay.
>
> Hi, Andres and Alexander!
>
> I've looked into the patch v1.
> Overall, it looks good to me.

The refactor LGTM in general. The buffer-ownership rewrite looks
cleaner and safer overall.

> Some thoughts:
>
> Is it worth/possible in recursive calls of ginScanToDelete() to free
> allocated myStackItem->child after processing all children of the
> current level, when they are not needed anymore?
> Previously to this patch, palloc-ed "me" variable also was't freed at
> recursion levels.

I think freeing myStackItem->child inside recursive calls might not be
worthwhile here. That node is intentionally reused for subsequent
siblings at the same depth, and it carries state (leftBuffer) that can
still be needed until the level is fully processed.
Freeing/reallocating it per subtree would add churn and make the
lifetime rules harder to reason about without meaningful memory
savings (the number of nodes is bounded by tree depth, not number of
pages). We currently free the chain once after ginScanToDelete()
returns in ginVacuumPostingTree(), which matches the natural lifetime
boundary

> Could limiting the maximum recursion level be useful?

Posting-tree depth is naturally small; a hard cap seems to add failure
risk with little practical benefit.

> In the comment to myStackItem before ginScanToDelete(), it might be
> worth adding that after processing all pages on the current level,
> myStackItem is not needed anymore.
>
> > > Yes, it's not possible to have meDelete set for root, because
> > > me->leftBuffer is always InvalidBuffer for the root.  So the branch
> > > handling meDelete case should better do Assert(!isRoot).
> Looks like this additional Assert is not in patch v1.
>
> In the root call of ginScanToDelete(gvs, &root); we can add Assert
> checking that its return result is false:
> -               ginScanToDelete(gvs, &root);
> +              deleted = ginScanToDelete(gvs, &root);
> +.             Assert(!deleted); /* Root page is never deleted */

+ 1,  this is a good invariant check and improves readability

One minor nit for the comment:

The DataPageDeleteStack.buffer field comment says "valid only while
recursing into children"
this is true for internal pages, but for leaf pages the buffer is
valid until the pageWasDeleted / leftBuffer decision. The validity
window is actually "from when the caller sets it until the
post-recursion cleanup."


--
Best,
Xuneng



Re: Odd code around ginScanToDelete

От
Pavel Borisov
Дата:
Hi, Xuneng

> > Is it worth/possible in recursive calls of ginScanToDelete() to free
> > allocated myStackItem->child after processing all children of the
> > current level, when they are not needed anymore?
> > Previously to this patch, palloc-ed "me" variable also was't freed at
> > recursion levels.
>
> Freeing/reallocating it per subtree would add churn and make the
> lifetime rules harder to reason about without meaningful memory
> savings (the number of nodes is bounded by tree depth, not number of
> pages). We currently free the chain once after ginScanToDelete()
> returns in ginVacuumPostingTree(), which matches the natural lifetime
> boundary
I proposed not freeing child when child iteration is complete. They
indeed can be reused. I proposed cleaning children when "my" iteration
is complete. At that time all the children iterations are completed
and not needed when we return level up.

Regards,
Pavel Borisov
Supabase



Re: Odd code around ginScanToDelete

От
Alexander Korotkov
Дата:
On Tue, Mar 10, 2026 at 11:29 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> Hi, Xuneng
>
> > > Is it worth/possible in recursive calls of ginScanToDelete() to free
> > > allocated myStackItem->child after processing all children of the
> > > current level, when they are not needed anymore?
> > > Previously to this patch, palloc-ed "me" variable also was't freed at
> > > recursion levels.
> >
> > Freeing/reallocating it per subtree would add churn and make the
> > lifetime rules harder to reason about without meaningful memory
> > savings (the number of nodes is bounded by tree depth, not number of
> > pages). We currently free the chain once after ginScanToDelete()
> > returns in ginVacuumPostingTree(), which matches the natural lifetime
> > boundary
> I proposed not freeing child when child iteration is complete. They
> indeed can be reused. I proposed cleaning children when "my" iteration
> is complete. At that time all the children iterations are completed
> and not needed when we return level up.

This is not clear for me.  We need stack items to keep track of left
pages until we scan the whole posting tree.  After scanning the whole
posting tree we can free stack items as we do now.

------
Regards,
Alexander Korotkov
Supabase



Re: Odd code around ginScanToDelete

От
Pavel Borisov
Дата:
On Thu, 12 Mar 2026 at 02:22, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Tue, Mar 10, 2026 at 11:29 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > Hi, Xuneng
> >
> > > > Is it worth/possible in recursive calls of ginScanToDelete() to free
> > > > allocated myStackItem->child after processing all children of the
> > > > current level, when they are not needed anymore?
> > > > Previously to this patch, palloc-ed "me" variable also was't freed at
> > > > recursion levels.
> > >
> > > Freeing/reallocating it per subtree would add churn and make the
> > > lifetime rules harder to reason about without meaningful memory
> > > savings (the number of nodes is bounded by tree depth, not number of
> > > pages). We currently free the chain once after ginScanToDelete()
> > > returns in ginVacuumPostingTree(), which matches the natural lifetime
> > > boundary
> > I proposed not freeing child when child iteration is complete. They
> > indeed can be reused. I proposed cleaning children when "my" iteration
> > is complete. At that time all the children iterations are completed
> > and not needed when we return level up.
> This is not clear for me.  We need stack items to keep track of left
> pages until we scan the whole posting tree.  After scanning the whole
> posting tree we can free stack items as we do now.

Hi, Alexander!
You are right, that we can free all posting tree stack items after the
whole tree, as we do now. But I think we can also do it earlier. It
looks like all "children" items are needed and could be reused only
until iteration on "my" level ends. When function returns up the
recursion "my" level becomes "child" for a caller, and previous
"child" is not used anymore.

It's optional, maybe we can do freeing at the end of posting tree iteration.

Regards,
Pavel Borisov



Re: Odd code around ginScanToDelete

От
Alexander Korotkov
Дата:
On Thu, Mar 12, 2026 at 12:35 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>
> On Thu, 12 Mar 2026 at 02:22, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > On Tue, Mar 10, 2026 at 11:29 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > > Hi, Xuneng
> > >
> > > > > Is it worth/possible in recursive calls of ginScanToDelete() to free
> > > > > allocated myStackItem->child after processing all children of the
> > > > > current level, when they are not needed anymore?
> > > > > Previously to this patch, palloc-ed "me" variable also was't freed at
> > > > > recursion levels.
> > > >
> > > > Freeing/reallocating it per subtree would add churn and make the
> > > > lifetime rules harder to reason about without meaningful memory
> > > > savings (the number of nodes is bounded by tree depth, not number of
> > > > pages). We currently free the chain once after ginScanToDelete()
> > > > returns in ginVacuumPostingTree(), which matches the natural lifetime
> > > > boundary
> > > I proposed not freeing child when child iteration is complete. They
> > > indeed can be reused. I proposed cleaning children when "my" iteration
> > > is complete. At that time all the children iterations are completed
> > > and not needed when we return level up.
> > This is not clear for me.  We need stack items to keep track of left
> > pages until we scan the whole posting tree.  After scanning the whole
> > posting tree we can free stack items as we do now.
>
> You are right, that we can free all posting tree stack items after the
> whole tree, as we do now. But I think we can also do it earlier. It
> looks like all "children" items are needed and could be reused only
> until iteration on "my" level ends. When function returns up the
> recursion "my" level becomes "child" for a caller, and previous
> "child" is not used anymore.

No matter how many levels we can go up, we can still descend and need
the leftBuffer stored at any stack level.

------
Regards,
Alexander Korotkov
Supabase



Re: Odd code around ginScanToDelete

От
Alexander Korotkov
Дата:
Hi, Pavel!

Thank you for your review!

On Fri, Mar 6, 2026 at 4:45 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> Some thoughts:
>
> Is it worth/possible in recursive calls of ginScanToDelete() to free
> allocated myStackItem->child after processing all children of the
> current level, when they are not needed anymore?
> Previously to this patch, palloc-ed "me" variable also was't freed at
> recursion levels.
>
> Could limiting the maximum recursion level be useful?
>
> In the comment to myStackItem before ginScanToDelete(), it might be
> worth adding that after processing all pages on the current level,
> myStackItem is not needed anymore.

Already answered in this thread.

> > > Yes, it's not possible to have meDelete set for root, because
> > > me->leftBuffer is always InvalidBuffer for the root.  So the branch
> > > handling meDelete case should better do Assert(!isRoot).
> Looks like this additional Assert is not in patch v1.
>
> In the root call of ginScanToDelete(gvs, &root); we can add Assert
> checking that its return result is false:
> -               ginScanToDelete(gvs, &root);
> +              deleted = ginScanToDelete(gvs, &root);
> +.             Assert(!deleted); /* Root page is never deleted */

Done.

> Additionally, it could be good to rename all vacuum functions related
> to posting tree pages only, to include "Posting" (e.g., ginDeletePage
> -> ginDeletePostingPage). The same is for the functions only for the
> entry tree. It is already named this way in many places (e.g.
> ginVacuumPostingTreeLeaves). It could be good to extend this to all
> relevant functions.

Renamed as you proposed.

> Several small proposals on wording:
> "rightmost non-deleted page to its left" -> "closest non-deleted
> sibling page to its left"

I renamed that to just "left sibling".  Deleted pages are not in the
tree already.  And "the rightmost page to its left" is just left
sibling.

> "each entry tracks the buffer of the page" -> "each entry tracks the
> buffers of the page" (as two buffers are mentioned)

I prefer to just use word "buffer" twice to make it more explicit.

> "must already be pinned" -> "must already have been pinned"

Done.

------
Regards,
Alexander Korotkov
Supabase

Вложения

Re: Odd code around ginScanToDelete

От
Alexander Korotkov
Дата:
On Tue, Mar 10, 2026 at 11:19 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> On Fri, Mar 6, 2026 at 10:45 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> >
> > Hi, Andres and Alexander!
> >
> > On Sat, 21 Feb 2026 at 13:55, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > >
> > > On Wed, Feb 4, 2026 at 12:32 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > >
> > > > On Tue, Feb 3, 2026 at 6:26 PM Andres Freund <andres@anarazel.de> wrote:
> > > > >
> > > > > While looking at converting more places to UnlockReleaseBuffer(), in the
> > > > > course of making UnlockReleaseBuffer() faster than the two separate
> > > > > operations, I found this code:
> > > > >
> > > > > static bool
> > > > > ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
> > > > >                                 DataPageDeleteStack *parent, OffsetNumber myoff)
> > > > > ...
> > > > >
> > > > >         if (!meDelete)
> > > > >         {
> > > > >                 if (BufferIsValid(me->leftBuffer))
> > > > >                         UnlockReleaseBuffer(me->leftBuffer);
> > > > >                 me->leftBuffer = buffer;
> > > > >         }
> > > > >         else
> > > > >         {
> > > > >                 if (!isRoot)
> > > > >                         LockBuffer(buffer, GIN_UNLOCK);
> > > > >
> > > > >                 ReleaseBuffer(buffer);
> > > > >         }
> > > > >
> > > > >         if (isRoot)
> > > > >                 ReleaseBuffer(buffer);
> > > > >
> > > > >
> > > > > Which sure looks like it'd release buffer twice if isRoot is set?  I guess
> > > > > that's not reachable, because presumably the root page will always go down the
> > > > > !meDelete path. But it sure made me wonder if there's a hard to reach bug.
> > > >
> > > > Yes, it's not possible to have meDelete set for root, because
> > > > me->leftBuffer is always InvalidBuffer for the root.  So the branch
> > > > handling meDelete case should better do Assert(!isRoot).
> > > > > This code was introduced in
> > > > >   commit e14641197a5
> > > > >   Author: Alexander Korotkov <akorotkov@postgresql.org>
> > > > >   Date:   2019-11-19 23:07:36 +0300
> > > > >
> > > > >       Fix deadlock between ginDeletePage() and ginStepRight()
> > > > >
> > > > > I didn't trace it further to see if it existed before that in some fashion.
> > > >
> > > > Yes.  I think generally this area needs to be reworked to become more
> > > > clear, and have vast more comments.  It was wrong from my side trying
> > > > to fix bugs there without reworking it into something more
> > > > appropriate.  I'm planning to put work on this during this week.
> > > >
> > > > > There's another oddity here: ginScanToDelete() requires that the root page has
> > > > > been locked by the caller already, but will afaict re-read the root page? But
> > > > > then have code to avoid locking it again, because that would not have worked?
> > > > > Seems odd.
> > > >
> > > >
> > > > It seems a bit odd for me that caller already have locked buffer, but
> > > > passes BlockNumber making us re-read the buffer.  But I'm not sure
> > > > that's the same as your point.  Could you, please, elaborate more on
> > > > this?
> > >
> > > Here is the refactoring patch.  Sorry for the delay.
> >
> > Hi, Andres and Alexander!
> >
> > I've looked into the patch v1.
> > Overall, it looks good to me.
>
> The refactor LGTM in general. The buffer-ownership rewrite looks
> cleaner and safer overall.
>
> > Some thoughts:
> >
> > Is it worth/possible in recursive calls of ginScanToDelete() to free
> > allocated myStackItem->child after processing all children of the
> > current level, when they are not needed anymore?
> > Previously to this patch, palloc-ed "me" variable also was't freed at
> > recursion levels.
>
> I think freeing myStackItem->child inside recursive calls might not be
> worthwhile here. That node is intentionally reused for subsequent
> siblings at the same depth, and it carries state (leftBuffer) that can
> still be needed until the level is fully processed.
> Freeing/reallocating it per subtree would add churn and make the
> lifetime rules harder to reason about without meaningful memory
> savings (the number of nodes is bounded by tree depth, not number of
> pages). We currently free the chain once after ginScanToDelete()
> returns in ginVacuumPostingTree(), which matches the natural lifetime
> boundary
>
> > Could limiting the maximum recursion level be useful?
>
> Posting-tree depth is naturally small; a hard cap seems to add failure
> risk with little practical benefit.
>
> > In the comment to myStackItem before ginScanToDelete(), it might be
> > worth adding that after processing all pages on the current level,
> > myStackItem is not needed anymore.
> >
> > > > Yes, it's not possible to have meDelete set for root, because
> > > > me->leftBuffer is always InvalidBuffer for the root.  So the branch
> > > > handling meDelete case should better do Assert(!isRoot).
> > Looks like this additional Assert is not in patch v1.
> >
> > In the root call of ginScanToDelete(gvs, &root); we can add Assert
> > checking that its return result is false:
> > -               ginScanToDelete(gvs, &root);
> > +              deleted = ginScanToDelete(gvs, &root);
> > +.             Assert(!deleted); /* Root page is never deleted */
>
> + 1,  this is a good invariant check and improves readability
>
> One minor nit for the comment:
>
> The DataPageDeleteStack.buffer field comment says "valid only while
> recursing into children"
> this is true for internal pages, but for leaf pages the buffer is
> valid until the pageWasDeleted / leftBuffer decision. The validity
> window is actually "from when the caller sets it until the
> post-recursion cleanup."

Thank you for catching this.  I decided to remove this statement from
v2.  It's hard to explain the life cycle of the buffer clearly in one
sentence.  On the other hand, it's explained in the comments of
ginScanPostingTreeToDelete().

------
Regards,
Alexander Korotkov
Supabase



Re: Odd code around ginScanToDelete

От
Pavel Borisov
Дата:
On Thu, 12 Mar 2026 at 02:52, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Tue, Mar 10, 2026 at 11:19 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > On Fri, Mar 6, 2026 at 10:45 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > >
> > > Hi, Andres and Alexander!
> > >
> > > On Sat, 21 Feb 2026 at 13:55, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > >
> > > > On Wed, Feb 4, 2026 at 12:32 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > > >
> > > > > On Tue, Feb 3, 2026 at 6:26 PM Andres Freund <andres@anarazel.de> wrote:
> > > > > >
> > > > > > While looking at converting more places to UnlockReleaseBuffer(), in the
> > > > > > course of making UnlockReleaseBuffer() faster than the two separate
> > > > > > operations, I found this code:
> > > > > >
> > > > > > static bool
> > > > > > ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
> > > > > >                                 DataPageDeleteStack *parent, OffsetNumber myoff)
> > > > > > ...
> > > > > >
> > > > > >         if (!meDelete)
> > > > > >         {
> > > > > >                 if (BufferIsValid(me->leftBuffer))
> > > > > >                         UnlockReleaseBuffer(me->leftBuffer);
> > > > > >                 me->leftBuffer = buffer;
> > > > > >         }
> > > > > >         else
> > > > > >         {
> > > > > >                 if (!isRoot)
> > > > > >                         LockBuffer(buffer, GIN_UNLOCK);
> > > > > >
> > > > > >                 ReleaseBuffer(buffer);
> > > > > >         }
> > > > > >
> > > > > >         if (isRoot)
> > > > > >                 ReleaseBuffer(buffer);
> > > > > >
> > > > > >
> > > > > > Which sure looks like it'd release buffer twice if isRoot is set?  I guess
> > > > > > that's not reachable, because presumably the root page will always go down the
> > > > > > !meDelete path. But it sure made me wonder if there's a hard to reach bug.
> > > > >
> > > > > Yes, it's not possible to have meDelete set for root, because
> > > > > me->leftBuffer is always InvalidBuffer for the root.  So the branch
> > > > > handling meDelete case should better do Assert(!isRoot).
> > > > > > This code was introduced in
> > > > > >   commit e14641197a5
> > > > > >   Author: Alexander Korotkov <akorotkov@postgresql.org>
> > > > > >   Date:   2019-11-19 23:07:36 +0300
> > > > > >
> > > > > >       Fix deadlock between ginDeletePage() and ginStepRight()
> > > > > >
> > > > > > I didn't trace it further to see if it existed before that in some fashion.
> > > > >
> > > > > Yes.  I think generally this area needs to be reworked to become more
> > > > > clear, and have vast more comments.  It was wrong from my side trying
> > > > > to fix bugs there without reworking it into something more
> > > > > appropriate.  I'm planning to put work on this during this week.
> > > > >
> > > > > > There's another oddity here: ginScanToDelete() requires that the root page has
> > > > > > been locked by the caller already, but will afaict re-read the root page? But
> > > > > > then have code to avoid locking it again, because that would not have worked?
> > > > > > Seems odd.
> > > > >
> > > > >
> > > > > It seems a bit odd for me that caller already have locked buffer, but
> > > > > passes BlockNumber making us re-read the buffer.  But I'm not sure
> > > > > that's the same as your point.  Could you, please, elaborate more on
> > > > > this?
> > > >
> > > > Here is the refactoring patch.  Sorry for the delay.
> > >
> > > Hi, Andres and Alexander!
> > >
> > > I've looked into the patch v1.
> > > Overall, it looks good to me.
> >
> > The refactor LGTM in general. The buffer-ownership rewrite looks
> > cleaner and safer overall.
> >
> > > Some thoughts:
> > >
> > > Is it worth/possible in recursive calls of ginScanToDelete() to free
> > > allocated myStackItem->child after processing all children of the
> > > current level, when they are not needed anymore?
> > > Previously to this patch, palloc-ed "me" variable also was't freed at
> > > recursion levels.
> >
> > I think freeing myStackItem->child inside recursive calls might not be
> > worthwhile here. That node is intentionally reused for subsequent
> > siblings at the same depth, and it carries state (leftBuffer) that can
> > still be needed until the level is fully processed.
> > Freeing/reallocating it per subtree would add churn and make the
> > lifetime rules harder to reason about without meaningful memory
> > savings (the number of nodes is bounded by tree depth, not number of
> > pages). We currently free the chain once after ginScanToDelete()
> > returns in ginVacuumPostingTree(), which matches the natural lifetime
> > boundary
> >
> > > Could limiting the maximum recursion level be useful?
> >
> > Posting-tree depth is naturally small; a hard cap seems to add failure
> > risk with little practical benefit.
> >
> > > In the comment to myStackItem before ginScanToDelete(), it might be
> > > worth adding that after processing all pages on the current level,
> > > myStackItem is not needed anymore.
> > >
> > > > > Yes, it's not possible to have meDelete set for root, because
> > > > > me->leftBuffer is always InvalidBuffer for the root.  So the branch
> > > > > handling meDelete case should better do Assert(!isRoot).
> > > Looks like this additional Assert is not in patch v1.
> > >
> > > In the root call of ginScanToDelete(gvs, &root); we can add Assert
> > > checking that its return result is false:
> > > -               ginScanToDelete(gvs, &root);
> > > +              deleted = ginScanToDelete(gvs, &root);
> > > +.             Assert(!deleted); /* Root page is never deleted */
> >
> > + 1,  this is a good invariant check and improves readability
> >
> > One minor nit for the comment:
> >
> > The DataPageDeleteStack.buffer field comment says "valid only while
> > recursing into children"
> > this is true for internal pages, but for leaf pages the buffer is
> > valid until the pageWasDeleted / leftBuffer decision. The validity
> > window is actually "from when the caller sets it until the
> > post-recursion cleanup."
>
> Thank you for catching this.  I decided to remove this statement from
> v2.  It's hard to explain the life cycle of the buffer clearly in one
> sentence.  On the other hand, it's explained in the comments of
> ginScanPostingTreeToDelete().

Hi, Alexander!
Patch v2 looks good to me. I also agree with Xuneng that this
refactoring improved the logic to make it look clearer.
Thank you for the explanation of buffers lifetime!

Regards,
Pavel Borisov



Re: Odd code around ginScanToDelete

От
Alexander Korotkov
Дата:
Hi, Pavel!

On Thu, Mar 12, 2026 at 10:41 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>
> On Thu, 12 Mar 2026 at 02:52, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > On Tue, Mar 10, 2026 at 11:19 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > On Fri, Mar 6, 2026 at 10:45 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > > >
> > > > Hi, Andres and Alexander!
> > > >
> > > > On Sat, 21 Feb 2026 at 13:55, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > > >
> > > > > On Wed, Feb 4, 2026 at 12:32 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Feb 3, 2026 at 6:26 PM Andres Freund <andres@anarazel.de> wrote:
> > > > > > >
> > > > > > > While looking at converting more places to UnlockReleaseBuffer(), in the
> > > > > > > course of making UnlockReleaseBuffer() faster than the two separate
> > > > > > > operations, I found this code:
> > > > > > >
> > > > > > > static bool
> > > > > > > ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
> > > > > > >                                 DataPageDeleteStack *parent, OffsetNumber myoff)
> > > > > > > ...
> > > > > > >
> > > > > > >         if (!meDelete)
> > > > > > >         {
> > > > > > >                 if (BufferIsValid(me->leftBuffer))
> > > > > > >                         UnlockReleaseBuffer(me->leftBuffer);
> > > > > > >                 me->leftBuffer = buffer;
> > > > > > >         }
> > > > > > >         else
> > > > > > >         {
> > > > > > >                 if (!isRoot)
> > > > > > >                         LockBuffer(buffer, GIN_UNLOCK);
> > > > > > >
> > > > > > >                 ReleaseBuffer(buffer);
> > > > > > >         }
> > > > > > >
> > > > > > >         if (isRoot)
> > > > > > >                 ReleaseBuffer(buffer);
> > > > > > >
> > > > > > >
> > > > > > > Which sure looks like it'd release buffer twice if isRoot is set?  I guess
> > > > > > > that's not reachable, because presumably the root page will always go down the
> > > > > > > !meDelete path. But it sure made me wonder if there's a hard to reach bug.
> > > > > >
> > > > > > Yes, it's not possible to have meDelete set for root, because
> > > > > > me->leftBuffer is always InvalidBuffer for the root.  So the branch
> > > > > > handling meDelete case should better do Assert(!isRoot).
> > > > > > > This code was introduced in
> > > > > > >   commit e14641197a5
> > > > > > >   Author: Alexander Korotkov <akorotkov@postgresql.org>
> > > > > > >   Date:   2019-11-19 23:07:36 +0300
> > > > > > >
> > > > > > >       Fix deadlock between ginDeletePage() and ginStepRight()
> > > > > > >
> > > > > > > I didn't trace it further to see if it existed before that in some fashion.
> > > > > >
> > > > > > Yes.  I think generally this area needs to be reworked to become more
> > > > > > clear, and have vast more comments.  It was wrong from my side trying
> > > > > > to fix bugs there without reworking it into something more
> > > > > > appropriate.  I'm planning to put work on this during this week.
> > > > > >
> > > > > > > There's another oddity here: ginScanToDelete() requires that the root page has
> > > > > > > been locked by the caller already, but will afaict re-read the root page? But
> > > > > > > then have code to avoid locking it again, because that would not have worked?
> > > > > > > Seems odd.
> > > > > >
> > > > > >
> > > > > > It seems a bit odd for me that caller already have locked buffer, but
> > > > > > passes BlockNumber making us re-read the buffer.  But I'm not sure
> > > > > > that's the same as your point.  Could you, please, elaborate more on
> > > > > > this?
> > > > >
> > > > > Here is the refactoring patch.  Sorry for the delay.
> > > >
> > > > Hi, Andres and Alexander!
> > > >
> > > > I've looked into the patch v1.
> > > > Overall, it looks good to me.
> > >
> > > The refactor LGTM in general. The buffer-ownership rewrite looks
> > > cleaner and safer overall.
> > >
> > > > Some thoughts:
> > > >
> > > > Is it worth/possible in recursive calls of ginScanToDelete() to free
> > > > allocated myStackItem->child after processing all children of the
> > > > current level, when they are not needed anymore?
> > > > Previously to this patch, palloc-ed "me" variable also was't freed at
> > > > recursion levels.
> > >
> > > I think freeing myStackItem->child inside recursive calls might not be
> > > worthwhile here. That node is intentionally reused for subsequent
> > > siblings at the same depth, and it carries state (leftBuffer) that can
> > > still be needed until the level is fully processed.
> > > Freeing/reallocating it per subtree would add churn and make the
> > > lifetime rules harder to reason about without meaningful memory
> > > savings (the number of nodes is bounded by tree depth, not number of
> > > pages). We currently free the chain once after ginScanToDelete()
> > > returns in ginVacuumPostingTree(), which matches the natural lifetime
> > > boundary
> > >
> > > > Could limiting the maximum recursion level be useful?
> > >
> > > Posting-tree depth is naturally small; a hard cap seems to add failure
> > > risk with little practical benefit.
> > >
> > > > In the comment to myStackItem before ginScanToDelete(), it might be
> > > > worth adding that after processing all pages on the current level,
> > > > myStackItem is not needed anymore.
> > > >
> > > > > > Yes, it's not possible to have meDelete set for root, because
> > > > > > me->leftBuffer is always InvalidBuffer for the root.  So the branch
> > > > > > handling meDelete case should better do Assert(!isRoot).
> > > > Looks like this additional Assert is not in patch v1.
> > > >
> > > > In the root call of ginScanToDelete(gvs, &root); we can add Assert
> > > > checking that its return result is false:
> > > > -               ginScanToDelete(gvs, &root);
> > > > +              deleted = ginScanToDelete(gvs, &root);
> > > > +.             Assert(!deleted); /* Root page is never deleted */
> > >
> > > + 1,  this is a good invariant check and improves readability
> > >
> > > One minor nit for the comment:
> > >
> > > The DataPageDeleteStack.buffer field comment says "valid only while
> > > recursing into children"
> > > this is true for internal pages, but for leaf pages the buffer is
> > > valid until the pageWasDeleted / leftBuffer decision. The validity
> > > window is actually "from when the caller sets it until the
> > > post-recursion cleanup."
> >
> > Thank you for catching this.  I decided to remove this statement from
> > v2.  It's hard to explain the life cycle of the buffer clearly in one
> > sentence.  On the other hand, it's explained in the comments of
> > ginScanPostingTreeToDelete().
>
> Patch v2 looks good to me. I also agree with Xuneng that this
> refactoring improved the logic to make it look clearer.
> Thank you for the explanation of buffers lifetime!

Thank you for your feedback. I'll push the patch if no objections.

------
Regards,
Alexander Korotkov
Supabase



Re:Re: Odd code around ginScanToDelete

От
jinbinge
Дата:
At 2026-03-12 17:05:51, "Alexander Korotkov" <aekorotkov@gmail.com> wrote:
>Hi, Pavel!
>
>On Thu, Mar 12, 2026 at 10:41 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>>
>> On Thu, 12 Mar 2026 at 02:52, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>> >
>> > On Tue, Mar 10, 2026 at 11:19 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>> > > On Fri, Mar 6, 2026 at 10:45 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>> > > >
>> > > > Hi, Andres and Alexander!
>> > > >
>> > > > On Sat, 21 Feb 2026 at 13:55, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>> > > > >
>> > > > > On Wed, Feb 4, 2026 at 12:32 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>> > > > > >
>> > > > > > On Tue, Feb 3, 2026 at 6:26 PM Andres Freund <andres@anarazel.de> wrote:
>> > > > > > >
>> > > > > > > While looking at converting more places to UnlockReleaseBuffer(), in the
>> > > > > > > course of making UnlockReleaseBuffer() faster than the two separate
>> > > > > > > operations, I found this code:
>> > > > > > >
>> > > > > > > static bool
>> > > > > > > ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
>> > > > > > >                                 DataPageDeleteStack *parent, OffsetNumber myoff)
>> > > > > > > ...
>> > > > > > >
>> > > > > > >         if (!meDelete)
>> > > > > > >         {
>> > > > > > >                 if (BufferIsValid(me->leftBuffer))
>> > > > > > >                         UnlockReleaseBuffer(me->leftBuffer);
>> > > > > > >                 me->leftBuffer = buffer;
>> > > > > > >         }
>> > > > > > >         else
>> > > > > > >         {
>> > > > > > >                 if (!isRoot)
>> > > > > > >                         LockBuffer(buffer, GIN_UNLOCK);
>> > > > > > >
>> > > > > > >                 ReleaseBuffer(buffer);
>> > > > > > >         }
>> > > > > > >
>> > > > > > >         if (isRoot)
>> > > > > > >                 ReleaseBuffer(buffer);
>> > > > > > >
>> > > > > > >
>> > > > > > > Which sure looks like it'd release buffer twice if isRoot is set?  I guess
>> > > > > > > that's not reachable, because presumably the root page will always go down the
>> > > > > > > !meDelete path. But it sure made me wonder if there's a hard to reach bug.
>> > > > > >
>> > > > > > Yes, it's not possible to have meDelete set for root, because
>> > > > > > me->leftBuffer is always InvalidBuffer for the root.  So the branch
>> > > > > > handling meDelete case should better do Assert(!isRoot).
>> > > > > > > This code was introduced in
>> > > > > > >   commit e14641197a5
>> > > > > > >   Author: Alexander Korotkov <akorotkov@postgresql.org>
>> > > > > > >   Date:   2019-11-19 23:07:36 +0300
>> > > > > > >
>> > > > > > >       Fix deadlock between ginDeletePage() and ginStepRight()
>> > > > > > >
>> > > > > > > I didn't trace it further to see if it existed before that in some fashion.
>> > > > > >
>> > > > > > Yes.  I think generally this area needs to be reworked to become more
>> > > > > > clear, and have vast more comments.  It was wrong from my side trying
>> > > > > > to fix bugs there without reworking it into something more
>> > > > > > appropriate.  I'm planning to put work on this during this week.
>> > > > > >
>> > > > > > > There's another oddity here: ginScanToDelete() requires that the root page has
>> > > > > > > been locked by the caller already, but will afaict re-read the root page? But
>> > > > > > > then have code to avoid locking it again, because that would not have worked?
>> > > > > > > Seems odd.
>> > > > > >
>> > > > > >
>> > > > > > It seems a bit odd for me that caller already have locked buffer, but
>> > > > > > passes BlockNumber making us re-read the buffer.  But I'm not sure
>> > > > > > that's the same as your point.  Could you, please, elaborate more on
>> > > > > > this?
>> > > > >
>> > > > > Here is the refactoring patch.  Sorry for the delay.
>> > > >
>> > > > Hi, Andres and Alexander!
>> > > >
>> > > > I've looked into the patch v1.
>> > > > Overall, it looks good to me.
>> > >
>> > > The refactor LGTM in general. The buffer-ownership rewrite looks
>> > > cleaner and safer overall.
>> > >
>> > > > Some thoughts:
>> > > >
>> > > > Is it worth/possible in recursive calls of ginScanToDelete() to free
>> > > > allocated myStackItem->child after processing all children of the
>> > > > current level, when they are not needed anymore?
>> > > > Previously to this patch, palloc-ed "me" variable also was't freed at
>> > > > recursion levels.
>> > >
>> > > I think freeing myStackItem->child inside recursive calls might not be
>> > > worthwhile here. That node is intentionally reused for subsequent
>> > > siblings at the same depth, and it carries state (leftBuffer) that can
>> > > still be needed until the level is fully processed.
>> > > Freeing/reallocating it per subtree would add churn and make the
>> > > lifetime rules harder to reason about without meaningful memory
>> > > savings (the number of nodes is bounded by tree depth, not number of
>> > > pages). We currently free the chain once after ginScanToDelete()
>> > > returns in ginVacuumPostingTree(), which matches the natural lifetime
>> > > boundary
>> > >
>> > > > Could limiting the maximum recursion level be useful?
>> > >
>> > > Posting-tree depth is naturally small; a hard cap seems to add failure
>> > > risk with little practical benefit.
>> > >
>> > > > In the comment to myStackItem before ginScanToDelete(), it might be
>> > > > worth adding that after processing all pages on the current level,
>> > > > myStackItem is not needed anymore.
>> > > >
>> > > > > > Yes, it's not possible to have meDelete set for root, because
>> > > > > > me->leftBuffer is always InvalidBuffer for the root.  So the branch
>> > > > > > handling meDelete case should better do Assert(!isRoot).
>> > > > Looks like this additional Assert is not in patch v1.
>> > > >
>> > > > In the root call of ginScanToDelete(gvs, &root); we can add Assert
>> > > > checking that its return result is false:
>> > > > -               ginScanToDelete(gvs, &root);
>> > > > +              deleted = ginScanToDelete(gvs, &root);
>> > > > +.             Assert(!deleted); /* Root page is never deleted */
>> > >
>> > > + 1,  this is a good invariant check and improves readability
>> > >
>> > > One minor nit for the comment:
>> > >
>> > > The DataPageDeleteStack.buffer field comment says "valid only while
>> > > recursing into children"
>> > > this is true for internal pages, but for leaf pages the buffer is
>> > > valid until the pageWasDeleted / leftBuffer decision. The validity
>> > > window is actually "from when the caller sets it until the
>> > > post-recursion cleanup."
>> >
>> > Thank you for catching this.  I decided to remove this statement from
>> > v2.  It's hard to explain the life cycle of the buffer clearly in one
>> > sentence.  On the other hand, it's explained in the comments of
>> > ginScanPostingTreeToDelete().
>>
>> Patch v2 looks good to me. I also agree with Xuneng that this
>> refactoring improved the logic to make it look clearer.
>> Thank you for the explanation of buffers lifetime!
>
>Thank you for your feedback. I'll push the patch if no objections.
>
>------
>Regards,
>Alexander Korotkov
>Supabase
>


Hi,

Overall solid to me. I got a nitpick:
in the code comments, ginDeletePage should be ginDeletePostingPag(ginDeletePage -> ginDeletePostingPage).
There are 3 places that need to be revised.
------
Regards,
jinbinge

Re: Odd code around ginScanToDelete

От
Xuneng Zhou
Дата:
On Thu, Mar 12, 2026 at 6:37 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Thu, Mar 12, 2026 at 12:35 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> >
> > On Thu, 12 Mar 2026 at 02:22, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > >
> > > On Tue, Mar 10, 2026 at 11:29 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > > > Hi, Xuneng
> > > >
> > > > > > Is it worth/possible in recursive calls of ginScanToDelete() to free
> > > > > > allocated myStackItem->child after processing all children of the
> > > > > > current level, when they are not needed anymore?
> > > > > > Previously to this patch, palloc-ed "me" variable also was't freed at
> > > > > > recursion levels.
> > > > >
> > > > > Freeing/reallocating it per subtree would add churn and make the
> > > > > lifetime rules harder to reason about without meaningful memory
> > > > > savings (the number of nodes is bounded by tree depth, not number of
> > > > > pages). We currently free the chain once after ginScanToDelete()
> > > > > returns in ginVacuumPostingTree(), which matches the natural lifetime
> > > > > boundary
> > > > I proposed not freeing child when child iteration is complete. They
> > > > indeed can be reused. I proposed cleaning children when "my" iteration
> > > > is complete. At that time all the children iterations are completed
> > > > and not needed when we return level up.
> > > This is not clear for me.  We need stack items to keep track of left
> > > pages until we scan the whole posting tree.  After scanning the whole
> > > posting tree we can free stack items as we do now.
> >
> > You are right, that we can free all posting tree stack items after the
> > whole tree, as we do now. But I think we can also do it earlier. It
> > looks like all "children" items are needed and could be reused only
> > until iteration on "my" level ends. When function returns up the
> > recursion "my" level becomes "child" for a caller, and previous
> > "child" is not used anymore.
>
> No matter how many levels we can go up, we can still descend and need
> the leftBuffer stored at any stack level.
>

Yeah, the important point is that a stack item here represents
per-depth scan state, not just one recursive invocation. Returning
from one subtree does not necessarily mean that depth is finished
globally: the caller may move to a sibling and descend again, and that
later descent can still need the child level's saved leftBuffer from
the subtree we just finished. The stronger condition is that no more
pages remain to be scanned to the right at that depth; the code
already uses GinPageRightMost(page) for that when releasing the child
level's leftBuffer.

--
Best,
Xuneng