Обсуждение: Odd code around ginScanToDelete
Hi,
While looking at converting more places to UnlockReleaseBuffer(), in the
course of making UnlockReleaseBuffer() faster than the two separate
operations, I found this code:
static bool
ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
DataPageDeleteStack *parent, OffsetNumber myoff)
...
if (!meDelete)
{
if (BufferIsValid(me->leftBuffer))
UnlockReleaseBuffer(me->leftBuffer);
me->leftBuffer = buffer;
}
else
{
if (!isRoot)
LockBuffer(buffer, GIN_UNLOCK);
ReleaseBuffer(buffer);
}
if (isRoot)
ReleaseBuffer(buffer);
Which sure looks like it'd release buffer twice if isRoot is set? I guess
that's not reachable, because presumably the root page will always go down the
!meDelete path. But it sure made me wonder if there's a hard to reach bug.
This code was introduced in
commit e14641197a5
Author: Alexander Korotkov <akorotkov@postgresql.org>
Date: 2019-11-19 23:07:36 +0300
Fix deadlock between ginDeletePage() and ginStepRight()
I didn't trace it further to see if it existed before that in some fashion.
There's another oddity here: ginScanToDelete() requires that the root page has
been locked by the caller already, but will afaict re-read the root page? But
then have code to avoid locking it again, because that would not have worked?
Seems odd.
Greetings,
Andres Freund
On Tue, Feb 3, 2026 at 6:26 PM Andres Freund <andres@anarazel.de> wrote:
>
> While looking at converting more places to UnlockReleaseBuffer(), in the
> course of making UnlockReleaseBuffer() faster than the two separate
> operations, I found this code:
>
> static bool
> ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
> DataPageDeleteStack *parent, OffsetNumber myoff)
> ...
>
> if (!meDelete)
> {
> if (BufferIsValid(me->leftBuffer))
> UnlockReleaseBuffer(me->leftBuffer);
> me->leftBuffer = buffer;
> }
> else
> {
> if (!isRoot)
> LockBuffer(buffer, GIN_UNLOCK);
>
> ReleaseBuffer(buffer);
> }
>
> if (isRoot)
> ReleaseBuffer(buffer);
>
>
> Which sure looks like it'd release buffer twice if isRoot is set? I guess
> that's not reachable, because presumably the root page will always go down the
> !meDelete path. But it sure made me wonder if there's a hard to reach bug.
Yes, it's not possible to have meDelete set for root, because
me->leftBuffer is always InvalidBuffer for the root. So the branch
handling meDelete case should better do Assert(!isRoot).
> This code was introduced in
> commit e14641197a5
> Author: Alexander Korotkov <akorotkov@postgresql.org>
> Date: 2019-11-19 23:07:36 +0300
>
> Fix deadlock between ginDeletePage() and ginStepRight()
>
> I didn't trace it further to see if it existed before that in some fashion.
Yes. I think generally this area needs to be reworked to become more
clear, and have vast more comments. It was wrong from my side trying
to fix bugs there without reworking it into something more
appropriate. I'm planning to put work on this during this week.
> There's another oddity here: ginScanToDelete() requires that the root page has
> been locked by the caller already, but will afaict re-read the root page? But
> then have code to avoid locking it again, because that would not have worked?
> Seems odd.
It seems a bit odd for me that caller already have locked buffer, but
passes BlockNumber making us re-read the buffer. But I'm not sure
that's the same as your point. Could you, please, elaborate more on
this?
------
Regards,
Alexander Korotkov
Supabase
On Wed, Feb 4, 2026 at 12:32 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Tue, Feb 3, 2026 at 6:26 PM Andres Freund <andres@anarazel.de> wrote:
> >
> > While looking at converting more places to UnlockReleaseBuffer(), in the
> > course of making UnlockReleaseBuffer() faster than the two separate
> > operations, I found this code:
> >
> > static bool
> > ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
> > DataPageDeleteStack *parent, OffsetNumber myoff)
> > ...
> >
> > if (!meDelete)
> > {
> > if (BufferIsValid(me->leftBuffer))
> > UnlockReleaseBuffer(me->leftBuffer);
> > me->leftBuffer = buffer;
> > }
> > else
> > {
> > if (!isRoot)
> > LockBuffer(buffer, GIN_UNLOCK);
> >
> > ReleaseBuffer(buffer);
> > }
> >
> > if (isRoot)
> > ReleaseBuffer(buffer);
> >
> >
> > Which sure looks like it'd release buffer twice if isRoot is set? I guess
> > that's not reachable, because presumably the root page will always go down the
> > !meDelete path. But it sure made me wonder if there's a hard to reach bug.
>
> Yes, it's not possible to have meDelete set for root, because
> me->leftBuffer is always InvalidBuffer for the root. So the branch
> handling meDelete case should better do Assert(!isRoot).
>
> > This code was introduced in
> > commit e14641197a5
> > Author: Alexander Korotkov <akorotkov@postgresql.org>
> > Date: 2019-11-19 23:07:36 +0300
> >
> > Fix deadlock between ginDeletePage() and ginStepRight()
> >
> > I didn't trace it further to see if it existed before that in some fashion.
>
> Yes. I think generally this area needs to be reworked to become more
> clear, and have vast more comments. It was wrong from my side trying
> to fix bugs there without reworking it into something more
> appropriate. I'm planning to put work on this during this week.
>
> > There's another oddity here: ginScanToDelete() requires that the root page has
> > been locked by the caller already, but will afaict re-read the root page? But
> > then have code to avoid locking it again, because that would not have worked?
> > Seems odd.
>
>
> It seems a bit odd for me that caller already have locked buffer, but
> passes BlockNumber making us re-read the buffer. But I'm not sure
> that's the same as your point. Could you, please, elaborate more on
> this?
Here is the refactoring patch. Sorry for the delay.
------
Regards,
Alexander Korotkov
Supabase
Вложения
Hi, Andres and Alexander!
On Sat, 21 Feb 2026 at 13:55, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Wed, Feb 4, 2026 at 12:32 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > On Tue, Feb 3, 2026 at 6:26 PM Andres Freund <andres@anarazel.de> wrote:
> > >
> > > While looking at converting more places to UnlockReleaseBuffer(), in the
> > > course of making UnlockReleaseBuffer() faster than the two separate
> > > operations, I found this code:
> > >
> > > static bool
> > > ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
> > > DataPageDeleteStack *parent, OffsetNumber myoff)
> > > ...
> > >
> > > if (!meDelete)
> > > {
> > > if (BufferIsValid(me->leftBuffer))
> > > UnlockReleaseBuffer(me->leftBuffer);
> > > me->leftBuffer = buffer;
> > > }
> > > else
> > > {
> > > if (!isRoot)
> > > LockBuffer(buffer, GIN_UNLOCK);
> > >
> > > ReleaseBuffer(buffer);
> > > }
> > >
> > > if (isRoot)
> > > ReleaseBuffer(buffer);
> > >
> > >
> > > Which sure looks like it'd release buffer twice if isRoot is set? I guess
> > > that's not reachable, because presumably the root page will always go down the
> > > !meDelete path. But it sure made me wonder if there's a hard to reach bug.
> >
> > Yes, it's not possible to have meDelete set for root, because
> > me->leftBuffer is always InvalidBuffer for the root. So the branch
> > handling meDelete case should better do Assert(!isRoot).
> > > This code was introduced in
> > > commit e14641197a5
> > > Author: Alexander Korotkov <akorotkov@postgresql.org>
> > > Date: 2019-11-19 23:07:36 +0300
> > >
> > > Fix deadlock between ginDeletePage() and ginStepRight()
> > >
> > > I didn't trace it further to see if it existed before that in some fashion.
> >
> > Yes. I think generally this area needs to be reworked to become more
> > clear, and have vast more comments. It was wrong from my side trying
> > to fix bugs there without reworking it into something more
> > appropriate. I'm planning to put work on this during this week.
> >
> > > There's another oddity here: ginScanToDelete() requires that the root page has
> > > been locked by the caller already, but will afaict re-read the root page? But
> > > then have code to avoid locking it again, because that would not have worked?
> > > Seems odd.
> >
> >
> > It seems a bit odd for me that caller already have locked buffer, but
> > passes BlockNumber making us re-read the buffer. But I'm not sure
> > that's the same as your point. Could you, please, elaborate more on
> > this?
>
> Here is the refactoring patch. Sorry for the delay.
Hi, Andres and Alexander!
I've looked into the patch v1.
Overall, it looks good to me.
Some thoughts:
Is it worth/possible in recursive calls of ginScanToDelete() to free
allocated myStackItem->child after processing all children of the
current level, when they are not needed anymore?
Previously to this patch, palloc-ed "me" variable also was't freed at
recursion levels.
Could limiting the maximum recursion level be useful?
In the comment to myStackItem before ginScanToDelete(), it might be
worth adding that after processing all pages on the current level,
myStackItem is not needed anymore.
> > Yes, it's not possible to have meDelete set for root, because
> > me->leftBuffer is always InvalidBuffer for the root. So the branch
> > handling meDelete case should better do Assert(!isRoot).
Looks like this additional Assert is not in patch v1.
In the root call of ginScanToDelete(gvs, &root); we can add Assert
checking that its return result is false:
- ginScanToDelete(gvs, &root);
+ deleted = ginScanToDelete(gvs, &root);
+. Assert(!deleted); /* Root page is never deleted */
Additionally, it could be good to rename all vacuum functions related
to posting tree pages only, to include "Posting" (e.g., ginDeletePage
-> ginDeletePostingPage). The same is for the functions only for the
entry tree. It is already named this way in many places (e.g.
ginVacuumPostingTreeLeaves). It could be good to extend this to all
relevant functions.
Several small proposals on wording:
"rightmost non-deleted page to its left" -> "closest non-deleted
sibling page to its left"
"each entry tracks the buffer of the page" -> "each entry tracks the
buffers of the page" (as two buffers are mentioned)
"must already be pinned" -> "must already have been pinned"
Best regards,
Pavel Borisov
Supabase
Hi Pavel, Alexander,
On Fri, Mar 6, 2026 at 10:45 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>
> Hi, Andres and Alexander!
>
> On Sat, 21 Feb 2026 at 13:55, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > On Wed, Feb 4, 2026 at 12:32 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > >
> > > On Tue, Feb 3, 2026 at 6:26 PM Andres Freund <andres@anarazel.de> wrote:
> > > >
> > > > While looking at converting more places to UnlockReleaseBuffer(), in the
> > > > course of making UnlockReleaseBuffer() faster than the two separate
> > > > operations, I found this code:
> > > >
> > > > static bool
> > > > ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
> > > > DataPageDeleteStack *parent, OffsetNumber myoff)
> > > > ...
> > > >
> > > > if (!meDelete)
> > > > {
> > > > if (BufferIsValid(me->leftBuffer))
> > > > UnlockReleaseBuffer(me->leftBuffer);
> > > > me->leftBuffer = buffer;
> > > > }
> > > > else
> > > > {
> > > > if (!isRoot)
> > > > LockBuffer(buffer, GIN_UNLOCK);
> > > >
> > > > ReleaseBuffer(buffer);
> > > > }
> > > >
> > > > if (isRoot)
> > > > ReleaseBuffer(buffer);
> > > >
> > > >
> > > > Which sure looks like it'd release buffer twice if isRoot is set? I guess
> > > > that's not reachable, because presumably the root page will always go down the
> > > > !meDelete path. But it sure made me wonder if there's a hard to reach bug.
> > >
> > > Yes, it's not possible to have meDelete set for root, because
> > > me->leftBuffer is always InvalidBuffer for the root. So the branch
> > > handling meDelete case should better do Assert(!isRoot).
> > > > This code was introduced in
> > > > commit e14641197a5
> > > > Author: Alexander Korotkov <akorotkov@postgresql.org>
> > > > Date: 2019-11-19 23:07:36 +0300
> > > >
> > > > Fix deadlock between ginDeletePage() and ginStepRight()
> > > >
> > > > I didn't trace it further to see if it existed before that in some fashion.
> > >
> > > Yes. I think generally this area needs to be reworked to become more
> > > clear, and have vast more comments. It was wrong from my side trying
> > > to fix bugs there without reworking it into something more
> > > appropriate. I'm planning to put work on this during this week.
> > >
> > > > There's another oddity here: ginScanToDelete() requires that the root page has
> > > > been locked by the caller already, but will afaict re-read the root page? But
> > > > then have code to avoid locking it again, because that would not have worked?
> > > > Seems odd.
> > >
> > >
> > > It seems a bit odd for me that caller already have locked buffer, but
> > > passes BlockNumber making us re-read the buffer. But I'm not sure
> > > that's the same as your point. Could you, please, elaborate more on
> > > this?
> >
> > Here is the refactoring patch. Sorry for the delay.
>
> Hi, Andres and Alexander!
>
> I've looked into the patch v1.
> Overall, it looks good to me.
The refactor LGTM in general. The buffer-ownership rewrite looks
cleaner and safer overall.
> Some thoughts:
>
> Is it worth/possible in recursive calls of ginScanToDelete() to free
> allocated myStackItem->child after processing all children of the
> current level, when they are not needed anymore?
> Previously to this patch, palloc-ed "me" variable also was't freed at
> recursion levels.
I think freeing myStackItem->child inside recursive calls might not be
worthwhile here. That node is intentionally reused for subsequent
siblings at the same depth, and it carries state (leftBuffer) that can
still be needed until the level is fully processed.
Freeing/reallocating it per subtree would add churn and make the
lifetime rules harder to reason about without meaningful memory
savings (the number of nodes is bounded by tree depth, not number of
pages). We currently free the chain once after ginScanToDelete()
returns in ginVacuumPostingTree(), which matches the natural lifetime
boundary
> Could limiting the maximum recursion level be useful?
Posting-tree depth is naturally small; a hard cap seems to add failure
risk with little practical benefit.
> In the comment to myStackItem before ginScanToDelete(), it might be
> worth adding that after processing all pages on the current level,
> myStackItem is not needed anymore.
>
> > > Yes, it's not possible to have meDelete set for root, because
> > > me->leftBuffer is always InvalidBuffer for the root. So the branch
> > > handling meDelete case should better do Assert(!isRoot).
> Looks like this additional Assert is not in patch v1.
>
> In the root call of ginScanToDelete(gvs, &root); we can add Assert
> checking that its return result is false:
> - ginScanToDelete(gvs, &root);
> + deleted = ginScanToDelete(gvs, &root);
> +. Assert(!deleted); /* Root page is never deleted */
+ 1, this is a good invariant check and improves readability
One minor nit for the comment:
The DataPageDeleteStack.buffer field comment says "valid only while
recursing into children"
this is true for internal pages, but for leaf pages the buffer is
valid until the pageWasDeleted / leftBuffer decision. The validity
window is actually "from when the caller sets it until the
post-recursion cleanup."
--
Best,
Xuneng
Hi, Xuneng > > Is it worth/possible in recursive calls of ginScanToDelete() to free > > allocated myStackItem->child after processing all children of the > > current level, when they are not needed anymore? > > Previously to this patch, palloc-ed "me" variable also was't freed at > > recursion levels. > > Freeing/reallocating it per subtree would add churn and make the > lifetime rules harder to reason about without meaningful memory > savings (the number of nodes is bounded by tree depth, not number of > pages). We currently free the chain once after ginScanToDelete() > returns in ginVacuumPostingTree(), which matches the natural lifetime > boundary I proposed not freeing child when child iteration is complete. They indeed can be reused. I proposed cleaning children when "my" iteration is complete. At that time all the children iterations are completed and not needed when we return level up. Regards, Pavel Borisov Supabase
On Tue, Mar 10, 2026 at 11:29 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > Hi, Xuneng > > > > Is it worth/possible in recursive calls of ginScanToDelete() to free > > > allocated myStackItem->child after processing all children of the > > > current level, when they are not needed anymore? > > > Previously to this patch, palloc-ed "me" variable also was't freed at > > > recursion levels. > > > > Freeing/reallocating it per subtree would add churn and make the > > lifetime rules harder to reason about without meaningful memory > > savings (the number of nodes is bounded by tree depth, not number of > > pages). We currently free the chain once after ginScanToDelete() > > returns in ginVacuumPostingTree(), which matches the natural lifetime > > boundary > I proposed not freeing child when child iteration is complete. They > indeed can be reused. I proposed cleaning children when "my" iteration > is complete. At that time all the children iterations are completed > and not needed when we return level up. This is not clear for me. We need stack items to keep track of left pages until we scan the whole posting tree. After scanning the whole posting tree we can free stack items as we do now. ------ Regards, Alexander Korotkov Supabase
On Thu, 12 Mar 2026 at 02:22, Alexander Korotkov <aekorotkov@gmail.com> wrote: > > On Tue, Mar 10, 2026 at 11:29 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > > Hi, Xuneng > > > > > > Is it worth/possible in recursive calls of ginScanToDelete() to free > > > > allocated myStackItem->child after processing all children of the > > > > current level, when they are not needed anymore? > > > > Previously to this patch, palloc-ed "me" variable also was't freed at > > > > recursion levels. > > > > > > Freeing/reallocating it per subtree would add churn and make the > > > lifetime rules harder to reason about without meaningful memory > > > savings (the number of nodes is bounded by tree depth, not number of > > > pages). We currently free the chain once after ginScanToDelete() > > > returns in ginVacuumPostingTree(), which matches the natural lifetime > > > boundary > > I proposed not freeing child when child iteration is complete. They > > indeed can be reused. I proposed cleaning children when "my" iteration > > is complete. At that time all the children iterations are completed > > and not needed when we return level up. > This is not clear for me. We need stack items to keep track of left > pages until we scan the whole posting tree. After scanning the whole > posting tree we can free stack items as we do now. Hi, Alexander! You are right, that we can free all posting tree stack items after the whole tree, as we do now. But I think we can also do it earlier. It looks like all "children" items are needed and could be reused only until iteration on "my" level ends. When function returns up the recursion "my" level becomes "child" for a caller, and previous "child" is not used anymore. It's optional, maybe we can do freeing at the end of posting tree iteration. Regards, Pavel Borisov
On Thu, Mar 12, 2026 at 12:35 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > > On Thu, 12 Mar 2026 at 02:22, Alexander Korotkov <aekorotkov@gmail.com> wrote: > > > > On Tue, Mar 10, 2026 at 11:29 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > > > Hi, Xuneng > > > > > > > > Is it worth/possible in recursive calls of ginScanToDelete() to free > > > > > allocated myStackItem->child after processing all children of the > > > > > current level, when they are not needed anymore? > > > > > Previously to this patch, palloc-ed "me" variable also was't freed at > > > > > recursion levels. > > > > > > > > Freeing/reallocating it per subtree would add churn and make the > > > > lifetime rules harder to reason about without meaningful memory > > > > savings (the number of nodes is bounded by tree depth, not number of > > > > pages). We currently free the chain once after ginScanToDelete() > > > > returns in ginVacuumPostingTree(), which matches the natural lifetime > > > > boundary > > > I proposed not freeing child when child iteration is complete. They > > > indeed can be reused. I proposed cleaning children when "my" iteration > > > is complete. At that time all the children iterations are completed > > > and not needed when we return level up. > > This is not clear for me. We need stack items to keep track of left > > pages until we scan the whole posting tree. After scanning the whole > > posting tree we can free stack items as we do now. > > You are right, that we can free all posting tree stack items after the > whole tree, as we do now. But I think we can also do it earlier. It > looks like all "children" items are needed and could be reused only > until iteration on "my" level ends. When function returns up the > recursion "my" level becomes "child" for a caller, and previous > "child" is not used anymore. No matter how many levels we can go up, we can still descend and need the leftBuffer stored at any stack level. ------ Regards, Alexander Korotkov Supabase
Hi, Pavel! Thank you for your review! On Fri, Mar 6, 2026 at 4:45 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > Some thoughts: > > Is it worth/possible in recursive calls of ginScanToDelete() to free > allocated myStackItem->child after processing all children of the > current level, when they are not needed anymore? > Previously to this patch, palloc-ed "me" variable also was't freed at > recursion levels. > > Could limiting the maximum recursion level be useful? > > In the comment to myStackItem before ginScanToDelete(), it might be > worth adding that after processing all pages on the current level, > myStackItem is not needed anymore. Already answered in this thread. > > > Yes, it's not possible to have meDelete set for root, because > > > me->leftBuffer is always InvalidBuffer for the root. So the branch > > > handling meDelete case should better do Assert(!isRoot). > Looks like this additional Assert is not in patch v1. > > In the root call of ginScanToDelete(gvs, &root); we can add Assert > checking that its return result is false: > - ginScanToDelete(gvs, &root); > + deleted = ginScanToDelete(gvs, &root); > +. Assert(!deleted); /* Root page is never deleted */ Done. > Additionally, it could be good to rename all vacuum functions related > to posting tree pages only, to include "Posting" (e.g., ginDeletePage > -> ginDeletePostingPage). The same is for the functions only for the > entry tree. It is already named this way in many places (e.g. > ginVacuumPostingTreeLeaves). It could be good to extend this to all > relevant functions. Renamed as you proposed. > Several small proposals on wording: > "rightmost non-deleted page to its left" -> "closest non-deleted > sibling page to its left" I renamed that to just "left sibling". Deleted pages are not in the tree already. And "the rightmost page to its left" is just left sibling. > "each entry tracks the buffer of the page" -> "each entry tracks the > buffers of the page" (as two buffers are mentioned) I prefer to just use word "buffer" twice to make it more explicit. > "must already be pinned" -> "must already have been pinned" Done. ------ Regards, Alexander Korotkov Supabase
Вложения
On Tue, Mar 10, 2026 at 11:19 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> On Fri, Mar 6, 2026 at 10:45 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> >
> > Hi, Andres and Alexander!
> >
> > On Sat, 21 Feb 2026 at 13:55, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > >
> > > On Wed, Feb 4, 2026 at 12:32 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > >
> > > > On Tue, Feb 3, 2026 at 6:26 PM Andres Freund <andres@anarazel.de> wrote:
> > > > >
> > > > > While looking at converting more places to UnlockReleaseBuffer(), in the
> > > > > course of making UnlockReleaseBuffer() faster than the two separate
> > > > > operations, I found this code:
> > > > >
> > > > > static bool
> > > > > ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
> > > > > DataPageDeleteStack *parent, OffsetNumber myoff)
> > > > > ...
> > > > >
> > > > > if (!meDelete)
> > > > > {
> > > > > if (BufferIsValid(me->leftBuffer))
> > > > > UnlockReleaseBuffer(me->leftBuffer);
> > > > > me->leftBuffer = buffer;
> > > > > }
> > > > > else
> > > > > {
> > > > > if (!isRoot)
> > > > > LockBuffer(buffer, GIN_UNLOCK);
> > > > >
> > > > > ReleaseBuffer(buffer);
> > > > > }
> > > > >
> > > > > if (isRoot)
> > > > > ReleaseBuffer(buffer);
> > > > >
> > > > >
> > > > > Which sure looks like it'd release buffer twice if isRoot is set? I guess
> > > > > that's not reachable, because presumably the root page will always go down the
> > > > > !meDelete path. But it sure made me wonder if there's a hard to reach bug.
> > > >
> > > > Yes, it's not possible to have meDelete set for root, because
> > > > me->leftBuffer is always InvalidBuffer for the root. So the branch
> > > > handling meDelete case should better do Assert(!isRoot).
> > > > > This code was introduced in
> > > > > commit e14641197a5
> > > > > Author: Alexander Korotkov <akorotkov@postgresql.org>
> > > > > Date: 2019-11-19 23:07:36 +0300
> > > > >
> > > > > Fix deadlock between ginDeletePage() and ginStepRight()
> > > > >
> > > > > I didn't trace it further to see if it existed before that in some fashion.
> > > >
> > > > Yes. I think generally this area needs to be reworked to become more
> > > > clear, and have vast more comments. It was wrong from my side trying
> > > > to fix bugs there without reworking it into something more
> > > > appropriate. I'm planning to put work on this during this week.
> > > >
> > > > > There's another oddity here: ginScanToDelete() requires that the root page has
> > > > > been locked by the caller already, but will afaict re-read the root page? But
> > > > > then have code to avoid locking it again, because that would not have worked?
> > > > > Seems odd.
> > > >
> > > >
> > > > It seems a bit odd for me that caller already have locked buffer, but
> > > > passes BlockNumber making us re-read the buffer. But I'm not sure
> > > > that's the same as your point. Could you, please, elaborate more on
> > > > this?
> > >
> > > Here is the refactoring patch. Sorry for the delay.
> >
> > Hi, Andres and Alexander!
> >
> > I've looked into the patch v1.
> > Overall, it looks good to me.
>
> The refactor LGTM in general. The buffer-ownership rewrite looks
> cleaner and safer overall.
>
> > Some thoughts:
> >
> > Is it worth/possible in recursive calls of ginScanToDelete() to free
> > allocated myStackItem->child after processing all children of the
> > current level, when they are not needed anymore?
> > Previously to this patch, palloc-ed "me" variable also was't freed at
> > recursion levels.
>
> I think freeing myStackItem->child inside recursive calls might not be
> worthwhile here. That node is intentionally reused for subsequent
> siblings at the same depth, and it carries state (leftBuffer) that can
> still be needed until the level is fully processed.
> Freeing/reallocating it per subtree would add churn and make the
> lifetime rules harder to reason about without meaningful memory
> savings (the number of nodes is bounded by tree depth, not number of
> pages). We currently free the chain once after ginScanToDelete()
> returns in ginVacuumPostingTree(), which matches the natural lifetime
> boundary
>
> > Could limiting the maximum recursion level be useful?
>
> Posting-tree depth is naturally small; a hard cap seems to add failure
> risk with little practical benefit.
>
> > In the comment to myStackItem before ginScanToDelete(), it might be
> > worth adding that after processing all pages on the current level,
> > myStackItem is not needed anymore.
> >
> > > > Yes, it's not possible to have meDelete set for root, because
> > > > me->leftBuffer is always InvalidBuffer for the root. So the branch
> > > > handling meDelete case should better do Assert(!isRoot).
> > Looks like this additional Assert is not in patch v1.
> >
> > In the root call of ginScanToDelete(gvs, &root); we can add Assert
> > checking that its return result is false:
> > - ginScanToDelete(gvs, &root);
> > + deleted = ginScanToDelete(gvs, &root);
> > +. Assert(!deleted); /* Root page is never deleted */
>
> + 1, this is a good invariant check and improves readability
>
> One minor nit for the comment:
>
> The DataPageDeleteStack.buffer field comment says "valid only while
> recursing into children"
> this is true for internal pages, but for leaf pages the buffer is
> valid until the pageWasDeleted / leftBuffer decision. The validity
> window is actually "from when the caller sets it until the
> post-recursion cleanup."
Thank you for catching this. I decided to remove this statement from
v2. It's hard to explain the life cycle of the buffer clearly in one
sentence. On the other hand, it's explained in the comments of
ginScanPostingTreeToDelete().
------
Regards,
Alexander Korotkov
Supabase
On Thu, 12 Mar 2026 at 02:52, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>
> On Tue, Mar 10, 2026 at 11:19 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > On Fri, Mar 6, 2026 at 10:45 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > >
> > > Hi, Andres and Alexander!
> > >
> > > On Sat, 21 Feb 2026 at 13:55, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > >
> > > > On Wed, Feb 4, 2026 at 12:32 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > > >
> > > > > On Tue, Feb 3, 2026 at 6:26 PM Andres Freund <andres@anarazel.de> wrote:
> > > > > >
> > > > > > While looking at converting more places to UnlockReleaseBuffer(), in the
> > > > > > course of making UnlockReleaseBuffer() faster than the two separate
> > > > > > operations, I found this code:
> > > > > >
> > > > > > static bool
> > > > > > ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
> > > > > > DataPageDeleteStack *parent, OffsetNumber myoff)
> > > > > > ...
> > > > > >
> > > > > > if (!meDelete)
> > > > > > {
> > > > > > if (BufferIsValid(me->leftBuffer))
> > > > > > UnlockReleaseBuffer(me->leftBuffer);
> > > > > > me->leftBuffer = buffer;
> > > > > > }
> > > > > > else
> > > > > > {
> > > > > > if (!isRoot)
> > > > > > LockBuffer(buffer, GIN_UNLOCK);
> > > > > >
> > > > > > ReleaseBuffer(buffer);
> > > > > > }
> > > > > >
> > > > > > if (isRoot)
> > > > > > ReleaseBuffer(buffer);
> > > > > >
> > > > > >
> > > > > > Which sure looks like it'd release buffer twice if isRoot is set? I guess
> > > > > > that's not reachable, because presumably the root page will always go down the
> > > > > > !meDelete path. But it sure made me wonder if there's a hard to reach bug.
> > > > >
> > > > > Yes, it's not possible to have meDelete set for root, because
> > > > > me->leftBuffer is always InvalidBuffer for the root. So the branch
> > > > > handling meDelete case should better do Assert(!isRoot).
> > > > > > This code was introduced in
> > > > > > commit e14641197a5
> > > > > > Author: Alexander Korotkov <akorotkov@postgresql.org>
> > > > > > Date: 2019-11-19 23:07:36 +0300
> > > > > >
> > > > > > Fix deadlock between ginDeletePage() and ginStepRight()
> > > > > >
> > > > > > I didn't trace it further to see if it existed before that in some fashion.
> > > > >
> > > > > Yes. I think generally this area needs to be reworked to become more
> > > > > clear, and have vast more comments. It was wrong from my side trying
> > > > > to fix bugs there without reworking it into something more
> > > > > appropriate. I'm planning to put work on this during this week.
> > > > >
> > > > > > There's another oddity here: ginScanToDelete() requires that the root page has
> > > > > > been locked by the caller already, but will afaict re-read the root page? But
> > > > > > then have code to avoid locking it again, because that would not have worked?
> > > > > > Seems odd.
> > > > >
> > > > >
> > > > > It seems a bit odd for me that caller already have locked buffer, but
> > > > > passes BlockNumber making us re-read the buffer. But I'm not sure
> > > > > that's the same as your point. Could you, please, elaborate more on
> > > > > this?
> > > >
> > > > Here is the refactoring patch. Sorry for the delay.
> > >
> > > Hi, Andres and Alexander!
> > >
> > > I've looked into the patch v1.
> > > Overall, it looks good to me.
> >
> > The refactor LGTM in general. The buffer-ownership rewrite looks
> > cleaner and safer overall.
> >
> > > Some thoughts:
> > >
> > > Is it worth/possible in recursive calls of ginScanToDelete() to free
> > > allocated myStackItem->child after processing all children of the
> > > current level, when they are not needed anymore?
> > > Previously to this patch, palloc-ed "me" variable also was't freed at
> > > recursion levels.
> >
> > I think freeing myStackItem->child inside recursive calls might not be
> > worthwhile here. That node is intentionally reused for subsequent
> > siblings at the same depth, and it carries state (leftBuffer) that can
> > still be needed until the level is fully processed.
> > Freeing/reallocating it per subtree would add churn and make the
> > lifetime rules harder to reason about without meaningful memory
> > savings (the number of nodes is bounded by tree depth, not number of
> > pages). We currently free the chain once after ginScanToDelete()
> > returns in ginVacuumPostingTree(), which matches the natural lifetime
> > boundary
> >
> > > Could limiting the maximum recursion level be useful?
> >
> > Posting-tree depth is naturally small; a hard cap seems to add failure
> > risk with little practical benefit.
> >
> > > In the comment to myStackItem before ginScanToDelete(), it might be
> > > worth adding that after processing all pages on the current level,
> > > myStackItem is not needed anymore.
> > >
> > > > > Yes, it's not possible to have meDelete set for root, because
> > > > > me->leftBuffer is always InvalidBuffer for the root. So the branch
> > > > > handling meDelete case should better do Assert(!isRoot).
> > > Looks like this additional Assert is not in patch v1.
> > >
> > > In the root call of ginScanToDelete(gvs, &root); we can add Assert
> > > checking that its return result is false:
> > > - ginScanToDelete(gvs, &root);
> > > + deleted = ginScanToDelete(gvs, &root);
> > > +. Assert(!deleted); /* Root page is never deleted */
> >
> > + 1, this is a good invariant check and improves readability
> >
> > One minor nit for the comment:
> >
> > The DataPageDeleteStack.buffer field comment says "valid only while
> > recursing into children"
> > this is true for internal pages, but for leaf pages the buffer is
> > valid until the pageWasDeleted / leftBuffer decision. The validity
> > window is actually "from when the caller sets it until the
> > post-recursion cleanup."
>
> Thank you for catching this. I decided to remove this statement from
> v2. It's hard to explain the life cycle of the buffer clearly in one
> sentence. On the other hand, it's explained in the comments of
> ginScanPostingTreeToDelete().
Hi, Alexander!
Patch v2 looks good to me. I also agree with Xuneng that this
refactoring improved the logic to make it look clearer.
Thank you for the explanation of buffers lifetime!
Regards,
Pavel Borisov
Hi, Pavel!
On Thu, Mar 12, 2026 at 10:41 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>
> On Thu, 12 Mar 2026 at 02:52, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> >
> > On Tue, Mar 10, 2026 at 11:19 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
> > > On Fri, Mar 6, 2026 at 10:45 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
> > > >
> > > > Hi, Andres and Alexander!
> > > >
> > > > On Sat, 21 Feb 2026 at 13:55, Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > > >
> > > > > On Wed, Feb 4, 2026 at 12:32 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> > > > > >
> > > > > > On Tue, Feb 3, 2026 at 6:26 PM Andres Freund <andres@anarazel.de> wrote:
> > > > > > >
> > > > > > > While looking at converting more places to UnlockReleaseBuffer(), in the
> > > > > > > course of making UnlockReleaseBuffer() faster than the two separate
> > > > > > > operations, I found this code:
> > > > > > >
> > > > > > > static bool
> > > > > > > ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
> > > > > > > DataPageDeleteStack *parent, OffsetNumber myoff)
> > > > > > > ...
> > > > > > >
> > > > > > > if (!meDelete)
> > > > > > > {
> > > > > > > if (BufferIsValid(me->leftBuffer))
> > > > > > > UnlockReleaseBuffer(me->leftBuffer);
> > > > > > > me->leftBuffer = buffer;
> > > > > > > }
> > > > > > > else
> > > > > > > {
> > > > > > > if (!isRoot)
> > > > > > > LockBuffer(buffer, GIN_UNLOCK);
> > > > > > >
> > > > > > > ReleaseBuffer(buffer);
> > > > > > > }
> > > > > > >
> > > > > > > if (isRoot)
> > > > > > > ReleaseBuffer(buffer);
> > > > > > >
> > > > > > >
> > > > > > > Which sure looks like it'd release buffer twice if isRoot is set? I guess
> > > > > > > that's not reachable, because presumably the root page will always go down the
> > > > > > > !meDelete path. But it sure made me wonder if there's a hard to reach bug.
> > > > > >
> > > > > > Yes, it's not possible to have meDelete set for root, because
> > > > > > me->leftBuffer is always InvalidBuffer for the root. So the branch
> > > > > > handling meDelete case should better do Assert(!isRoot).
> > > > > > > This code was introduced in
> > > > > > > commit e14641197a5
> > > > > > > Author: Alexander Korotkov <akorotkov@postgresql.org>
> > > > > > > Date: 2019-11-19 23:07:36 +0300
> > > > > > >
> > > > > > > Fix deadlock between ginDeletePage() and ginStepRight()
> > > > > > >
> > > > > > > I didn't trace it further to see if it existed before that in some fashion.
> > > > > >
> > > > > > Yes. I think generally this area needs to be reworked to become more
> > > > > > clear, and have vast more comments. It was wrong from my side trying
> > > > > > to fix bugs there without reworking it into something more
> > > > > > appropriate. I'm planning to put work on this during this week.
> > > > > >
> > > > > > > There's another oddity here: ginScanToDelete() requires that the root page has
> > > > > > > been locked by the caller already, but will afaict re-read the root page? But
> > > > > > > then have code to avoid locking it again, because that would not have worked?
> > > > > > > Seems odd.
> > > > > >
> > > > > >
> > > > > > It seems a bit odd for me that caller already have locked buffer, but
> > > > > > passes BlockNumber making us re-read the buffer. But I'm not sure
> > > > > > that's the same as your point. Could you, please, elaborate more on
> > > > > > this?
> > > > >
> > > > > Here is the refactoring patch. Sorry for the delay.
> > > >
> > > > Hi, Andres and Alexander!
> > > >
> > > > I've looked into the patch v1.
> > > > Overall, it looks good to me.
> > >
> > > The refactor LGTM in general. The buffer-ownership rewrite looks
> > > cleaner and safer overall.
> > >
> > > > Some thoughts:
> > > >
> > > > Is it worth/possible in recursive calls of ginScanToDelete() to free
> > > > allocated myStackItem->child after processing all children of the
> > > > current level, when they are not needed anymore?
> > > > Previously to this patch, palloc-ed "me" variable also was't freed at
> > > > recursion levels.
> > >
> > > I think freeing myStackItem->child inside recursive calls might not be
> > > worthwhile here. That node is intentionally reused for subsequent
> > > siblings at the same depth, and it carries state (leftBuffer) that can
> > > still be needed until the level is fully processed.
> > > Freeing/reallocating it per subtree would add churn and make the
> > > lifetime rules harder to reason about without meaningful memory
> > > savings (the number of nodes is bounded by tree depth, not number of
> > > pages). We currently free the chain once after ginScanToDelete()
> > > returns in ginVacuumPostingTree(), which matches the natural lifetime
> > > boundary
> > >
> > > > Could limiting the maximum recursion level be useful?
> > >
> > > Posting-tree depth is naturally small; a hard cap seems to add failure
> > > risk with little practical benefit.
> > >
> > > > In the comment to myStackItem before ginScanToDelete(), it might be
> > > > worth adding that after processing all pages on the current level,
> > > > myStackItem is not needed anymore.
> > > >
> > > > > > Yes, it's not possible to have meDelete set for root, because
> > > > > > me->leftBuffer is always InvalidBuffer for the root. So the branch
> > > > > > handling meDelete case should better do Assert(!isRoot).
> > > > Looks like this additional Assert is not in patch v1.
> > > >
> > > > In the root call of ginScanToDelete(gvs, &root); we can add Assert
> > > > checking that its return result is false:
> > > > - ginScanToDelete(gvs, &root);
> > > > + deleted = ginScanToDelete(gvs, &root);
> > > > +. Assert(!deleted); /* Root page is never deleted */
> > >
> > > + 1, this is a good invariant check and improves readability
> > >
> > > One minor nit for the comment:
> > >
> > > The DataPageDeleteStack.buffer field comment says "valid only while
> > > recursing into children"
> > > this is true for internal pages, but for leaf pages the buffer is
> > > valid until the pageWasDeleted / leftBuffer decision. The validity
> > > window is actually "from when the caller sets it until the
> > > post-recursion cleanup."
> >
> > Thank you for catching this. I decided to remove this statement from
> > v2. It's hard to explain the life cycle of the buffer clearly in one
> > sentence. On the other hand, it's explained in the comments of
> > ginScanPostingTreeToDelete().
>
> Patch v2 looks good to me. I also agree with Xuneng that this
> refactoring improved the logic to make it look clearer.
> Thank you for the explanation of buffers lifetime!
Thank you for your feedback. I'll push the patch if no objections.
------
Regards,
Alexander Korotkov
Supabase
At 2026-03-12 17:05:51, "Alexander Korotkov" <aekorotkov@gmail.com> wrote:
>Hi, Pavel!
>
>On Thu, Mar 12, 2026 at 10:41 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>>
>> On Thu, 12 Mar 2026 at 02:52, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>> >
>> > On Tue, Mar 10, 2026 at 11:19 AM Xuneng Zhou <xunengzhou@gmail.com> wrote:
>> > > On Fri, Mar 6, 2026 at 10:45 PM Pavel Borisov <pashkin.elfe@gmail.com> wrote:
>> > > >
>> > > > Hi, Andres and Alexander!
>> > > >
>> > > > On Sat, 21 Feb 2026 at 13:55, Alexander Korotkov <aekorotkov@gmail.com> wrote:
>> > > > >
>> > > > > On Wed, Feb 4, 2026 at 12:32 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
>> > > > > >
>> > > > > > On Tue, Feb 3, 2026 at 6:26 PM Andres Freund <andres@anarazel.de> wrote:
>> > > > > > >
>> > > > > > > While looking at converting more places to UnlockReleaseBuffer(), in the
>> > > > > > > course of making UnlockReleaseBuffer() faster than the two separate
>> > > > > > > operations, I found this code:
>> > > > > > >
>> > > > > > > static bool
>> > > > > > > ginScanToDelete(GinVacuumState *gvs, BlockNumber blkno, bool isRoot,
>> > > > > > > DataPageDeleteStack *parent, OffsetNumber myoff)
>> > > > > > > ...
>> > > > > > >
>> > > > > > > if (!meDelete)
>> > > > > > > {
>> > > > > > > if (BufferIsValid(me->leftBuffer))
>> > > > > > > UnlockReleaseBuffer(me->leftBuffer);
>> > > > > > > me->leftBuffer = buffer;
>> > > > > > > }
>> > > > > > > else
>> > > > > > > {
>> > > > > > > if (!isRoot)
>> > > > > > > LockBuffer(buffer, GIN_UNLOCK);
>> > > > > > >
>> > > > > > > ReleaseBuffer(buffer);
>> > > > > > > }
>> > > > > > >
>> > > > > > > if (isRoot)
>> > > > > > > ReleaseBuffer(buffer);
>> > > > > > >
>> > > > > > >
>> > > > > > > Which sure looks like it'd release buffer twice if isRoot is set? I guess
>> > > > > > > that's not reachable, because presumably the root page will always go down the
>> > > > > > > !meDelete path. But it sure made me wonder if there's a hard to reach bug.
>> > > > > >
>> > > > > > Yes, it's not possible to have meDelete set for root, because
>> > > > > > me->leftBuffer is always InvalidBuffer for the root. So the branch
>> > > > > > handling meDelete case should better do Assert(!isRoot).
>> > > > > > > This code was introduced in
>> > > > > > > commit e14641197a5
>> > > > > > > Author: Alexander Korotkov <akorotkov@postgresql.org>
>> > > > > > > Date: 2019-11-19 23:07:36 +0300
>> > > > > > >
>> > > > > > > Fix deadlock between ginDeletePage() and ginStepRight()
>> > > > > > >
>> > > > > > > I didn't trace it further to see if it existed before that in some fashion.
>> > > > > >
>> > > > > > Yes. I think generally this area needs to be reworked to become more
>> > > > > > clear, and have vast more comments. It was wrong from my side trying
>> > > > > > to fix bugs there without reworking it into something more
>> > > > > > appropriate. I'm planning to put work on this during this week.
>> > > > > >
>> > > > > > > There's another oddity here: ginScanToDelete() requires that the root page has
>> > > > > > > been locked by the caller already, but will afaict re-read the root page? But
>> > > > > > > then have code to avoid locking it again, because that would not have worked?
>> > > > > > > Seems odd.
>> > > > > >
>> > > > > >
>> > > > > > It seems a bit odd for me that caller already have locked buffer, but
>> > > > > > passes BlockNumber making us re-read the buffer. But I'm not sure
>> > > > > > that's the same as your point. Could you, please, elaborate more on
>> > > > > > this?
>> > > > >
>> > > > > Here is the refactoring patch. Sorry for the delay.
>> > > >
>> > > > Hi, Andres and Alexander!
>> > > >
>> > > > I've looked into the patch v1.
>> > > > Overall, it looks good to me.
>> > >
>> > > The refactor LGTM in general. The buffer-ownership rewrite looks
>> > > cleaner and safer overall.
>> > >
>> > > > Some thoughts:
>> > > >
>> > > > Is it worth/possible in recursive calls of ginScanToDelete() to free
>> > > > allocated myStackItem->child after processing all children of the
>> > > > current level, when they are not needed anymore?
>> > > > Previously to this patch, palloc-ed "me" variable also was't freed at
>> > > > recursion levels.
>> > >
>> > > I think freeing myStackItem->child inside recursive calls might not be
>> > > worthwhile here. That node is intentionally reused for subsequent
>> > > siblings at the same depth, and it carries state (leftBuffer) that can
>> > > still be needed until the level is fully processed.
>> > > Freeing/reallocating it per subtree would add churn and make the
>> > > lifetime rules harder to reason about without meaningful memory
>> > > savings (the number of nodes is bounded by tree depth, not number of
>> > > pages). We currently free the chain once after ginScanToDelete()
>> > > returns in ginVacuumPostingTree(), which matches the natural lifetime
>> > > boundary
>> > >
>> > > > Could limiting the maximum recursion level be useful?
>> > >
>> > > Posting-tree depth is naturally small; a hard cap seems to add failure
>> > > risk with little practical benefit.
>> > >
>> > > > In the comment to myStackItem before ginScanToDelete(), it might be
>> > > > worth adding that after processing all pages on the current level,
>> > > > myStackItem is not needed anymore.
>> > > >
>> > > > > > Yes, it's not possible to have meDelete set for root, because
>> > > > > > me->leftBuffer is always InvalidBuffer for the root. So the branch
>> > > > > > handling meDelete case should better do Assert(!isRoot).
>> > > > Looks like this additional Assert is not in patch v1.
>> > > >
>> > > > In the root call of ginScanToDelete(gvs, &root); we can add Assert
>> > > > checking that its return result is false:
>> > > > - ginScanToDelete(gvs, &root);
>> > > > + deleted = ginScanToDelete(gvs, &root);
>> > > > +. Assert(!deleted); /* Root page is never deleted */
>> > >
>> > > + 1, this is a good invariant check and improves readability
>> > >
>> > > One minor nit for the comment:
>> > >
>> > > The DataPageDeleteStack.buffer field comment says "valid only while
>> > > recursing into children"
>> > > this is true for internal pages, but for leaf pages the buffer is
>> > > valid until the pageWasDeleted / leftBuffer decision. The validity
>> > > window is actually "from when the caller sets it until the
>> > > post-recursion cleanup."
>> >
>> > Thank you for catching this. I decided to remove this statement from
>> > v2. It's hard to explain the life cycle of the buffer clearly in one
>> > sentence. On the other hand, it's explained in the comments of
>> > ginScanPostingTreeToDelete().
>>
>> Patch v2 looks good to me. I also agree with Xuneng that this
>> refactoring improved the logic to make it look clearer.
>> Thank you for the explanation of buffers lifetime!
>
>Thank you for your feedback. I'll push the patch if no objections.
>
>------
>Regards,
>Alexander Korotkov
>Supabase
>
Hi,
Overall solid to me. I got a nitpick:in the code comments, ginDeletePage should be ginDeletePostingPag(ginDeletePage -> ginDeletePostingPage).There are 3 places that need to be revised.------
Regards,
jinbinge
On Thu, Mar 12, 2026 at 6:37 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > > On Thu, Mar 12, 2026 at 12:35 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > > > > On Thu, 12 Mar 2026 at 02:22, Alexander Korotkov <aekorotkov@gmail.com> wrote: > > > > > > On Tue, Mar 10, 2026 at 11:29 AM Pavel Borisov <pashkin.elfe@gmail.com> wrote: > > > > Hi, Xuneng > > > > > > > > > > Is it worth/possible in recursive calls of ginScanToDelete() to free > > > > > > allocated myStackItem->child after processing all children of the > > > > > > current level, when they are not needed anymore? > > > > > > Previously to this patch, palloc-ed "me" variable also was't freed at > > > > > > recursion levels. > > > > > > > > > > Freeing/reallocating it per subtree would add churn and make the > > > > > lifetime rules harder to reason about without meaningful memory > > > > > savings (the number of nodes is bounded by tree depth, not number of > > > > > pages). We currently free the chain once after ginScanToDelete() > > > > > returns in ginVacuumPostingTree(), which matches the natural lifetime > > > > > boundary > > > > I proposed not freeing child when child iteration is complete. They > > > > indeed can be reused. I proposed cleaning children when "my" iteration > > > > is complete. At that time all the children iterations are completed > > > > and not needed when we return level up. > > > This is not clear for me. We need stack items to keep track of left > > > pages until we scan the whole posting tree. After scanning the whole > > > posting tree we can free stack items as we do now. > > > > You are right, that we can free all posting tree stack items after the > > whole tree, as we do now. But I think we can also do it earlier. It > > looks like all "children" items are needed and could be reused only > > until iteration on "my" level ends. When function returns up the > > recursion "my" level becomes "child" for a caller, and previous > > "child" is not used anymore. > > No matter how many levels we can go up, we can still descend and need > the leftBuffer stored at any stack level. > Yeah, the important point is that a stack item here represents per-depth scan state, not just one recursive invocation. Returning from one subtree does not necessarily mean that depth is finished globally: the caller may move to a sibling and descend again, and that later descent can still need the child level's saved leftBuffer from the subtree we just finished. The stronger condition is that no more pages remain to be scanned to the right at that depth; the code already uses GinPageRightMost(page) for that when releasing the child level's leftBuffer. -- Best, Xuneng