Обсуждение: Re: pgsql: Drop unnamed portal immediately after execution to completion

Поиск
Список
Период
Сортировка

Re: pgsql: Drop unnamed portal immediately after execution to completion

От
Robert Haas
Дата:
On Wed, Nov 5, 2025 at 12:43 AM Michael Paquier <michael@paquier.xyz> wrote:
> Drop unnamed portal immediately after execution to completion

This patch doesn't look well-considered to me. One problem is that
it's a wire protocol change to fix a minor logging anomaly, which
seems disproportionate. Another problem is that the new portal-drop
behavior is conditional on whether XACT_FLAGS_NEEDIMMEDIATECOMMIT gets
set, which seems unprincipled. In addition to those points, I am not
entirely certain that the "here is no need for it beyond this point"
comment is correct. I mean, I think it will normally be true, but what
if the client wants to send a Describe message after-the-fact, or an
additional Execute message that will presumably return zero rows?

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pgsql: Drop unnamed portal immediately after execution to completion

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Nov 5, 2025 at 12:43 AM Michael Paquier <michael@paquier.xyz> wrote:
>> Drop unnamed portal immediately after execution to completion

> This patch doesn't look well-considered to me. One problem is that
> it's a wire protocol change to fix a minor logging anomaly, which
> seems disproportionate. Another problem is that the new portal-drop
> behavior is conditional on whether XACT_FLAGS_NEEDIMMEDIATECOMMIT gets
> set, which seems unprincipled. In addition to those points, I am not
> entirely certain that the "here is no need for it beyond this point"
> comment is correct. I mean, I think it will normally be true, but what
> if the client wants to send a Describe message after-the-fact, or an
> additional Execute message that will presumably return zero rows?

Yeah, the whole idea of changing the wire-level behavior to fix this
has been making me itch: I don't believe for a moment that it won't
cause compatibility problems.  I do not have a better proposal for
fixing the problem though.

            regards, tom lane



Re: pgsql: Drop unnamed portal immediately after execution to completion

От
Michael Paquier
Дата:
On Mon, Nov 10, 2025 at 04:28:02PM -0500, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> This patch doesn't look well-considered to me. One problem is that
>> it's a wire protocol change to fix a minor logging anomaly, which
>> seems disproportionate. Another problem is that the new portal-drop
>> behavior is conditional on whether XACT_FLAGS_NEEDIMMEDIATECOMMIT gets
>> set, which seems unprincipled. In addition to those points, I am not
>> entirely certain that the "here is no need for it beyond this point"
>> comment is correct. I mean, I think it will normally be true, but what
>> if the client wants to send a Describe message after-the-fact, or an
>> additional Execute message that will presumably return zero rows?
>
> Yeah, the whole idea of changing the wire-level behavior to fix this
> has been making me itch: I don't believe for a moment that it won't
> cause compatibility problems.
>
> I do not have a better proposal for fixing the problem though.

All the other solutions mentioned mean to work around the issue of the
unnamed portal still being present by maintaining the query string it
in a different context across multiple messages.  And I doubt that
anybody would be thrilled by that.

If you think that we should continue to live with the protocol for
unnamed portals as-is and continue to live with the existing behavior,
meaning a revert of 1fd981f05369, that would not be the end of the
world here: we'd still have some tests that track the
past-and-currently-released behavior.

Even if strange, it works with the timing where the unnamed protocol
is dropped.  Perhaps somebody would be able to come up with a approach
better than a more aggressive portal drop once completed, but I don't
quite see how much can be done as long as we manipulate the unnamed
portals this way (aka drop then with a follow-up bind, and expect the
logging to show the correct statements attached to a previous
message that we have no knowledge about).  The original statement
logging did not really consider all these fancier cases with the
extended query protocol.
--
Michael

Вложения

Re: pgsql: Drop unnamed portal immediately after execution to completion

От
Robert Haas
Дата:
On Mon, Nov 10, 2025 at 6:31 PM Michael Paquier <michael@paquier.xyz> wrote:
> All the other solutions mentioned mean to work around the issue of the
> unnamed portal still being present by maintaining the query string it
> in a different context across multiple messages.  And I doubt that
> anybody would be thrilled by that.

Why not?

I mean, I haven't studied this problem and I don't know how
complicated that would be, but it isn't obvious to me that it would be
wildly impractical. We would need to pass around pointers to the query
string, and make sure that the string itself isn't freed before all
the pointers are gone, but maybe there's a reasonable way to do that.
I would definitely rather do that than change the wire protocol. If
that isn't practical, another option might be to just suppress the
context in places where we know it can be misleading, rather than
displaying a misleading context. That would still require us to know
in which situations that the context might be misleading, which would
likely merit some careful thought, but it would at least avoid us
needing to know the query string upon which we wanted to place blame.
Of course, this approach would produce less useful output, but it
would still be better than showing wrong information. (I see that Tom
previously suggested exactly this approach.)

To me, this is a classic example of why programming with global
variables is often a bad idea. If we lived in a world where a backend
could only ever do one thing at a time, then consulting a global
variable to find out what it's currently doing would be fine, but we
have this whole system of portals and prepared statements precisely so
that a backend can do more than one thing at a time, so the right
thing to do is pass around pointers to the appropriate context object
-- a Portal, or whatever -- to all the code that needs that
information. If the existing context object, a Portal or whatever,
doesn't have a long enough lifetime to still be available at all
points where we need that information, then we should either make the
existing type of context object live longer or invent a separate kind
of context object that with a different lifespan that is suited to the
remaining problems. Given how old and crufty the Portal machinery
seems to be, the latter seems like the likely winner.

> If you think that we should continue to live with the protocol for
> unnamed portals as-is and continue to live with the existing behavior,
> meaning a revert of 1fd981f05369, that would not be the end of the
> world here: we'd still have some tests that track the
> past-and-currently-released behavior.

Yes, I'd argue for a revert. I think the risk of subtle breakage is
high, and there is a good chance that it will be too late to pull this
back by the time we realize what the problem is. I also feel that it's
solving the problem in the wrong way. To me, this solution feels like
saying "doing proper bookkeeping is hard, so let's redefine the
contract with the user instead." But doing proper bookkeeping is a big
part of what good software engineering is all about. We've put an
enormous amount of energy into our error reporting and a lot of that
energy has gone into making sure that the details that would be useful
to the user are available in the parts of the code that need those
details. There's lots and lots of code already getting query strings
to be available at places where we might want to complain about them,
and very often we also pass through a parse location as well, so that
we can complain about a specific part of a query string. The fact that
the query string that shows up in this case is wrong or misleading
doesn't make that the wrong approach; it just means we haven't totally
solved the problem yet.

> Even if strange, it works with the timing where the unnamed protocol
> is dropped.  Perhaps somebody would be able to come up with a approach
> better than a more aggressive portal drop once completed, but I don't
> quite see how much can be done as long as we manipulate the unnamed
> portals this way (aka drop then with a follow-up bind, and expect the
> logging to show the correct statements attached to a previous
> message that we have no knowledge about).  The original statement
> logging did not really consider all these fancier cases with the
> extended query protocol.

I have a really hard time with the idea that the problem here is with
when the unnamed portal is dropped. The reasoning behind the existing
wire protocol specification is a little unclear to me, but it seems
like a perfectly elegant concept: you keep the unnamed portal around
until something creates a new unnamed portal. I like that definition
because it's simply and devoid of ambiguity, so other software
interacting with PostgreSQL knows exactly what to expect. When you
start to make the rules more complicated, that makes it harder for
other software that speaks our wire protocol to be fully correct,
especially if it must deal with both an old rule and a new one
depending on the version of PostgreSQL to which it's connected. I
think you could make an argument here that the problem is with when
the temporary file cleanup is done (perhaps it should happen sooner,
when the previous query is still in context) or that the temporary
file cleanup should be passed some appropriate context so that it
knows who to blame for error or that the temporary file cleanup can't
really attribute any errors to a specific statement and thus shouldn't
mention one at all, but I don't think blaming the unnamed portal for
this is the right idea. Our portal management code seems to me to be
full of deficiencies, but the wire protocol specification itself seems
to be fine, so I think we should try to do a better job implementing
the protocol, rather than changing it.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: pgsql: Drop unnamed portal immediately after execution to completion

От
Mircea Cadariu
Дата:

Hi,

I understand the point that’s been raised. Would it be an option to indeed revert the portal drop in postgres.c, but then append the right query in the "temporary file: ..." log line instead? This would work at least for me.

Attached is a POC patch (contains a layering violation, hoping it could be avoided somehow).

Kind regards,

Mircea Cadariu

Вложения

Re: pgsql: Drop unnamed portal immediately after execution to completion

От
Michael Paquier
Дата:
On Wed, Nov 12, 2025 at 11:14:26AM +0000, Mircea Cadariu wrote:
> I understand the point that’s been raised. Would it be an option to indeed
> revert the portal drop in postgres.c, but then append the right query in the
> "temporary file: ..." log line instead? This would work at least for me.

This is new, attaching the information to a Vfd in fd.c.  Not sure
that adding this information to this structure is a good concept.
This layer of the code has no idea of query strings currently, so that
feels a bit like a layer violation.

Thinking a bit outside the box..  I was wondering about a plan D (plan
A is what's on HEAD, plan B is copying around the query string, plan C
this Vfd approach) where we shut down the executor when we have
finished the execution of an unnamed portal, with something like the
attached based on a more aggressive PortalCleanup().  However, that
would not fly far if the client decides to send an extra execute
message post-portal-completion where we'd still want the executor to
be around, even if there are no rows to process.  We presumably do not
need the temp files anymore at this stage, but I don't really like the
fact that we'd need to somehow take a shortcut if we want only to
clean up the temp files.

Perhaps the best answer for now is to revert and continue the
discussion for this cycle as there seem to be little love for the
current HEAD's solution with the protocol change.

If folks have more ideas or input, please feel free.
--
Michael

Вложения

Re: pgsql: Drop unnamed portal immediately after execution to completion

От
Mircea Cadariu
Дата:
On 13/11/2025 06:27, Michael Paquier wrote:

> This is new, attaching the information to a Vfd in fd.c.  Not sure
> that adding this information to this structure is a good concept.
> This layer of the code has no idea of query strings currently, so that
> feels a bit like a layer violation.

Thanks for having a look! FWIW I found a way for Plan C to work without 
including tcoprot into fd, see attached.

There's a new field in that structure indeed, but maybe not that far 
fetched, it's the query that triggered the creation of the file.

Kind regards,

Mircea Cadariu

Вложения

Re: pgsql: Drop unnamed portal immediately after execution to completion

От
Sami Imseih
Дата:
> Thinking a bit outside the box..  I was wondering about a plan D (plan
> A is what's on HEAD, plan B is copying around the query string, plan C
> this Vfd approach) where we shut down the executor when we have
> finished the execution of an unnamed portal, with something like the
> attached based on a more aggressive PortalCleanup().

I am not sure I like this idea as-is, because besides that fact
that it's still a wire level change, it's not safe at all to
re-enter exec_execute_message after you just cleaned the
portal but did not drop it.

if (!PortalIsValid(portal)) will tell you the portal is still valid, but its
resources, like queryDesc and others are no longer available.
You can actually see what happens there with this handy
 extended.py that communicates directly over the wire-protocol.
See the "back-to-back execute" print.

It results in
```
TRAP: failed Assert("queryDesc || portal->holdStore"), File:
"../src/backend/tcop/pquery.c", Line: 875, PID: 32358
0   postgres                            0x00000001028ce52c
ExceptionalCondition + 108
1   postgres                            0x00000001027a2470 PortalRunMulti + 0
2   postgres                            0x00000001027a1fc0 PortalRun + 492
3   postgres                            0x000000010279ff54 PostgresMain + 8896
4   postgres                            0x0000000102799bbc BackendInitialize + 0
5   postgres                            0x00000001026f1264
postmaster_child_launch + 364
6   postgres                            0x00000001026f588c ServerLoop + 5840
7   postgres                            0x00000001026f3800
InitProcessGlobals + 0
8   postgres                            0x0000000102641564 help + 0
9   dyld                                0x00000001930d6b98 start + 6076
```

This idea could perhaps work, but needs more thought.

> Perhaps the best answer for now is to revert and continue the
> discussion for this cycle as there seem to be little love for the
> current HEAD's solution with the protocol change.
>
> If folks have more ideas or input, please feel free.

That is probably the best idea now.

> This is new, attaching the information to a Vfd in fd.c.  Not sure
> that adding this information to this structure is a good concept.
> This layer of the code has no idea of query strings currently, so that
> feels a bit like a layer violation.

Thanks for having a look! FWIW I found a way for Plan C to work without
including tcoprot into fd, see attached.

There's a new field in that structure indeed, but maybe not that far
fetched, it's the query that triggered the creation of the file.

To Michael's point, this looks like a layering violation. Besides,
I think this will double log, although I did not test, because you log
the statement once when closing the temp and once when logging
the STATEMENT.

One thing I am still not so sure about is we currently say that things
like the query_string will out live the portal, so I am still not clear what
is the risk of copying the query_string to debug_query_string during
PortalDrop?

```
/*
* We don't have to copy anything into the portal, because everything
* we are passing here is in MessageContext or the
* per_parsetree_context, and so will outlive the portal anyway.
*/
PortalDefineQuery(portal,
NULL,
query_string,
commandTag,
plantree_list,
NULL);
```

--
Sami Imseih
Amazon Web Services (AWS)

Вложения

Re: pgsql: Drop unnamed portal immediately after execution to completion

От
Michael Paquier
Дата:
On Thu, Nov 13, 2025 at 04:02:24PM -0600, Sami Imseih wrote:
> I am not sure I like this idea as-is, because besides that fact
> that it's still a wire level change, it's not safe at all to
> re-enter exec_execute_message after you just cleaned the
> portal but did not drop it.

This will come at the cost of tracking a new state in the backend
where the portal would still live with the executor state gone, I
don't like much my own plan D and the road where this leads at the
end.

> That is probably the best idea now.

I have just done a revert of 1fd981f05369 now, let's continue the
discussions.
--
Michael

Вложения

Re: pgsql: Drop unnamed portal immediately after execution to completion

От
Mircea Cadariu
Дата:
Hi Michael,

I think I found a gap in the tests we added previously for documenting 
the current behaviour. See attached patch for your consideration.

What's interesting about the holdable cursors scenario is that as far as 
I can tell the temp files are cleaned up during PersistHoldablePortal 
instead of PortalDrop.


Kind regards,

Mircea Cadariu


Вложения

Re: pgsql: Drop unnamed portal immediately after execution to completion

От
Michael Paquier
Дата:
On Fri, Nov 14, 2025 at 03:10:32PM +0000, Mircea Cadariu wrote:
> What's interesting about the holdable cursors scenario is that as far as I
> can tell the temp files are cleaned up during PersistHoldablePortal instead
> of PortalDrop.

As part of the ExecutorEnd() done in this case.  Right, it looks like
a good thing to track this behavior as well in the long-run.  I'll
double-check that later.
--
Michael

Вложения