Обсуждение: Re: pgsql: Drop unnamed portal immediately after execution to completion
On Wed, Nov 5, 2025 at 12:43 AM Michael Paquier <michael@paquier.xyz> wrote: > Drop unnamed portal immediately after execution to completion This patch doesn't look well-considered to me. One problem is that it's a wire protocol change to fix a minor logging anomaly, which seems disproportionate. Another problem is that the new portal-drop behavior is conditional on whether XACT_FLAGS_NEEDIMMEDIATECOMMIT gets set, which seems unprincipled. In addition to those points, I am not entirely certain that the "here is no need for it beyond this point" comment is correct. I mean, I think it will normally be true, but what if the client wants to send a Describe message after-the-fact, or an additional Execute message that will presumably return zero rows? -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Nov 5, 2025 at 12:43 AM Michael Paquier <michael@paquier.xyz> wrote:
>> Drop unnamed portal immediately after execution to completion
> This patch doesn't look well-considered to me. One problem is that
> it's a wire protocol change to fix a minor logging anomaly, which
> seems disproportionate. Another problem is that the new portal-drop
> behavior is conditional on whether XACT_FLAGS_NEEDIMMEDIATECOMMIT gets
> set, which seems unprincipled. In addition to those points, I am not
> entirely certain that the "here is no need for it beyond this point"
> comment is correct. I mean, I think it will normally be true, but what
> if the client wants to send a Describe message after-the-fact, or an
> additional Execute message that will presumably return zero rows?
Yeah, the whole idea of changing the wire-level behavior to fix this
has been making me itch: I don't believe for a moment that it won't
cause compatibility problems. I do not have a better proposal for
fixing the problem though.
regards, tom lane
On Mon, Nov 10, 2025 at 04:28:02PM -0500, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> This patch doesn't look well-considered to me. One problem is that >> it's a wire protocol change to fix a minor logging anomaly, which >> seems disproportionate. Another problem is that the new portal-drop >> behavior is conditional on whether XACT_FLAGS_NEEDIMMEDIATECOMMIT gets >> set, which seems unprincipled. In addition to those points, I am not >> entirely certain that the "here is no need for it beyond this point" >> comment is correct. I mean, I think it will normally be true, but what >> if the client wants to send a Describe message after-the-fact, or an >> additional Execute message that will presumably return zero rows? > > Yeah, the whole idea of changing the wire-level behavior to fix this > has been making me itch: I don't believe for a moment that it won't > cause compatibility problems. > > I do not have a better proposal for fixing the problem though. All the other solutions mentioned mean to work around the issue of the unnamed portal still being present by maintaining the query string it in a different context across multiple messages. And I doubt that anybody would be thrilled by that. If you think that we should continue to live with the protocol for unnamed portals as-is and continue to live with the existing behavior, meaning a revert of 1fd981f05369, that would not be the end of the world here: we'd still have some tests that track the past-and-currently-released behavior. Even if strange, it works with the timing where the unnamed protocol is dropped. Perhaps somebody would be able to come up with a approach better than a more aggressive portal drop once completed, but I don't quite see how much can be done as long as we manipulate the unnamed portals this way (aka drop then with a follow-up bind, and expect the logging to show the correct statements attached to a previous message that we have no knowledge about). The original statement logging did not really consider all these fancier cases with the extended query protocol. -- Michael
Вложения
On Mon, Nov 10, 2025 at 6:31 PM Michael Paquier <michael@paquier.xyz> wrote: > All the other solutions mentioned mean to work around the issue of the > unnamed portal still being present by maintaining the query string it > in a different context across multiple messages. And I doubt that > anybody would be thrilled by that. Why not? I mean, I haven't studied this problem and I don't know how complicated that would be, but it isn't obvious to me that it would be wildly impractical. We would need to pass around pointers to the query string, and make sure that the string itself isn't freed before all the pointers are gone, but maybe there's a reasonable way to do that. I would definitely rather do that than change the wire protocol. If that isn't practical, another option might be to just suppress the context in places where we know it can be misleading, rather than displaying a misleading context. That would still require us to know in which situations that the context might be misleading, which would likely merit some careful thought, but it would at least avoid us needing to know the query string upon which we wanted to place blame. Of course, this approach would produce less useful output, but it would still be better than showing wrong information. (I see that Tom previously suggested exactly this approach.) To me, this is a classic example of why programming with global variables is often a bad idea. If we lived in a world where a backend could only ever do one thing at a time, then consulting a global variable to find out what it's currently doing would be fine, but we have this whole system of portals and prepared statements precisely so that a backend can do more than one thing at a time, so the right thing to do is pass around pointers to the appropriate context object -- a Portal, or whatever -- to all the code that needs that information. If the existing context object, a Portal or whatever, doesn't have a long enough lifetime to still be available at all points where we need that information, then we should either make the existing type of context object live longer or invent a separate kind of context object that with a different lifespan that is suited to the remaining problems. Given how old and crufty the Portal machinery seems to be, the latter seems like the likely winner. > If you think that we should continue to live with the protocol for > unnamed portals as-is and continue to live with the existing behavior, > meaning a revert of 1fd981f05369, that would not be the end of the > world here: we'd still have some tests that track the > past-and-currently-released behavior. Yes, I'd argue for a revert. I think the risk of subtle breakage is high, and there is a good chance that it will be too late to pull this back by the time we realize what the problem is. I also feel that it's solving the problem in the wrong way. To me, this solution feels like saying "doing proper bookkeeping is hard, so let's redefine the contract with the user instead." But doing proper bookkeeping is a big part of what good software engineering is all about. We've put an enormous amount of energy into our error reporting and a lot of that energy has gone into making sure that the details that would be useful to the user are available in the parts of the code that need those details. There's lots and lots of code already getting query strings to be available at places where we might want to complain about them, and very often we also pass through a parse location as well, so that we can complain about a specific part of a query string. The fact that the query string that shows up in this case is wrong or misleading doesn't make that the wrong approach; it just means we haven't totally solved the problem yet. > Even if strange, it works with the timing where the unnamed protocol > is dropped. Perhaps somebody would be able to come up with a approach > better than a more aggressive portal drop once completed, but I don't > quite see how much can be done as long as we manipulate the unnamed > portals this way (aka drop then with a follow-up bind, and expect the > logging to show the correct statements attached to a previous > message that we have no knowledge about). The original statement > logging did not really consider all these fancier cases with the > extended query protocol. I have a really hard time with the idea that the problem here is with when the unnamed portal is dropped. The reasoning behind the existing wire protocol specification is a little unclear to me, but it seems like a perfectly elegant concept: you keep the unnamed portal around until something creates a new unnamed portal. I like that definition because it's simply and devoid of ambiguity, so other software interacting with PostgreSQL knows exactly what to expect. When you start to make the rules more complicated, that makes it harder for other software that speaks our wire protocol to be fully correct, especially if it must deal with both an old rule and a new one depending on the version of PostgreSQL to which it's connected. I think you could make an argument here that the problem is with when the temporary file cleanup is done (perhaps it should happen sooner, when the previous query is still in context) or that the temporary file cleanup should be passed some appropriate context so that it knows who to blame for error or that the temporary file cleanup can't really attribute any errors to a specific statement and thus shouldn't mention one at all, but I don't think blaming the unnamed portal for this is the right idea. Our portal management code seems to me to be full of deficiencies, but the wire protocol specification itself seems to be fine, so I think we should try to do a better job implementing the protocol, rather than changing it. -- Robert Haas EDB: http://www.enterprisedb.com
Hi,
I understand the point that’s been raised. Would it be an option to indeed revert the portal drop in postgres.c, but then append the right query in the "temporary file: ..." log line instead? This would work at least for me.
Attached is a POC patch (contains a layering violation, hoping it could be avoided somehow).
Kind regards,
Mircea Cadariu
Вложения
On Wed, Nov 12, 2025 at 11:14:26AM +0000, Mircea Cadariu wrote: > I understand the point that’s been raised. Would it be an option to indeed > revert the portal drop in postgres.c, but then append the right query in the > "temporary file: ..." log line instead? This would work at least for me. This is new, attaching the information to a Vfd in fd.c. Not sure that adding this information to this structure is a good concept. This layer of the code has no idea of query strings currently, so that feels a bit like a layer violation. Thinking a bit outside the box.. I was wondering about a plan D (plan A is what's on HEAD, plan B is copying around the query string, plan C this Vfd approach) where we shut down the executor when we have finished the execution of an unnamed portal, with something like the attached based on a more aggressive PortalCleanup(). However, that would not fly far if the client decides to send an extra execute message post-portal-completion where we'd still want the executor to be around, even if there are no rows to process. We presumably do not need the temp files anymore at this stage, but I don't really like the fact that we'd need to somehow take a shortcut if we want only to clean up the temp files. Perhaps the best answer for now is to revert and continue the discussion for this cycle as there seem to be little love for the current HEAD's solution with the protocol change. If folks have more ideas or input, please feel free. -- Michael
Вложения
On 13/11/2025 06:27, Michael Paquier wrote: > This is new, attaching the information to a Vfd in fd.c. Not sure > that adding this information to this structure is a good concept. > This layer of the code has no idea of query strings currently, so that > feels a bit like a layer violation. Thanks for having a look! FWIW I found a way for Plan C to work without including tcoprot into fd, see attached. There's a new field in that structure indeed, but maybe not that far fetched, it's the query that triggered the creation of the file. Kind regards, Mircea Cadariu
Вложения
> Thinking a bit outside the box.. I was wondering about a plan D (plan
> A is what's on HEAD, plan B is copying around the query string, plan C
> this Vfd approach) where we shut down the executor when we have
> finished the execution of an unnamed portal, with something like the
> attached based on a more aggressive PortalCleanup().
I am not sure I like this idea as-is, because besides that fact
that it's still a wire level change, it's not safe at all to
re-enter exec_execute_message after you just cleaned the
portal but did not drop it.
if (!PortalIsValid(portal)) will tell you the portal is still valid, but its
resources, like queryDesc and others are no longer available.
You can actually see what happens there with this handy
extended.py that communicates directly over the wire-protocol.
See the "back-to-back execute" print.
It results in
```
TRAP: failed Assert("queryDesc || portal->holdStore"), File:
"../src/backend/tcop/pquery.c", Line: 875, PID: 32358
0 postgres 0x00000001028ce52c
ExceptionalCondition + 108
1 postgres 0x00000001027a2470 PortalRunMulti + 0
2 postgres 0x00000001027a1fc0 PortalRun + 492
3 postgres 0x000000010279ff54 PostgresMain + 8896
4 postgres 0x0000000102799bbc BackendInitialize + 0
5 postgres 0x00000001026f1264
postmaster_child_launch + 364
6 postgres 0x00000001026f588c ServerLoop + 5840
7 postgres 0x00000001026f3800
InitProcessGlobals + 0
8 postgres 0x0000000102641564 help + 0
9 dyld 0x00000001930d6b98 start + 6076
```
This idea could perhaps work, but needs more thought.
> Perhaps the best answer for now is to revert and continue the
> discussion for this cycle as there seem to be little love for the
> current HEAD's solution with the protocol change.
>
> If folks have more ideas or input, please feel free.
That is probably the best idea now.
> This is new, attaching the information to a Vfd in fd.c. Not sure
> that adding this information to this structure is a good concept.
> This layer of the code has no idea of query strings currently, so that
> feels a bit like a layer violation.
Thanks for having a look! FWIW I found a way for Plan C to work without
including tcoprot into fd, see attached.
There's a new field in that structure indeed, but maybe not that far
fetched, it's the query that triggered the creation of the file.
To Michael's point, this looks like a layering violation. Besides,
I think this will double log, although I did not test, because you log
the statement once when closing the temp and once when logging
the STATEMENT.
One thing I am still not so sure about is we currently say that things
like the query_string will out live the portal, so I am still not clear what
is the risk of copying the query_string to debug_query_string during
PortalDrop?
```
/*
* We don't have to copy anything into the portal, because everything
* we are passing here is in MessageContext or the
* per_parsetree_context, and so will outlive the portal anyway.
*/
PortalDefineQuery(portal,
NULL,
query_string,
commandTag,
plantree_list,
NULL);
```
--
Sami Imseih
Amazon Web Services (AWS)
Вложения
On Thu, Nov 13, 2025 at 04:02:24PM -0600, Sami Imseih wrote: > I am not sure I like this idea as-is, because besides that fact > that it's still a wire level change, it's not safe at all to > re-enter exec_execute_message after you just cleaned the > portal but did not drop it. This will come at the cost of tracking a new state in the backend where the portal would still live with the executor state gone, I don't like much my own plan D and the road where this leads at the end. > That is probably the best idea now. I have just done a revert of 1fd981f05369 now, let's continue the discussions. -- Michael
Вложения
Hi Michael, I think I found a gap in the tests we added previously for documenting the current behaviour. See attached patch for your consideration. What's interesting about the holdable cursors scenario is that as far as I can tell the temp files are cleaned up during PersistHoldablePortal instead of PortalDrop. Kind regards, Mircea Cadariu
Вложения
On Fri, Nov 14, 2025 at 03:10:32PM +0000, Mircea Cadariu wrote: > What's interesting about the holdable cursors scenario is that as far as I > can tell the temp files are cleaned up during PersistHoldablePortal instead > of PortalDrop. As part of the ExecutorEnd() done in this case. Right, it looks like a good thing to track this behavior as well in the long-run. I'll double-check that later. -- Michael