Обсуждение: After 10 -> 15 upgrade getting "cannot commit while a portal is pinned" on one python function

Поиск
Список
Период
Сортировка

Greetings!

I built a trigger fired process that copies an "order" from our production database to our dev database.  An order, in this case, is an initial row from a table and all of the rows in all of the tables in that database/schema that are needed to satisfy all of the foreign key constraints for the original insert.  Through a web page, one of our folks can select a schema and an order id to copy.  That information is then inserted into a table.  A trigger attached to that table takes care of copying the necessary rows using a function that uses both plython3u and psycopg2.  I can supply the source code if that will help.

On postgresql 10 using plpython2, this function worked great.

After migration to 15 (now 15.5) and a switch to plpython3 (no code change needed inside the function) logging inside the function tells me that everything completes except the exit.

I then get this error:

NOTICE:  update cargotel_common.copy_orders_to_dev set copy_completed = 't', copy_completed_timestamp = clock_timestamp() where id = 21
ERROR:  cannot commit while a portal is pinned

What the heck?

I did find this error inside the source code.  This is from 15.6 source:

jross@workstation:~/postgresql-15.6$ grep -R -C20 "cannot commit while a portal is pinned" *
src/backend/utils/mmgr/portalmem.c- */
src/backend/utils/mmgr/portalmem.c-bool
src/backend/utils/mmgr/portalmem.c-PreCommit_Portals(bool isPrepare)
src/backend/utils/mmgr/portalmem.c-{
src/backend/utils/mmgr/portalmem.c-    bool        result = false;
src/backend/utils/mmgr/portalmem.c-    HASH_SEQ_STATUS status;
src/backend/utils/mmgr/portalmem.c-    PortalHashEnt *hentry;
src/backend/utils/mmgr/portalmem.c-
src/backend/utils/mmgr/portalmem.c-    hash_seq_init(&status, PortalHashTable);
src/backend/utils/mmgr/portalmem.c-
src/backend/utils/mmgr/portalmem.c-    while ((hentry = (PortalHashEnt *) hash_seq_search(&status)) != NULL)
src/backend/utils/mmgr/portalmem.c-    {
src/backend/utils/mmgr/portalmem.c-        Portal        portal = hentry->portal;
src/backend/utils/mmgr/portalmem.c-
src/backend/utils/mmgr/portalmem.c-        /*
src/backend/utils/mmgr/portalmem.c-         * There should be no pinned portals anymore. Complain if someone
src/backend/utils/mmgr/portalmem.c-         * leaked one. Auto-held portals are allowed; we assume that whoever
src/backend/utils/mmgr/portalmem.c-         * pinned them is managing them.
src/backend/utils/mmgr/portalmem.c-         */
src/backend/utils/mmgr/portalmem.c-        if (portal->portalPinned && !portal->autoHeld)
src/backend/utils/mmgr/portalmem.c:            elog(ERROR, "cannot commit while a portal is pinned");
src/backend/utils/mmgr/portalmem.c-
src/backend/utils/mmgr/portalmem.c-        /*
src/backend/utils/mmgr/portalmem.c-         * Do not touch active portals --- this can only happen in the case of
src/backend/utils/mmgr/portalmem.c-         * a multi-transaction utility command, such as VACUUM, or a commit in
src/backend/utils/mmgr/portalmem.c-         * a procedure.
src/backend/utils/mmgr/portalmem.c-         *
src/backend/utils/mmgr/portalmem.c-         * Note however that any resource owner attached to such a portal is
src/backend/utils/mmgr/portalmem.c-         * still going to go away, so don't leave a dangling pointer.  Also
src/backend/utils/mmgr/portalmem.c-         * unregister any snapshots held by the portal, mainly to avoid
src/backend/utils/mmgr/portalmem.c-         * snapshot leak warnings from ResourceOwnerRelease().
src/backend/utils/mmgr/portalmem.c-         */
src/backend/utils/mmgr/portalmem.c-        if (portal->status == PORTAL_ACTIVE)
src/backend/utils/mmgr/portalmem.c-        {
src/backend/utils/mmgr/portalmem.c-            if (portal->holdSnapshot)
src/backend/utils/mmgr/portalmem.c-            {
src/backend/utils/mmgr/portalmem.c-                if (portal->resowner)
src/backend/utils/mmgr/portalmem.c-                    UnregisterSnapshotFromOwner(portal->holdSnapshot,
src/backend/utils/mmgr/portalmem.c-                                                portal->resowner);
src/backend/utils/mmgr/portalmem.c-                portal->holdSnapshot = NULL;
src/backend/utils/mmgr/portalmem.c-            }

Do I have any idea of how to fix this after reading this bit of code?  No.

This error has been reported here before on December 2, 2022 and in other places as well.

https://www.postgresql.org/message-id/1061909348.200334.1669970706749%40mail.yahoo.com

No responses though to this message though.

In the code this:

src/backend/utils/mmgr/portalmem.c-        /*
src/backend/utils/mmgr/portalmem.c-         * There should be no pinned portals anymore. Complain if someone
src/backend/utils/mmgr/portalmem.c-         * leaked one. Auto-held portals are allowed; we assume that whoever
src/backend/utils/mmgr/portalmem.c-         * pinned them is managing them.
src/backend/utils/mmgr/portalmem.c-         */

makes me wonder if this error is specific to plpython3?

I can think of a way to maybe workaround this but this seems to be something that needs to be handled in a better way than slapping on a hacky bandaid.

Thanks for reading and for any suggestions, including clue-by-fours :-)

Jeff




On 3/20/24 15:18, Jeff Ross wrote:
> Greetings!
> 
> I built a trigger fired process that copies an "order" from our 
> production database to our dev database.  An order, in this case, is an 
> initial row from a table and all of the rows in all of the tables in 
> that database/schema that are needed to satisfy all of the foreign key 
> constraints for the original insert.  Through a web page, one of our 
> folks can select a schema and an order id to copy.  That information is 
> then inserted into a table.  A trigger attached to that table takes care 
> of copying the necessary rows using a function that uses both plython3u 
> and psycopg2.  I can supply the source code if that will help.

I think that will help, especially the interaction between psycopg2 and 
plpython3u.


-- 
Adrian Klaver
adrian.klaver@aklaver.com




Jeff Ross <jross@openvistas.net> writes:
> I then get this error:

> NOTICE:  update cargotel_common.copy_orders_to_dev set copy_completed = 
> 't', copy_completed_timestamp = clock_timestamp() where id = 21
> ERROR:  cannot commit while a portal is pinned

Would you mind supplying a self-contained example that triggers this?

            regards, tom lane



On 3/20/24 16:25, Adrian Klaver wrote:

> On 3/20/24 15:18, Jeff Ross wrote:
>> Greetings!
>>
>> I built a trigger fired process that copies an "order" from our 
>> production database to our dev database.  An order, in this case, is 
>> an initial row from a table and all of the rows in all of the tables 
>> in that database/schema that are needed to satisfy all of the foreign 
>> key constraints for the original insert. Through a web page, one of 
>> our folks can select a schema and an order id to copy.  That 
>> information is then inserted into a table.  A trigger attached to 
>> that table takes care of copying the necessary rows using a function 
>> that uses both plython3u and psycopg2.  I can supply the source code 
>> if that will help.
>
> I think that will help, especially the interaction between psycopg2 
> and plpython3u.
>
As requested:

https://openvistas.net/copy_orders_to_dev.html

Jeff




On 3/20/24 16:50, Tom Lane wrote:
> Jeff Ross <jross@openvistas.net> writes:
>> I then get this error:
>> NOTICE:  update cargotel_common.copy_orders_to_dev set copy_completed =
>> 't', copy_completed_timestamp = clock_timestamp() where id = 21
>> ERROR:  cannot commit while a portal is pinned
> Would you mind supplying a self-contained example that triggers this?
>
>             regards, tom lane
>
>
I'll see if I can up with something.

Thanks for the reply!

Jeff




On 3/20/24 15:52, Jeff Ross wrote:
> On 3/20/24 16:25, Adrian Klaver wrote:
> 
>> On 3/20/24 15:18, Jeff Ross wrote:
>>> Greetings!
>>>
>>> I built a trigger fired process that copies an "order" from our 
>>> production database to our dev database.  An order, in this case, is 
>>> an initial row from a table and all of the rows in all of the tables 
>>> in that database/schema that are needed to satisfy all of the foreign 
>>> key constraints for the original insert. Through a web page, one of 
>>> our folks can select a schema and an order id to copy.  That 
>>> information is then inserted into a table.  A trigger attached to 
>>> that table takes care of copying the necessary rows using a function 
>>> that uses both plython3u and psycopg2.  I can supply the source code 
>>> if that will help.
>>
>> I think that will help, especially the interaction between psycopg2 
>> and plpython3u.
>>
> As requested:
> 
> https://openvistas.net/copy_orders_to_dev.html

Haven't had a chance to go through this yet. I'm going to say though 
that Tom Lane is looking for a shorter generic case that anyone could 
run on their system.

> 
> Jeff
> 
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




Adrian Klaver <adrian.klaver@aklaver.com> writes:
> Haven't had a chance to go through this yet. I'm going to say though 
> that Tom Lane is looking for a shorter generic case that anyone could 
> run on their system.

Yeah, it's a long way from that trigger function definition to a
working (i.e. failing) example.  Shortening the trigger might help by
eliminating some parts of the infrastructure that would need to be
shown --- but nobody's going to try to reverse-engineer all that.

            regards, tom lane



On 3/20/24 15:52, Jeff Ross wrote:
> On 3/20/24 16:25, Adrian Klaver wrote:
> 
>> On 3/20/24 15:18, Jeff Ross wrote:
>>> Greetings!
>>>
>>> I built a trigger fired process that copies an "order" from our 
>>> production database to our dev database.  An order, in this case, is 
>>> an initial row from a table and all of the rows in all of the tables 
>>> in that database/schema that are needed to satisfy all of the foreign 
>>> key constraints for the original insert. Through a web page, one of 
>>> our folks can select a schema and an order id to copy.  That 
>>> information is then inserted into a table.  A trigger attached to 
>>> that table takes care of copying the necessary rows using a function 
>>> that uses both plython3u and psycopg2.  I can supply the source code 
>>> if that will help.
>>
>> I think that will help, especially the interaction between psycopg2 
>> and plpython3u.
>>
> As requested:
> 
> https://openvistas.net/copy_orders_to_dev.html

1) I have not gone through this thoroughly enough to figure out what is 
going on.

2) Things I have noticed, may not be relevant.

a) from psycopg2 import sql
    Never used.

b) #prod_database_connection.set_session(autocommit=True)
    #dev_database_connection.set_session(autocommit=True)

    Why are they commented out?

c) prod_database_connection_string = "host='pgbouncer' dbname='%s' 
application_name = '%s'"

dev_database_connection_string = "host='pgbouncer' dbname='%s' 
application_name = '%s'"

What version of PgBouncer?

How is it setup?

d) Why predefine all those cursors()?

e) Why is database global?

> 
> Jeff
> 
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




On 3/20/24 17:04, Tom Lane wrote:

Adrian Klaver <adrian.klaver@aklaver.com> writes:
Haven't had a chance to go through this yet. I'm going to say though 
that Tom Lane is looking for a shorter generic case that anyone could 
run on their system.
Yeah, it's a long way from that trigger function definition to a
working (i.e. failing) example.  Shortening the trigger might help by
eliminating some parts of the infrastructure that would need to be
shown --- but nobody's going to try to reverse-engineer all that.
			regards, tom lane

It took some digging but I've found a very simple fix to this.

Somewhere (sorry, can't find it again) I read that a postgresql cursor is sometimes referenced as "portal".  This was when I was still pretty sure that this was a psycopg2 issue.

Further testing ruled that out--I wasn't getting the error on the psycopg2 commit statements, I was getting the error when the plpython3u function itself exits and tries to commit.

I only use one plpython3u cursor in that function.  The plpython docs say:

"Cursors are automatically disposed of. But if you want to explicitly release all resources held by a cursor, use the close method. Once closed, a cursor cannot be fetched from anymore."

https://www.postgresql.org/docs/15/plpython-database.html#id-1.8.11.14.3

Perhaps "pinned" in the error message means "open"?

I added a cursor.close() as the last line called in that function and it works again.

I haven't been able to come up with a test case that throws the same error, though, so I consider this a solution to what is very likely an odd corner case.

Jeff

Jeff Ross <jross@openvistas.net> writes:
> I only use one plpython3u cursor in that function.  The plpython docs say:

> "Cursors are automatically disposed of. But if you want to explicitly 
> release all resources held by a cursor, use the |close| method. Once 
> closed, a cursor cannot be fetched from anymore."

> https://www.postgresql.org/docs/15/plpython-database.html#id-1.8.11.14.3

> Perhaps "pinned" in the error message means "open"?

No, it means "pinned" ... but I see that plpython pins the portal
underlying any PLyCursor object it creates.  Most of our PLs do
that too, to prevent a portal from disappearing under them (e.g.
if you were to try to close the portal directly from SQL rather
than via whatever mechanism the PL wants you to use).

> I added a cursor.close() as the last line called in that function and it 
> works again.

It looks to me like PLy_cursor_close does pretty much exactly the same
cleanup as PLy_cursor_dealloc, including unpinning and closing the
underlying portal.  I'm far from a Python expert, but I suspect that
the docs you quote intend to say "cursors are disposed of when Python
garbage-collects them", and that the reason your code is failing is
that there's still a reference to the PLyCursor somewhere after the
plpython function exits, perhaps in a Python global variable.

            regards, tom lane



On 3/27/24 15:44, Tom Lane wrote:

Perhaps "pinned" in the error message means "open"?
No, it means "pinned" ... but I see that plpython pins the portal
underlying any PLyCursor object it creates.  Most of our PLs do
that too, to prevent a portal from disappearing under them (e.g.
if you were to try to close the portal directly from SQL rather
than via whatever mechanism the PL wants you to use).

I added a cursor.close() as the last line called in that function and it 
works again.
It looks to me like PLy_cursor_close does pretty much exactly the same
cleanup as PLy_cursor_dealloc, including unpinning and closing the
underlying portal.  I'm far from a Python expert, but I suspect that
the docs you quote intend to say "cursors are disposed of when Python
garbage-collects them", and that the reason your code is failing is
that there's still a reference to the PLyCursor somewhere after the
plpython function exits, perhaps in a Python global variable.
			regards, tom lane


Thank you for your reply, as always, Tom!

Debugging at this level might well be over my paygrade ;-)

I just happy that the function works again, and that I was able to share a solution to this apparently rare error with the community.

Jeff



On 3/27/24 17:05, Jeff Ross wrote:

On 3/27/24 15:44, Tom Lane wrote:

Perhaps "pinned" in the error message means "open"?
No, it means "pinned" ... but I see that plpython pins the portal
underlying any PLyCursor object it creates.  Most of our PLs do
that too, to prevent a portal from disappearing under them (e.g.
if you were to try to close the portal directly from SQL rather
than via whatever mechanism the PL wants you to use).

I added a cursor.close() as the last line called in that function and it 
works again.
It looks to me like PLy_cursor_close does pretty much exactly the same
cleanup as PLy_cursor_dealloc, including unpinning and closing the
underlying portal.  I'm far from a Python expert, but I suspect that
the docs you quote intend to say "cursors are disposed of when Python
garbage-collects them", and that the reason your code is failing is
that there's still a reference to the PLyCursor somewhere after the
plpython function exits, perhaps in a Python global variable.
			regards, tom lane


Thank you for your reply, as always, Tom!

Debugging at this level might well be over my paygrade ;-)

I just happy that the function works again, and that I was able to share a solution to this apparently rare error with the community.

Jeff

My read of Tom's reply suggests you still have work to do to find the other "reference" holding on to your cursor.
On 3/27/24 16:35, Rob Sargent wrote:
> 
> 
> On 3/27/24 17:05, Jeff Ross wrote:
>>
>> On 3/27/24 15:44, Tom Lane wrote:
>>
>>> Perhaps "pinned" in the error message means "open"?
>>> No, it means "pinned" ... but I see that plpython pins the portal
>>> underlying any PLyCursor object it creates.  Most of our PLs do
>>> that too, to prevent a portal from disappearing under them (e.g.
>>> if you were to try to close the portal directly from SQL rather
>>> than via whatever mechanism the PL wants you to use).
>>>
>>>> I added a cursor.close() as the last line called in that function and it
>>>> works again.
>>> It looks to me like PLy_cursor_close does pretty much exactly the same
>>> cleanup as PLy_cursor_dealloc, including unpinning and closing the
>>> underlying portal.  I'm far from a Python expert, but I suspect that
>>> the docs you quote intend to say "cursors are disposed of when Python
>>> garbage-collects them", and that the reason your code is failing is
>>> that there's still a reference to the PLyCursor somewhere after the
>>> plpython function exits, perhaps in a Python global variable.
>>>
>>>             regards, tom lane
>>>
>>>
>> Thank you for your reply, as always, Tom!
>>
>> Debugging at this level might well be over my paygrade ;-)
>>
>> I just happy that the function works again, and that I was able to 
>> share a solution to this apparently rare error with the community.
>>
>> Jeff
>>
> My read of Tom's reply suggests you still have work to do to find the 
> other "reference" holding on to your cursor.

I would start with:

def logging(comment):
     global database
     <...>

-- 
Adrian Klaver
adrian.klaver@aklaver.com




On 3/27/24 17:41, Adrian Klaver wrote:
> On 3/27/24 16:35, Rob Sargent wrote:
>>
>>
>> On 3/27/24 17:05, Jeff Ross wrote:
>>>
>>> On 3/27/24 15:44, Tom Lane wrote:
>>>
>>>> Perhaps "pinned" in the error message means "open"?
>>>> No, it means "pinned" ... but I see that plpython pins the portal
>>>> underlying any PLyCursor object it creates.  Most of our PLs do
>>>> that too, to prevent a portal from disappearing under them (e.g.
>>>> if you were to try to close the portal directly from SQL rather
>>>> than via whatever mechanism the PL wants you to use).
>>>>
>>>>> I added a cursor.close() as the last line called in that function 
>>>>> and it
>>>>> works again.
>>>> It looks to me like PLy_cursor_close does pretty much exactly the same
>>>> cleanup as PLy_cursor_dealloc, including unpinning and closing the
>>>> underlying portal.  I'm far from a Python expert, but I suspect that
>>>> the docs you quote intend to say "cursors are disposed of when Python
>>>> garbage-collects them", and that the reason your code is failing is
>>>> that there's still a reference to the PLyCursor somewhere after the
>>>> plpython function exits, perhaps in a Python global variable.
>>>>
>>>>             regards, tom lane
>>>>
>>>>
>>> Thank you for your reply, as always, Tom!
>>>
>>> Debugging at this level might well be over my paygrade ;-)
>>>
>>> I just happy that the function works again, and that I was able to 
>>> share a solution to this apparently rare error with the community.
>>>
>>> Jeff
>>>
>> My read of Tom's reply suggests you still have work to do to find the 
>> other "reference" holding on to your cursor.
>
> I would start with:
>
> def logging(comment):
>     global database
>     <...>
>
Already removed that--thanks, though.




On 3/27/24 17:35, Rob Sargent wrote:


On 3/27/24 17:05, Jeff Ross wrote:

On 3/27/24 15:44, Tom Lane wrote:

Perhaps "pinned" in the error message means "open"?
No, it means "pinned" ... but I see that plpython pins the portal
underlying any PLyCursor object it creates.  Most of our PLs do
that too, to prevent a portal from disappearing under them (e.g.
if you were to try to close the portal directly from SQL rather
than via whatever mechanism the PL wants you to use).

I added a cursor.close() as the last line called in that function and it 
works again.
It looks to me like PLy_cursor_close does pretty much exactly the same
cleanup as PLy_cursor_dealloc, including unpinning and closing the
underlying portal.  I'm far from a Python expert, but I suspect that
the docs you quote intend to say "cursors are disposed of when Python
garbage-collects them", and that the reason your code is failing is
that there's still a reference to the PLyCursor somewhere after the
plpython function exits, perhaps in a Python global variable.
			regards, tom lane


Thank you for your reply, as always, Tom!

Debugging at this level might well be over my paygrade ;-)

I just happy that the function works again, and that I was able to share a solution to this apparently rare error with the community.

Jeff

My read of Tom's reply suggests you still have work to do to find the other "reference" holding on to your cursor.

Yes, my read was the same.

There are exactly 3 references to that cursor now that I added the close() at the end. 

Here are the first 2 (cursor renamed from the code I posted):

        plpy_cursor = plpy.cursor(schemas_query)
        while True:
            schema_rows = plpy_cursor.fetch(100)

The last is:

        plpy_cursor.close()

I don't know how to proceed further.

On 3/27/24 18:00, Jeff Ross wrote:
> 
> On 3/27/24 17:35, Rob Sargent wrote:
>>
>>
>> On 3/27/24 17:05, Jeff Ross wrote:
>>>
>>> On 3/27/24 15:44, Tom Lane wrote:
>>>
>>>> Perhaps "pinned" in the error message means "open"?
>>>> No, it means "pinned" ... but I see that plpython pins the portal
>>>> underlying any PLyCursor object it creates.  Most of our PLs do
>>>> that too, to prevent a portal from disappearing under them (e.g.
>>>> if you were to try to close the portal directly from SQL rather
>>>> than via whatever mechanism the PL wants you to use).
>>>>
>>>>> I added a cursor.close() as the last line called in that function and it
>>>>> works again.
>>>> It looks to me like PLy_cursor_close does pretty much exactly the same
>>>> cleanup as PLy_cursor_dealloc, including unpinning and closing the
>>>> underlying portal.  I'm far from a Python expert, but I suspect that
>>>> the docs you quote intend to say "cursors are disposed of when Python
>>>> garbage-collects them", and that the reason your code is failing is
>>>> that there's still a reference to the PLyCursor somewhere after the
>>>> plpython function exits, perhaps in a Python global variable.
>>>>
>>>>             regards, tom lane
>>>>
>>>>
>>> Thank you for your reply, as always, Tom!
>>>
>>> Debugging at this level might well be over my paygrade ;-)
>>>
>>> I just happy that the function works again, and that I was able to 
>>> share a solution to this apparently rare error with the community.
>>>
>>> Jeff
>>>
>> My read of Tom's reply suggests you still have work to do to find the 
>> other "reference" holding on to your cursor.
> 
> Yes, my read was the same.
> 
> There are exactly 3 references to that cursor now that I added the 
> close() at the end.
> 
> Here are the first 2 (cursor renamed from the code I posted):
> 
>          plpy_cursor = plpy.cursor(schemas_query)
>          while True:
>              schema_rows = plpy_cursor.fetch(100)

If the above is the complete while loop how you expect it to break out 
of the loop?

Or did you do per Postgres docs?:

https://www.postgresql.org/docs/current/plpython-database.html

cursor = plpy.cursor("select num from largetable")
while True:
     rows = cursor.fetch(batch_size)
     if not rows:
         break
     for row in rows:
         if row['num'] % 2:
             odd += 1


> 
> The last is:
> 
>          plpy_cursor.close()
> 
> I don't know how to proceed further.
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com