Обсуждение: PL/Python initialization cleanup
As I was working through steps to make PL/Python more thread-safe, I noticed that the initialization code of PL/Python is pretty messy. I think some of this has grown while both Python 2 and 3 were supported, because they required different initialization steps, and we had some defenses against accidentally running both at the same time. But that is over, and right now a lot of this doesn't make sense anymore. For example, the function PLy_init_interp() said "Initialize the Python interpreter ..." but it didn't actually do this, and PLy_init_plpy() said "initialize plpy module" but it didn't do that either (or at least they used the term "initialize" in non-standard ways). Here are some patches to clean this up. After this change, all the global initialization is called directly from _PG_init(), and the plpy module initialization is all called from its registered initialization function PyInit_plpy(). (For the thread-safety job, the plpy module initialization will need to be rewritten using a different API. That's why I'm keen to have it clearly separated.) I also tried to add more comments and make existing comments more precise. There was also some apparently obsolete or redundant code that could be deleted. Surely, all of this will need some more rounds of careful scrutiny, but I think the overall code arrangement is correct and an improvement.
Вложения
> On Dec 31, 2025, at 16:47, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> As I was working through steps to make PL/Python more thread-safe, I noticed that the initialization code of
PL/Pythonis pretty messy. I think some of this has grown while both Python 2 and 3 were supported, because they
requireddifferent initialization steps, and we had some defenses against accidentally running both at the same time.
Butthat is over, and right now a lot of this doesn't make sense anymore. For example, the function PLy_init_interp()
said"Initialize the Python interpreter ..." but it didn't actually do this, and PLy_init_plpy() said "initialize plpy
module"but it didn't do that either (or at least they used the term "initialize" in non-standard ways).
>
> Here are some patches to clean this up. After this change, all the global initialization is called directly from
_PG_init(),and the plpy module initialization is all called from its registered initialization function PyInit_plpy().
(Forthe thread-safety job, the plpy module initialization will need to be rewritten using a different API. That's why
I'mkeen to have it clearly separated.) I also tried to add more comments and make existing comments more precise.
Therewas also some apparently obsolete or redundant code that could be deleted.
>
> Surely, all of this will need some more rounds of careful scrutiny, but I think the overall code arrangement is
correctand an improvement.
>
<v1-0001-plpython-Remove-commented-out-code.patch><v1-0002-plpython-Clean-up-PyModule_AddObject-uses.patch><v1-0003-plpython-Remove-duplicate-PyModule_Create.patch><v1-0004-plpython-Streamline-initialization.patch>
I just did an eyeball review. Overall looks good to me. The cleanup, as explained in the patch email, makes sense to
me.Only a nit comment on 0002:
1 - 0002
```
+ if (PyModule_AddObject(mod, modname, exc) < 0)
+ {
+ Py_XDECREF(exc);
+ PLy_elog(ERROR, "could not add exceptions %s", name);
+ }
```
Plural “exceptions” is a little confusing. What about “could not add exception object”?
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On Wed Dec 31, 2025 at 5:47 AM -03, Peter Eisentraut wrote:
> As I was working through steps to make PL/Python more thread-safe, I
> noticed that the initialization code of PL/Python is pretty messy. I
> think some of this has grown while both Python 2 and 3 were supported,
> because they required different initialization steps, and we had some
> defenses against accidentally running both at the same time. But that
> is over, and right now a lot of this doesn't make sense anymore. For
> example, the function PLy_init_interp() said "Initialize the Python
> interpreter ..." but it didn't actually do this, and PLy_init_plpy()
> said "initialize plpy module" but it didn't do that either (or at least
> they used the term "initialize" in non-standard ways).
>
> Here are some patches to clean this up. After this change, all the
> global initialization is called directly from _PG_init(), and the plpy
> module initialization is all called from its registered initialization
> function PyInit_plpy(). (For the thread-safety job, the plpy module
> initialization will need to be rewritten using a different API. That's
> why I'm keen to have it clearly separated.) I also tried to add more
> comments and make existing comments more precise. There was also some
> apparently obsolete or redundant code that could be deleted.
>
> Surely, all of this will need some more rounds of careful scrutiny, but
> I think the overall code arrangement is correct and an improvement.
0001, 0003 and 0004 looks good to me, just a small comment on 0002:
- /*
- * PyModule_AddObject does not add a refcount to the object, for some odd
- * reason; we must do that.
- */
- Py_INCREF(exc);
- PyModule_AddObject(mod, modname, exc);
-
/*
* The caller will also store a pointer to the exception object in some
- * permanent variable, so add another ref to account for that. This is
- * probably excessively paranoid, but let's be sure.
+ * permanent variable, so add another ref to account for that.
*/
Py_INCREF(exc);
The later code comment say "so add another ref to account for that", but
you've removed the previous Py_INCREF() call. The returned object
PyErr_NewException() already have a refcount increased for usage? If
that's not the case I think that the "add another ref..." don't seems
correct because IIUC we are increasing the ref count for the first time,
so there is no "another" refcount being increased.
--
Matheus Alcantara
EDB: https://www.enterprisedb.com
On 05.01.26 16:08, Matheus Alcantara wrote: > On Wed Dec 31, 2025 at 5:47 AM -03, Peter Eisentraut wrote: >> As I was working through steps to make PL/Python more thread-safe, I >> noticed that the initialization code of PL/Python is pretty messy. I >> think some of this has grown while both Python 2 and 3 were supported, >> because they required different initialization steps, and we had some >> defenses against accidentally running both at the same time. But that >> is over, and right now a lot of this doesn't make sense anymore. For >> example, the function PLy_init_interp() said "Initialize the Python >> interpreter ..." but it didn't actually do this, and PLy_init_plpy() >> said "initialize plpy module" but it didn't do that either (or at least >> they used the term "initialize" in non-standard ways). >> >> Here are some patches to clean this up. After this change, all the >> global initialization is called directly from _PG_init(), and the plpy >> module initialization is all called from its registered initialization >> function PyInit_plpy(). (For the thread-safety job, the plpy module >> initialization will need to be rewritten using a different API. That's >> why I'm keen to have it clearly separated.) I also tried to add more >> comments and make existing comments more precise. There was also some >> apparently obsolete or redundant code that could be deleted. >> >> Surely, all of this will need some more rounds of careful scrutiny, but >> I think the overall code arrangement is correct and an improvement. > > 0001, 0003 and 0004 looks good to me, just a small comment on 0002: > > - /* > - * PyModule_AddObject does not add a refcount to the object, for some odd > - * reason; we must do that. > - */ > - Py_INCREF(exc); > - PyModule_AddObject(mod, modname, exc); > - > /* > * The caller will also store a pointer to the exception object in some > - * permanent variable, so add another ref to account for that. This is > - * probably excessively paranoid, but let's be sure. > + * permanent variable, so add another ref to account for that. > */ > Py_INCREF(exc); > > The later code comment say "so add another ref to account for that", but > you've removed the previous Py_INCREF() call. The returned object > PyErr_NewException() already have a refcount increased for usage? If > that's not the case I think that the "add another ref..." don't seems > correct because IIUC we are increasing the ref count for the first time, > so there is no "another" refcount being increased. The reference created by PyErr_NewException() is "stolen" by PyModule_AddObject(), so we need to create another one for returning the object from the function and storing it in the permanent variable. I have updated the comment in this new patch version. But I think the actual code is correct.
Вложения
On 01.01.26 00:34, Chao Li wrote:
>
>
>> On Dec 31, 2025, at 16:47, Peter Eisentraut <peter@eisentraut.org> wrote:
>>
>> As I was working through steps to make PL/Python more thread-safe, I noticed that the initialization code of
PL/Pythonis pretty messy. I think some of this has grown while both Python 2 and 3 were supported, because they
requireddifferent initialization steps, and we had some defenses against accidentally running both at the same time.
Butthat is over, and right now a lot of this doesn't make sense anymore. For example, the function PLy_init_interp()
said"Initialize the Python interpreter ..." but it didn't actually do this, and PLy_init_plpy() said "initialize plpy
module"but it didn't do that either (or at least they used the term "initialize" in non-standard ways).
>>
>> Here are some patches to clean this up. After this change, all the global initialization is called directly from
_PG_init(),and the plpy module initialization is all called from its registered initialization function PyInit_plpy().
(Forthe thread-safety job, the plpy module initialization will need to be rewritten using a different API. That's why
I'mkeen to have it clearly separated.) I also tried to add more comments and make existing comments more precise.
Therewas also some apparently obsolete or redundant code that could be deleted.
>>
>> Surely, all of this will need some more rounds of careful scrutiny, but I think the overall code arrangement is
correctand an improvement.
>>
<v1-0001-plpython-Remove-commented-out-code.patch><v1-0002-plpython-Clean-up-PyModule_AddObject-uses.patch><v1-0003-plpython-Remove-duplicate-PyModule_Create.patch><v1-0004-plpython-Streamline-initialization.patch>
>
> I just did an eyeball review. Overall looks good to me. The cleanup, as explained in the patch email, makes sense to
me.Only a nit comment on 0002:
>
> 1 - 0002
> ```
> + if (PyModule_AddObject(mod, modname, exc) < 0)
> + {
> + Py_XDECREF(exc);
> + PLy_elog(ERROR, "could not add exceptions %s", name);
> + }
> ```
>
> Plural “exceptions” is a little confusing. What about “could not add exception object”?
Thanks, I have fixed this in the v2 patch (sent in a separate message).
On 12/01/26 09:24, Peter Eisentraut wrote: >> 0001, 0003 and 0004 looks good to me, just a small comment on 0002: >> >> - /* >> - * PyModule_AddObject does not add a refcount to the object, >> for some odd >> - * reason; we must do that. >> - */ >> - Py_INCREF(exc); >> - PyModule_AddObject(mod, modname, exc); >> - >> /* >> * The caller will also store a pointer to the exception >> object in some >> - * permanent variable, so add another ref to account for that. >> This is >> - * probably excessively paranoid, but let's be sure. >> + * permanent variable, so add another ref to account for that. >> */ >> Py_INCREF(exc); >> >> The later code comment say "so add another ref to account for that", >> but >> you've removed the previous Py_INCREF() call. The returned object >> PyErr_NewException() already have a refcount increased for usage? If >> that's not the case I think that the "add another ref..." don't seems >> correct because IIUC we are increasing the ref count for the first >> time, >> so there is no "another" refcount being increased. > > The reference created by PyErr_NewException() is "stolen" by > PyModule_AddObject(), so we need to create another one for returning > the object from the function and storing it in the permanent > variable. I have updated the comment in this new patch version. But > I think the actual code is correct. Thanks, it looks more clear IMHO now. Agree that the code is correct. -- Matheus Alcantara EDB: https://www.enterprisedb.com
> On 12/01/26 09:24, Peter Eisentraut wrote: > >> 0001, 0003 and 0004 looks good to me, just a small comment on 0002: > >> > >> - /* > >> - * PyModule_AddObject does not add a refcount to the object, for > >> some odd > >> - * reason; we must do that. > >> - */ > >> - Py_INCREF(exc); > >> - PyModule_AddObject(mod, modname, exc); > >> - > >> /* > >> * The caller will also store a pointer to the exception object > >> in some > >> - * permanent variable, so add another ref to account for that. > >> This is > >> - * probably excessively paranoid, but let's be sure. > >> + * permanent variable, so add another ref to account for that. > >> */ > >> Py_INCREF(exc); > >> > >> The later code comment say "so add another ref to account for that", > >> but you've removed the previous Py_INCREF() call. The returned object > >> PyErr_NewException() already have a refcount increased for usage? If > >> that's not the case I think that the "add another ref..." don't seems > >> correct because IIUC we are increasing the ref count for the first > >> time, so there is no "another" refcount being increased. > > > > The reference created by PyErr_NewException() is "stolen" by > > PyModule_AddObject(), so we need to create another one for returning > > the object from the function and storing it in the permanent variable. > > I have updated the comment in this new patch version. But I think the > > actual code is correct. > > Thanks, it looks more clear IMHO now. Agree that the code is correct. > > > -- > Matheus Alcantara > EDB: https://www.enterprisedb.com Hi Peter, I have applied and reviewed the v2 patch set. The cleanup of the initialization flow is a very good improvement and makesthe logic much easier to follow. In particular, the updated comments in PLy_create_exception regarding the reference counting logic are very helpful and clarifythe previous ambiguity. I have one minor nitpick in plpy_main.c regarding consistency. In _PG_init(), the import and dictionary insertion of the"plpy" module is currently split into two checks with the same error message. To make it more consistent with how the__main__ module is handled earlier in the same function, we could potentially streamline it like this: /* * Import plpy. */ plpy_mod = PyImport_ImportModule("plpy"); if (plpy_mod == NULL || PyDict_SetItemString(main_dict, "plpy", plpy_mod) < 0) PLy_elog(ERROR, "could not import \"plpy\" module"); This is just a small suggestion for style consistency; the existing logic in v2 is perfectly correct. Regards, Yuan Li(carol)
On Mon, 12 Jan 2026 at 17:25, Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 01.01.26 00:34, Chao Li wrote:
> >
> >
> >> On Dec 31, 2025, at 16:47, Peter Eisentraut <peter@eisentraut.org> wrote:
> >>
> >> As I was working through steps to make PL/Python more thread-safe, I noticed that the initialization code of
PL/Pythonis pretty messy. I think some of this has grown while both Python 2 and 3 were supported, because they
requireddifferent initialization steps, and we had some defenses against accidentally running both at the same time.
Butthat is over, and right now a lot of this doesn't make sense anymore. For example, the function PLy_init_interp()
said"Initialize the Python interpreter ..." but it didn't actually do this, and PLy_init_plpy() said "initialize plpy
module"but it didn't do that either (or at least they used the term "initialize" in non-standard ways).
> >>
> >> Here are some patches to clean this up. After this change, all the global initialization is called directly from
_PG_init(),and the plpy module initialization is all called from its registered initialization function PyInit_plpy().
(Forthe thread-safety job, the plpy module initialization will need to be rewritten using a different API. That's why
I'mkeen to have it clearly separated.) I also tried to add more comments and make existing comments more precise.
Therewas also some apparently obsolete or redundant code that could be deleted.
> >>
> >> Surely, all of this will need some more rounds of careful scrutiny, but I think the overall code arrangement is
correctand an improvement.
> >>
<v1-0001-plpython-Remove-commented-out-code.patch><v1-0002-plpython-Clean-up-PyModule_AddObject-uses.patch><v1-0003-plpython-Remove-duplicate-PyModule_Create.patch><v1-0004-plpython-Streamline-initialization.patch>
> >
> > I just did an eyeball review. Overall looks good to me. The cleanup, as explained in the patch email, makes sense
tome. Only a nit comment on 0002:
> >
> > 1 - 0002
> > ```
> > + if (PyModule_AddObject(mod, modname, exc) < 0)
> > + {
> > + Py_XDECREF(exc);
> > + PLy_elog(ERROR, "could not add exceptions %s", name);
> > + }
> > ```
> >
> > Plural “exceptions” is a little confusing. What about “could not add exception object”?
>
> Thanks, I have fixed this in the v2 patch (sent in a separate message).
>
hi!
0001, 0002, 0003, are ready, LGTM.
For 0004, do we need main_dict at all? it is only used inside _PG_init
and then its value assigned to PLy_interp_globals...
--
Best regards,
Kirill Reshke
On 13.01.26 07:15, li carol wrote:
> I have applied and reviewed the v2 patch set. The cleanup of the initialization flow is a very good improvement and
makesthe logic much easier to follow.
> In particular, the updated comments in PLy_create_exception regarding the reference counting logic are very helpful
andclarify the previous ambiguity.
> I have one minor nitpick in plpy_main.c regarding consistency. In _PG_init(), the import and dictionary insertion of
the"plpy" module is currently split into two checks with the same error message. To make it more consistent with how
the__main__ module is handled earlier in the same function, we could potentially streamline it like this:
> /*
> * Import plpy.
> */
> plpy_mod = PyImport_ImportModule("plpy");
> if (plpy_mod == NULL || PyDict_SetItemString(main_dict, "plpy", plpy_mod) < 0)
> PLy_elog(ERROR, "could not import \"plpy\" module");
>
> This is just a small suggestion for style consistency; the existing logic in v2 is perfectly correct.
Thank you for this suggestion. I agree that having the same error
message for two different situations was not good. I changed the error
checking of PyDict_SetItemString() like you suggested here, but also
gave it a separate PLy_elog() call (without a message, since it's pretty
much "can't happen").
I committed the patches with this change.
On 14.01.26 22:03, Kirill Reshke wrote: > For 0004, do we need main_dict at all? it is only used inside _PG_init > and then its value assigned to PLy_interp_globals... This was merely a stylistic, defensive choice. I prefer to keep main_dict in a local variable while it is being assembled, and only put it into a global variable once it's done. This reduces the chance of accidentally having a partially constructed object globally visible. Another point is that main_dict is a borrowed reference, but when we put it into the global variable we need to add another reference. Having two separate variables makes this step more explicit.