Обсуждение: PL/Python fails on new NetBSD/PPC 8.0 install

Поиск
Список
Период
Сортировка

PL/Python fails on new NetBSD/PPC 8.0 install

От
Tom Lane
Дата:
I noticed that the old NetBSD 5.1.5 installation I had on my G4 Mac
was no longer passing our regression tests, because it has a strtof()
that is sloppy about underflow.  Rather than fight with that I decided
to update it to something shinier (well, as shiny as you can get on
hardware that's old enough to apply for a driver's license).  I stuck in
NetBSD/macppc 8.0, and things seem to work, except that PL/Python
crashes on launch.  I see something like this in the postmaster log:

Traceback (most recent call last):
  File "<frozen importlib._bootstrap>", line 1162, in _install_external_importers
  File "<frozen importlib._bootstrap>", line 980, in _find_and_load
  File "<frozen importlib._bootstrap>", line 149, in __enter__
  File "<frozen importlib._bootstrap>", line 84, in acquire
RuntimeError: no current thread ident
Fatal Python error: initexternalimport: external importer setup failed

Current thread 0xffffffff (most recent call first):
2019-06-18 17:40:59.629 EDT [20764] LOG:  server process (PID 23714) was terminated by signal 6: Abort trap
2019-06-18 17:40:59.629 EDT [20764] DETAIL:  Failed process was running: CREATE FUNCTION stupid() RETURNS text AS
'return"zarkon"' LANGUAGE plpython3u; 

and a stack trace like

#0  0xfddd383c in _lwp_kill () from /usr/lib/libc.so.12
#1  0xfddd3800 in raise () from /usr/lib/libc.so.12
#2  0xfddd2e38 in abort () from /usr/lib/libc.so.12
#3  0xf4c371dc in fatal_error () from /usr/pkg/lib/libpython3.7.so.1.0
#4  0xf4c38370 in _Py_FatalInitError () from /usr/pkg/lib/libpython3.7.so.1.0
#5  0xf4c38f7c in Py_InitializeEx () from /usr/pkg/lib/libpython3.7.so.1.0
#6  0xf4c38fc0 in Py_Initialize () from /usr/pkg/lib/libpython3.7.so.1.0
#7  0xfdc8d548 in PLy_initialize () at plpy_main.c:135
#8  0xfdc8da0c in plpython3_validator (fcinfo=<optimized out>)
    at plpy_main.c:192
#9  0x01d4a904 in FunctionCall1Coll (flinfo=0xffffd608,
    collation=<optimized out>, arg1=<optimized out>) at fmgr.c:1140
#10 0x01d4b03c in OidFunctionCall1Coll (functionId=functionId@entry=16464,
    collation=collation@entry=0, arg1=arg1@entry=32774) at fmgr.c:1418
#11 0x0196a9d0 in ProcedureCreate (
    procedureName=procedureName@entry=0xfdb0aac0 "transaction_test1",
    procNamespace=procNamespace@entry=2200, replace=replace@entry=false,
    returnsSet=returnsSet@entry=false, returnType=returnType@entry=2278,
    proowner=10, languageObjectId=languageObjectId@entry=16465,
    languageValidator=languageValidator@entry=16464,
    prosrc=prosrc@entry=0xfdb0abf8 "\nfor i in range(0, 10):\n    plpy.execute(\"INSERT INTO test1 (a) VALUES (%d)\" %
i)\n   if i % 2 == 0:\n        plpy.commit()\n    else:\n        plpy.rollback()\n", probin=probin@entry=0x0,  
...

The "no current thread ident" error rings some vague bells, but I could
not find any previous discussion matching that in our archives.

This is with today's HEAD of our code and the python37-3.7.1 package from
NetBSD 8.0.

Any ideas?  I'm not so wedded to PL/Python that I'll spend a lot of time
making it go on this old box ... but seeing that 3.7 is still pretty
bleeding-edge Python, I wonder if other people will start getting this
too.

            regards, tom lane



Re: PL/Python fails on new NetBSD/PPC 8.0 install

От
Tom Lane
Дата:
Awhile back I wrote:
> I noticed that the old NetBSD 5.1.5 installation I had on my G4 Mac
> was no longer passing our regression tests, because it has a strtof()
> that is sloppy about underflow.  Rather than fight with that I decided
> to update it to something shinier (well, as shiny as you can get on
> hardware that's old enough to apply for a driver's license).  I stuck in
> NetBSD/macppc 8.0, and things seem to work, except that PL/Python
> crashes on launch.  I see something like this in the postmaster log:

> Traceback (most recent call last):
>   File "<frozen importlib._bootstrap>", line 1162, in _install_external_importers
>   File "<frozen importlib._bootstrap>", line 980, in _find_and_load
>   File "<frozen importlib._bootstrap>", line 149, in __enter__
>   File "<frozen importlib._bootstrap>", line 84, in acquire
> RuntimeError: no current thread ident
> Fatal Python error: initexternalimport: external importer setup failed
>
> Current thread 0xffffffff (most recent call first):
> 2019-06-18 17:40:59.629 EDT [20764] LOG:  server process (PID 23714) was terminated by signal 6: Abort trap
> 2019-06-18 17:40:59.629 EDT [20764] DETAIL:  Failed process was running: CREATE FUNCTION stupid() RETURNS text AS
'return"zarkon"' LANGUAGE plpython3u; 

So ... I just got this identical failure on NetBSD 8.1 on a shiny
new Intel NUC box.  So that removes the excuse of old unsupported
hardware, and leaves us with the conclusion that PL/Python is
flat out broken on recent NetBSD.

This is with today's HEAD of our code and the python37-3.7.4/amd64
package from NetBSD 8.1.

BTW, the only somewhat-modern NetBSD machine in our buildfarm is
coypu, which is running NetBSD/macppc 8.0 ... but what it is testing
PL/Python against is python 2.7.15, so the fact that it doesn't
fail can probably be explained as a python 2 vs python 3 thing.

            regards, tom lane



Re: PL/Python fails on new NetBSD/PPC 8.0 install

От
Benjamin Scherrey
Дата:
None of the output provides any clue to me but I do know that Python 3.7 has some issues with a lot of versions of openssl that is based on a disagreement between devs in both projects. This was a problem for me when trying to build python 3.7 on my Kubuntu 14.04 system. I've seen this issue reported across all targets for Python including Freebsd so I expect it's likely to also happen for NetBSD. 

Perhaps this might be related to the problem? 

On Mon, Oct 28, 2019, 8:12 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Awhile back I wrote:
> I noticed that the old NetBSD 5.1.5 installation I had on my G4 Mac
> was no longer passing our regression tests, because it has a strtof()
> that is sloppy about underflow.  Rather than fight with that I decided
> to update it to something shinier (well, as shiny as you can get on
> hardware that's old enough to apply for a driver's license).  I stuck in
> NetBSD/macppc 8.0, and things seem to work, except that PL/Python
> crashes on launch.  I see something like this in the postmaster log:

> Traceback (most recent call last):
>   File "<frozen importlib._bootstrap>", line 1162, in _install_external_importers
>   File "<frozen importlib._bootstrap>", line 980, in _find_and_load
>   File "<frozen importlib._bootstrap>", line 149, in __enter__
>   File "<frozen importlib._bootstrap>", line 84, in acquire
> RuntimeError: no current thread ident
> Fatal Python error: initexternalimport: external importer setup failed
>
> Current thread 0xffffffff (most recent call first):
> 2019-06-18 17:40:59.629 EDT [20764] LOG:  server process (PID 23714) was terminated by signal 6: Abort trap
> 2019-06-18 17:40:59.629 EDT [20764] DETAIL:  Failed process was running: CREATE FUNCTION stupid() RETURNS text AS 'return "zarkon"' LANGUAGE plpython3u;

So ... I just got this identical failure on NetBSD 8.1 on a shiny
new Intel NUC box.  So that removes the excuse of old unsupported
hardware, and leaves us with the conclusion that PL/Python is
flat out broken on recent NetBSD.

This is with today's HEAD of our code and the python37-3.7.4/amd64
package from NetBSD 8.1.

BTW, the only somewhat-modern NetBSD machine in our buildfarm is
coypu, which is running NetBSD/macppc 8.0 ... but what it is testing
PL/Python against is python 2.7.15, so the fact that it doesn't
fail can probably be explained as a python 2 vs python 3 thing.

                        regards, tom lane


Re: PL/Python fails on new NetBSD/PPC 8.0 install

От
Tom Lane
Дата:
Benjamin Scherrey <scherrey@proteus-tech.com> writes:
> None of the output provides any clue to me but I do know that Python 3.7
> has some issues with a lot of versions of openssl that is based on a
> disagreement between devs in both projects. This was a problem for me when
> trying to build python 3.7 on my Kubuntu 14.04 system. I've seen this issue
> reported across all targets for Python including Freebsd so I expect it's
> likely to also happen for NetBSD.

Thanks for looking!  It doesn't seem to be related to this issue though.
I've now tracked this problem down, and what I'm finding is that:

1. The proximate cause of the crash is that pthread_self() is
returning ((pthread_t) -1), which Python interprets as a hard
failure.  Now on the one hand, I wonder why Python is even
checking for a failure, given that POSIX is totally clear that
there are no failures:

    The pthread_self() function shall always be successful and no
    return value is reserved to indicate an error.

"Shall" does not allow wiggle room.  But on the other hand,
pthread_t is a pointer on this platform, so that's a pretty
strange value to be returning if it's valid.

And on the third hand, NetBSD's own man page for pthread_self()
doesn't admit the possibility of failure either, though it does
suggest that you should link with -lpthread [1].

2. Testing pthread_self() standalone on this platform provides
illuminating results:

$ cat test.c
#include <stdio.h>
#include <pthread.h>

int main()
{
  pthread_t id = pthread_self();

  printf("self = %p\n", id);
  return 0;
}
$ gcc test.c
$ ./a.out
self = 0xffffffffffffffff
$ gcc test.c -lpthread
$ ./a.out
self = 0x754ae5a2b800

3. libpython.so on this platform has a dependency on libpthread,
but we don't link the postgres executable to libpthread.  I surmise
that pthread_self() actually exists in core libc, but what it returns
is only valid if libpthread was linked into the main executable so
that it could initialize some static state at execution start.

4. If I add -lpthread to the LIBS for the main postgres executable,
PL/Python starts passing its regression tests.  I haven't finished
a complete check-world run, but at least the core regression tests
show no ill effects from doing this.


So one possible answer for us is "if we're on NetBSD and plpython3
is to be built, add -lpthread to the core LIBS list".  I do not
much like this answer though; it's putting the responsibility in
the wrong place.

What I'm inclined to do is go file a bug report saying that this
behavior contradicts both POSIX and NetBSD's own man page, and
see what they say about that.

            regards, tom lane

[1] https://netbsd.gw.com/cgi-bin/man-cgi?pthread_self+3+NetBSD-current



Re: PL/Python fails on new NetBSD/PPC 8.0 install

От
Thomas Munro
Дата:
On Wed, Oct 30, 2019 at 9:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> What I'm inclined to do is go file a bug report saying that this
> behavior contradicts both POSIX and NetBSD's own man page, and
> see what they say about that.

From a quick look at the relevant trees, isn't the problem here that
cpython thinks it can reserve pthread_t value -1 (or rather, that
number cast to unsigned long, which is the type it uses for its own
thread IDs):

https://github.com/python/cpython/blob/master/Include/pythread.h#L21

... and then use that to detect lack of initialisation:

https://github.com/python/cpython/blob/master/Modules/_threadmodule.c#L1149

... and that NetBSD also chose the same arbitrary value for their
threading stub library:

https://github.com/NetBSD/src/blob/trunk/lib/libc/thread-stub/thread-stub.c#L392

... as they are entirely within their rights to do?  Assuming the stub
library can do whatever it has to do with that value, like answer
questions like pthread_equal(), as it clearly can.  I think libc is
allowed to implement pthread_t as an integer type and reserve -1, but
application code is not allowed to assume that pthread_t is even
castable to an integer type, let alone that it can reserve magic
values.

Further evidence that this is Python's fault is the admission in the
source code itself that it is "inherently hosed":

https://github.com/python/cpython/blob/master/Python/thread_pthread.h#L299



Re: PL/Python fails on new NetBSD/PPC 8.0 install

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Wed, Oct 30, 2019 at 9:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What I'm inclined to do is go file a bug report saying that this
>> behavior contradicts both POSIX and NetBSD's own man page, and
>> see what they say about that.

> From a quick look at the relevant trees, isn't the problem here that
> cpython thinks it can reserve pthread_t value -1 (or rather, that
> number cast to unsigned long, which is the type it uses for its own
> thread IDs):
> https://github.com/python/cpython/blob/master/Include/pythread.h#L21

Possibly.  A value of -1 would be quite likely to crash any other
libpthread code it might be passed to, though, since it's evidently
supposed to be a pointer on this implementation.  Note that the
point here is that libpython should get a *valid* thread ID that it
can use for other purposes, independently of what the host executable
did, and that we can expect that libpython's calls are not being
routed to the stub implementations.

I've been experimenting with that test program on other platforms,
and I find that FreeBSD 11.0, OpenBSD 6.4, and Fedora 30 all return
plausible-looking pointers with or without -lpthread.

Interestingly, RHEL6 (glibc 2.12) acts more like NetBSD is acting: you get
NULL without -lpthread and a valid pointer with it.  Given the lack of
other problem reports about pl/python, I surmise that the glibc
implementation does manage to produce a valid pointer as soon as
libpthread is loaded.  Or maybe they fixed glibc far enough back that
nobody has tried recent python with a glibc that worked the old way.

> Further evidence that this is Python's fault is the admission in the
> source code itself that it is "inherently hosed":
> https://github.com/python/cpython/blob/master/Python/thread_pthread.h#L299

I'm not here to defend Python's choices in this area.  I'm just
observing that libpthread should produce valid results in a
correctly-linked dynamically loaded library, whether or not the
host executable linked libpthread.  NetBSD's code is failing that
test, and nobody else's is.

            regards, tom lane



Re: PL/Python fails on new NetBSD/PPC 8.0 install

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
> ... and that NetBSD also chose the same arbitrary value for their
> threading stub library:
> https://github.com/NetBSD/src/blob/trunk/lib/libc/thread-stub/thread-stub.c#L392
> ... as they are entirely within their rights to do?

I poked around in that repo, and found the non-stub version of
pthread_self:

https://github.com/NetBSD/src/blob/trunk/lib/libpthread/pthread.c#L863

Relevant to this discussion is that it actually redirects to the
stub version if __uselibcstub is still set, and that variable
appears to be cleared by pthread__init,

https://github.com/NetBSD/src/blob/trunk/lib/libpthread/pthread.c#L187

whose header comment is pretty telling:

/*
 * This needs to be started by the library loading code, before main()
 * gets to run, for various things that use the state of the initial thread
 * to work properly (thread-specific data is an application-visible example;
 * spinlock counts for mutexes is an internal example).
 */

I've not found the mechanism by which pthread__init gets called, but
this sure smells like they think it only has to happen before main().

Interestingly, some of the other files in that directory have recent
CVS log entries specifically mentioning bug fixes for cases where
libpthread is dlopen'd.  So it's not like they don't want to support
the case.  I wonder if they just need to fix pthread_self to forcibly
init the library if __uselibcstub is still set.

            regards, tom lane



Re: PL/Python fails on new NetBSD/PPC 8.0 install

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Wed, Oct 30, 2019 at 9:25 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> What I'm inclined to do is go file a bug report saying that this
>> behavior contradicts both POSIX and NetBSD's own man page, and
>> see what they say about that.

So I went and filed that bug,

http://gnats.netbsd.org/cgi-bin/query-pr-single.pl?number=54661

and the answer seems to be that netbsd's libpthread is operating as
designed.  They don't support creating new threads if libpthread
wasn't present at main program start, so redirecting all the
entry points to the libc stub functions in that case is actually
pretty sane, self-consistent behavior.

This behavior is actually kinda useful from our standpoint: it means
that a perlu/pythonu/tclu function *can't* cause a backend to become
multithreaded, even if it tries.  So I definitely don't want to
"fix" this by linking libpthread to the core backend; that would
open us up to problems we needn't have, on this platform anyway.

> From a quick look at the relevant trees, isn't the problem here that
> cpython thinks it can reserve pthread_t value -1 (or rather, that
> number cast to unsigned long, which is the type it uses for its own
> thread IDs):

Yeah, this.  I shall now go rant at the Python people about that.

            regards, tom lane



Re: PL/Python fails on new NetBSD/PPC 8.0 install

От
Tom Lane
Дата:
I wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
>> From a quick look at the relevant trees, isn't the problem here that
>> cpython thinks it can reserve pthread_t value -1 (or rather, that
>> number cast to unsigned long, which is the type it uses for its own
>> thread IDs):

> Yeah, this.  I shall now go rant at the Python people about that.

Done at

https://bugs.python.org/issue38646

            regards, tom lane