Обсуждение: pltcl crash on recent macOS

Поиск
Список
Период
Сортировка

pltcl crash on recent macOS

От
Peter Eisentraut
Дата:
A little while ago, the pltcl tests starting crashing for me on macOS. 
I don't know what had changed, but I suspect it was either an operating 
system update or something like an xcode update.

Here is a backtrace:

   * frame #0: 0x00007ff7b0e61853
     frame #1: 0x00007ff803a28751 libsystem_c.dylib`hash_search + 215
     frame #2: 0x0000000110357700 
pltcl.so`compile_pltcl_function(fn_oid=16418, tgreloid=0, 
is_event_trigger=false, pltrusted=true) at pltcl.c:1418:13
     frame #3: 0x0000000110355d50 
pltcl.so`pltcl_func_handler(fcinfo=0x00007fb6f1817028, 
call_state=0x00007ff7b0e61b80, pltrusted=true) at pltcl.c:814:12
...

Note that the hash_search call goes into some system library, not postgres.

The command to link pltcl is:

gcc ... -ltcl8.6 -lz -lpthread -framework CoreFoundation  -lc 
-bundle_loader ../../../src/backend/postgres

Notice the -lc in there.  If I remove that, it works again.

The -lc is explicitly added in src/pl/tcl/Makefile, so it's our own 
doing.  I tracked this back, and it's been moved and rearranged in that 
makefile a number of time.  The original addition was

commit e3909672f12e0ddf3e202b824fda068ad2195ef2
Author: Tom Lane <tgl@sss.pgh.pa.us>
Date:   Mon Dec 14 00:46:49 1998

     Build pltcl.so correctly on platforms that want dependent
     shared libraries to be listed in the link command.

Has anyone else seen this?

Note, I'm using the tcl-tk package from Homebrew.  The tcl installation 
provided by macOS itself no longer appears to work for linking against.



Re: pltcl crash on recent macOS

От
Thomas Munro
Дата:
On Mon, Jun 13, 2022 at 6:53 PM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>      frame #1: 0x00007ff803a28751 libsystem_c.dylib`hash_search + 215
>      frame #2: 0x0000000110357700
> pltcl.so`compile_pltcl_function(fn_oid=16418, tgreloid=0,

Hmm, I can’t reproduce that….  although that symbol is present in my
libSystem.B.dylib according to dlsym() and callable from a simple
program not linked to anything else, pltcl.so is apparently reaching
postgres’s hash_search for me, based on the fact that make -C
src/pl/tcl check succeeds and nm -m on pltcl.so shows it as "from
executable".  It would be interesting to see what nm -m shows for you.

Archeological note: That hash_search stuff, header <strhash.h>, seems
to have been copied from ancient FreeBSD before it was dropped
upstream for the crime of polluting the global symbol namespace with
junk[1].  It's been languishing in Apple's libc for at least 19
years[2], though, so I'm not sure why it's showing up suddenly as a
problem for you now.

> Note, I'm using the tcl-tk package from Homebrew.  The tcl installation
> provided by macOS itself no longer appears to work for linking against.

I’m using tcl 8.6.12 installed by MacPorts on macOS 12.4, though, hmm,
SDK 12.3.  I see the explicit -lc when building pltcl.so, and I see
that libSystem.B.dylib is explicitly mentioned here, whether or not I
have -lc:

% otool -L ./tmp_install/Users/tmunro/install/lib/postgresql/pltcl.so
./tmp_install/Users/tmunro/install/lib/postgresql/pltcl.so:
/opt/local/lib/libtcl8.6.dylib (compatibility version 8.6.0, current
version 8.6.12)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current
version 1311.100.3)

Here’s the complete link line:

ccache cc -Wall -Wmissing-prototypes -Wpointer-arith
-Wdeclaration-after-statement -Werror=vla
-Werror=unguarded-availability-new -Wendif-labels
-Wmissing-format-attribute -Wcast-function-type -Wformat-security
-fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument
-Wno-compound-token-split-by-macro -g -O0  -bundle -multiply_defined
suppress -o pltcl.so  pltcl.o -L../../../src/port
-L../../../src/common  -isysroot
/Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk
-Wl,-dead_strip_dylibs   -L/opt/local/lib -ltcl8.6 -lz -lpthread
-framework CoreFoundation  -lc -bundle_loader
../../../src/backend/postgres

[1] https://github.com/freebsd/freebsd-src/commit/dc196afb2e58dd05cd66e2da44872bb3d619910f
[2] https://github.com/apple-open-source-mirror/Libc/blame/master/stdlib/FreeBSD/strhash.c



Re: pltcl crash on recent macOS

От
Tom Lane
Дата:
Thomas Munro <thomas.munro@gmail.com> writes:
> On Mon, Jun 13, 2022 at 6:53 PM Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
>> frame #1: 0x00007ff803a28751 libsystem_c.dylib`hash_search + 215
>> frame #2: 0x0000000110357700
>> pltcl.so`compile_pltcl_function(fn_oid=16418, tgreloid=0,

> Hmm, I can’t reproduce that….

I can't either, although I'm using the macOS-provided Tcl code,
which still works fine for me.  (I grant that Apple might desupport
that someday, but they haven't yet.)  sifaka and longfin aren't
unhappy either; although sifaka is close to identical to my laptop.

Having said that, I wonder whether the position of the -bundle_loader
switch in the command line is relevant to which way the hash_search
reference is resolved.  Seems like we could put it in front of the
various -l options if that'd help.

            regards, tom lane



Re: pltcl crash on recent macOS

От
Peter Eisentraut
Дата:
On 13.06.22 13:27, Thomas Munro wrote:
> On Mon, Jun 13, 2022 at 6:53 PM Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
>>       frame #1: 0x00007ff803a28751 libsystem_c.dylib`hash_search + 215
>>       frame #2: 0x0000000110357700
>> pltcl.so`compile_pltcl_function(fn_oid=16418, tgreloid=0,
> 
> Hmm, I can’t reproduce that….  although that symbol is present in my
> libSystem.B.dylib according to dlsym() and callable from a simple
> program not linked to anything else, pltcl.so is apparently reaching
> postgres’s hash_search for me, based on the fact that make -C
> src/pl/tcl check succeeds and nm -m on pltcl.so shows it as "from
> executable".  It would be interesting to see what nm -m shows for you.

...
          (undefined) external _get_call_result_type (from executable)
          (undefined) external _getmissingattr (from executable)
          (undefined) external _hash_create (from libSystem)
          (undefined) external _hash_search (from libSystem)
...

> I’m using tcl 8.6.12 installed by MacPorts on macOS 12.4, though, hmm,
> SDK 12.3.  I see the explicit -lc when building pltcl.so, and I see
> that libSystem.B.dylib is explicitly mentioned here, whether or not I
> have -lc:
> 
> % otool -L ./tmp_install/Users/tmunro/install/lib/postgresql/pltcl.so
> ./tmp_install/Users/tmunro/install/lib/postgresql/pltcl.so:
> /opt/local/lib/libtcl8.6.dylib (compatibility version 8.6.0, current
> version 8.6.12)
> /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current
> version 1311.100.3)

Looks the same here:

pltcl.so:
    /usr/local/opt/tcl-tk/lib/libtcl8.6.dylib (compatibility version 8.6.0, 
current version 8.6.12)
    /usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 1311.100.3)

> Here’s the complete link line:
> 
> ccache cc -Wall -Wmissing-prototypes -Wpointer-arith
> -Wdeclaration-after-statement -Werror=vla
> -Werror=unguarded-availability-new -Wendif-labels
> -Wmissing-format-attribute -Wcast-function-type -Wformat-security
> -fno-strict-aliasing -fwrapv -Wno-unused-command-line-argument
> -Wno-compound-token-split-by-macro -g -O0  -bundle -multiply_defined
> suppress -o pltcl.so  pltcl.o -L../../../src/port
> -L../../../src/common  -isysroot
> /Library/Developer/CommandLineTools/SDKs/MacOSX12.3.sdk
> -Wl,-dead_strip_dylibs   -L/opt/local/lib -ltcl8.6 -lz -lpthread
> -framework CoreFoundation  -lc -bundle_loader
> ../../../src/backend/postgres

The difference is that I use CC=gcc-11.  I have change to CC=cc, then it 
works (nm output shows "from executable").  So it's gcc that gets thrown 
off by the -lc.



Re: pltcl crash on recent macOS

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> The difference is that I use CC=gcc-11.  I have change to CC=cc, then it 
> works (nm output shows "from executable").  So it's gcc that gets thrown 
> off by the -lc.

Hah, that makes sense.  So does changing the option order help?

            regards, tom lane



Re: pltcl crash on recent macOS

От
Peter Eisentraut
Дата:
On 13.06.22 18:01, Tom Lane wrote:
> Having said that, I wonder whether the position of the -bundle_loader
> switch in the command line is relevant to which way the hash_search
> reference is resolved.  Seems like we could put it in front of the
> various -l options if that'd help.

Switching the order of -bundle_loader and -lc did not help.



Re: pltcl crash on recent macOS

От
Thomas Munro
Дата:
On Tue, Jun 14, 2022 at 8:21 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
> The difference is that I use CC=gcc-11.  I have change to CC=cc, then it
> works (nm output shows "from executable").  So it's gcc that gets thrown
> off by the -lc.

Hrmph, I changed my CC to "ccache gcc-mp-11" (what MacPorts calls GCC
11), and I still can't reproduce the problem.  I still get "(from
executable)".  In your original quote you showed "gcc", not "gcc-11",
which (assuming it is found as /usr/bin/gcc) is just a little binary
that redirects to clang... trying that, this time without ccache in
the mix... and still no cigar.  So something is different about GCC 11
from homebrew, or the linker invocation it produces under the covers,
or the linker it's using?



Re: pltcl crash on recent macOS

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@enterprisedb.com> writes:
> Switching the order of -bundle_loader and -lc did not help.

Meh.  Well, it was worth a try.

I'd be okay with just dropping the -lc from pl/tcl/Makefile and seeing
what the buildfarm says.  The fact that we needed it in 1998 doesn't
mean that we still need it on supported versions of Tcl; nor was it
ever anything but a hack for us to be overriding what TCL_LIBS says.

As a quick check, I tried it on prairiedog's host (which has the oldest
Tcl installation I still have in captivity), and it seemed fine.

            regards, tom lane



Re: pltcl crash on recent macOS

От
Peter Eisentraut
Дата:
On 13.06.22 23:32, Thomas Munro wrote:
> Hrmph, I changed my CC to "ccache gcc-mp-11" (what MacPorts calls GCC
> 11), and I still can't reproduce the problem.  I still get "(from
> executable)".  In your original quote you showed "gcc", not "gcc-11",
> which (assuming it is found as /usr/bin/gcc) is just a little binary
> that redirects to clang... trying that, this time without ccache in
> the mix... and still no cigar.  So something is different about GCC 11
> from homebrew, or the linker invocation it produces under the covers,
> or the linker it's using?

The original quote said "gcc" but that just me attempting to simplify. 
I have now also figured out that it works with gcc-10 but not with 
gcc-11 and gcc-12.  For example, below are the underlying linker 
invocations from gcc-10 and gcc-11.  Note that some of the options are 
ordered quite differently.  I don't know what all of that means yet, but 
it surely points to something in gcc or its packaging being the cause.

However, I think ultimately the use of -lc is an error and we should get 
rid of it.  This episode shows that it's very fragile in any case.


 
"/usr/local/Cellar/gcc@10/10.3.0/libexec/gcc/x86_64-apple-darwin20/10.3.0/collect2" 
-dynamic -arch x86_64 -bundle -bundle_loader 
../../../src/backend/postgres -macosx_version_min 11.4.0 
-multiply_defined suppress -syslibroot 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk 
-weak_reference_mismatches non-weak -o pltcl.so -L../../../src/port 
-L../../../src/common -L/usr/local/lib -L/usr/local/opt/openldap/lib 
"-L/usr/local/opt/openssl@1.1/lib" -L/usr/local/opt/readline/lib 
-L/usr/local/opt/krb5/lib -L/usr/local/opt/icu4c/lib 
-L/usr/local/opt/tcl-tk/lib -L/usr/local/Cellar/libxml2/2.9.14/lib 
-L/usr/local/Cellar/lz4/1.9.3/lib -L/usr/local/Cellar/zstd/1.5.2/lib 
-L/usr/local/Cellar/tcl-tk/8.6.12_1/lib 
"-L/usr/local/Cellar/gcc@10/10.3.0/lib/gcc/10/gcc/x86_64-apple-darwin20/10.3.0" 
"-L/usr/local/Cellar/gcc@10/10.3.0/lib/gcc/10/gcc/x86_64-apple-darwin20/10.3.0/../../.." 
pltcl.o -dead_strip_dylibs -ltcl8.6 -lz -framework CoreFoundation -lc 
-lSystem -lgcc_ext.10.5 -lgcc -lSystem -no_compact_unwind -idsym

 
/usr/local/Cellar/gcc/11.3.0_1/bin/../libexec/gcc/x86_64-apple-darwin21/11/collect2 
-dynamic -arch x86_64 -syslibroot 
/Applications/Xcode.app/Contents/Developer/Platforms/MacOSX.platform/Developer/SDKs/MacOSX12.3.sdk 
-macosx_version_min 12.4.0 -o pltcl.so -L../../../src/port 
-L../../../src/common -L/usr/local/lib -L/usr/local/opt/openldap/lib 
"-L/usr/local/opt/openssl@1.1/lib" -L/usr/local/opt/readline/lib 
-L/usr/local/opt/krb5/lib -L/usr/local/opt/icu4c/lib 
-L/usr/local/opt/tcl-tk/lib -L/usr/local/Cellar/libxml2/2.9.14/lib 
-L/usr/local/Cellar/lz4/1.9.3/lib -L/usr/local/Cellar/zstd/1.5.2/lib 
-L/usr/local/Cellar/tcl-tk/8.6.12_1/lib 
-L/usr/local/Cellar/gcc/11.3.0_1/bin/../lib/gcc/11/gcc/x86_64-apple-darwin21/11 
-L/usr/local/Cellar/gcc/11.3.0_1/bin/../lib/gcc/11/gcc 
-L/usr/local/Cellar/gcc/11.3.0_1/bin/../lib/gcc/11/gcc/x86_64-apple-darwin21/11/../../.. 
pltcl.o -dead_strip_dylibs -ltcl8.6 -lz -lc -bundle_loader 
../../../src/backend/postgres -bundle -framework CoreFoundation 
-multiply_defined suppress -lemutls_w -lgcc -lSystem -no_compact_unwind 
-idsym



Re: pltcl crash on recent macOS

От
Peter Eisentraut
Дата:
On 14.06.22 05:05, Tom Lane wrote:
> I'd be okay with just dropping the -lc from pl/tcl/Makefile and seeing
> what the buildfarm says.  The fact that we needed it in 1998 doesn't
> mean that we still need it on supported versions of Tcl; nor was it
> ever anything but a hack for us to be overriding what TCL_LIBS says.

Ok, I propose to proceed with the attached patch (with a bit more 
explanation added) for the master branch (for now) and see how it goes.
Вложения

Re: pltcl crash on recent macOS

От
Peter Eisentraut
Дата:
On 20.06.22 12:36, Peter Eisentraut wrote:
> On 14.06.22 05:05, Tom Lane wrote:
>> I'd be okay with just dropping the -lc from pl/tcl/Makefile and seeing
>> what the buildfarm says.  The fact that we needed it in 1998 doesn't
>> mean that we still need it on supported versions of Tcl; nor was it
>> ever anything but a hack for us to be overriding what TCL_LIBS says.
> 
> Ok, I propose to proceed with the attached patch (with a bit more 
> explanation added) for the master branch (for now) and see how it goes.

done