Обсуждение: LDAP authenticated session terminated by signal 11: Segmentationfault, PostgresSQL server terminates other active server processes

Поиск
Список
Период
Сортировка
Hi all, I have encountered a problem related to LDAP authenticated session with Postgres foreign data wrapper (postgres_fdw).

The server crashed with following errors and other active server processes are terminated as well:
2019-02-20 14:53:30.496 SGT [PID=1353 application="" user_name= database= host(port)=] LOG:  server process (PID 26306) was terminated by signal 11: Segmentation fault

2019-02-20 14:53:30.496 SGT [PID=1353 application="" user_name= database= host(port)=] LOG:  terminating any other active server processes

I can reproduce it in a test server with many other sessions connected:

1. login using non-LDAP-authenticated user, query local & foreign tables - OK
2. login using LDAP-authenticated user, query local table - OK
3. login using LDAP-authenticated user, query foreign table - ERROR, server crashes with signal 11: Segmentation fault error when I quit the psql session

It seems like the problem only when the LDAP-authenticated session (which queried foreign table) is terminated. In dmesg log, I can see following:

[16385512.182231] traps: postmaster[26306] general protection ip:7f1e758b638c sp:7ffef7ed8858 error:0 in libc-2.17.so[7f1e75836000+1b6000]

Has anyone encountered similar issue?

######################
PostgreSQL version: 10.6
Platform: CentOS Linux
######################

Thank you.

Regards,
Mike Yeap
Mike Yeap wrote:
> I have encountered a problem related to LDAP authenticated session with Postgres foreign data wrapper
(postgres_fdw).
> 
> The server crashed with following errors and other active server processes are terminated as well:
> 2019-02-20 14:53:30.496 SGT [PID=1353 application="" user_name= database= host(port)=] LOG:  server process (PID
26306)was terminated by signal 11: Segmentation fault
 
> 
> 2019-02-20 14:53:30.496 SGT [PID=1353 application="" user_name= database= host(port)=] LOG:  terminating any other
activeserver processes
 
> 
> I can reproduce it in a test server with many other sessions connected:
> 
> 1. login using non-LDAP-authenticated user, query local & foreign tables - OK
> 2. login using LDAP-authenticated user, query local table - OK
> 3. login using LDAP-authenticated user, query foreign table - ERROR, server crashes with signal 11: Segmentation
faulterror when I quit the psql session
 

Are the "postgres" executable and libpq linked with the same version of OpenLDAP?

Any other extensions installed?

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com



Laurenz Albe <laurenz.albe@cybertec.at> writes:
> Mike Yeap wrote:
>> I have encountered a problem related to LDAP authenticated session with Postgres foreign data wrapper
(postgres_fdw).

> Are the "postgres" executable and libpq linked with the same version of OpenLDAP?

And which version is that?  (And which version of Postgres?)

Digging around in our git history, I came across this:

Author: Noah Misch <noah@leadboat.com>
Branch: master Release: REL9_5_BR [d7cdf6ee3] 2014-07-22 11:01:03 -0400

    Diagnose incompatible OpenLDAP versions during build and test.

    With OpenLDAP versions 2.4.24 through 2.4.31, inclusive, PostgreSQL
    backends can crash at exit.  Raise a warning during "configure" based on
    the compile-time OpenLDAP version number, and test the crash scenario in
    the dblink test suite.  Back-patch to 9.0 (all supported versions).

which sounds a fair bit like what you are describing.

            regards, tom lane


> Are the "postgres" executable and libpq linked with the same version of OpenLDAP?
How should I check whether they are linked?

My Postgres version is 10.6 and I have this output for "yum list | grep ldap | sort":
$ yum list | grep ldap | sort

apr-util-ldap.x86_64                        1.5.2-6.el7                base
bind-dyndb-ldap.x86_64                      11.1-4.el7                 base
compat-openldap.i686                        1:2.3.43-5.el7             base
compat-openldap.x86_64                      1:2.3.43-5.el7             base
cyrus-sasl-ldap.i686                        2.1.26-23.el7              base
cyrus-sasl-ldap.x86_64                      2.1.26-23.el7              base
freeradius-ldap.x86_64                      3.0.13-9.el7_5             base
ipsilon-authldap.noarch                     1.0.0-13.el7_3             base
krb5-server-ldap.x86_64                     1.15.1-37.el7_6            updates
ldapjdk-javadoc.noarch                      4.19-5.el7                 base
ldapjdk.noarch                              4.19-5.el7                 base
mod_ldap.x86_64                             2.4.6-88.el7.centos        base
nss-pam-ldapd.i686                          0.8.13-16.el7              base
nss-pam-ldapd.x86_64                        0.8.13-16.el7              base
openldap-clients.x86_64                     2.4.44-21.el7_6            @updates
openldap-devel.i686                         2.4.44-21.el7_6            updates
openldap-devel.x86_64                       2.4.44-21.el7_6            updates
openldap.i686                               2.4.44-21.el7_6            updates
openldap-servers-sql.x86_64                 2.4.44-21.el7_6            updates
openldap-servers.x86_64                     2.4.44-21.el7_6            updates
openldap.x86_64                             2.4.44-21.el7_6            @updates
openssh-ldap.x86_64                         7.4p1-16.el7               base
php-ldap.x86_64                             5.4.16-46.el7              base
python-ldap2pg-doc.x86_64                   4.11-1.rhel7               pgdg10
python-ldap2pg.x86_64                       4.11-1.rhel7               pgdg10
python-ldap.x86_64                          2.4.15-2.el7               base
sssd-ldap.x86_64                            1.16.2-13.el7_6.5          updates

And in the database where I encountered this issue I have these extensions installed:

repdb=# \dx
                                      List of installed extensions
        Name        | Version |   Schema   |                        Description
--------------------+---------+------------+------------------------------------------------------------
 hstore             | 1.4     | public     | data type for storing sets of (key, value) pairs
 pg_stat_statements | 1.6     | repdb      | track execution statistics of all SQL statements executed
 plpgsql            | 1.0     | pg_catalog | PL/pgSQL procedural language
 postgres_fdw       | 1.0     | repdb      | foreign-data wrapper for remote PostgreSQL servers
 tablefunc          | 1.0     | repdb      | functions that manipulate whole tables, including crosstab
(5 rows)

Thank you.

Regards,
Mike Yeap

On Wed, Feb 20, 2019 at 10:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> Mike Yeap wrote:
>> I have encountered a problem related to LDAP authenticated session with Postgres foreign data wrapper (postgres_fdw).

> Are the "postgres" executable and libpq linked with the same version of OpenLDAP?

And which version is that?  (And which version of Postgres?)

Digging around in our git history, I came across this:

Author: Noah Misch <noah@leadboat.com>
Branch: master Release: REL9_5_BR [d7cdf6ee3] 2014-07-22 11:01:03 -0400

    Diagnose incompatible OpenLDAP versions during build and test.

    With OpenLDAP versions 2.4.24 through 2.4.31, inclusive, PostgreSQL
    backends can crash at exit.  Raise a warning during "configure" based on
    the compile-time OpenLDAP version number, and test the crash scenario in
    the dblink test suite.  Back-patch to 9.0 (all supported versions).

which sounds a fair bit like what you are describing.

                        regards, tom lane
Mike Yeap <wkk1020@gmail.com> writes:
>> Are the "postgres" executable and libpq linked with the same version of
>> OpenLDAP?

> How should I check whether they are linked?

"ldd" should show the dependencies of whatever executable or library
you point it at.

            regards, tom lane


Hi Tom, when I run "ldd /usr/pgsql-10/bin/postmaster" I got this output:

# ldd /usr/pgsql-10/bin/postmaster
linux-vdso.so.1 =>  (0x00007ffd4ec65000)
libpthread.so.0 => /lib64/libpthread.so.0 (0x00007eff8b5d3000)
libxml2.so.2 => /lib64/libxml2.so.2 (0x00007eff8b268000)
libpam.so.0 => /lib64/libpam.so.0 (0x00007eff8b059000)
libssl.so.10 => /lib64/libssl.so.10 (0x00007eff8ade7000)
libcrypto.so.10 => /lib64/libcrypto.so.10 (0x00007eff8a985000)
libgssapi_krb5.so.2 => /lib64/libgssapi_krb5.so.2 (0x00007eff8a738000)
librt.so.1 => /lib64/librt.so.1 (0x00007eff8a530000)
libdl.so.2 => /lib64/libdl.so.2 (0x00007eff8a32b000)
libm.so.6 => /lib64/libm.so.6 (0x00007eff8a029000)
libldap-2.4.so.2 => /lib64/libldap-2.4.so.2 (0x00007eff89dd4000)
libicui18n.so.50 => /lib64/libicui18n.so.50 (0x00007eff899d4000)
libicuuc.so.50 => /lib64/libicuuc.so.50 (0x00007eff8965b000)
libsystemd.so.0 => /lib64/libsystemd.so.0 (0x00007eff89633000)
libc.so.6 => /lib64/libc.so.6 (0x00007eff89271000)
/lib64/ld-linux-x86-64.so.2 (0x00007eff8b7f9000)
libz.so.1 => /lib64/libz.so.1 (0x00007eff8905b000)
liblzma.so.5 => /lib64/liblzma.so.5 (0x00007eff88e35000)
libaudit.so.1 => /lib64/libaudit.so.1 (0x00007eff88c0c000)
libkrb5.so.3 => /lib64/libkrb5.so.3 (0x00007eff88924000)
libcom_err.so.2 => /lib64/libcom_err.so.2 (0x00007eff88720000)
libk5crypto.so.3 => /lib64/libk5crypto.so.3 (0x00007eff884ec000)
libkrb5support.so.0 => /lib64/libkrb5support.so.0 (0x00007eff882de000)
libkeyutils.so.1 => /lib64/libkeyutils.so.1 (0x00007eff880da000)
libresolv.so.2 => /lib64/libresolv.so.2 (0x00007eff87ebf000)
liblber-2.4.so.2 => /lib64/liblber-2.4.so.2 (0x00007eff87cb0000)
libsasl2.so.3 => /lib64/libsasl2.so.3 (0x00007eff87a93000)
libssl3.so => /lib64/libssl3.so (0x00007eff8784f000)
libsmime3.so => /lib64/libsmime3.so (0x00007eff87628000)
libnss3.so => /lib64/libnss3.so (0x00007eff87302000)
libnssutil3.so => /lib64/libnssutil3.so (0x00007eff870d5000)
libplds4.so => /lib64/libplds4.so (0x00007eff86ed1000)
libplc4.so => /lib64/libplc4.so (0x00007eff86ccc000)
libnspr4.so => /lib64/libnspr4.so (0x00007eff86a8d000)
libstdc++.so.6 => /lib64/libstdc++.so.6 (0x00007eff86785000)
libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00007eff8656f000)
libicudata.so.50 => /lib64/libicudata.so.50 (0x00007eff84f9a000)
libcap.so.2 => /lib64/libcap.so.2 (0x00007eff84d95000)
libselinux.so.1 => /lib64/libselinux.so.1 (0x00007eff84b6e000)
libgcrypt.so.11 => /lib64/libgcrypt.so.11 (0x00007eff848ec000)
libgpg-error.so.0 => /lib64/libgpg-error.so.0 (0x00007eff846e7000)
libdw.so.1 => /lib64/libdw.so.1 (0x00007eff844a0000)
libcap-ng.so.0 => /lib64/libcap-ng.so.0 (0x00007eff84299000)
libcrypt.so.1 => /lib64/libcrypt.so.1 (0x00007eff84062000)
libattr.so.1 => /lib64/libattr.so.1 (0x00007eff83e5c000)
libpcre.so.1 => /lib64/libpcre.so.1 (0x00007eff83bfa000)
libelf.so.1 => /lib64/libelf.so.1 (0x00007eff839e2000)
libbz2.so.1 => /lib64/libbz2.so.1 (0x00007eff837d1000)
libfreebl3.so => /lib64/libfreebl3.so (0x00007eff835ce000)

On the line that has ldap in it:

libldap-2.4.so.2 => /lib64/libldap-2.4.so.2 (0x00007eff89dd4000)

Sorry but in this case what is my libpq?

Regards,
Mike Yeap

On Thu, Feb 21, 2019 at 10:03 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Mike Yeap <wkk1020@gmail.com> writes:
>> Are the "postgres" executable and libpq linked with the same version of
>> OpenLDAP?

> How should I check whether they are linked?

"ldd" should show the dependencies of whatever executable or library
you point it at.

                        regards, tom lane
On Thu, Feb 21, 2019 at 2:42 PM Mike Yeap <wkk1020@gmail.com> wrote:
> openldap-clients.x86_64                     2.4.44-21.el7_6            @updates
> openldap-devel.i686                         2.4.44-21.el7_6            updates
> openldap-devel.x86_64                       2.4.44-21.el7_6            updates
> openldap.i686                               2.4.44-21.el7_6            updates
> openldap-servers-sql.x86_64                 2.4.44-21.el7_6            updates
> openldap-servers.x86_64                     2.4.44-21.el7_6            updates
> openldap.x86_64                             2.4.44-21.el7_6            @updates

> On Wed, Feb 20, 2019 at 10:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>     With OpenLDAP versions 2.4.24 through 2.4.31, inclusive, PostgreSQL
>>     backends can crash at exit.  Raise a warning during "configure" based on
>>     the compile-time OpenLDAP version number, and test the crash scenario in
>>     the dblink test suite.  Back-patch to 9.0 (all supported versions).

Clearly 2.4.44 is not in the range 2.4.24 through 2.4.31.  Perhaps the
dangerous range is out of date?  Hmm, so Noah's analysis[1] says this
is a clash between libldap_r.so (used by libpq) and libldap.so (used
by the server), specifically in destructor/exit code.  Curiously, in a
thread about Curl's struggles with this problem, I found a claim[2]
that Debian decided to abandon the non-"_r" variant and just use _r
always.  Sure enough, on my Debian buster VM I see a symlink
libldap-2.4.so.2 -> libldap_r-2.4.so.2.  So essentially Debian and
friends have already forced Noah's first option on users:

> 1. Link the backend with libldap_r, so we never face the mismatch. On some
> platforms, this means also linking in threading libraries.

FreeBSD and CentOS systems near me have separate libraries still.

[1] https://www.postgresql.org/message-id/flat/20140612210219.GA705509%40tornado.leadboat.com
[2] https://www.openldap.org/lists/openldap-technical/201608/msg00094.html

-- 
Thomas Munro
https://enterprisedb.com


Hi Thomas, does that mean the bug is still there?

Regards,
Mike Yeap

On Mon, Feb 25, 2019 at 4:06 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Thu, Feb 21, 2019 at 2:42 PM Mike Yeap <wkk1020@gmail.com> wrote:
> openldap-clients.x86_64                     2.4.44-21.el7_6            @updates
> openldap-devel.i686                         2.4.44-21.el7_6            updates
> openldap-devel.x86_64                       2.4.44-21.el7_6            updates
> openldap.i686                               2.4.44-21.el7_6            updates
> openldap-servers-sql.x86_64                 2.4.44-21.el7_6            updates
> openldap-servers.x86_64                     2.4.44-21.el7_6            updates
> openldap.x86_64                             2.4.44-21.el7_6            @updates

> On Wed, Feb 20, 2019 at 10:17 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>     With OpenLDAP versions 2.4.24 through 2.4.31, inclusive, PostgreSQL
>>     backends can crash at exit.  Raise a warning during "configure" based on
>>     the compile-time OpenLDAP version number, and test the crash scenario in
>>     the dblink test suite.  Back-patch to 9.0 (all supported versions).

Clearly 2.4.44 is not in the range 2.4.24 through 2.4.31.  Perhaps the
dangerous range is out of date?  Hmm, so Noah's analysis[1] says this
is a clash between libldap_r.so (used by libpq) and libldap.so (used
by the server), specifically in destructor/exit code.  Curiously, in a
thread about Curl's struggles with this problem, I found a claim[2]
that Debian decided to abandon the non-"_r" variant and just use _r
always.  Sure enough, on my Debian buster VM I see a symlink
libldap-2.4.so.2 -> libldap_r-2.4.so.2.  So essentially Debian and
friends have already forced Noah's first option on users:

> 1. Link the backend with libldap_r, so we never face the mismatch. On some
> platforms, this means also linking in threading libraries.

FreeBSD and CentOS systems near me have separate libraries still.

[1] https://www.postgresql.org/message-id/flat/20140612210219.GA705509%40tornado.leadboat.com
[2] https://www.openldap.org/lists/openldap-technical/201608/msg00094.html

--
Thomas Munro
https://enterprisedb.com
On Tue, Feb 26, 2019 at 8:17 PM Mike Yeap <wkk1020@gmail.com> wrote:
> Hi Thomas, does that mean the bug is still there?

Hi Mike,

I haven't tried to repro this myself, but it certainly sounds like it.
It also sounds like it would probably go away if you switched to a
Debian-derived distro, instead of a Red Hat-derived distro, but I
doubt that's the kind of advice you were looking for.  We need to
figure out a proper solution here, though I'm not sure what.  Question
for the list: other stuff in the server needs libpthread (SSL, LLVM,
...), so why are we insisting on using non-MT LDAP?

-- 
Thomas Munro
https://enterprisedb.com


Hi Thomas, I see..... guess I can't use LDAP authentication for now, :-(

Hopefully this problem is solved in future version, thank you!

Regards,
Mike Yeap

On Tue, Feb 26, 2019 at 4:12 PM Thomas Munro <thomas.munro@gmail.com> wrote:
On Tue, Feb 26, 2019 at 8:17 PM Mike Yeap <wkk1020@gmail.com> wrote:
> Hi Thomas, does that mean the bug is still there?

Hi Mike,

I haven't tried to repro this myself, but it certainly sounds like it.
It also sounds like it would probably go away if you switched to a
Debian-derived distro, instead of a Red Hat-derived distro, but I
doubt that's the kind of advice you were looking for.  We need to
figure out a proper solution here, though I'm not sure what.  Question
for the list: other stuff in the server needs libpthread (SSL, LLVM,
...), so why are we insisting on using non-MT LDAP?

--
Thomas Munro
https://enterprisedb.com
On Tue, Feb 26, 2019 at 9:11 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Tue, Feb 26, 2019 at 8:17 PM Mike Yeap <wkk1020@gmail.com> wrote:
> > Hi Thomas, does that mean the bug is still there?

> I haven't tried to repro this myself, but it certainly sounds like it.
> It also sounds like it would probably go away if you switched to a
> Debian-derived distro, instead of a Red Hat-derived distro, but I
> doubt that's the kind of advice you were looking for.  We need to
> figure out a proper solution here, though I'm not sure what.  Question
> for the list: other stuff in the server needs libpthread (SSL, LLVM,
> ...), so why are we insisting on using non-MT LDAP?

Concretely, why don't we just kill the LDAP_LIBS_FE/LDAP_LIBS_BE
distinction and use a single LDAP_LIBS?  Then it'll always match.  It
can still be the non-MT variant if you build with
--disable-thread-safety (who does that?), but then it'll be the same
in the server too so that postgres_fdw + ldap works that way too.
Sketch patch attached.


--
Thomas Munro
https://enterprisedb.com

Вложения
Greetings Mike,

* Mike Yeap (wkk1020@gmail.com) wrote:
> Hi Thomas, I see..... guess I can't use LDAP authentication for now, :-(

If you're in an active directory environment, you should really be using
Kerberos for authentication and NOT LDAP anyway.  LDAP-based
authentication involves sending the user's password (cleartext) to the
PG server, which is really bad security.  Hopefully you're at least
connecting to PG with SSL, and from PG to LDAP with SSL, but you still
run the issue that a compromised server would expose the password of
everyone connecting to that server, and when you're using a centralized
authentication system like LDAP, that one password gets you access to
everything that account has access to.

Thanks!

Stephen

Вложения
Thomas Munro <thomas.munro@gmail.com> writes:
> Question
> for the list: other stuff in the server needs libpthread (SSL, LLVM,
> ...), so why are we insisting on using non-MT LDAP?

The traditional reason for avoiding that is the risk of a server
process becoming multi-threaded.  There are live bugs of that ilk
on Darwin, and we actually have cross-checks for the case in our
code (see HAVE_PTHREAD_IS_THREADED_NP stanzas).

If pthread_is_threaded_np(), or something equivalent, is widely available
then it might be all right to try solving this going forward by switching
to libldap_r and seeing if anyone hits those cross-checks.  I'd be afraid
to risk it in the back branches though ...

            regards, tom lane


On Wed, Feb 27, 2019 at 3:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Question
> > for the list: other stuff in the server needs libpthread (SSL, LLVM,
> > ...), so why are we insisting on using non-MT LDAP?
>
> The traditional reason for avoiding that is the risk of a server
> process becoming multi-threaded.  There are live bugs of that ilk
> on Darwin, and we actually have cross-checks for the case in our
> code (see HAVE_PTHREAD_IS_THREADED_NP stanzas).
>
> If pthread_is_threaded_np(), or something equivalent, is widely available
> then it might be all right to try solving this going forward by switching
> to libldap_r and seeing if anyone hits those cross-checks.  I'd be afraid
> to risk it in the back branches though ...

Hmm.  Well here is a new data point: it looks like the Red Hat family
of distributions is in the process of making the same decision as
Debian (namely: to expunge the non-MT variant, because it bites
various projects in the same way that it bites us), but they haven't
quite hasn't pulled the trigger yet:

https://fedoraproject.org/wiki/Changes/OpenLDAPwithoutNonthreadedLibraries

So if we do nothing at all, it seems likely that this problem will
eventually go away by itself on practically all Linux systems, leaving
this unfixed LDAP vs postgres_fdw bug to trip up the other Unix
systems.  Bleugh.

I don't see pthread_is_threaded_np() on any non-Apple systems in my
lab.  Clearly libdap_r is *capable* of creating threads: it contains a
function ldap_pvt_thread_create(), and we can see that slapd and other
OpenLDAP things use that, but AFAICT that's a private facility not
intended for end users to call, so there's no danger if you just use
the documented LDAP client API.  Since pthread_is_threaded_np() is a
Mac thing, note also that Macs aren't directly exposed to this
particular choice anyway because (at least if you use system-provided
libraries rather than MacPorts et al) libldap.dylib and
libldap_r.dylib are already symlinks to the same Apple voodoo
"/System/Library/Frameworks/LDAP.framework/Versions/A/LDAP".

-- 
Thomas Munro
https://enterprisedb.com


Thomas Munro <thomas.munro@gmail.com> writes:
> On Wed, Feb 27, 2019 at 3:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> If pthread_is_threaded_np(), or something equivalent, is widely available
>> then it might be all right to try solving this going forward by switching
>> to libldap_r and seeing if anyone hits those cross-checks.  I'd be afraid
>> to risk it in the back branches though ...

> Hmm.  Well here is a new data point: it looks like the Red Hat family
> of distributions is in the process of making the same decision as
> Debian (namely: to expunge the non-MT variant, because it bites
> various projects in the same way that it bites us), but they haven't
> quite hasn't pulled the trigger yet:
> https://fedoraproject.org/wiki/Changes/OpenLDAPwithoutNonthreadedLibraries

Interesting, but that's going to be a very slow change.  That says they'll
pull the trigger in Fedora 30, which I think is due to be released this
spring --- but it won't show up in RHEL till the next major release (8
or maybe even 9 at this point), and the existing major releases have got
10-year support lifespans.

> I don't see pthread_is_threaded_np() on any non-Apple systems in my
> lab.

Yeah, I thought that might be a Mac thing.  I wonder if POSIX has any
usable equivalent.

> Clearly libdap_r is *capable* of creating threads: it contains a
> function ldap_pvt_thread_create(), and we can see that slapd and other
> OpenLDAP things use that, but AFAICT that's a private facility not
> intended for end users to call, so there's no danger if you just use
> the documented LDAP client API.

That seems promising, but I'd sure be happier if we could cross-check
that there's still just one thread at the completion of authentication.

            regards, tom lane


Adding Noah to thread.

On Wed, Feb 27, 2019 at 11:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > I don't see pthread_is_threaded_np() on any non-Apple systems in my
> > lab.
>
> Yeah, I thought that might be a Mac thing.  I wonder if POSIX has any
> usable equivalent.

I don't see anything like that (the concept doesn't seem very
portable).  I couldn't find a way on Glibc (but I'm not saying there
isn't one hiding somewhere).  FreeBSD has a thing much like macOS's
(and I think some more BSDs do too); it's set to true by libthr when
the first thread is created, to make libc start locking various stuff.

The macOS one probably isn't a good canary to protect us from OpenLDAP
creating threads since on typical macOS builds we're using Apple's
LDAP thing (which cybersquats libldap.dylib and libldap_r.dylib via
symlinks).  So adding a FreeBSD check seems like a good idea, because
at least one FreeBSD system in our buildfarm runs the ldap checks on
real OpenLDAP (elver).

> > Clearly libdap_r is *capable* of creating threads: it contains a
> > function ldap_pvt_thread_create(), and we can see that slapd and other
> > OpenLDAP things use that, but AFAICT that's a private facility not
> > intended for end users to call, so there's no danger if you just use
> > the documented LDAP client API.
>
> That seems promising, but I'd sure be happier if we could cross-check
> that there's still just one thread at the completion of authentication.

Ok, here's that patch again with a commit message and with the
configure version warning removed, and a make-sure-we're-not-threaded
patch for FreeBSD.

I'm not sure what to do about the LDAP test in
contrib/dblink/sql/dblink.sql.  Do we still want this?

I propose this for master only, for now.  I also think it'd be nice to
consider back-patching it after a while, especially since this
reported broke on CentOS/RHEL7, a pretty popular OS that'll be around
for a good while.  Hmm, I wonder if it's OK to subtly change library
dependencies in a minor release; I don't see any problem with it since
I expect both variants to be provided by the same package in every
distro but we'd certainly want to highlight this to the package
maintainers if we did it.

--
Thomas Munro
https://enterprisedb.com

Вложения
On Thu, Mar 07, 2019 at 10:45:56AM +1300, Thomas Munro wrote:
> On Wed, Feb 27, 2019 at 11:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Thomas Munro <thomas.munro@gmail.com> writes:
> > > I don't see pthread_is_threaded_np() on any non-Apple systems in my
> > > lab.
> >
> > Yeah, I thought that might be a Mac thing.  I wonder if POSIX has any
> > usable equivalent.
> 
> I don't see anything like that (the concept doesn't seem very
> portable).

I'm not aware of one.

> > > Clearly libdap_r is *capable* of creating threads: it contains a
> > > function ldap_pvt_thread_create(), and we can see that slapd and other
> > > OpenLDAP things use that, but AFAICT that's a private facility not
> > > intended for end users to call, so there's no danger if you just use
> > > the documented LDAP client API.
> >
> > That seems promising, but I'd sure be happier if we could cross-check
> > that there's still just one thread at the completion of authentication.
> 
> Ok, here's that patch again with a commit message and with the
> configure version warning removed, and a make-sure-we're-not-threaded
> patch for FreeBSD.
> 
> I'm not sure what to do about the LDAP test in
> contrib/dblink/sql/dblink.sql.  Do we still want this?

Mike, does the dblink test suite not fail on your system?  It's designed to
catch this exact problem.

Has anyone else reproduced this?

> I propose this for master only, for now.  I also think it'd be nice to
> consider back-patching it after a while, especially since this
> reported broke on CentOS/RHEL7, a pretty popular OS that'll be around
> for a good while.  Hmm, I wonder if it's OK to subtly change library
> dependencies in a minor release; I don't see any problem with it since
> I expect both variants to be provided by the same package in every
> distro but we'd certainly want to highlight this to the package
> maintainers if we did it.

It's not great to change library dependencies in a minor release.  If every
RHEL 7 installation can crash this way, changing the dependencies is probably
the least bad thing.


On Thu, Mar 7, 2019 at 4:19 PM Noah Misch <noah@leadboat.com> wrote:
> Has anyone else reproduced this?

I tried, but could not reproduce this problem on "CentOS Linux release
7.6.1810 (Core)" using OpenLDAP "2.4.44-21.el7_6" (same as Mike
reported, what yum install is currently serving up).  I tried "make
check" in contrib/dblink, and the only strange thing I noticed was
this FATAL error at the top of contrib/dblink/log/postmaster.log:

2019-03-14 03:51:33.058 UTC [20131] LOG:  database system is ready to
accept connections
2019-03-14 03:51:33.059 UTC [20135] [unknown] FATAL:  the database
system is starting up

I don't see that on other systems and don't understand it.

I also tried a test of my own which I thought corresponded directly to
what Mike described, on both master and REL_10_STABLE.  I'll record my
steps here so perhaps someone can see what's missing.

1.  Run the regression test under src/test/ldap so that you get some
canned slapd configuration files.
2.  cd into src/test/ldap/tmp_check and run "slapd -f slapd.conf -h
ldap://localhost:5555".  It should daemonify itself, and run until you
kill it with SIGINT.
3.  Put this into pg_hba.conf:
host postgres test1 127.0.0.1/32 ldap ldapserver=localhost
ldapport=5555 ldapbasedn="dc=example,dc=net"
4.  Create database objects as superuser:
create user test1;
create table t (i int);
grant all on t to test1;
create extension postgres_fdw;
create server foreign_server foreign data wrapper postgres_fdw options
(dbname 'postgres', host '127.0.0.1');
create foreign table ft (i int) server foreign_server options (table_name 't');
create user mapping for test1 server foreign_server options (user
'test1', password 'secret1');
grant all on ft to test1;
5.  Now you should be able to log in with "psql -h 127.0.0.1 postgres
test1" and password "secret1", and run queries like: select * from ft;

When exiting the session, I was expecting the backend to crash,
because it had executed libldap.so code during authentication, and
then it had linked in libldap_r.so via libpq.so while connecting via
postgres_fdw.  But it doesn't crash.  I wonder what is different for
Mike; am I missing something, or is there non-determinism here?

> > I propose this for master only, for now.  I also think it'd be nice to
> > consider back-patching it after a while, especially since this
> > reported broke on CentOS/RHEL7, a pretty popular OS that'll be around
> > for a good while.  Hmm, I wonder if it's OK to subtly change library
> > dependencies in a minor release; I don't see any problem with it since
> > I expect both variants to be provided by the same package in every
> > distro but we'd certainly want to highlight this to the package
> > maintainers if we did it.
>
> It's not great to change library dependencies in a minor release.  If every
> RHEL 7 installation can crash this way, changing the dependencies is probably
> the least bad thing.

+1, once we get a repro and/or better understanding.

-- 
Thomas Munro
https://enterprisedb.com


On Thu, Mar 14, 2019 at 05:18:49PM +1300, Thomas Munro wrote:
> On Thu, Mar 7, 2019 at 4:19 PM Noah Misch <noah@leadboat.com> wrote:
> > Has anyone else reproduced this?
> 
> I tried, but could not reproduce this problem on "CentOS Linux release
> 7.6.1810 (Core)" using OpenLDAP "2.4.44-21.el7_6" (same as Mike
> reported, what yum install is currently serving up).

> When exiting the session, I was expecting the backend to crash,
> because it had executed libldap.so code during authentication, and
> then it had linked in libldap_r.so via libpq.so while connecting via
> postgres_fdw.  But it doesn't crash.  I wonder what is different for
> Mike; am I missing something, or is there non-determinism here?

The test is deterministic.  I'm guessing Mike's system is finding ldap
libraries other than the usual system ones.  Mike, would you check as follows?

$ echo "select pg_backend_pid(); load 'dblink'; select pg_sleep(100)" | psql -X &
[1] 2530123
  pg_backend_pid
----------------
        2530124
(1 row)

LOAD

$ gdb --batch --pid 2530124 -ex 'info sharedlibrary ldap'
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007ffff6303463 in __epoll_wait_nocancel () from /lib64/libc.so.6
From                To                  Syms Read   Shared Object Library
0x00007ffff65e1ee0  0x00007ffff6613304  Yes (*)     /lib64/libldap-2.4.so.2
0x00007fffe998f6d0  0x00007fffe99c3ae4  Yes (*)     /lib64/libldap_r-2.4.so.2
(*): Shared library is missing debugging information.


Hi Noah, below is the output from one of the servers having this issue:

$ echo "select pg_backend_pid(); load 'dblink'; select pg_sleep(100)" | psql -X &
[1] 9731

$ select pg_backend_pid(); load 'dblink'; select pg_sleep(100)
 pg_backend_pid
----------------
           9732
(1 row)

LOAD

$ gdb --batch --pid 9732 -ex 'info sharedlibrary ldap'

warning: .dynamic section for "/lib64/libldap-2.4.so.2" is not at the expected address (wrong library or version mismatch?)

warning: .dynamic section for "/lib64/liblber-2.4.so.2" is not at the expected address (wrong library or version mismatch?)
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007f1e7592dcf3 in __epoll_wait_nocancel () from /lib64/libc.so.6
From                To                  Syms Read   Shared Object Library
0x00007f1e7637d0f8  0x00007f1e763ae51c  Yes (*)     /lib64/libldap-2.4.so.2
0x00007f1d9f2c16d0  0x00007f1d9f2f5ae4  Yes (*)     /lib64/libldap_r-2.4.so.2
(*): Shared library is missing debugging information.


Regards,
Mike Yeap

On Thu, Mar 14, 2019 at 1:42 PM Noah Misch <noah@leadboat.com> wrote:
On Thu, Mar 14, 2019 at 05:18:49PM +1300, Thomas Munro wrote:
> On Thu, Mar 7, 2019 at 4:19 PM Noah Misch <noah@leadboat.com> wrote:
> > Has anyone else reproduced this?
>
> I tried, but could not reproduce this problem on "CentOS Linux release
> 7.6.1810 (Core)" using OpenLDAP "2.4.44-21.el7_6" (same as Mike
> reported, what yum install is currently serving up).

> When exiting the session, I was expecting the backend to crash,
> because it had executed libldap.so code during authentication, and
> then it had linked in libldap_r.so via libpq.so while connecting via
> postgres_fdw.  But it doesn't crash.  I wonder what is different for
> Mike; am I missing something, or is there non-determinism here?

The test is deterministic.  I'm guessing Mike's system is finding ldap
libraries other than the usual system ones.  Mike, would you check as follows?

$ echo "select pg_backend_pid(); load 'dblink'; select pg_sleep(100)" | psql -X &
[1] 2530123
  pg_backend_pid
----------------
        2530124
(1 row)

LOAD

$ gdb --batch --pid 2530124 -ex 'info sharedlibrary ldap'
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
0x00007ffff6303463 in __epoll_wait_nocancel () from /lib64/libc.so.6
From                To                  Syms Read   Shared Object Library
0x00007ffff65e1ee0  0x00007ffff6613304  Yes (*)     /lib64/libldap-2.4.so.2
0x00007fffe998f6d0  0x00007fffe99c3ae4  Yes (*)     /lib64/libldap_r-2.4.so.2
(*): Shared library is missing debugging information.
On Fri, Mar 15, 2019 at 12:10:59AM +0800, Mike Yeap wrote:
> Hi Noah, below is the output from one of the servers having this issue:
> 
> $ echo "select pg_backend_pid(); load 'dblink'; select pg_sleep(100)" | psql -X &
> [1] 9731
> 
> $ select pg_backend_pid(); load 'dblink'; select pg_sleep(100)
>  pg_backend_pid
> ----------------
>            9732
> (1 row)
> 
> LOAD
> 
> $ gdb --batch --pid 9732 -ex 'info sharedlibrary ldap'
> 
> warning: .dynamic section for "/lib64/libldap-2.4.so.2" is not at the expected address (wrong library or version
mismatch?)
> 
> warning: .dynamic section for "/lib64/liblber-2.4.so.2" is not at the expected address (wrong library or version
mismatch?)
> [Thread debugging using libthread_db enabled]
> Using host libthread_db library "/lib64/libthread_db.so.1".
> 0x00007f1e7592dcf3 in __epoll_wait_nocancel () from /lib64/libc.so.6
> From                To                  Syms Read   Shared Object Library
> 0x00007f1e7637d0f8  0x00007f1e763ae51c  Yes (*)     /lib64/libldap-2.4.so.2
> 0x00007f1d9f2c16d0  0x00007f1d9f2f5ae4  Yes (*)     /lib64/libldap_r-2.4.so.2
> (*): Shared library is missing debugging information.

Thanks.  That rules out my guess.  I don't have another guess at this time.


On Fri, Mar 15, 2019 at 4:46 PM Noah Misch <noah@leadboat.com> wrote:
> Thanks.  That rules out my guess.  I don't have another guess at this time.

Even though I can't reproduce the problem myself, I'm quite keen to go
ahead and push the patch I proposed for v12 anyway, and close this
case.  Otherwise this problem could just keep coming back until
libldap.so is eventually entirely phased out by all distros.  In 2023
I want to be working on quantum parallelism or something, not LDAP bug
reports.  Any objections?

-- 
Thomas Munro
https://enterprisedb.com


Thomas Munro <thomas.munro@gmail.com> writes:
> Even though I can't reproduce the problem myself, I'm quite keen to go
> ahead and push the patch I proposed for v12 anyway, and close this
> case.  Otherwise this problem could just keep coming back until
> libldap.so is eventually entirely phased out by all distros.  In 2023
> I want to be working on quantum parallelism or something, not LDAP bug
> reports.  Any objections?

Do we have any clear reason to believe this'd actually fix Mike's problem?
AFAIK the analogy to the old destructor-conflict issue is just a guess,
and we don't really know exactly what is going wrong.

It's reasonable to assume that the proposed patch won't cause real issues
on any modern platform, but I'm not sure we can assume that for old ones,
so the whole thing is making me a bit nervous.  Still, it's nice
simplification to not have different frontend and backend LDAP libs.

As far as the specifics of the patch go, I don't like that you didn't
adjust any of the comments near pthread_is_threaded_np() usages.

            regards, tom lane


On Wed, Mar 20, 2019 at 10:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Even though I can't reproduce the problem myself, I'm quite keen to go
> > ahead and push the patch I proposed for v12 anyway, and close this
> > case.  Otherwise this problem could just keep coming back until
> > libldap.so is eventually entirely phased out by all distros.  In 2023
> > I want to be working on quantum parallelism or something, not LDAP bug
> > reports.  Any objections?
>
> Do we have any clear reason to believe this'd actually fix Mike's problem?
> AFAIK the analogy to the old destructor-conflict issue is just a guess,
> and we don't really know exactly what is going wrong.

Right, we don't know.  To learn more about the reported crash I think
we'll need Mike to install debug symbols, attach with gdb and make it
crash, then show us the output of "bt".  More info here:
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD

It'd be nice to be able to rule it out in any future bug reports with
these symptoms though, and it's roughly in line with what we see the
rest of the open source ecosystem doing about this problem.

> It's reasonable to assume that the proposed patch won't cause real issues
> on any modern platform, but I'm not sure we can assume that for old ones,
> so the whole thing is making me a bit nervous.  Still, it's nice
> simplification to not have different frontend and backend LDAP libs.

Sure, it's possible that some BF animal will fail to link the backend
for some reason that requires a bit of investigation and a follow-up
patch.  Are you thinking of systems not covered by the BF?

Unless the server is being built with an extremely small set of
configure options enabled, it's almost certainly already linking
something that pulls in the platform's threading library (SSL, GSSAPI,
XML2, ...).  If someone out there is not enabling any of that stuff
because their system doesn't like threads, they can use
--disable-thread-safety to avoid the effects of this change.

> As far as the specifics of the patch go, I don't like that you didn't
> adjust any of the comments near pthread_is_threaded_np() usages.

Hmm.  The comments seemed OK to me without adjustment, is there
something specific that bothered you?  The errhint about LC_ALL is
wrong though, it's macOS-specific.  So I think I should change the
hint to "On macOS, ...", or I guess make it conditional.

-- 
Thomas Munro
https://enterprisedb.com


Thomas Munro <thomas.munro@gmail.com> writes:
> On Wed, Mar 20, 2019 at 10:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> It's reasonable to assume that the proposed patch won't cause real issues
>> on any modern platform, but I'm not sure we can assume that for old ones,
>> so the whole thing is making me a bit nervous.

> Sure, it's possible that some BF animal will fail to link the backend
> for some reason that requires a bit of investigation and a follow-up
> patch.  Are you thinking of systems not covered by the BF?

No, I'm thinking that a "followup patch" might be impossible.

> Unless the server is being built with an extremely small set of
> configure options enabled, it's almost certainly already linking
> something that pulls in the platform's threading library (SSL, GSSAPI,
> XML2, ...).

Yeah, but if somebody is relying on LDAP and not any of those other
things, they won't be happy.

> If someone out there is not enabling any of that stuff
> because their system doesn't like threads, they can use
> --disable-thread-safety to avoid the effects of this change.

No, that's nonsense; --disable-thread-safety only affects what happens
on the frontend side.

>> As far as the specifics of the patch go, I don't like that you didn't
>> adjust any of the comments near pthread_is_threaded_np() usages.

> Hmm.  The comments seemed OK to me without adjustment, is there
> something specific that bothered you?

The comment at postmaster.c:1339 is very specific about how there's
a problem with macOS's libintl.  On the basis of that, nobody would
expect that there's a need to do anything on any other platform.
I think we should at least add something about how we're worried
about libldap_r maybe causing the backend to become multithreaded.

> The errhint about LC_ALL is
> wrong though, it's macOS-specific.

Yeah, but that's part and parcel with the comment.

            regards, tom lane


On Thu, Mar 21, 2019 at 5:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > If someone out there is not enabling any of that stuff
> > because their system doesn't like threads, they can use
> > --disable-thread-safety to avoid the effects of this change.
>
> No, that's nonsense; --disable-thread-safety only affects what happens
> on the frontend side.

That's exactly what I'm talking about changing.  With the patch, BE's
LDAP library variant would also be controlled by that configure
switch, so it would always match the FE.  Almost all users would
continue to choose libldap_r.so for the FE, so they'd start getting
that in the BE too (if they didn't already due to distro-supplied
symlinks).  People using --disable-thread-safety would continue to get
libldap.so for FE and BE, as they do today.

-- 
Thomas Munro
https://enterprisedb.com


Thomas Munro <thomas.munro@gmail.com> writes:
> On Thu, Mar 21, 2019 at 5:07 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Thomas Munro <thomas.munro@gmail.com> writes:
>>> If someone out there is not enabling any of that stuff
>>> because their system doesn't like threads, they can use
>>> --disable-thread-safety to avoid the effects of this change.

>> No, that's nonsense; --disable-thread-safety only affects what happens
>> on the frontend side.

> That's exactly what I'm talking about changing.  With the patch, BE's
> LDAP library variant would also be controlled by that configure
> switch, so it would always match the FE.  Almost all users would
> continue to choose libldap_r.so for the FE, so they'd start getting
> that in the BE too (if they didn't already due to distro-supplied
> symlinks).  People using --disable-thread-safety would continue to get
> libldap.so for FE and BE, as they do today.

Ah, I see.  Seems reasonable.

I still wish we could confirm this fixes the reported problem before
we pull the trigger.

            regards, tom lane