Обсуждение: BUG #16041: Error shows up both in pgAdmin and in Ruby (pg gem) - Segmentation fault

Поиск
Список
Период
Сортировка

BUG #16041: Error shows up both in pgAdmin and in Ruby (pg gem) - Segmentation fault

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16041
Logged by:          Mark Siemers
Email address:      mark.siemers@gmail.com
PostgreSQL version: 12.0
Operating system:   Mac OS X Mojave 10.14.6
Description:

For further details (including crash report) see bugs filed with
third-parties:
Ruby - https://bugs.ruby-lang.org/issues/16239
pgAdmin 4 - https://redmine.postgresql.org/issues/4813

The speculation from a ruby maintainer is there is an issue with GSS
authentication on OS X.

Snippet of stack trace below:
7   ???                             0x0000000200000000 0 + 8589934592
8   com.apple.security              0x00007fff3f57c059 invocation function
for block in
Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*)
+ 287
9   libdispatch.dylib               0x00007fff5fd6d63d
_dispatch_client_callout + 8
10  libdispatch.dylib               0x00007fff5fd79129
_dispatch_lane_barrier_sync_invoke_and_complete + 60
11  com.apple.security              0x00007fff3f57be47
Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*)
+ 441
12  com.apple.security              0x00007fff3f37cae2
Security::KeychainCore::KCCursorImpl::next(Security::KeychainCore::Item&) +
230
13  com.apple.security              0x00007fff3f523c98
Security::KeychainCore::IdentityCursor::next(Security::SecPointer<Security::KeychainCore::Identity>&)
+ 192
14  com.apple.security              0x00007fff3f545f2f
SecIdentitySearchCopyNext + 145
15  com.apple.security              0x00007fff3f550956
SecItemCopyMatching_osx(__CFDictionary const*, void const**) + 238
16  com.apple.security              0x00007fff3f553fc5 SecItemCopyMatching +
316
17  com.apple.Heimdal               0x00007fff4feae830 0x7fff4fe5c000 +
337968
18  com.apple.Heimdal               0x00007fff4fead35e hx509_certs_find +
67
19  com.apple.Heimdal               0x00007fff4fe88a6c _krb5_pk_find_cert +
246
20  com.apple.GSS                   0x00007fff364dbd8e
_gsspku2u_acquire_cred + 386
21  com.apple.GSS                   0x00007fff364cb0d8 gss_acquire_cred +
523
22  libpq.5.dylib                   0x0000000112b4b77d
pg_GSS_have_cred_cache + 54
23  libpq.5.dylib                   0x0000000112b39edf PQconnectPoll +
6377
24  libpq.5.dylib                   0x0000000112b36f8b connectDBComplete +
232
25  libpq.5.dylib                   0x0000000112b37112 PQconnectdb + 36
26  pg_ext.bundle                   0x000000011157ab01
gvl_PQconnectdb_skeleton + 17
27  ruby                            0x000000010f1dfff9 call_without_gvl +
185
28  pg_ext.bundle                   0x000000011157aadd gvl_PQconnectdb +
45
29  pg_ext.bundle                   0x000000011157fcb9 pgconn_init + 121
30  ruby                            0x000000010f221b1c vm_call0_body + 604


Re: BUG #16041: Error shows up both in pgAdmin and in Ruby (pg gem) - Segmentation fault

От
Fahar Abbas
Дата:
Hi,

Issue is not reproducible on MAC 10.12 for same PostgreSQL 12 server.

On Sat, Oct 5, 2019 at 3:43 AM PG Bug reporting form <noreply@postgresql.org> wrote:
The following bug has been logged on the website:

Bug reference:      16041
Logged by:          Mark Siemers
Email address:      mark.siemers@gmail.com
PostgreSQL version: 12.0
Operating system:   Mac OS X Mojave 10.14.6
Description:       

For further details (including crash report) see bugs filed with
third-parties:
Ruby - https://bugs.ruby-lang.org/issues/16239
pgAdmin 4 - https://redmine.postgresql.org/issues/4813

The speculation from a ruby maintainer is there is an issue with GSS
authentication on OS X.

Snippet of stack trace below:
7   ???                             0x0000000200000000 0 + 8589934592
8   com.apple.security              0x00007fff3f57c059 invocation function
for block in
Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*)
+ 287
9   libdispatch.dylib               0x00007fff5fd6d63d
_dispatch_client_callout + 8
10  libdispatch.dylib               0x00007fff5fd79129
_dispatch_lane_barrier_sync_invoke_and_complete + 60
11  com.apple.security              0x00007fff3f57be47
Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*)
+ 441
12  com.apple.security              0x00007fff3f37cae2
Security::KeychainCore::KCCursorImpl::next(Security::KeychainCore::Item&) +
230
13  com.apple.security              0x00007fff3f523c98
Security::KeychainCore::IdentityCursor::next(Security::SecPointer<Security::KeychainCore::Identity>&)
+ 192
14  com.apple.security              0x00007fff3f545f2f
SecIdentitySearchCopyNext + 145
15  com.apple.security              0x00007fff3f550956
SecItemCopyMatching_osx(__CFDictionary const*, void const**) + 238
16  com.apple.security              0x00007fff3f553fc5 SecItemCopyMatching +
316
17  com.apple.Heimdal               0x00007fff4feae830 0x7fff4fe5c000 +
337968
18  com.apple.Heimdal               0x00007fff4fead35e hx509_certs_find +
67
19  com.apple.Heimdal               0x00007fff4fe88a6c _krb5_pk_find_cert +
246
20  com.apple.GSS                   0x00007fff364dbd8e
_gsspku2u_acquire_cred + 386
21  com.apple.GSS                   0x00007fff364cb0d8 gss_acquire_cred +
523
22  libpq.5.dylib                   0x0000000112b4b77d
pg_GSS_have_cred_cache + 54
23  libpq.5.dylib                   0x0000000112b39edf PQconnectPoll +
6377
24  libpq.5.dylib                   0x0000000112b36f8b connectDBComplete +
232
25  libpq.5.dylib                   0x0000000112b37112 PQconnectdb + 36
26  pg_ext.bundle                   0x000000011157ab01
gvl_PQconnectdb_skeleton + 17
27  ruby                            0x000000010f1dfff9 call_without_gvl +
185
28  pg_ext.bundle                   0x000000011157aadd gvl_PQconnectdb +
45
29  pg_ext.bundle                   0x000000011157fcb9 pgconn_init + 121
30  ruby                            0x000000010f221b1c vm_call0_body + 604



--
Fahar Abbas
QMG
EnterpriseDB Corporation
Phone Office: +92-51-835-8874
Phone Direct: +92-51-8466803
Mobile: +92-333-5409707
Skype ID: live:fahar.abbas
Website: www.enterprisedb.com
Вложения

Re: BUG #16041: Error shows up both in pgAdmin and in Ruby (pg gem) -Segmentation fault

От
Chris Bandy
Дата:
Hello,

I am able to reproduce this on macOS 10.14 (Mojave) in multiple versions 
of Ruby and in a minimal C program.

Steps to reproduce:

1. Install libpq for PostgreSQL 12:
    brew install postgresql@12

2. Install the pg gem:
    gem install pg

2. Start a PostgreSQL server:
    docker run --rm -d -p 127.0.0.1:5432:5432 postgres:12

3. Execute some GSS path before and after fork:
    ruby -r pg -e '
      PG.connect(host: "localhost")
      Process.fork { PG.connect(host: "localhost") }
      Process.wait
    '

Notice that host must be a TCP address (not Unix) and gssencmode must be 
"prefer" (default is "prefer".) The version of the server doesn't appear 
to matter; I tested 10, 11, and 12.

This can also happen in `rails console` if an application initializer 
interacts with ActiveRecord or a descendant (i.e. opens a database 
connection.) Any further interaction with ActiveRecord on the console 
segfaults.

This has been reported in a variety of Ruby projects and often dismissed 
as "a PostgreSQL issue."


I found a similar trace in a Python package that interacts with the 
macOS keychain.[1] There they narrowed it to a single call, raised the 
issue upstream, and were told in-short "you can't use keychain after fork."

Based on that report, I crafted a minimal C program to make the same GSS 
call as libpq. I compiled (with deprecation warnings) and tested with 
the following:

    gcc macos-gss-crash.c -o macos-gss-crash -lgssapi_krb5
    ./macos-gss-crash

It prints:

    before gss_acquire_cred in main
    after gss_acquire_cred in main
    gss complete: true
    before gss_acquire_cred in child
    child signalled: 11

I've attached the C program and crash reports for it and the above Ruby 
snippet.

Thanks!

Chris

[1]: https://github.com/jaraco/keyring/issues/281


On 10/4/19 5:43 PM, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      16041
> Logged by:          Mark Siemers
> Email address:      mark.siemers@gmail.com
> PostgreSQL version: 12.0
> Operating system:   Mac OS X Mojave 10.14.6
> Description:
> 
> For further details (including crash report) see bugs filed with
> third-parties:
> Ruby - https://bugs.ruby-lang.org/issues/16239
> pgAdmin 4 - https://redmine.postgresql.org/issues/4813
> 
> The speculation from a ruby maintainer is there is an issue with GSS
> authentication on OS X.
> 
> Snippet of stack trace below:
> 7   ???                             0x0000000200000000 0 + 8589934592
> 8   com.apple.security              0x00007fff3f57c059 invocation function
> for block in
> Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*)
> + 287
> 9   libdispatch.dylib               0x00007fff5fd6d63d
> _dispatch_client_callout + 8
> 10  libdispatch.dylib               0x00007fff5fd79129
> _dispatch_lane_barrier_sync_invoke_and_complete + 60
> 11  com.apple.security              0x00007fff3f57be47
> Security::KeychainCore::StorageManager::tickleKeychain(Security::KeychainCore::KeychainImpl*)
> + 441
> 12  com.apple.security              0x00007fff3f37cae2
> Security::KeychainCore::KCCursorImpl::next(Security::KeychainCore::Item&) +
> 230
> 13  com.apple.security              0x00007fff3f523c98
> Security::KeychainCore::IdentityCursor::next(Security::SecPointer<Security::KeychainCore::Identity>&)
> + 192
> 14  com.apple.security              0x00007fff3f545f2f
> SecIdentitySearchCopyNext + 145
> 15  com.apple.security              0x00007fff3f550956
> SecItemCopyMatching_osx(__CFDictionary const*, void const**) + 238
> 16  com.apple.security              0x00007fff3f553fc5 SecItemCopyMatching +
> 316
> 17  com.apple.Heimdal               0x00007fff4feae830 0x7fff4fe5c000 +
> 337968
> 18  com.apple.Heimdal               0x00007fff4fead35e hx509_certs_find +
> 67
> 19  com.apple.Heimdal               0x00007fff4fe88a6c _krb5_pk_find_cert +
> 246
> 20  com.apple.GSS                   0x00007fff364dbd8e
> _gsspku2u_acquire_cred + 386
> 21  com.apple.GSS                   0x00007fff364cb0d8 gss_acquire_cred +
> 523
> 22  libpq.5.dylib                   0x0000000112b4b77d
> pg_GSS_have_cred_cache + 54
> 23  libpq.5.dylib                   0x0000000112b39edf PQconnectPoll +
> 6377
> 24  libpq.5.dylib                   0x0000000112b36f8b connectDBComplete +
> 232
> 25  libpq.5.dylib                   0x0000000112b37112 PQconnectdb + 36
> 26  pg_ext.bundle                   0x000000011157ab01
> gvl_PQconnectdb_skeleton + 17
> 27  ruby                            0x000000010f1dfff9 call_without_gvl +
> 185
> 28  pg_ext.bundle                   0x000000011157aadd gvl_PQconnectdb +
> 45
> 29  pg_ext.bundle                   0x000000011157fcb9 pgconn_init + 121
> 30  ruby                            0x000000010f221b1c vm_call0_body + 604
>

Вложения

Re: BUG #16041: Error shows up both in pgAdmin and in Ruby (pg gem) -Segmentation fault

От
Chris Bandy
Дата:
On 12/3/19 3:33 PM, Chris Bandy wrote:
> Hello,
> 
> I am able to reproduce this on macOS 10.14 (Mojave) in multiple versions 
> of Ruby and in a minimal C program.
> 
I was also able to reproduce this with the attached Python program and 
psycopg2 package.

Steps to reproduce:

1. Install libpq for PostgreSQL 12:
    brew install postgresql@12

2. Install the psycopg2 package:
    pip install psycopg2

3. Start a PostgreSQL server:
    docker run --rm -d -p 127.0.0.1:5432:5432 postgres:12

4. Execute some GSS path before and after fork:
    python macos-gss-crash.py

It generates a crash report and prints:

    main ok
    -11

In this and the previous tests I can avoid/workaround the segfault by 
specifying gssencmode=disable.


Thanks!

Chris

Вложения

Re: BUG #16041: Error shows up both in pgAdmin and in Ruby (pg gem)- Segmentation fault

От
Stephen Frost
Дата:
Greetings,

* Chris Bandy (chris.bandy@crunchydata.com) wrote:
> Notice that host must be a TCP address (not Unix) and gssencmode must be
> "prefer" (default is "prefer".) The version of the server doesn't appear to
> matter; I tested 10, 11, and 12.

So, gssencmode didn't exist in 10 or 11- but are you actually testing
those different versions of *libpq*?  That's really what is relevant
here, I believe, if libpq is actually even relevant at all...

> This has been reported in a variety of Ruby projects and often dismissed as
> "a PostgreSQL issue."

I'm really inclined to say that this isn't a PG issue...

> Based on that report, I crafted a minimal C program to make the same GSS
> call as libpq. I compiled (with deprecation warnings) and tested with the
> following:
>
>    gcc macos-gss-crash.c -o macos-gss-crash -lgssapi_krb5
>    ./macos-gss-crash

Particularly since that isn't linking against libpq and it's still
crashing.

I took the liberty to update the C code version to run on a Linux
system, and sure enough, it works just fine:

before gss_acquire_cred in main
after gss_acquire_cred in main
gss complete: true
before gss_acquire_cred in child
after gss_acquire_cred in child
gss complete: true
child exit code: 0

(also tested w/o having GSS creds and it still worked without a crash)

The only difference I needed to get it to compile on my Ubuntu box was
to add:

#include <sys/types.h>
#include <sys/wait.h>

and then compile as:

➜  ~ gcc macos-gss-crash.c -o macos-gss-crash -I /usr/include/mit-krb5 -L /usr/lib/x86_64-linux-gnu/mit-krb5
-lgssapi_krb5 

> It prints:
>
>    before gss_acquire_cred in main
>    after gss_acquire_cred in main
>    gss complete: true
>    before gss_acquire_cred in child
>    child signalled: 11
>
> I've attached the C program and crash reports for it and the above Ruby
> snippet.

Unfortunately, MacOS is pretty well known to be terrible about less
commonly used libraries and maintaining them.  I'd suggest building a
current version of the Kerberos libraries, making sure you're linking
against just those and not whatever is provided by MacOS, and see if you
still have an issue.

The other possibility is that this is an current bug in Heimdal, which
seems to be the Kerberos library being used on MacOS, in which case
you'd need to bring up the issue with them.

There seems to be some indepedent confirmation of this being an issue
with the Heimdal provided by MacOS:

https://github.com/zenchild/gssapi/issues/12

The docs for gss_acquire_cred() don't seem to say much about what
happens when there's a fork():

https://docs.oracle.com/cd/E19683-01/816-1331/overview-141/index.html

If there's something we should be doing differently with
gss_acquire_cred() to "fix" this then I'm certainly open to it but I'm
really not sure what we'd do here; it seems pretty clearly to be some
issue where the Kerberos/Heimdal library being used is maintaining its
own state and getting confused after a fork happens.

Thanks,

Stephen

Вложения

Re: BUG #16041: Error shows up both in pgAdmin and in Ruby (pg gem) -Segmentation fault

От
Chris Bandy
Дата:
On 12/3/19 5:31 PM, Stephen Frost wrote:
> Greetings,
> 
> * Chris Bandy (chris.bandy@crunchydata.com) wrote:
>> Notice that host must be a TCP address (not Unix) and gssencmode must be
>> "prefer" (default is "prefer".) The version of the server doesn't appear to
>> matter; I tested 10, 11, and 12.
> 
> So, gssencmode didn't exist in 10 or 11- but are you actually testing
> those different versions of *libpq*?

No, the libpq version in my tests is always 12. I was trying to say that 
it doesn't appear to be an issue with the protocol/negotiation of GSS 
encryption.

That does make me wonder, though, if/how the _server_ built by `brew 
install postgresql` might be impacted by the macOS GSSAPI? All my tests 
targeted a linux server.

>> This has been reported in a variety of Ruby projects and often dismissed as
>> "a PostgreSQL issue."
> 
> I'm really inclined to say that this isn't a PG issue...

I agree, but at the same time the perception seems to be that 
using/connecting to PostgreSQL crashes one's application. I think the 
very reasonable default of gssencmode=prefer is partly responsible. 
Users don't realize that by upgrading libpq they are opting in to new 
security code paths (and library compatibility issues.)

> Unfortunately, MacOS is pretty well known to be terrible about less
> commonly used libraries and maintaining them.  I'd suggest building a
> current version of the Kerberos libraries, making sure you're linking
> against just those and not whatever is provided by MacOS, and see if you
> still have an issue.

Investigating this has been the deepest exposure I've had to this... 
yes, "unfortunate" reality.

Homebrew provides a recent version of krb5 (1.17 at this time) so I set 
out to use it. A small diff to the formula proved successful. I'll 
submit a patch to Homebrew linking back to this thread.

Is there anything that can/should be done on PostgreSQL's end now that 
we know about this situation? The most I can imagine is to issue a 
warning when macOS's GSSAPI is detected during build/configure. I don't 
know how to do the latter and won't be surprised if the answer to the 
former is "no."

> The other possibility is that this is an current bug in Heimdal, which
> seems to be the Kerberos library being used on MacOS, in which case
> you'd need to bring up the issue with them.

I'm out of my depth on this front. My impression from the traces is that 
the incompatibility is in macOS keychain, and I'm willing to leave it at 
that. While researching this topic, I found multiple cases where fork() 
and the "dispatch queue" are incompatible.[1]

> There seems to be some indepedent confirmation of this being an issue
> with the Heimdal provided by MacOS:
> 
> https://github.com/zenchild/gssapi/issues/12

I don't see any C level backtrace information in that thread, so I can't 
tell if its the same issue.

Thank you for your help!

Chris

[1]: https://www.evanjones.ca/fork-is-dangerous.html



Re: BUG #16041: Error shows up both in pgAdmin and in Ruby (pg gem)- Segmentation fault

От
Stephen Frost
Дата:
Greetings,

* Chris Bandy (chris.bandy@crunchydata.com) wrote:
> On 12/3/19 5:31 PM, Stephen Frost wrote:
> >* Chris Bandy (chris.bandy@crunchydata.com) wrote:
> >>Notice that host must be a TCP address (not Unix) and gssencmode must be
> >>"prefer" (default is "prefer".) The version of the server doesn't appear to
> >>matter; I tested 10, 11, and 12.
> >
> >So, gssencmode didn't exist in 10 or 11- but are you actually testing
> >those different versions of *libpq*?
>
> No, the libpq version in my tests is always 12. I was trying to say that it
> doesn't appear to be an issue with the protocol/negotiation of GSS
> encryption.

No, I don't think it's got anything to do with that ... or largely to do
with PG, except that libpq with v12 now uses more of the GSSAPI library
than it used to.

> That does make me wonder, though, if/how the _server_ built by `brew install
> postgresql` might be impacted by the macOS GSSAPI? All my tests targeted a
> linux server.

I wouldn't be at all surprised if there's other bugs lurking in the old
version of Heimdal that Apple hacked up and distributes with their base
OS.

> >>This has been reported in a variety of Ruby projects and often dismissed as
> >>"a PostgreSQL issue."
> >
> >I'm really inclined to say that this isn't a PG issue...
>
> I agree, but at the same time the perception seems to be that
> using/connecting to PostgreSQL crashes one's application. I think the very
> reasonable default of gssencmode=prefer is partly responsible. Users don't
> realize that by upgrading libpq they are opting in to new security code
> paths (and library compatibility issues.)

Perception isn't reality though and upgrading to a new major version of
libpq is going to pretty regularly involves new library calls or calls
being made in ways they weren't before.  If that exposes a bug in
that library (particularly one that's been fixed in more recent versions
of the library), that's not on us to hack around or attempt to solve,
imv.  Perhaps someone else has a differing opinion and wants to try and
figure out a way to solve this that doesn't materially make things worse
for users that are running with a modern library, which would be great,
but I can't get too worked up about it.

> >Unfortunately, MacOS is pretty well known to be terrible about less
> >commonly used libraries and maintaining them.  I'd suggest building a
> >current version of the Kerberos libraries, making sure you're linking
> >against just those and not whatever is provided by MacOS, and see if you
> >still have an issue.
>
> Investigating this has been the deepest exposure I've had to this... yes,
> "unfortunate" reality.
>
> Homebrew provides a recent version of krb5 (1.17 at this time) so I set out
> to use it. A small diff to the formula proved successful. I'll submit a
> patch to Homebrew linking back to this thread.

Great, that sounds like it's probably the right approach to addressing
this.

> Is there anything that can/should be done on PostgreSQL's end now that we
> know about this situation? The most I can imagine is to issue a warning when
> macOS's GSSAPI is detected during build/configure. I don't know how to do
> the latter and won't be surprised if the answer to the former is "no."

I wouldn't be against doing something here but I don't have a Mac myself
and I don't plan to spend time trying to hack around their broken
library.  I'm also not entirely convinced that we should just throw an
error if we come across this busted library- psql doesn't fork and
hasn't got any problems, so it seems a bit overkill to just refuse to
work with the MacOS library.

> >The other possibility is that this is an current bug in Heimdal, which
> >seems to be the Kerberos library being used on MacOS, in which case
> >you'd need to bring up the issue with them.
>
> I'm out of my depth on this front. My impression from the traces is that the
> incompatibility is in macOS keychain, and I'm willing to leave it at that.
> While researching this topic, I found multiple cases where fork() and the
> "dispatch queue" are incompatible.[1]

I'm.. not terribly impressed by that blog's arguments around fork(),
particularly since it seems to be claiming things that are actually not
true about fork but which are true about threads.  In fact, what it
seems to really be getting at is that running with threads and fork'ing
at the same time is awful complicated to get right, and that's pretty
accurate, but that doesn't make just using fork() an issue.

That blog post aside, it looks like what it's getting at is that you
can't link to MacOS libraries and also fork() and expect things to be
sane, and while that's unfortuante, that isn't really our issue to go
figure out how to fix or address.

Thanks,

Stephen

Вложения