Re: narwhal and PGDLLIMPORT

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: narwhal and PGDLLIMPORT
Дата
Msg-id 20141020202447.GH7176@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: narwhal and PGDLLIMPORT  (Noah Misch <noah@leadboat.com>)
Ответы Re: narwhal and PGDLLIMPORT
Список pgsql-hackers
On 2014-10-20 01:03:31 -0400, Noah Misch wrote:
> On Wed, Oct 15, 2014 at 12:53:03AM -0400, Noah Misch wrote:
> > On Tue, Oct 14, 2014 at 07:07:17PM -0400, Tom Lane wrote:
> > > Dave Page <dpage@pgadmin.org> writes:
> > > > On Tue, Oct 14, 2014 at 11:38 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > > >> I think we're hoping that somebody will step up and investigate how
> > > >> narwhal's problem might be fixed.
> > 
> > I have planned to look at reproducing narwhal's problem once the dust settles
> > on orangutan, but I wouldn't mind if narwhal went away instead.
> 
> > > No argument here.  I would kind of like to have more than zero
> > > understanding of *why* it's failing, just in case there's more to it
> > > than "oh, probably a bug in this old toolchain".  But finding that out
> > > might well take significant time, and in the end not tell us anything
> > > very useful.
> > 
> > Agreed on all those points.
> 
> I reproduced narwhal's problem using its toolchain on another 32-bit Windows
> Server 2003 system.  The crash happens at the SHGetFolderPath() call in
> pqGetHomeDirectory().  A program can acquire that function via shfolder.dll or
> via shell32.dll; we've used the former method since commit 889f038, for better
> compatibility[1] with Windows NT 4.0.  On this system, shfolder.dll's version
> loads and unloads shell32.dll.  In PostgreSQL built using this older compiler,
> shfolder.dll:SHGetFolderPath() unloads libpq in addition to unloading shell32!
> That started with commit 846e91e.  I don't expect to understand the mechanism
> behind it, but I recommend we switch back to linking libpq with shell32.dll.
> The MSVC build already does that in all supported branches, and it feels right
> for the MinGW build to follow suit in 9.4+.  Windows versions that lack the
> symbol in shell32.dll are now ancient history.

Ick. Nice detective work of a ugly situation.

> I happened to try the same contrib/dblink test suite on PostgreSQL built with
> modern MinGW-w64 (i686-4.9.1-release-win32-dwarf-rt_v3-rev1).  That, too, gave
> a crash-like symptom starting with commit 846e91e.  Specifically, a backend
> that LOADed any module linked to libpq (libpqwalreceiver, dblink,
> postgres_fdw) would suffer this after calling exit(0):
> 
> ===
> 3056 2014-10-20 00:40:15.163 GMT LOG:  disconnection: session time: 0:00:00.515 user=cyg_server database=template1
host=127.0.0.1port=3936
 
> 
> This application has requested the Runtime to terminate it in an unusual way.
> Please contact the application's support team for more information.
> 
> This application has requested the Runtime to terminate it in an unusual way.
> Please contact the application's support team for more information.
> 9300 2014-10-20 00:40:15.163 GMT LOG:  server process (PID 3056) exited with exit code 3
> ===
> 
> The mechanism turned out to be disjoint from the mechanism behind the
> ancient-compiler crash.  Based on the functions called from exit(), my best
> guess is that exit() encountered recursion and used something like an abort()
> to escape.

Hm.

>  (I can send the gdb transcript if anyone is curious to see the
> gory details.)

That would be interesting.

> The proximate cause was commit 846e91e allowing modules to use
> shared libgcc.  A 32-bit libpq acquires 64-bit integer division from libgcc.
> Passing -static-libgcc to the link restores the libgcc situation as it stood
> before commit 846e91e.  The main beneficiary of shared libgcc is C++/Java
> exception handling, so PostgreSQL doesn't care.  No doubt there's some deeper
> bug in libgcc or in PostgreSQL; loading a module that links with shared libgcc
> should not disrupt exit().  I'm content with this workaround.

I'm unconvinced by this reasoning. Popular postgres extensions like
postgis do use C++. It's imo not hard to imagine situations where
switching to a statically linked libgcc statically could cause problems.


Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Следующее
От: David G Johnston
Дата:
Сообщение: Re: Add regression tests for autocommit-off mode for psql and fix some omissions