Обсуждение: ERRORDATA_STACK_SIZE panic crashes on Windows

Поиск
Список
Период
Сортировка

ERRORDATA_STACK_SIZE panic crashes on Windows

От
Tom Lane
Дата:
Have any Windows-using hackers tried to look into the reports of
$SUBJECT on 8.3?  We have two fresh reports:
http://archives.postgresql.org/pgsql-bugs/2008-05/msg00106.php
http://archives.postgresql.org/pgsql-bugs/2008-05/msg00109.php
and this isn't the first time we've heard of it.

I spent some time chasing the theory that the conversion from
UTF8 to the client encoding was failing; but if that's the case
it should be reproducible on non-Windows machines, and I didn't
have any luck with that.

What I'm currently thinking is that maybe gettext() isn't on the
same page as us concerning what encoding it's supposed to produce
its output in.  This is reinforced by the mention of changing
lc_messages in the first report above.  We had had some discussions
of trying to adjust the LC_XXX values to ensure that they all
represent the same encoding choice, but I don't believe that got done.
It might also be significant that both complainants are using UTF8
database encoding; IIRC that has some weird status in the Windows
locale world.

I am also toying with the idea of disabling gettext usage once we
get past two or so levels of error nesting, in order to prevent the
recursion panic in this type of scenario --- but it would be real
nice to know for certain what is happening before we try to fix it.
        regards, tom lane


Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
Tom Lane
Дата:
I wrote:
> Have any Windows-using hackers tried to look into the reports of
> $SUBJECT on 8.3?  We have two fresh reports:
> http://archives.postgresql.org/pgsql-bugs/2008-05/msg00106.php
> http://archives.postgresql.org/pgsql-bugs/2008-05/msg00109.php
> and this isn't the first time we've heard of it.

And now we have another report:
http://archives.postgresql.org/pgsql-bugs/2008-05/msg00159.php
for which the complainant was kind enough to supply the requested
details, and they seem to confirm my previous theory:

> What I'm currently thinking is that maybe gettext() isn't on the
> same page as us concerning what encoding it's supposed to produce
> its output in.

After digging around in the source code a bit, the reason why (and why
it's only seen on Windows) became blindingly obvious :-(.  On Windows
we specifically allow UTF-8 database encoding to be selected regardless
of the system locale settings.  This is (AFAIK) safe for the basic
locale-dependent stuff we do, because that all goes through special
UTF-16-using code paths.  But it leaves gettext utterly misinformed
about what it is supposed to do.

Fortunately there is a way to tell gettext what to do, and accordingly
I propose the attached patch.  I am not in a position to test it
however.  Would somebody replicate the failure and confirm this
fixes it?

            regards, tom lane

Index: mbutils.c
===================================================================
RCS file: /cvsroot/pgsql/src/backend/utils/mb/mbutils.c,v
retrieving revision 1.70
diff -c -r1.70 mbutils.c
*** mbutils.c    12 Apr 2008 23:21:04 -0000    1.70
--- mbutils.c    26 May 2008 17:15:49 -0000
***************
*** 697,702 ****
--- 697,721 ----

      DatabaseEncoding = &pg_enc2name_tbl[encoding];
      Assert(DatabaseEncoding->encoding == encoding);
+
+     /*
+      * On Windows, we allow UTF-8 database encoding to be used with any
+      * locale setting, because UTF-8 requires special handling anyway.
+      * But this means that gettext() might be misled about what output
+      * encoding it should use, so we have to tell it explicitly.
+      *
+      * In future we might want to call bind_textdomain_codeset
+      * unconditionally, but that requires knowing how to spell the codeset
+      * name properly for all encodings on all platforms, which might be
+      * problematic.
+      *
+      * This is presently unnecessary, but harmless, on non-Windows platforms.
+      */
+ #ifdef ENABLE_NLS
+     if (encoding == PG_UTF8)
+         if (bind_textdomain_codeset("postgres", "UTF-8") == NULL)
+             elog(LOG, "bind_textdomain_codeset failed");
+ #endif
  }

  void

Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
Magnus Hagander
Дата:
Tom Lane wrote:
> I wrote:
> > Have any Windows-using hackers tried to look into the reports of
> > $SUBJECT on 8.3?  We have two fresh reports:
> > http://archives.postgresql.org/pgsql-bugs/2008-05/msg00106.php
> > http://archives.postgresql.org/pgsql-bugs/2008-05/msg00109.php
> > and this isn't the first time we've heard of it.
> 
> And now we have another report:
> http://archives.postgresql.org/pgsql-bugs/2008-05/msg00159.php
> for which the complainant was kind enough to supply the requested
> details, and they seem to confirm my previous theory:
> 
> > What I'm currently thinking is that maybe gettext() isn't on the
> > same page as us concerning what encoding it's supposed to produce
> > its output in.
> 
> After digging around in the source code a bit, the reason why (and why
> it's only seen on Windows) became blindingly obvious :-(.  On Windows
> we specifically allow UTF-8 database encoding to be selected
> regardless of the system locale settings.  This is (AFAIK) safe for
> the basic locale-dependent stuff we do, because that all goes through
> special UTF-16-using code paths.  But it leaves gettext utterly
> misinformed about what it is supposed to do.
> 
> Fortunately there is a way to tell gettext what to do, and accordingly
> I propose the attached patch.  I am not in a position to test it
> however.  Would somebody replicate the failure and confirm this
> fixes it?

After some work, I've managed to reproduce it in my test environment for
Swedish, and I can confirm that the patch fixes the issue.

Just for kicks, I've applied this patch so you, so you get to be on the
receiving side of that ;-)

//Magnus


//Magnus


Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
Tom Lane
Дата:
Magnus Hagander <magnus@hagander.net> writes:
> Tom Lane wrote:
>> Fortunately there is a way to tell gettext what to do, and accordingly
>> I propose the attached patch.  I am not in a position to test it
>> however.  Would somebody replicate the failure and confirm this
>> fixes it?

> After some work, I've managed to reproduce it in my test environment for
> Swedish, and I can confirm that the patch fixes the issue.

Thanks.

> Just for kicks, I've applied this patch so you, so you get to be on the
> receiving side of that ;-)

No objection here.

I noticed that you applied the patch to 8.2 as well.  It should be
harmless enough, but we weren't having the problem in 8.2 were we?
Or am I just confused?
        regards, tom lane


Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > Tom Lane wrote:
> >> Fortunately there is a way to tell gettext what to do, and accordingly
> >> I propose the attached patch.  I am not in a position to test it
> >> however.  Would somebody replicate the failure and confirm this
> >> fixes it?
> 
> > After some work, I've managed to reproduce it in my test environment for
> > Swedish, and I can confirm that the patch fixes the issue.
> 
> Thanks.
> 
> > Just for kicks, I've applied this patch so you, so you get to be on the
> > receiving side of that ;-)
> 
> No objection here.
> 
> I noticed that you applied the patch to 8.2 as well.  It should be
> harmless enough, but we weren't having the problem in 8.2 were we?
> Or am I just confused?

I am seeing a compile falure after this patch on BSD/OS 4.3.1.  The
failure is during link of the backend binary:
-lssl -lcrypto -lgetopt -ldl -lutil -lm -o postgresutils/mb/mbutils.o: In function
`SetDatabaseEncoding':utils/mb/mbutils.o(.text+0xbbc):undefined reference to `bind_textdomain_codeset'gmake: ***
[postgres]Error 1
 

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> I am seeing a compile falure after this patch on BSD/OS 4.3.1.  The
> failure is during link of the backend binary:

>     -lssl -lcrypto -lgetopt -ldl -lutil -lm -o postgres
>     utils/mb/mbutils.o: In function `SetDatabaseEncoding':
>     utils/mb/mbutils.o(.text+0xbbc): undefined reference to `bind_textdomain_codeset'
>     gmake: *** [postgres] Error 1

Hm, I assume you used --enable-nls ... why isn't libintl mentioned
in the link?
        regards, tom lane


Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Bruce Momjian <bruce@momjian.us> writes:
> > I am seeing a compile falure after this patch on BSD/OS 4.3.1.  The
> > failure is during link of the backend binary:
> 
> >     -lssl -lcrypto -lgetopt -ldl -lutil -lm -o postgres
> >     utils/mb/mbutils.o: In function `SetDatabaseEncoding':
> >     utils/mb/mbutils.o(.text+0xbbc): undefined reference to `bind_textdomain_codeset'
> >     gmake: *** [postgres] Error 1
> 
> Hm, I assume you used --enable-nls ... why isn't libintl mentioned
> in the link?

It was cut off --- the libraries are:
../../src/port/libpgport_srv.a -lintl -lssl -lcrypto -lgetopt -ldl-lutil -lm -o postgres

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> Hm, I assume you used --enable-nls ... why isn't libintl mentioned
>> in the link?

> It was cut off --- the libraries are:

>     ../../src/port/libpgport_srv.a -lintl -lssl -lcrypto -lgetopt -ldl
>     -lutil -lm -o postgres

OK, so your version of libintl doesn't have bind_textdomain_codeset?
Is that functionality missing altogether, or just named differently?
        regards, tom lane


Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
Tom Lane
Дата:
I wrote:
> OK, so your version of libintl doesn't have bind_textdomain_codeset?

Some digging in the gettext changelog suggests that
bind_textdomain_codeset was added in gettext-0.10.36, released
2001-03-29.

We can either add a configure test or say that we don't support
such old versions of gettext ...
        regards, tom lane


Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
"Joshua D. Drake"
Дата:
> We can either add a configure test or say that we don't support
> such old versions of gettext ...

We don't support seems like a very reasonable response considering the
age.

Sincerely,

Joshua D. Drake

> 
>             regards, tom lane
> 



Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
Magnus Hagander
Дата:
Tom Lane wrote:
> I wrote:
> > OK, so your version of libintl doesn't have bind_textdomain_codeset?
> 
> Some digging in the gettext changelog suggests that
> bind_textdomain_codeset was added in gettext-0.10.36, released
> 2001-03-29.
> 
> We can either add a configure test or say that we don't support
> such old versions of gettext ...

Or we could just #ifdef the whole thing to win32, since it's not
really needed on other platforms, pushing that decision to later...
(when that version of gettext will be even more obsolete)

//Magnus


Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
Magnus Hagander
Дата:
Tom Lane wrote:
> > Just for kicks, I've applied this patch so you, so you get to be on
> > the receiving side of that ;-)
> 
> No objection here.
> 
> I noticed that you applied the patch to 8.2 as well.  It should be
> harmless enough, but we weren't having the problem in 8.2 were we?
> Or am I just confused?

I think the underlying problem is still there, it's just that it
doesn't manifest itself in a crash here. I figured it couldn't hurt to
fix it in that case.

//Magnus


Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
Tom Lane
Дата:
Magnus Hagander <magnus@hagander.net> writes:
> Tom Lane wrote:
>> We can either add a configure test or say that we don't support
>> such old versions of gettext ...

> Or we could just #ifdef the whole thing to win32, since it's not
> really needed on other platforms, pushing that decision to later...
> (when that version of gettext will be even more obsolete)

That would work for the moment, but we're almost certainly going to
have to insist on bind_textdomain_codeset being available eventually;
AFAICS there's no hope of multi-locale/multi-encoding support without it.

I was considering either:

1. Add a probe for bind_textdomain_codeset to configure, and
conditionalize the new patch on HAVE_BIND_TEXTDOMAIN_CODESET.

2. Adjust the AC_SEARCH_LIBS call to probe for bind_textdomain_codeset()
instead of gettext() as it does now.  This would have the effect of
rejecting pre-0.10.36 versions of the library.

Magnus' suggestion gives a third possibility.

I notice that the PGAC_CHECK_GETTEXT macro already contains the commentdnl FIXME: We should probably check for version
>=0.10.36.
So depending on what that's about, there might be some other good
reasons to go with choice #2.  Peter, it appears you put that comment in
when you first added the macro, on 2001-06-02.  Do you remember why?
        regards, tom lane


Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
Magnus Hagander
Дата:
Tom Lane wrote:
> Magnus Hagander <magnus@hagander.net> writes:
> > Tom Lane wrote:
> >> We can either add a configure test or say that we don't support
> >> such old versions of gettext ...
> 
> > Or we could just #ifdef the whole thing to win32, since it's not
> > really needed on other platforms, pushing that decision to later...
> > (when that version of gettext will be even more obsolete)
> 
> That would work for the moment, but we're almost certainly going to
> have to insist on bind_textdomain_codeset being available eventually;
> AFAICS there's no hope of multi-locale/multi-encoding support without
> it.

Yes, that's why I said it would only push the decision to the future.

Perhaps doing this #ifdef would be a good idea for back-branches, and
then we look at one of the other solutions for 8.4?


> I was considering either:
> 
> 1. Add a probe for bind_textdomain_codeset to configure, and
> conditionalize the new patch on HAVE_BIND_TEXTDOMAIN_CODESET.
> 
> 2. Adjust the AC_SEARCH_LIBS call to probe for
> bind_textdomain_codeset() instead of gettext() as it does now.  This
> would have the effect of rejecting pre-0.10.36 versions of the
> library.

Depending on the buildfarm issue it may be that the software is antique
enough that almost only Bruce runs such an old version. If so, I think
#2 is just fine (except in back branches, of course)


> Magnus' suggestion gives a third possibility.
> 
> I notice that the PGAC_CHECK_GETTEXT macro already contains the
> comment dnl FIXME: We should probably check for version >=0.10.36.
> So depending on what that's about, there might be some other good
> reasons to go with choice #2.  Peter, it appears you put that comment
> in when you first added the macro, on 2001-06-02.  Do you remember
> why?

Could it possibly be for this very reason?

//Magnus


Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
Bruce Momjian
Дата:
Magnus Hagander wrote:
> > 2. Adjust the AC_SEARCH_LIBS call to probe for
> > bind_textdomain_codeset() instead of gettext() as it does now.  This
> > would have the effect of rejecting pre-0.10.36 versions of the
> > library.
> 
> Depending on the buildfarm issue it may be that the software is antique
> enough that almost only Bruce runs such an old version. If so, I think
> #2 is just fine (except in back branches, of course)

You don't have to fix it just for me --- I can remove --enable-nls;  the
big question is who else is going to hit this.  I think the buildfarm
has safe-enough checking for 8.4, but I am concerned about the back
branches where testing isn't as complete.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
Magnus Hagander
Дата:
Bruce Momjian wrote:
> Magnus Hagander wrote:
> > > 2. Adjust the AC_SEARCH_LIBS call to probe for
> > > bind_textdomain_codeset() instead of gettext() as it does now.
> > > This would have the effect of rejecting pre-0.10.36 versions of
> > > the library.
> > 
> > Depending on the buildfarm issue it may be that the software is
> > antique enough that almost only Bruce runs such an old version. If
> > so, I think #2 is just fine (except in back branches, of course)
> 
> You don't have to fix it just for me --- I can remove --enable-nls;
> the big question is who else is going to hit this.  I think the
> buildfarm has safe-enough checking for 8.4, but I am concerned about
> the back branches where testing isn't as complete.

This is why I'm suggesting the #ifdef WIN32 for back branches.

//Magnus


Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
"Dave Page"
Дата:
On Tue, May 27, 2008 at 8:05 PM, Bruce Momjian <bruce@momjian.us> wrote:
> You don't have to fix it just for me --- I can remove --enable-nls;  the
> big question is who else is going to hit this.  I think the buildfarm
> has safe-enough checking for 8.4, but I am concerned about the back
> branches where testing isn't as complete.

We'll find out in a few hours.

My guess is that anyone happy to be running such an old version of
gettext is probably running old versions of everything, including PG.
Your case is obviously a little different.


-- 
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com


Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
Peter Eisentraut
Дата:
Tom Lane wrote:
> I notice that the PGAC_CHECK_GETTEXT macro already contains the comment
>         dnl FIXME: We should probably check for version >=0.10.36.
> So depending on what that's about, there might be some other good
> reasons to go with choice #2.  Peter, it appears you put that comment in
> when you first added the macro, on 2001-06-02.  Do you remember why?

I think that was the first version that worked sanely in general.


Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
Tom Lane
Дата:
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane wrote:
>> I notice that the PGAC_CHECK_GETTEXT macro already contains the comment
>>    dnl FIXME: We should probably check for version >=0.10.36.
>> So depending on what that's about, there might be some other good
>> reasons to go with choice #2. Peter, it appears you put that comment in
>> when you first added the macro, on 2001-06-02. Do you remember why?

> I think that was the first version that worked sanely in general.

Hmm.  Bruce, what gettext version are you running exactly, and how much
have you ever tested the localization behavior with it?
        regards, tom lane


Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > Tom Lane wrote:
> >> I notice that the PGAC_CHECK_GETTEXT macro already contains the comment
> >>    dnl FIXME: We should probably check for version >=0.10.36.
> >> So depending on what that's about, there might be some other good
> >> reasons to go with choice #2. Peter, it appears you put that comment in
> >> when you first added the macro, on 2001-06-02. Do you remember why?
> 
> > I think that was the first version that worked sanely in general.
> 
> Hmm.  Bruce, what gettext version are you running exactly, and how much
> have you ever tested the localization behavior with it?

Uh, I can't seem to find the libintl version --- my bet is that it was
installed as part of another package but I am not sure which one.

I have never used it to test localization support so turning it off is
fine --- I just enabled it to catch compile failures.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: ERRORDATA_STACK_SIZE panic crashes on Windows

От
Tom Lane
Дата:
Bruce Momjian <bruce@momjian.us> writes:
> Tom Lane wrote:
>> Peter Eisentraut <peter_e@gmx.net> writes:
>>> I think that was the first version that worked sanely in general.
>> 
>> Hmm.  Bruce, what gettext version are you running exactly, and how much
>> have you ever tested the localization behavior with it?

> Uh, I can't seem to find the libintl version --- my bet is that it was
> installed as part of another package but I am not sure which one.
> I have never used it to test localization support so turning it off is
> fine --- I just enabled it to catch compile failures.

I dug through the gettext changelogs a bit more.  There were *three
years* and a lot of fixes between 0.10.35 and 0.10.36; the one that
is most directly relevant to us is

* Locales which differ only in the character encoding, for example ja_JP and ja_JP.UTF-8, can now share the same
messagecatalogs. gettext converts the messages to the appropriate character encoding on the fly.
 

Since we supply only one message catalog per language, it's probably
fair to say that gettext versions without this ability are broken for
our purposes.

So I'm thinking that the correct fix is to require
bind_textdomain_codeset to be present when --enable-nls is requested.
I will go make that happen in the versions that have the new patch
(ie, 8.2 and up).
        regards, tom lane