Обсуждение: 7.2.1 backend crash (convert_string_datum, locale)

Поиск
Список
Период
Сортировка

7.2.1 backend crash (convert_string_datum, locale)

От
Mats Lofkvist
Дата:
Hi,

When testing postgres 7.2.1 on a sparc/solaris8 box with
--enable-locale --enable-multibyte I get a crash in
convert_string_datum.

The backend just dies when doing an select. With casserts
and debug configured in I got the following in the log:


NOTICE:  AllocSetFree: detected write past chunk end in TransactionCommandContex
t 4b7c18
NOTICE:  AllocSetFree: detected write past chunk end in TransactionCommandContex
t 4b7c18
NOTICE:  AllocSetFree: detected write past chunk end in TransactionCommandContex
t 4b7c18
NOTICE:  AllocSetFree: detected write past chunk end in TransactionCommandContex
t 4b7c18
NOTICE:  AllocSetFree: detected write past chunk end in TransactionCommandContex
t 4b7c18
NOTICE:  AllocSetFree: detected write past chunk end in TransactionCommandContex
t 4b7818


Gdb on the crashing backend says:


Program received signal SIGSEGV, Segmentation fault.
0x269bd0 in pfree (pointer=0x4b7878) at mcxt.c:446
446             AssertArg(MemoryContextIsValid(header->context));
(gdb) where
#0  0x269bd0 in pfree (pointer=0x4b7878) at mcxt.c:446
#1  0x21844c in convert_string_datum (value=5251848, typid=1043)
    at selfuncs.c:2059
#2  0x217978 in convert_to_scalar (value=4947304, valuetypid=1043,
    scaledvalue=0xffbee0b8, lobound=5251848, hibound=4946632,
    boundstypid=1043, scaledlobound=0xffbee0a8, scaledhibound=0xffbee0b0)
    at selfuncs.c:1763
#3  0x214f8c in scalarineqsel (root=0x4aebe8, operator=1066, isgt=0 '\000',
    var=0x4b6218, other=0x4b76d8) at selfuncs.c:584
#4  0x21541c in scalarltsel (fcinfo=0xffbee258) at selfuncs.c:733
#5  0x25aa90 in DirectFunctionCall4 (func=0x215304 <scalarltsel>,
    arg1=4910056, arg2=1066, arg3=4947368, arg4=0) at fmgr.c:725
#6  0x2199f0 in prefix_selectivity (root=0x4aebe8, var=0x4b6218,
    prefix=0x4b7ce8 "SY") at selfuncs.c:2667
#7  0x215854 in patternsel (fcinfo=0xffbee518, ptype=Pattern_Type_Like)
    at selfuncs.c:872
#8  0x215a18 in likesel (fcinfo=0xffbee518) at selfuncs.c:913
#9  0x25c5e4 in OidFunctionCall4 (functionId=1819, arg1=4910056, arg2=1213,
    arg3=4941064, arg4=1) at fmgr.c:1218
#10 0x185128 in restriction_selectivity (root=0x4aebe8, operator=1213,
    args=0x4b6508, varRelid=1) at plancat.c:232
#11 0x167530 in clauselist_selectivity (root=0x4aebe8, clauses=0x4b7678,
    varRelid=1) at clausesel.c:156
#12 0x167394 in restrictlist_selectivity (root=0x4aebe8,
    restrictinfo_list=0x4b6958, varRelid=1) at clausesel.c:74
#13 0x16a044 in set_baserel_size_estimates (root=0x4aebe8, rel=0x4b6af8)
    at costsize.c:1146
#14 0x166ae0 in set_plain_rel_pathlist (root=0x4aebe8, rel=0x4b6af8,
    rte=0x4aec78) at allpaths.c:132
#15 0x166aa4 in set_base_rel_pathlists (root=0x4aebe8) at allpaths.c:115
#16 0x1667ec in make_one_rel (root=0x4aebe8) at allpaths.c:62
#17 0x177708 in subplanner (root=0x4aebe8, flat_tlist=0x4b6a18,
    tuple_fraction=0) at planmain.c:238
#18 0x177544 in query_planner (root=0x4aebe8, tlist=0x4b5ed8, tuple_fraction=0)
    at planmain.c:126
#19 0x17939c in grouping_planner (parse=0x4aebe8, tuple_fraction=0)
    at planner.c:1094
#20 0x177d70 in subquery_planner (parse=0x4aebe8, tuple_fraction=-1)
    at planner.c:228
#21 0x177a2c in planner (parse=0x4aebe8) at planner.c:94
#22 0x1c821c in pg_plan_query (querytree=0x4aebe8) at postgres.c:513
#23 0x1c871c in pg_exec_query_string (
    query_string=0x4ae278 "SELECT find0.userId AS userId, find0.longValue AS findLongValue0 FROM userData find0 WHERE
find0.groupName='user'AND find0.attributeName LIKE 'login%' AND find0.value LIKE 'SY%'", dest=Remote,  
    parse_context=0x464598) at postgres.c:784
#24 0x1ca63c in PostgresMain (argc=4, argv=0xffbef018,
    username=0x4607e1 "mats") at postgres.c:1926
#25 0x18bab0 in DoBackend (port=0x4606b0) at postmaster.c:2243
#26 0x18af48 in BackendStartup (port=0x4606b0) at postmaster.c:1874
#27 0x189548 in ServerLoop () at postmaster.c:995
#28 0x188d18 in PostmasterMain (argc=1, argv=0x447db0) at postmaster.c:771
#29 0x143ebc in main (argc=1, argv=0xffbefacc) at main.c:206
(gdb) up
#1  0x21844c in convert_string_datum (value=5251848, typid=1043)
    at selfuncs.c:2059
2059            pfree(val);
(gdb) print val
$1 = 0x4b7878 "D1BFD67F71192ECE"
(gdb) print xfrmstr
$2 = 0x4b78d8
"\001R\0014\001P\001T\001R\0019\001:\001T\001:\0014\0014\001<\0015\001S\001Q\001S\001\001\001S\001Q\001S\0015\001<\0014\0014\001:\001T\001:\0019\001R\001T\001P\0014\001R\001\001\001R\0014\001P\001T\001R\0019\001:\001T\001:\0014\0014\001<\0015\001S\001Q\001S\001\001"
(gdb) print xfrmsize
$3 = 48
(gdb) print xfrmlen
$4 = 102
(gdb) print *(varattrib *)(value)
$5 = {va_header = 20, va_content = {va_compressed = {va_rawsize = 1144078918,
      va_data = "D"}, va_external = {va_rawsize = 1144078918,
      va_extsize = 1144403782, va_valueid = 925970745,
      va_toastrelid = 843400005}, va_data = "D"}}
(gdb) print (char *)((varattrib *)(value))->va_content.va_data
$6 = 0x50230c "D1BFD67F71192ECE~", '\177' <repeats 183 times>...
(gdb) list
2054                    /* Oops, didn't make it */
2055                    pfree(xfrmstr);
2056                    xfrmstr = (char *) palloc(xfrmlen + 1);
2057                    xfrmlen = strxfrm(xfrmstr, val, xfrmlen + 1);
2058            }
2059            pfree(val);
2060            val = xfrmstr;
2061    #endif
2062
2063            return (unsigned char *) val;
(gdb) down
#0  0x269bd0 in pfree (pointer=0x4b7878) at mcxt.c:446
446             AssertArg(MemoryContextIsValid(header->context));
(gdb) print header
$7 = (StandardChunkHeader *) 0x4b7868
(gdb) print *header
$8 = {context = 0x15246b8, size = 32, requested_size = 17}
(gdb)


Please let me know if there is more info I can get out of
gdb to track this down.

      _
Mats Lofkvist
mal@algonet.se

Re: 7.2.1 backend crash (convert_string_datum, locale)

От
Tom Lane
Дата:
Mats Lofkvist <mal@algonet.se> writes:
> When testing postgres 7.2.1 on a sparc/solaris8 box with
> --enable-locale --enable-multibyte I get a crash in
> convert_string_datum.

This smells like a problem that we chased down awhile back, that
snprintf on Solaris is broken (it will write past the end of the
specified buffer length, thus corrupting adjacent data).

Andrew, I think that was your test case we found it on.  Do you
recall if a fix is available from Sun?

            regards, tom lane

Re: 7.2.1 backend crash (convert_string_datum, locale)

От
Bruce Momjian
Дата:
Tom Lane wrote:
> Mats Lofkvist <mal@algonet.se> writes:
> > When testing postgres 7.2.1 on a sparc/solaris8 box with
> > --enable-locale --enable-multibyte I get a crash in
> > convert_string_datum.
>
> This smells like a problem that we chased down awhile back, that
> snprintf on Solaris is broken (it will write past the end of the
> specified buffer length, thus corrupting adjacent data).
>
> Andrew, I think that was your test case we found it on.  Do you
> recall if a fix is available from Sun?

Yes, I remember this too.  It was specifically multibyte-related.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026

Re: 7.2.1 backend crash (convert_string_datum, locale)

От
Andrew Sullivan
Дата:
On Thu, Jul 11, 2002 at 11:15:42PM -0400, Tom Lane wrote:
> Mats Lofkvist <mal@algonet.se> writes:
> > When testing postgres 7.2.1 on a sparc/solaris8 box with
> > --enable-locale --enable-multibyte I get a crash in
> > convert_string_datum.
>
> This smells like a problem that we chased down awhile back, that
> snprintf on Solaris is broken (it will write past the end of the
> specified buffer length, thus corrupting adjacent data).

It does indeed.  This was only the 64-bit library, though, or at
least as far as we were able to tell.  And I wasn't able to turn up
any evidence that it happened on Solaris 8.  But it might.  We don't
use 8, at least not yet.

> Andrew, I think that was your test case we found it on.  Do you
> recall if a fix is available from Sun?

Not as far as I know, at least for 7.  Come to think of it, I now
_do_ recall seeing something in my various Google wanderings which
suggested that there is a fix in one of the patch packages for
Solaris 8 (which suggests the buggy library is in the basic Solaris 8
install).  I dimly recall some mention of incompatibility between it
and some other patchlevel, as well, so it might require some digging.
(Given that it's really a bounds mistake in a system library, you'd
think that it'd be easier to find more information about it; I
actually learned almost everything I know about the problem from,
IIRC, the autoconf web pages, so I'd not expect a cursory search of
Sun's site to turn anything up.)

In the FAQ_Solaris, there is a suggestion to use the substitute
function included in the Postgres tree (which is what you suggested,
Tom, and what I did), as well as instructions on how to do it.  It
definitely works for me on Solaris 7.  Might be worth trying on 8 as
well.  If so, the FAQ should be updated so as not to limit the
discussion to Solaris 7 and earlier.

Sorry I can't be more help than this.

A

--
----
Andrew Sullivan                               87 Mowat Avenue
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M6K 3E3
                                         +1 416 646 3304 x110

Re: 7.2.1 backend crash (convert_string_datum, locale)

От
Mats Lofkvist
Дата:
andrew@libertyrms.info (Andrew Sullivan) writes:
> On Thu, Jul 11, 2002 at 11:15:42PM -0400, Tom Lane wrote:
> > Mats Lofkvist <mal@algonet.se> writes:
> > > When testing postgres 7.2.1 on a sparc/solaris8 box with
> > > --enable-locale --enable-multibyte I get a crash in
> > > convert_string_datum.
> >
> > This smells like a problem that we chased down awhile back, that
> > snprintf on Solaris is broken (it will write past the end of the
> > specified buffer length, thus corrupting adjacent data).
>
> It does indeed.  This was only the 64-bit library, though, or at
> least as far as we were able to tell.  And I wasn't able to turn up
> any evidence that it happened on Solaris 8.  But it might.  We don't
> use 8, at least not yet.
>
> > Andrew, I think that was your test case we found it on.  Do you
> > recall if a fix is available from Sun?
>
> Not as far as I know, at least for 7.  Come to think of it, I now
> _do_ recall seeing something in my various Google wanderings which
> suggested that there is a fix in one of the patch packages for
> Solaris 8 (which suggests the buggy library is in the basic Solaris 8
> install).  I dimly recall some mention of incompatibility between it
> and some other patchlevel, as well, so it might require some digging.
> (Given that it's really a bounds mistake in a system library, you'd
> think that it'd be easier to find more information about it; I
> actually learned almost everything I know about the problem from,
> IIRC, the autoconf web pages, so I'd not expect a cursory search of
> Sun's site to turn anything up.)
>
> In the FAQ_Solaris, there is a suggestion to use the substitute
> function included in the Postgres tree (which is what you suggested,
> Tom, and what I did), as well as instructions on how to do it.  It
> definitely works for me on Solaris 7.  Might be worth trying on 8 as
> well.  If so, the FAQ should be updated so as not to limit the
> discussion to Solaris 7 and earlier.

I didn't get it to work with the stuff in FAQ_Solaris (can't
guarantee I really got snprintf substituted though, just
followed the instructions and recompiled).

Removing --enable-multibyte didn't help either.

Without neither --enable-locale or --enable-multibyte it
seems to work, but as I had to create a new database when
removing locale any problems local to the first database
are not seen anymore.

Is postgres 8-bit clean without locale support enabled?
(I don't care about sort orders and such, only need to
read/write 8-bit chars via jdbc).

      _
Mats Lofkvist
mal@algonet.se

Re: 7.2.1 backend crash (convert_string_datum, locale)

От
Tom Lane
Дата:
Mats Lofkvist <mal@algonet.se> writes:
> Without neither --enable-locale or --enable-multibyte it
> seems to work, but as I had to create a new database when
> removing locale any problems local to the first database
> are not seen anymore.

Hm.  If the database is already corrupt then simply recompiling
a corrected binary isn't going to magically make things perfect.
Maybe you should retry the snprintf patch and/or --enable-multibyte
using fresh databases.

> Is postgres 8-bit clean without locale support enabled?
> (I don't care about sort orders and such, only need to
> read/write 8-bit chars via jdbc).

In that case you don't really need locale, no.  Not sure about
whether you need multibyte; does JDBC expect Unicode support?

            regards, tom lane