Обсуждение: Infinite loop in transformExpr()

Поиск
Список
Период
Сортировка

Infinite loop in transformExpr()

От
Fernando Schapachnik
Дата:
I've stumbled upon what seems to be a core-dumping infinite recursion
in transformExpr(), on 8.1.6.

Backtrace:

Core was generated by `postgres'.
Program terminated with signal 10, Bus error.
Reading symbols from /usr/lib/libssl.so.3...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libssl.so.3
Reading symbols from /lib/libcrypto.so.3...(no debugging symbols
found)...done.
Loaded symbols for /lib/libcrypto.so.3
Reading symbols from /lib/libz.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib/libz.so.2
Reading symbols from /lib/libreadline.so.5...(no debugging symbols
found)...done.
Loaded symbols for /lib/libreadline.so.5
Reading symbols from /lib/libcrypt.so.2...(no debugging symbols
found)...done.
Loaded symbols for /lib/libcrypt.so.2
Reading symbols from /lib/libm.so.3...(no debugging symbols
found)...done.
Loaded symbols for /lib/libm.so.3
Reading symbols from /lib/libutil.so.4...(no debugging symbols
found)...done.
Loaded symbols for /lib/libutil.so.4
Reading symbols from /lib/libc.so.5...(no debugging symbols
found)...done.
Loaded symbols for /lib/libc.so.5
Reading symbols from /lib/libncurses.so.5...(no debugging symbols
found)...done.
Loaded symbols for /lib/libncurses.so.5
Reading symbols from /usr/local/lib/postgresql/dblink.so...(no
debugging symbols found)...done.
Loaded symbols for /usr/local/lib/postgresql/dblink.so
Reading symbols from /usr/local/lib/libpq.so.4...(no debugging symbols
found)...done.
Loaded symbols for /usr/local/lib/libpq.so.4
Reading symbols from /usr/lib/libpthread.so.1...(no debugging symbols
found)...done.
Loaded symbols for /usr/lib/libpthread.so.1
Reading symbols from /libexec/ld-elf.so.1...(no debugging symbols
found)...done.
Loaded symbols for /libexec/ld-elf.so.1

#0  0x080d5979 in transformExpr ()
#1  0x080d6700 in transformExpr ()
#2  0x080d5bbb in transformExpr ()
[...]
#21669 0x080d6700 in transformExpr ()
#21670 0x080d5bbb in transformExpr ()
#21671 0x080d669e in transformExpr ()
#21672 0x080d5ba5 in transformExpr ()
#21673 0x080d4f10 in transformWhereClause ()
#21674 0x080c13dd in parse_sub_analyze ()
#21675 0x080bf36f in parse_sub_analyze ()
#21676 0x080bf110 in parse_sub_analyze ()
#21677 0x080bf021 in parse_analyze ()
#21678 0x0818d949 in pg_analyze_and_rewrite ()
#21679 0x0818dd76 in pg_plan_queries ()
#21680 0x081908d5 in PostgresMain ()
#21681 0x0816e084 in ClosePostmasterPorts ()
#21682 0x0816d887 in ClosePostmasterPorts ()
#21683 0x0816bbcf in PostmasterMain ()
#21684 0x0816b5ed in PostmasterMain ()
#21685 0x0813376b in main ()

This is postgres 8.1.6 compiled from ports (with
--enable-thread-safety) on FreeBSD/i386 5.3 (gcc version 3.4.2
[FreeBSD] 20040728).

Should I file a bug report?

Thanks!

Fernando.

Re: Infinite loop in transformExpr()

От
Tom Lane
Дата:
Fernando Schapachnik <fschapachnik@mecon.gov.ar> writes:
> I've stumbled upon what seems to be a core-dumping infinite recursion
> in transformExpr(), on 8.1.6.

A test case would help.

            regards, tom lane

Re: Infinite loop in transformExpr()

От
Fernando Schapachnik
Дата:
En un mensaje anterior, Tom Lane escribió:
> Fernando Schapachnik <fschapachnik@mecon.gov.ar> writes:
> > I've stumbled upon what seems to be a core-dumping infinite recursion
> > in transformExpr(), on 8.1.6.
>
> A test case would help.

The culprit query looks like:
SELECT ...
FROM
    (SELECT ... FROM 3 tables
    WHERE join condition AND
        int_key IN (enumeration of aprox. 16000 values here)
    GROUP BY ...)
LEFT OUTER JOIN join condition GROUP BY ...;

(I can provide a more specific version if needed, but look below.)

A couple of strange things.

The query is executed in production via pgperl in a (FreeBSD 5.x)
server where:

# limit stacksize
stacksize    65536 kbytes
# psql -U pgsql template1 -c 'SHOW max_stack_depth'
 max_stack_depth
-----------------
 2048

Running the query in this scenario (reasonably) gives:
ERROR:  stack depth limit exceeded
HINT:  Increase the configuration parameter "max_stack_depth".

So I'm unsure why it explodes in production.

On a testing environment, however, setting max_stack_depth to 16000,
it efectively dumps core. The strange thing is, that while trying to
trim down the query, now I'm stuck with:

Fatal error 'Cannot allocate red zone for initial thread' at line 343
in file /usr/src/lib/libpthread/thread/thr_init.c (errno = 12)

(ie, the server works, I just can't get the original error again, not
even after restart or full/freeze vacuum).

The backtrace now gives:

(gdb) bt
#0  0x284eb37b in kill () from /lib/libc.so.5
#1  0x284e0422 in raise () from /lib/libc.so.5
#2  0x28552c1b in abort () from /lib/libc.so.5
#3  0x290b6a7c in pthread_testcancel () from /usr/lib/libpthread.so.1
#4  0x290b3067 in pthread_setconcurrency () from
/usr/lib/libpthread.so.1
#5  0x290b2e87 in pthread_setconcurrency () from
/usr/lib/libpthread.so.1
#6  0x290b627a in pthread_testcancel () from /usr/lib/libpthread.so.1
#7  0x290b740a in __error () from /usr/lib/libpthread.so.1
#8  0x2909e7ae in ?? () from /usr/lib/libpthread.so.1
#9  0x282a5845 in find_symdef () from /libexec/ld-elf.so.1
#10 0x282a61aa in dlopen () from /libexec/ld-elf.so.1
#11 0x08164d38 in BSD44_derived_dlopen ()
#12 0x081f9550 in load_external_function ()
#13 0x081fa06c in fmgr_info_cxt ()
#14 0x081f9f46 in fmgr_info_cxt ()
#15 0x081f9d62 in fmgr_info_cxt ()
#16 0x0811a0dc in init_fcache ()
#17 0x0811a742 in ExecMakeTableFunctionResult ()
#18 0x08125e67 in ExecReScanNestLoop ()
#19 0x0811db75 in ExecScan ()
#20 0x08125ee3 in ExecFunctionScan ()
#21 0x08119061 in ExecProcNode ()
#22 0x08126b8f in ExecSort ()
#23 0x081190c0 in ExecProcNode ()
#24 0x081251f3 in ExecMergeJoin ()
#25 0x08119087 in ExecProcNode ()
#26 0x08126b8f in ExecSort ()
#27 0x081190c0 in ExecProcNode ()
#28 0x08120f5c in ExecAgg ()
#29 0x08120ed5 in ExecAgg ()
#30 0x081190e6 in ExecProcNode ()
#31 0x08117b10 in ExecEndPlan ()
#32 0x08116fb0 in ExecutorRun ()
#33 0x08191f9d in PortalRun ()
#34 0x08191ccc in PortalRun ()
#35 0x0818e259 in pg_plan_queries ()
#36 0x08190dad in PostgresMain ()
#37 0x0816e41c in ClosePostmasterPorts ()
#38 0x0816dc13 in ClosePostmasterPorts ()
#39 0x0816bf07 in PostmasterMain ()
#40 0x0816b875 in PostmasterMain ()
#41 0x08133a0f in main ()


Thanks.

Fernando.

Re: Infinite loop in transformExpr()

От
Tom Lane
Дата:
Fernando Schapachnik <fernando@mecon.gov.ar> writes:
> En un mensaje anterior, Tom Lane escribi�:
>> Fernando Schapachnik <fschapachnik@mecon.gov.ar> writes:
>>> I've stumbled upon what seems to be a core-dumping infinite recursion
>>> in transformExpr(), on 8.1.6.
>>
>> A test case would help.

> The culprit query looks like:
>     WHERE join condition AND
>         int_key IN (enumeration of aprox. 16000 values here)

PG versions before 8.2 don't handle very long IN lists particularly
well.  This query will take a fair amount of stack space to parse, not
to mention an unreasonably long time to plan.  (You should consider
putting the 16000 values in a temp table and doing a join, instead.)

> Running the query in this scenario (reasonably) gives:
> ERROR:  stack depth limit exceeded
> HINT:  Increase the configuration parameter "max_stack_depth".
> So I'm unsure why it explodes in production.

Most likely, the production machine has a kernel-enforced stack limit
setting that is less than what "max_stack_depth" claims.  Up till recently
(8.2 I think), we didn't make any effort to verify that "max_stack_depth"
was set to a sane value.  If it's too high you will get crashes rather
than "stack depth limit exceeded", because overrunning the kernel limit
is typically treated as a SIGSEGV.

> (gdb) bt
> #0  0x284eb37b in kill () from /lib/libc.so.5
> #1  0x284e0422 in raise () from /lib/libc.so.5
> #2  0x28552c1b in abort () from /lib/libc.so.5
> #3  0x290b6a7c in pthread_testcancel () from /usr/lib/libpthread.so.1
> #4  0x290b3067 in pthread_setconcurrency () from
> /usr/lib/libpthread.so.1
> #5  0x290b2e87 in pthread_setconcurrency () from
> /usr/lib/libpthread.so.1
> #6  0x290b627a in pthread_testcancel () from /usr/lib/libpthread.so.1
> #7  0x290b740a in __error () from /usr/lib/libpthread.so.1
> #8  0x2909e7ae in ?? () from /usr/lib/libpthread.so.1
> #9  0x282a5845 in find_symdef () from /libexec/ld-elf.so.1
> #10 0x282a61aa in dlopen () from /libexec/ld-elf.so.1
> #11 0x08164d38 in BSD44_derived_dlopen ()
> #12 0x081f9550 in load_external_function ()
> #13 0x081fa06c in fmgr_info_cxt ()

Hm.  It would appear that you are loading some custom code that sucks
pthread support into the backend.  This is generally a bad idea in any
case, as the backend code is not designed for threaded operation.  But
the reason it seems relevant is that thread support often causes a
decrease in the effective stack limit (because it's slicing up the stack
area for use by multiple threads).  I'd suggest trying to fix the link
dependencies of your code to avoid sucking in libpthread.

            regards, tom lane

Re: Infinite loop in transformExpr()

От
Fernando Schapachnik
Дата:
En un mensaje anterior, Tom Lane escribió:
> PG versions before 8.2 don't handle very long IN lists particularly
> well.  This query will take a fair amount of stack space to parse, not
> to mention an unreasonably long time to plan.  (You should consider
> putting the 16000 values in a temp table and doing a join, instead.)

Thanks for the tip!

[...]

> Most likely, the production machine has a kernel-enforced stack limit
> setting that is less than what "max_stack_depth" claims.  Up till recently
> (8.2 I think), we didn't make any effort to verify that "max_stack_depth"
> was set to a sane value.  If it's too high you will get crashes rather
> than "stack depth limit exceeded", because overrunning the kernel limit
> is typically treated as a SIGSEGV.

[...]

> Hm.  It would appear that you are loading some custom code that sucks
> pthread support into the backend.  This is generally a bad idea in any

Not really. Only PLSQL and dblink. Anyway, my understanding is that
this should be already fixed in 8.2 and is not worth looking deeply,
right?

Thanks for your help.


Fernando.