Postgres dies in the rules regression test (64-bit problem)

Поиск
Список
Период
Сортировка
От Pedro J. Lobo
Тема Postgres dies in the rules regression test (64-bit problem)
Дата
Msg-id Pine.OSF.4.05.9906100920390.19752-100000@haddock.euitt.upm.es
обсуждение исходный текст
Ответы Re: [HACKERS] Postgres dies in the rules regression test (64-bit problem)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Hi, all.

I have found what seems to be a 64/32-bit problem while testing the latest
cvsup'ed source on an alpha running Digital Unix 4.0d. After fixing a bug
in the 'money' type that prevented the rules regression test from working
(I sent a patch to pgsql-patches but was told by Bruce that it will have
to wait after the release), the test ran up to a point where the backend
died dumping core. Here is what I've seen using gdb:

pgbeta:nodes> gdb /usr/local/pgsql.beta/bin/postgres
/usr/local/pgsql.beta/data/base/regression/core 
GDB is free software and you are welcome to distribute copies of itunder certain conditions; type "show copying" to see
theconditions.
 
There is absolutely no warranty for GDB; type "show warranty" for details.
GDB 4.16 (alpha-dec-osf3.2), Copyright 1996 Free Software Foundation, Inc...
Core was generated by `postgres'.
Program terminated with signal 11, Segmentation fault.
Reading symbols from /usr/shlib/libm.so...done.
Reading symbols from /usr/shlib/libcurses.so...done.
Reading symbols from /usr/shlib/libc.so...done.
Reading symbols from /usr/lib/nls/loc//es_ES.ISO8859-1...done.
#0  replace_opid (oper=0x4015aad0) at nodeFuncs.c:95
95              oper->opid = get_opcode(oper->opno);
(gdb) where
#0  replace_opid (oper=0x4015aad0) at nodeFuncs.c:95
#1  0x1201208b0 in fix_opid (clause=0x14015aaa0) at clauses.c:554
#2  0x12011e214 in preprocess_targetlist (tlist=0x14015a4c0, command_type=2,    result_relation=5,
range_table=0x14015b1b0)at preptlist.c:84
 
#3  0x120118808 in union_planner (parse=0x14015a100) at planner.c:162
#4  0x1201185d4 in planner (parse=0x14015a100) at planner.c:83
#5  0x120159d90 in pg_parse_and_plan (   query_string=0x11fffa918 "update rtest_v1 set a = rtest_t3.a + 20
where b = rtest_t3.b;", typev=0x0, nargs=0, queryListP=0x11fffa868,
dest=Remote,    aclOverride=0 '\000') at postgres.c:590
#6  0x12015a034 in pg_exec_query_dest (   query_string=0x11fffa918 "update rtest_v1 set a = rtest_t3.a + 20
where b = rtest_t3.b;", dest=Remote, aclOverride=0 '\000') at
postgres.c:678
#7  0x120159f80 in pg_exec_query (   query_string=0x11fffa918 "update rtest_v1 set a = rtest_t3.a + 20
where b = rtest_t3.b;") at postgres.c:656
#8  0x12015baa0 in PostgresMain (argc=10, argv=0x11fffee90, real_argc=9,    real_argv=0x11ffffc28) at postgres.c:1658
#9  0x12012d02c in DoBackend (port=0x1400d9a00) at postmaster.c:1628
#10 0x12012c778 in BackendStartup (port=0x1400d9a00) at postmaster.c:1373
#11 0x12012b5d8 in ServerLoop () at postmaster.c:823
#12 0x12012ae00 in PostmasterMain (argc=9, argv=0x11ffffc28)   at postmaster.c:616
#13 0x1200e0b30 in main (argc=9, argv=0x11ffffc28) at main.c:93
(gdb) 

Although my knowledge of postgres internals is null, I have done a bit of
investigation. 'replace_opid' is called from 'fix_opid', line 554 of file 
backend/optimizer/utils/clauses.c:

replace_opid((Oper *) ((Expr *) clause)->oper);

'clause' is a pointer to Node. The actual value is tagged as T_Expr, so it
seems to be used right. If you look at the contents of 'clause':

(gdb) p *((Expr *) clause)
$3 = {type = T_Expr, typeOid = 23, opType = OP_EXPR, oper = 0x4015aad0,  args = 0x14015ab30}

Here is the problem. ((Expr*) clause)->oper is a pointer to Node, which
(from the name of the field) I think that should be tagged as T_Oper. But,
if you look carefully, it has the same value as 'clause' but *truncated to
32 bits*. This is a problem that I've seen many times when you store a
pointer in an int (which still is 32 bits long in an alpha) and later you
use it again as a pointer.

I don't know if ((Expr*) clause)->oper should point to itself as it seems
to do, but certainly its value is passed though an int variable and is
truncated.

If someone points me to the right place to look, I can play a bit more
with gdb and try to find the cause. You can find the query that crashes
the backend at the stack trace above.

Cheers,
Pedro.

-- 
-------------------------------------------------------------------
Pedro José Lobo Perea                   Tel:    +34 91 336 78 19
Centro de Cálculo                       Fax:    +34 91 331 92 29
E.U.I.T. Telecomunicación               e-mail: pjlobo@euitt.upm.es
Universidad Politécnica de Madrid
Ctra. de Valencia, Km. 7                E-28031 Madrid - España / Spain



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Philip Warner
Дата:
Сообщение: Re: [HACKERS] Aggregates with context - a question
Следующее
От: wieck@debis.com (Jan Wieck)
Дата:
Сообщение: Re: Real Programmers (was: [HACKERS] Priorities for 6.6)