Обсуждение: master plpython check fails on Solaris 10

Поиск
Список
Период
Сортировка

master plpython check fails on Solaris 10

От
Marina Polyakova
Дата:
Hello, hackers! I got a permanent failure of master (commit 
ebdb42a0d6a61b93a5bb9f4204408edf5959332c) plpython check on Solaris 10. 
Regression output and diffs are attached.

I used the following commands:
./configure CC="ccache gcc" CFLAGS="-m64 -I/opt/csw/include" 
PKG_CONFIG_PATH="/opt/csw/lib/pkgconfig:/usr/local/lib/pkgconfig" 
LDFLAGS="-L/opt/csw/lib/sparcv9 -L/usr/local/lib/64" --enable-cassert 
--enable-debug --enable-nls --enable-tap-tests --with-perl --with-tcl 
--with-python --with-gssapi --with-openssl --with-ldap --with-libxml 
--with-libxslt --with-icu
gmake > make_results.txt
gmake -C src/pl/plpython check

Binary search has shown that this failure begins with commit 
8561e4840c81f7e345be2df170839846814fa004 (Transaction control in PL 
procedures.). On the previous commit 
(b9ff79b8f17697f3df492017d454caa9920a7183) there's no 
plpython_transaction test and plpython check passes.

About the system: SunOS, Release 5.10, KernelID Generic_141444-09.

P.S. It seems that there's a similar failure on Windows, and I'm trying 
to reproduce it..

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения

Re: master plpython check fails on Solaris 10

От
Peter Eisentraut
Дата:
On 2/13/18 05:40, Marina Polyakova wrote:
> Binary search has shown that this failure begins with commit 
> 8561e4840c81f7e345be2df170839846814fa004 (Transaction control in PL 
> procedures.). On the previous commit 
> (b9ff79b8f17697f3df492017d454caa9920a7183) there's no 
> plpython_transaction test and plpython check passes.

OK, can you get some kind of stack trace or other debugging information?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: master plpython check fails on Solaris 10

От
Marina Polyakova
Дата:
On 14-02-2018 3:43, Peter Eisentraut wrote:
> On 2/13/18 05:40, Marina Polyakova wrote:
>> Binary search has shown that this failure begins with commit
>> 8561e4840c81f7e345be2df170839846814fa004 (Transaction control in PL
>> procedures.). On the previous commit
>> (b9ff79b8f17697f3df492017d454caa9920a7183) there's no
>> plpython_transaction test and plpython check passes.
> 
> OK, can you get some kind of stack trace or other debugging 
> information?

I got this backtrace from gdb:

Program received signal SIGSEGV, Segmentation fault.
0xffffffff7d13bb50 in strlen () from /lib/64/libc.so.1
(gdb) bt
#0  0xffffffff7d13bb50 in strlen () from /lib/64/libc.so.1
#1  0xffffffff7d1aabf8 in _ndoprnt () from /lib/64/libc.so.1
#2  0xffffffff7d1ad3d4 in vsnprintf () from /lib/64/libc.so.1
#3  0x00000001009ab174 in pvsnprintf (
     buf=0x100d680e3 "PL/Python function \"", '\177' <repeats 180 
times>..., len=933,
     fmt=0x100d684a0 "PL/Python function \"%s\"", 
args=0xffffffff7fff77f8) at psprintf.c:121
#4  0x0000000100492bd0 in appendStringInfoVA (str=0xffffffff7fff7738,
     fmt=0x100d684a0 "PL/Python function \"%s\"", 
args=0xffffffff7fff77f8) at stringinfo.c:130
#5  0x0000000100911038 in errcontext_msg (fmt=0xffffffff7aa1eb98 
"PL/Python function \"%s\"")
     at elog.c:1021
#6  0xffffffff7aa11ea8 in plpython_error_callback (arg=0x100e7d7e8) at 
plpy_main.c:402
#7  0x000000010090e9f8 in errfinish (dummy=0) at elog.c:438
#8  0x0000000100482e78 in SPI_commit () at spi.c:211
#9  0xffffffff7aa13e90 in PLy_commit (self=0x0, args=0x0) at 
plpy_plpymodule.c:602
#10 0xffffffff71a1ca40 in PyEval_EvalFrameEx (f=0xffffffff70619548, 
throwflag=2147449896)
     at Python/ceval.c:4004
#11 0xffffffff71a1d8a0 in PyEval_EvalFrameEx (f=0xffffffff706193a0, 
throwflag=2147450456)
     at Python/ceval.c:4106
#12 0xffffffff71a1ed20 in PyEval_EvalCodeEx (co=0xffffffff7060e6b0, 
globals=0xffffffff705236b4,
     locals=0xffffffff71bf9940 <_PyThreadState_Current>, args=0x0, 
argcount=0, kws=0x0, kwcount=0,
     defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3252
#13 0xffffffff71a1ef64 in PyEval_EvalCode (co=0xffffffff7060e6b0, 
globals=0xffffffff70525c58,
     locals=0xffffffff70525c58) at Python/ceval.c:667
#14 0xffffffff7aa10938 in PLy_procedure_call (proc=0xffffffff7fff8580,
     kargs=0xffffffff7aa1e128 "args", vargs=0xffffffff7060d290) at 
plpy_exec.c:1031
#15 0xffffffff7aa0cf08 in PLy_exec_function (fcinfo=0xffffffff7fff86f0, 
proc=0xffffffff7fff8580)
     at plpy_exec.c:107
#16 0xffffffff7aa11c64 in plpython_inline_handler 
(fcinfo=0xffffffff7fff8be0) at plpy_main.c:353
#17 0x000000010091e44c in OidFunctionCall1Coll (functionId=16385, 
collation=0, arg1=4311287992)
     at fmgr.c:1327
#18 0x00000001003746ec in ExecuteDoStmt (stmt=0x100f8fa70, atomic=1 
'\001') at functioncmds.c:2183
#19 0x00000001006f420c in standard_ProcessUtility (pstmt=0x100f90768,
     queryString=0xffffffff70528054 "DO LANGUAGE plpythonu $x$ 
plpy.commit() $x$",
     context=PROCESS_UTILITY_QUERY, params=0x0, queryEnv=0x0, 
dest=0x100cd6708 <spi_printtupDR>,
     completionTag=0xffffffff7fff93f8 "") at utility.c:533
#20 0x00000001006f3b04 in ProcessUtility (pstmt=0x100f90768,
     queryString=0xffffffff70528054 "DO LANGUAGE plpythonu $x$ 
plpy.commit() $x$",
     context=PROCESS_UTILITY_QUERY, params=0x0, queryEnv=0x0, 
dest=0x100cd6708 <spi_printtupDR>,
     completionTag=0xffffffff7fff93f8 "") at utility.c:358
#21 0x00000001004886e0 in _SPI_execute_plan (plan=0xffffffff7fff95b0, 
paramLI=0x0, snapshot=0x0,
     crosscheck_snapshot=0x0, read_only=0 '\000', fire_triggers=1 '\001', 
tcount=0) at spi.c:2207
#22 0x0000000100483980 in SPI_execute (
     src=0xffffffff70528054 "DO LANGUAGE plpythonu $x$ plpy.commit() 
$x$", read_only=0 '\000',
     tcount=0) at spi.c:416
#23 0xffffffff7aa17698 in PLy_spi_execute_query (
     query=0xffffffff70528054 "DO LANGUAGE plpythonu $x$ plpy.commit() 
$x$", limit=0)
     at plpy_spi.c:331
#24 0xffffffff7aa16c3c in PLy_spi_execute (self=0x0, 
args=0xffffffff7051fb50) at plpy_spi.c:168
#25 0xffffffff719af824 in PyCFunction_Call (func=0xffffffff7062a098, 
arg=0xffffffff7051fb50, kw=0x0)
     at Objects/methodobject.c:116
#26 0xffffffff71a1cfb0 in PyEval_EvalFrameEx (f=0xffffffff7af29c20, 
throwflag=2147457704)
     at Python/ceval.c:4020
#27 0xffffffff71a1d8a0 in PyEval_EvalFrameEx (f=0xffffffff706191f8, 
throwflag=2147458264)
     at Python/ceval.c:4106
#28 0xffffffff71a1ed20 in PyEval_EvalCodeEx (co=0xffffffff7072cf30, 
globals=0xffffffff70523624,
     locals=0xffffffff71bf9940 <_PyThreadState_Current>, args=0x0, 
argcount=0, kws=0x0, kwcount=0,
     defs=0x0, defcount=0, closure=0x0) at Python/ceval.c:3252
#29 0xffffffff71a1ef64 in PyEval_EvalCode (co=0xffffffff7072cf30, 
globals=0xffffffff70525910,
     locals=0xffffffff70525910) at Python/ceval.c:667
---Type <return> to continue, or q <return> to quit---
#30 0xffffffff7aa10938 in PLy_procedure_call (proc=0x100f91278, 
kargs=0xffffffff7aa1e128 "args",
     vargs=0xffffffff7060d0e0) at plpy_exec.c:1031
#31 0xffffffff7aa0cf08 in PLy_exec_function (fcinfo=0x100e7d8d0, 
proc=0x100f91278)
     at plpy_exec.c:107
#32 0xffffffff7aa11710 in plpython_call_handler (fcinfo=0x100e7d8d0) at 
plpy_main.c:271
#33 0x0000000100414ad4 in ExecInterpExpr (state=0x100e7d7e8, 
econtext=0x100e7d498,
     isnull=0xffffffff7fffa987 "\350") at execExprInterp.c:652
#34 0x0000000100416d78 in ExecInterpExprStillValid (state=0x100e7d7e8, 
econtext=0x100e7d498,
     isNull=0xffffffff7fffa987 "\350") at execExprInterp.c:1789
#35 0x000000010046c244 in ExecEvalExprSwitchContext (state=0x100e7d7e8, 
econtext=0x100e7d498,
     isNull=0xffffffff7fffa987 "\350") at 
../../../src/include/executor/executor.h:307
#36 0x000000010046c2b4 in ExecProject (projInfo=0x100e7d7e0)
     at ../../../src/include/executor/executor.h:341
#37 0x000000010046c580 in ExecResult (pstate=0x100e7d380) at 
nodeResult.c:136
#38 0x000000010042efa8 in ExecProcNodeFirst (node=0x100e7d380) at 
execProcnode.c:446
#39 0x0000000100421e74 in ExecProcNode (node=0x100e7d380)
     at ../../../src/include/executor/executor.h:246
#40 0x0000000100425e0c in ExecutePlan (estate=0x100e7d168, 
planstate=0x100e7d380,
     use_parallel_mode=0 '\000', operation=CMD_SELECT, sendTuples=1 
'\001', numberTuples=0,
     direction=ForwardScanDirection, dest=0x100d905b0, execute_once=1 
'\001') at execMain.c:1721
#41 0x0000000100422800 in standard_ExecutorRun (queryDesc=0x100d8ee48,
     direction=ForwardScanDirection, count=0, execute_once=1 '\001') at 
execMain.c:361
#42 0x00000001004224e8 in ExecutorRun (queryDesc=0x100d8ee48, 
direction=ForwardScanDirection,
     count=0, execute_once=1 '\001') at execMain.c:304
#43 0x00000001006f14d0 in PortalRunSelect (portal=0x100dcf218, forward=1 
'\001', count=0,
     dest=0x100d905b0) at pquery.c:932
#44 0x00000001006f0f54 in PortalRun (portal=0x100dcf218, 
count=9223372036854775807,
     isTopLevel=1 '\001', run_once=1 '\001', dest=0x100d905b0, 
altdest=0x100d905b0,
     completionTag=0xffffffff7fffb198 "") at pquery.c:773
#45 0x00000001006e5f78 in exec_simple_query (query_string=0x100d6a628 
"SELECT transaction_test4();")
     at postgres.c:1120
#46 0x00000001006ee044 in PostgresMain (argc=1, argv=0x100d95750, 
dbname=0x100d95728 "postgres",
     username=0x100d95700 "buildfarm") at postgres.c:4144
#47 0x00000001005e4398 in BackendRun (port=0x100d8bbe0) at 
postmaster.c:4412
#48 0x00000001005e3504 in BackendStartup (port=0x100d8bbe0) at 
postmaster.c:4084
#49 0x00000001005dbbfc in ServerLoop () at postmaster.c:1757
#50 0x00000001005da99c in PostmasterMain (argc=5, argv=0x100d63ea0) at 
postmaster.c:1365
#51 0x00000001004b9c4c in main (argc=5, argv=0x100d63ea0) at main.c:228
(gdb)

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: master plpython check fails on Solaris 10

От
Tom Lane
Дата:
Marina Polyakova <m.polyakova@postgrespro.ru> writes:
> On 14-02-2018 3:43, Peter Eisentraut wrote:
>> OK, can you get some kind of stack trace or other debugging 
>> information?

> I got this backtrace from gdb:

Hmm, so the only question in my mind is how did this ever work for anyone?

The basic problem here is that, instead of using the
ErrorContextCallback.arg field provided for the purpose,
plpython_error_callback is using PLy_current_execution_context()
to try to identify the context it's supposed to report on.
In the example, that points to the context associated with the
inner DO block, not with the outer procedure.  That context
looks like it should reliably have curr_proc->proname == NULL,
so how come this test case doesn't crash for everybody?

In any case the expected output for the transaction_test4
case is obviously wrong.  Rather than reporting the transaction_test4
function and then the inner DO block, it's reporting 2 levels
of transaction_test4.  That seems to trace directly to both levels
of error context callback looking at the same execution context.

I think we need to fix the error callback code so that it
uses the "arg" field to find the relevant procedure, and that
that's a back-patchable fix, because nested plpython functions
would show this up as wrong in any case.  That would also let us
undo the not-terribly-safe practice of having code between the
PLy_push_execution_context call and the PG_TRY that is supposed
to ensure the context gets popped again.

While I'm bitching ... I wonder how long it's been since the
comment for PLy_procedure_name() had anything to do with its
actual behavior.

            regards, tom lane


Re: master plpython check fails on Solaris 10

От
Tom Lane
Дата:
I wrote:
> so how come this test case doesn't crash for everybody?

I traced through this, and what I see is that the error information
constructed at the time of the inner ereport includes

    0x1f98528 "invalid transaction termination", detail = 0x0, detail_log =
    0x0, hint = 0x0, context =
    0x1f98598 "PL/Python anonymous code block\nSQL statement \"DO LANGUAGE plpythonu $x$ plpy.commit() $x$\"\nPL/Python
function\"(null)\"", message_id =  
    0xa0fc50 "invalid transaction termination", schema_name = 0x0,

So, in fact, we *are* passing a null pointer to sprintf here.
Apparently, none of the machines in the buildfarm crash on that,
or at least none of the ones building plpython do, which is
pretty troubling from a test coverage standpoint.  I suggest that
it might be a good idea to insert an "Assert(strvalue != NULL)"
in src/port/snprintf.c, approx. line 748, so that at least the
machines using that fallback implementation are guaranteed to
whine about this type of mistake.

The reason we don't see this obviously-bogus output in the test
results is that when control exits the outer Python function,
the inner error result is thrown away during
PLy_abort_open_subtransactions, and then we generate an all-new error
report with "PLy_elog(ERROR, NULL)".  That contains only Python's
own traceback plus the output from a (duplicate) call of
plpython_error_callback, which now gets the right answer because
PLy_execution_contexts is now pointing at the outer function's
context.

So I'm still thinking I can construct a test case in which plpython
outputs a clearly-wrong context stack, but it will have to involve
elog(NOTICE) rather than ERROR.  Will work on that.

            regards, tom lane


Re: master plpython check fails on Solaris 10

От
Marina Polyakova
Дата:
On 14-02-2018 17:54, Tom Lane wrote:
> Marina Polyakova <m.polyakova@postgrespro.ru> writes:
>> On 14-02-2018 3:43, Peter Eisentraut wrote:
>>> OK, can you get some kind of stack trace or other debugging
>>> information?
> 
>> I got this backtrace from gdb:
> 
> Hmm, so the only question in my mind is how did this ever work for 
> anyone?
> 
> The basic problem here is that, instead of using the
> ErrorContextCallback.arg field provided for the purpose,
> plpython_error_callback is using PLy_current_execution_context()
> to try to identify the context it's supposed to report on.
> In the example, that points to the context associated with the
> inner DO block, not with the outer procedure.  That context
> looks like it should reliably have curr_proc->proname == NULL,
> so how come this test case doesn't crash for everybody?
> 
> In any case the expected output for the transaction_test4
> case is obviously wrong.  Rather than reporting the transaction_test4
> function and then the inner DO block, it's reporting 2 levels
> of transaction_test4.  That seems to trace directly to both levels
> of error context callback looking at the same execution context.
> 
> I think we need to fix the error callback code so that it
> uses the "arg" field to find the relevant procedure, and that
> that's a back-patchable fix, because nested plpython functions
> would show this up as wrong in any case.  That would also let us
> undo the not-terribly-safe practice of having code between the
> PLy_push_execution_context call and the PG_TRY that is supposed
> to ensure the context gets popped again.

Thank you very much! I'll try to implement this.

> While I'm bitching ... I wonder how long it's been since the
> comment for PLy_procedure_name() had anything to do with its
> actual behavior.

AFAIU this has always been like that, since the commit 1ca717f3...

-- 
Marina Polyakova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Re: master plpython check fails on Solaris 10

От
Tom Lane
Дата:
Marina Polyakova <m.polyakova@postgrespro.ru> writes:
> On 14-02-2018 17:54, Tom Lane wrote:
>> I think we need to fix the error callback code so that it
>> uses the "arg" field to find the relevant procedure, and that
>> that's a back-patchable fix, because nested plpython functions
>> would show this up as wrong in any case.  That would also let us
>> undo the not-terribly-safe practice of having code between the
>> PLy_push_execution_context call and the PG_TRY that is supposed
>> to ensure the context gets popped again.

> Thank you very much! I'll try to implement this.

I hsve a patch mostly done already.

            regards, tom lane