Обсуждение: server resetting

Поиск
Список
Период
Сортировка

server resetting

От
Geoffrey
Дата:
Postgresql 7.4.7 (yes, I've been telling them we need to upgrade to the
latest 7.4)
Red Hat Enterprise Linux ES release 3

We are having problems with the postgresql server resetting and dropping
all user connections.  There is a core file generated and I've attached
a backtrace.  I'm about to dig into the source to see what I can find,
but if anyone can put their finger on the problem, I would appreciate
it.  I do realize that there is a call to exec_stmt() which appears to
have a null value being passed, which I suspect is the issue.  Why a
null is being passed is what I plan to look into.

Thanks for any info, here's the backtrace:

Using host libthread_db library "/lib/tls/libthread_db.so.1".
Core was generated by `postgres: bwoods exp [local] INSERT           '.
Program terminated with signal 11, Segmentation fault.
#0  exec_stmt (estate=0xfeff8a90, stmt=0x0) at pl_exec.c:928
         in pl_exec.c
#0  exec_stmt (estate=0xfeff8a90, stmt=0x0) at pl_exec.c:928
#1  0x0083f005 in exec_stmts (estate=0xfeff8a90, stmts=0x90fa9e0)
     at pl_exec.c:903
#2  0x0083f4f2 in exec_stmt_if (estate=0xfeff8a90, stmt=0x90fab78)
     at pl_exec.c:1139
#3  0x0083f0ca in exec_stmt (estate=0xfeff8a90, stmt=0x90fab78)
     at pl_exec.c:947
#4  0x0083f005 in exec_stmts (estate=0xfeff8a90, stmts=0x90fab90)
     at pl_exec.c:903
#5  0x0083f4f2 in exec_stmt_if (estate=0xfeff8a90, stmt=0x90fad20)
     at pl_exec.c:1139
#6  0x0083f0ca in exec_stmt (estate=0xfeff8a90, stmt=0x90fad20)
     at pl_exec.c:947
#7  0x0083f005 in exec_stmts (estate=0xfeff8a90, stmts=0x9133e60)
     at pl_exec.c:903
#8  0x0083f4f2 in exec_stmt_if (estate=0xfeff8a90, stmt=0x90d97b8)
     at pl_exec.c:1139
#9  0x0083f0ca in exec_stmt (estate=0xfeff8a90, stmt=0x90d97b8)
     at pl_exec.c:947
#10 0x0083f005 in exec_stmts (estate=0xfeff8a90, stmts=0x9118408)
     at pl_exec.c:903
#11 0x0083ee15 in exec_stmt_block (estate=0xfeff8a90, block=0x90d97e8)
     at pl_exec.c:859
#12 0x0083e77a in plpgsql_exec_trigger (func=0x9149ae0, trigdata=0xfeff8ca0)
     at pl_exec.c:645
#13 0x0083b053 in plpgsql_call_handler (fcinfo=0xfeff8b50) at
pl_handler.c:121
#14 0x080f1c8e in ExecCallTriggerFunc (trigdata=0xfeff8ca0, finfo=0x935e260,
     per_tuple_context=0x0) at trigger.c:1150
#15 0x080f2be7 in DeferredTriggerExecute (event=0x92af050, itemno=0,
rel=0x8,
     trigdesc=0x935daf0, finfo=0xfeff8a90, per_tuple_context=0x0)
     at trigger.c:1859
#16 0x080f2fee in deferredTriggerInvokeEvents (immediate_only=1 '\001')
     at trigger.c:2000
#17 0x080f314f in DeferredTriggerEndQuery () at trigger.c:2135
#18 0x08178ae8 in finish_xact_command () at postgres.c:1749
#19 0x08177816 in exec_simple_query (
     query_string=0x8fe2438 "INSERT INTO logs

(seq,level,event_code,event_date,event_time,city,province,user_id,est_dsp_date,est_dsp_time,country,edilate,carr_code,notes,trac_notes,order_num)

VALUES ('2','6','TAS','09/14/06','19:"...)
     at postgres.c:905
#20 0x08179f09 in PostgresMain (argc=4, argv=0x8f94b48,
     username=0x8f94ab8 "bwoods") at postgres.c:2871
#21 0x08153c90 in BackendFork (port=0x8fa6af0) at postmaster.c:2564
#22 0x08153683 in BackendStartup (port=0x8fa6af0) at postmaster.c:2207
#23 0x08151be8 in ServerLoop () at postmaster.c:1119
#24 0x081512ae in PostmasterMain (argc=5, argv=0x8f92688) at
postmaster.c:897
#25 0x08121163 in main (argc=5, argv=0xfeff9e44) at main.c:214



--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin

Re: server resetting

От
"Brandon Aiken"
Дата:
Maybe I'm restating the obvious, but it looks to me like the procedural
trigger from the SQL query "INSERT INTO logs
(seq,level,event_code,event_date,event_time,city,province,user_id,est_ds
p_date,est_dsp_time,country,edilate,carr_code,notes,trac_notes,order_num
)
VALUES ('2','6','TAS','09/14/06','19:"... is the culprit, probably 3-4
IF (or other conditional) statements in.  Check this trigger to see if
it handles NULLs correctly.


Looking at the change logs
(http://www.postgresql.org/docs/7.4/interactive/release.html) it looks
like there were significant fixes in 7.4.8.  It's possible that this is
a known bug that has already been fixed.

--
Brandon Aiken
CS/IT Systems Engineer
-----Original Message-----
From: pgsql-general-owner@postgresql.org
[mailto:pgsql-general-owner@postgresql.org] On Behalf Of Geoffrey
Sent: Monday, September 18, 2006 10:06 AM
To: PostgreSQL List
Subject: [GENERAL] server resetting

Postgresql 7.4.7 (yes, I've been telling them we need to upgrade to the
latest 7.4)
Red Hat Enterprise Linux ES release 3

We are having problems with the postgresql server resetting and dropping

all user connections.  There is a core file generated and I've attached
a backtrace.  I'm about to dig into the source to see what I can find,
but if anyone can put their finger on the problem, I would appreciate
it.  I do realize that there is a call to exec_stmt() which appears to
have a null value being passed, which I suspect is the issue.  Why a
null is being passed is what I plan to look into.

Thanks for any info, here's the backtrace:

Using host libthread_db library "/lib/tls/libthread_db.so.1".
Core was generated by `postgres: bwoods exp [local] INSERT           '.
Program terminated with signal 11, Segmentation fault.
#0  exec_stmt (estate=0xfeff8a90, stmt=0x0) at pl_exec.c:928
         in pl_exec.c
#0  exec_stmt (estate=0xfeff8a90, stmt=0x0) at pl_exec.c:928
#1  0x0083f005 in exec_stmts (estate=0xfeff8a90, stmts=0x90fa9e0)
     at pl_exec.c:903
#2  0x0083f4f2 in exec_stmt_if (estate=0xfeff8a90, stmt=0x90fab78)
     at pl_exec.c:1139
#3  0x0083f0ca in exec_stmt (estate=0xfeff8a90, stmt=0x90fab78)
     at pl_exec.c:947
#4  0x0083f005 in exec_stmts (estate=0xfeff8a90, stmts=0x90fab90)
     at pl_exec.c:903
#5  0x0083f4f2 in exec_stmt_if (estate=0xfeff8a90, stmt=0x90fad20)
     at pl_exec.c:1139
#6  0x0083f0ca in exec_stmt (estate=0xfeff8a90, stmt=0x90fad20)
     at pl_exec.c:947
#7  0x0083f005 in exec_stmts (estate=0xfeff8a90, stmts=0x9133e60)
     at pl_exec.c:903
#8  0x0083f4f2 in exec_stmt_if (estate=0xfeff8a90, stmt=0x90d97b8)
     at pl_exec.c:1139
#9  0x0083f0ca in exec_stmt (estate=0xfeff8a90, stmt=0x90d97b8)
     at pl_exec.c:947
#10 0x0083f005 in exec_stmts (estate=0xfeff8a90, stmts=0x9118408)
     at pl_exec.c:903
#11 0x0083ee15 in exec_stmt_block (estate=0xfeff8a90, block=0x90d97e8)
     at pl_exec.c:859
#12 0x0083e77a in plpgsql_exec_trigger (func=0x9149ae0,
trigdata=0xfeff8ca0)
     at pl_exec.c:645
#13 0x0083b053 in plpgsql_call_handler (fcinfo=0xfeff8b50) at
pl_handler.c:121
#14 0x080f1c8e in ExecCallTriggerFunc (trigdata=0xfeff8ca0,
finfo=0x935e260,
     per_tuple_context=0x0) at trigger.c:1150
#15 0x080f2be7 in DeferredTriggerExecute (event=0x92af050, itemno=0,
rel=0x8,
     trigdesc=0x935daf0, finfo=0xfeff8a90, per_tuple_context=0x0)
     at trigger.c:1859
#16 0x080f2fee in deferredTriggerInvokeEvents (immediate_only=1 '\001')
     at trigger.c:2000
#17 0x080f314f in DeferredTriggerEndQuery () at trigger.c:2135
#18 0x08178ae8 in finish_xact_command () at postgres.c:1749
#19 0x08177816 in exec_simple_query (
     query_string=0x8fe2438 "INSERT INTO logs
(seq,level,event_code,event_date,event_time,city,province,user_id,est_ds
p_date,est_dsp_time,country,edilate,carr_code,notes,trac_notes,order_num
)
VALUES ('2','6','TAS','09/14/06','19:"...)
     at postgres.c:905
#20 0x08179f09 in PostgresMain (argc=4, argv=0x8f94b48,
     username=0x8f94ab8 "bwoods") at postgres.c:2871
#21 0x08153c90 in BackendFork (port=0x8fa6af0) at postmaster.c:2564
#22 0x08153683 in BackendStartup (port=0x8fa6af0) at postmaster.c:2207
#23 0x08151be8 in ServerLoop () at postmaster.c:1119
#24 0x081512ae in PostmasterMain (argc=5, argv=0x8f92688) at
postmaster.c:897
#25 0x08121163 in main (argc=5, argv=0xfeff9e44) at main.c:214



--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin

---------------------------(end of broadcast)---------------------------
TIP 3: Have you checked our extensive FAQ?

               http://www.postgresql.org/docs/faq

Re: server resetting

От
Tom Lane
Дата:
Geoffrey <esoteric@3times25.net> writes:
> Program terminated with signal 11, Segmentation fault.
> #0  exec_stmt (estate=0xfeff8a90, stmt=0x0) at pl_exec.c:928
>          in pl_exec.c
> #0  exec_stmt (estate=0xfeff8a90, stmt=0x0) at pl_exec.c:928
> #1  0x0083f005 in exec_stmts (estate=0xfeff8a90, stmts=0x90fa9e0)
>      at pl_exec.c:903
> #2  0x0083f4f2 in exec_stmt_if (estate=0xfeff8a90, stmt=0x90fab78)
>      at pl_exec.c:1139

It seems you've got a corrupt "compiled statements" data structure for
a plpgsql trigger function.  Offhand this does not look like any of the
known post-7.4.7 bug fixes.  Can you show us the source code for that
trigger?

            regards, tom lane

Re: server resetting

От
Geoffrey
Дата:
Tom Lane wrote:

> It seems you've got a corrupt "compiled statements" data structure for
> a plpgsql trigger function.  Offhand this does not look like any of the
> known post-7.4.7 bug fixes.  Can you show us the source code for that
> trigger?

Problem is, we seem to be having a problem with this reset issue and I
don't see a correlation in the backtraces.  Most of them are in fact
related to inserts, but there are at least three different tables
involved.  There are also some where an INSERT is not involved.  I've
attached three more backtraces from different core files to provide
further data and hopefully pinpoint this issue.

We're assuming a common problem here, maybe that's our first mistake.

--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin
Using host libthread_db library "/lib/tls/libthread_db.so.1".
Core was generated by `postgres: bwoods exp [local] idle             '.
Program terminated with signal 11, Segmentation fault.
#0  0x002c35b8 in malloc_consolidate () from /lib/tls/libc.so.6
#0  0x002c35b8 in malloc_consolidate () from /lib/tls/libc.so.6
#1  0x002c3438 in _int_free () from /lib/tls/libc.so.6
#2  0x002c2278 in free () from /lib/tls/libc.so.6
#3  0x002131ad in GridSystem::~GridSystem () from /usr/local/lib/libgrid.so
#4  0x00213226 in GridSystem::Release () from /usr/local/lib/libgrid.so
#5  0x0021415f in GridManager::~GridManager () from /usr/local/lib/libgrid.so
#6  0x00212c9e in GridGlobals::Terminate () from /usr/local/lib/libgrid.so
#7  0x00212b36 in GridGlobals::~GridGlobals () from /usr/local/lib/libgrid.so
#8  0x00212d84 in Grid_GetNet () from /usr/local/lib/libgrid.so
#9  0x0027b8f3 in exit () from /lib/tls/libc.so.6
#10 0x0816ceda in proc_exit (code=0) at ipc.c:114
#11 0x0817a235 in PostgresMain (argc=4, argv=0x8f93e00,
    username=0x8f93d70 "bwoods") at postgres.c:3063
#12 0x08153c90 in BackendFork (port=0x8fa6aa0) at postmaster.c:2564
#13 0x08153683 in BackendStartup (port=0x8fa6aa0) at postmaster.c:2207
#14 0x08151be8 in ServerLoop () at postmaster.c:1119
#15 0x081512ae in PostmasterMain (argc=5, argv=0x8f92688) at postmaster.c:897
#16 0x08121163 in main (argc=5, argv=0xfeff9e44) at main.c:214
Using host libthread_db library "/lib/tls/libthread_db.so.1".
Core was generated by `postgres: merck exp [local] idle              '.
Program terminated with signal 11, Segmentation fault.
#0  0x00360042 in _dl_close () from /lib/tls/libc.so.6
#0  0x00360042 in _dl_close () from /lib/tls/libc.so.6
#1  0x0095b06c in dlclose_doit () from /lib/libdl.so.2
#2  0x004bf8c6 in _dl_catch_error_internal () from /lib/ld-linux.so.2
#3  0x0095b4b6 in _dlerror_run () from /lib/libdl.so.2
#4  0x0095b032 in dlclose () from /lib/libdl.so.2
#5  0x00181893 in DynDll::Unload () from /usr/local/lib/libpcmsrv.so
#6  0x00181682 in DynDll::~DynDll$base () from /usr/local/lib/libpcmsrv.so
#7  0x00183718 in PCMSGetDefaultRegion () from /usr/local/lib/libpcmsrv.so
#8  0x0027b8f3 in exit () from /lib/tls/libc.so.6
#9  0x0816ceda in proc_exit (code=0) at ipc.c:114
#10 0x0817a235 in PostgresMain (argc=4, argv=0x8f93b58,
    username=0x8f93ac8 "merck") at postgres.c:3063
#11 0x08153c90 in BackendFork (port=0x8fa6aa0) at postmaster.c:2564
#12 0x08153683 in BackendStartup (port=0x8fa6aa0) at postmaster.c:2207
#13 0x08151be8 in ServerLoop () at postmaster.c:1119
#14 0x081512ae in PostmasterMain (argc=5, argv=0x8f92688) at postmaster.c:897
#15 0x08121163 in main (argc=5, argv=0xfeff9e44) at main.c:214
Using host libthread_db library "/lib/tls/libthread_db.so.1".
Core was generated by `postgres: brandon exp [local] INSERT          '.
Program terminated with signal 11, Segmentation fault.
#0  _SPI_execute_plan (plan=0x0, Values=0x97aba78, Nulls=0x97aba90 " ",
    useCurrentSnapshot=0 '\0', tcount=1) at spi.c:1190
    in spi.c
#0  _SPI_execute_plan (plan=0x0, Values=0x97aba78, Nulls=0x97aba90 " ",
    useCurrentSnapshot=0 '\0', tcount=1) at spi.c:1190
#1  0x08115ed2 in SPI_execp (plan=0x919bfd8, Values=0x97aba78, Nulls=0x0,
    tcount=1) at spi.c:241
#2  0x00166b33 in exec_run_select (estate=0xfeff89e0, expr=0x9254c58,
    maxtuples=0, portalP=0x0) at pl_exec.c:3217
#3  0x00163c8e in exec_stmt_select (estate=0xfeff89e0, stmt=0x9254cb0)
    at pl_exec.c:1519
#4  0x0016311f in exec_stmt (estate=0xfeff89e0, stmt=0x9254cb0)
    at pl_exec.c:967
#5  0x00163005 in exec_stmts (estate=0xfeff89e0, stmts=0x9254cc8)
    at pl_exec.c:903
#6  0x001634f2 in exec_stmt_if (estate=0xfeff89e0, stmt=0x924ed08)
    at pl_exec.c:1139
#7  0x001630ca in exec_stmt (estate=0xfeff89e0, stmt=0x924ed08)
    at pl_exec.c:947
#8  0x00163005 in exec_stmts (estate=0xfeff89e0, stmts=0x9256fb0)
    at pl_exec.c:903
#9  0x00162e15 in exec_stmt_block (estate=0xfeff89e0, block=0x924ed38)
    at pl_exec.c:859
#10 0x0016277a in plpgsql_exec_trigger (func=0x94762f8, trigdata=0xfeff8b90)
    at pl_exec.c:645
#11 0x0015f053 in plpgsql_call_handler (fcinfo=0xfeff8aa0) at pl_handler.c:121
#12 0x080f1c8e in ExecCallTriggerFunc (trigdata=0xfeff8b90, finfo=0x98b62b8,
    per_tuple_context=0x0) at trigger.c:1150
#13 0x080f1f8f in ExecBRInsertTriggers (estate=0x98b4b90, relinfo=0x98b4c40,
    trigtuple=0x98b6198) at trigger.c:1258
#14 0x08105098 in ExecInsert (slot=0x98b50e0, tupleid=0x0, estate=0x98b4b90)
    at execMain.c:1387
#15 0x08104c7f in ExecutePlan (estate=0x98b4b90, planstate=0x98b5128,
    operation=CMD_INSERT, numberTuples=0, direction=NoMovementScanDirection,
    dest=0x9789250) at execMain.c:1250
#16 0x08103df8 in ExecutorRun (queryDesc=0x98beee0,
    direction=ForwardScanDirection, count=0) at execMain.c:249
#17 0x0817a952 in ProcessQuery (parsetree=0x0, plan=0x98beee0, params=0x0,
    dest=0x0, completionTag=0xfeff8dc0 "") at pquery.c:139
#18 0x0817b4c4 in PortalRunMulti (portal=0x8fe6618, dest=0x9789250,
    altdest=0x9789250, completionTag=0xfeff8dc0 "") at pquery.c:860
#19 0x0817ae33 in PortalRun (portal=0x8fe6618, count=2147483647,
    dest=0x9789250, altdest=0x0, completionTag=0xfeff8dc0 "") at pquery.c:494
#20 0x081777e5 in exec_simple_query (
    query_string=0x8fe2200 "INSERT INTO fax
(seq,fax_tstamp,fax_from,send_time,spooled,who_to,signed,action,code,fax_number,order_num)VALUES ('1','09/14/2006
11:57:44','BRANDON','1200','f','Carrier','T','fax','WILLNCUS001','252"...)
    at postgres.c:873
#21 0x08179f09 in PostgresMain (argc=4, argv=0x8f948a0,
    username=0x8f94810 "brandon") at postgres.c:2871
#22 0x08153c90 in BackendFork (port=0x8fa6ac8) at postmaster.c:2564
#23 0x08153683 in BackendStartup (port=0x8fa6ac8) at postmaster.c:2207
#24 0x08151be8 in ServerLoop () at postmaster.c:1119
#25 0x081512ae in PostmasterMain (argc=5, argv=0x8f92688) at postmaster.c:897
#26 0x08121163 in main (argc=5, argv=0xfeff9e44) at main.c:214

Re: server resetting

От
Tom Lane
Дата:
Geoffrey <esoteric@3times25.net> writes:
> Problem is, we seem to be having a problem with this reset issue and I
> don't see a correlation in the backtraces.  Most of them are in fact
> related to inserts, but there are at least three different tables
> involved.  There are also some where an INSERT is not involved.  I've
> attached three more backtraces from different core files to provide
> further data and hopefully pinpoint this issue.

Well, these make it clear that you've got some pretty big chunks of
nonstandard code in the backend, so my first thought is that there's a
memory-clobber bug somewhere in that.  It might be worth trying to run
the code with a debugging malloc library (ElectricFence or some such)
to try to locate the culprit.

            regards, tom lane

Re: server resetting

От
Geoffrey
Дата:
Tom Lane wrote:
> Geoffrey <esoteric@3times25.net> writes:
>> Problem is, we seem to be having a problem with this reset issue and I
>> don't see a correlation in the backtraces.  Most of them are in fact
>> related to inserts, but there are at least three different tables
>> involved.  There are also some where an INSERT is not involved.  I've
>> attached three more backtraces from different core files to provide
>> further data and hopefully pinpoint this issue.
>
> Well, these make it clear that you've got some pretty big chunks of
> nonstandard code in the backend, so my first thought is that there's a
> memory-clobber bug somewhere in that.  It might be worth trying to run
> the code with a debugging malloc library (ElectricFence or some such)
> to try to locate the culprit.

I'm not sure what you mean by 'nonstandard code,' could you expand on
that?  All the trigger code is written in plpgsql.  Are you suggesting
we're stomping on our own memory within the trigger code we've written?

--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin

Re: server resetting

От
Tom Lane
Дата:
Geoffrey <esoteric@3times25.net> writes:
> Tom Lane wrote:
>> Well, these make it clear that you've got some pretty big chunks of
>> nonstandard code in the backend, so my first thought is that there's a
>> memory-clobber bug somewhere in that.

> I'm not sure what you mean by 'nonstandard code,' could you expand on
> that?

The traces include code from /usr/local/lib/libgrid.so and
/usr/local/lib/libpcmsrv.so ... I don't know what those are,
but I'm quite sure they are not invoked by a standard Postgres build.
I also find it suggestive that they appear to have been written in C++
... we've seen problems before from trying to link C++ code into the
backend, because it tends to bring along its own incompatible ideas
about how to do error recovery and memory management.

            regards, tom lane

Re: server resetting

От
Geoffrey
Дата:
Tom Lane wrote:
> Geoffrey <esoteric@3times25.net> writes:
>> Tom Lane wrote:
>>> Well, these make it clear that you've got some pretty big chunks of
>>> nonstandard code in the backend, so my first thought is that there's a
>>> memory-clobber bug somewhere in that.
>
>> I'm not sure what you mean by 'nonstandard code,' could you expand on
>> that?
>
> The traces include code from /usr/local/lib/libgrid.so and
> /usr/local/lib/libpcmsrv.so ... I don't know what those are,
> but I'm quite sure they are not invoked by a standard Postgres build.
> I also find it suggestive that they appear to have been written in C++
> ... we've seen problems before from trying to link C++ code into the
> backend, because it tends to bring along its own incompatible ideas
> about how to do error recovery and memory management.

The libpcmsrv is a library for looking up miles, vendor provided.  I'll
have to check on the other one, it may be related to the same package.

Thanks for the heads up on C++ code.

It seems we may have located a memory problem in a library that is used
throughout our code, thus, we are looking into this at this time.

Thanks again.

--
Until later, Geoffrey

Those who would give up essential Liberty, to purchase a little
temporary Safety, deserve neither Liberty nor Safety.
  - Benjamin Franklin