Обсуждение: Bug #514: Backend crashes periodically

Поиск
Список
Период
Сортировка

Bug #514: Backend crashes periodically

От
pgsql-bugs@postgresql.org
Дата:
Warren Volz (wrv@po.cwru.edu) reports a bug with a severity of 2
The lower the number the more severe it is.

Short Description
Backend crashes periodically

Long Description
I am using Postgres version 7.1.3 for a webmail application. At what seems like random intervals the backend dies with
aerror status of 11. Whenever the backend dies the mod_perl webserver must be restarted to re-establish the DB
connection.The only way I have found to reproduce this problem is to run several consecutive VACUUM commands. 

Here are the relevant details:
DB Version:  PostgreSQL v7.1.3
OS:  Linux 2.4.5 #10 Fri Jun 22 02:20:21 PDT 2001 i686
Startup command: pg_ctl start -D /home/pgsql/data -l /var/local/pgsql/errlog -o "-F"
Changes to config file:  max_connections = 64, shared_buffers = 128, wal_files = 16

I managed to capture the core file and recompiled with -g to get the following trace:

Core was generated by `postgres: sis sis [local] UPDATE                      '.
Program terminated with signal 11, Segmentation fault.
(Omitted symbol loading)
#0  ExecEvalVar (variable=0x8239888, econtext=0x8239ed8, isNull=0x823b7c4 "") at execQual.c:323
323             heapTuple = slot->val;

The output in our pglog file is:

Nov 10 20:32:31 wedge postgres[1839]: [1-1] NOTICE:  Message from PostgreSQL backend:
Nov 10 20:32:31 wedge postgres[1839]: [1-2] ^IThe Postmaster has informed me that some other backend^Idied abnormally
andpossibly corrupted shared memory. 
Nov 10 20:32:31 wedge postgres[1839]: [1-3] ^II have rolled back the current transaction and am^Igoing to terminate
yourdatabase system connection and exit. 

Server process (pid 13230) exited with status 11 at Sat Nov 10 13:21:31 2001
Terminating any active server processes...
Server processes were terminated at Sat Nov 10 13:21:31 2001

Sample Code


No file was uploaded with this report

Re: Bug #514: Backend crashes periodically

От
Tom Lane
Дата:
pgsql-bugs@postgresql.org writes:
> Program terminated with signal 11, Segmentation fault.
> (Omitted symbol loading)
> #0  ExecEvalVar (variable=0x8239888, econtext=0x8239ed8, isNull=0x823b7c4 "") at execQual.c:323
> 323             heapTuple = slot->val;

This isn't enough to say anything useful, but please send the results of
    bt
and
    p debug_query_string
in this same corefile.

            regards, tom lane

Re: Bug #514: Backend crashes periodically

От
Warren Volz
Дата:
Tom,

Here ya go...

bt output:

#0  ExecEvalVar (variable=0x8239888, econtext=0x8239ed8,
isNull=0x823b7c4 "") at execQual.c:323
#1  0x80c1a2b in ExecEvalExpr (expression=0x8239888,
econtext=0x8239ed8, isNull=0x823b7c4 "", isDone=0xbfffea0c) at
execQual.c:1187
#2  0x80c1431 in ExecEvalFuncArgs (fcache=0x823b760,
argList=0x82398b0, econtext=0x8239ed8) at execQual.c:614
#3  0x80c14b0 in ExecMakeFunctionResult (fcache=0x823b760,
arguments=0x82398b0, econtext=0x8239ed8, isNull=0xbfffeadf "",
isDone=0x0) at execQual.c:667
#4  0x80c165e in ExecEvalOper (opClause=0x8239838,
econtext=0x8239ed8, isNull=0xbfffeadf "", isDone=0x0) at
execQual.c:859
#5  0x80c1aa0 in ExecEvalExpr (expression=0x8239838,
econtext=0x8239ed8, isNull=0xbfffeadf "", isDone=0x0) at
execQual.c:1222
#6  0x80c1bc6 in ExecQual (qual=0x8239908, econtext=0x8239ed8,
resultForNull=0 '\000') at execQual.c:1374
#7  0x80c56db in IndexNext (node=0x8238c98) at nodeIndexscan.c:133
#8  0x80c2063 in ExecScan (node=0x8238c98, accessMtd=0x80c55e0
<IndexNext>) at execScan.c:95
#9  0x80c58af in ExecIndexScan (node=0x8238c98) at nodeIndexscan.c:283
#10 0x80c0729 in ExecProcNode (node=0x8238c98, parent=0x8237d60) at
execProcnode.c:282
#11 0x80c724c in ExecNestLoop (node=0x8237d60) at nodeNestloop.c:165
#12 0x80c0759 in ExecProcNode (node=0x8237d60, parent=0x8237d60) at
execProcnode.c:297
#13 0x80c0392 in EvalPlanQualNext (estate=0x822ea18) at execMain.c:1884
#14 0x80c0369 in EvalPlanQual (estate=0x822ea18, rti=4,
tid=0xbfffed28) at execMain.c:1870
#15 0x80bfcbb in ExecReplace (slot=0x822f970, tupleid=0xbfffed98,
estate=0x822ea18) at execMain.c:1465
#16 0x80bf951 in ExecutePlan (estate=0x822ea18, plan=0x822df90,
operation=CMD_UPDATE, numberTuples=0, direction=ForwardScanDirection,
destfunc=0x82374e8)
     at execMain.c:1125
#17 0x80bedb7 in ExecutorRun (queryDesc=0x822f640, estate=0x822ea18,
feature=3, count=0) at execMain.c:233
#18 0x8105ab3 in ProcessQuery (parsetree=0x820cf98, plan=0x822df90,
dest=Remote) at pquery.c:295
#19 0x81044d1 in pg_exec_query_string (query_string=0x820caa0 "DELETE
FROM session WHERE session_key =
'bRUjmnqcb5prUNRE.Bb0sq+fZB69dkdVlCAKNBsb'", dest=Remote,
     parse_context=0x81e6d58) at postgres.c:810
#20 0x8105564 in PostgresMain (argc=4, argv=0xbffff050, real_argc=4,
real_argv=0xbffff9c4, username=0x81dcc09 "sis") at postgres.c:1908
#21 0x80f0194 in DoBackend (port=0x81dc9a0) at postmaster.c:2114
#22 0x80efd4c in BackendStartup (port=0x81dc9a0) at postmaster.c:1897
#23 0x80eef79 in ServerLoop () at postmaster.c:995
#24 0x80ee963 in PostmasterMain (argc=4, argv=0xbffff9c4) at postmaster.c:685
#25 0x80cefad in main (argc=4, argv=0xbffff9c4) at main.c:171
#26 0x400e92eb in __libc_start_main (main=0x80cee90 <main>, argc=4,
ubp_av=0xbffff9c4, init=0x80653ec <_init>, fini=0x8148c8c <_fini>,
     rtld_fini=0x4000c130 <_dl_fini>, stack_end=0xbffff9bc) at
../sysdeps/generic/libc-start.c:129


p debug_query_string output:

$1 = 0x820caa0 "DELETE FROM session WHERE session_key =
'bRUjmnqcb5prUNRE.Bb0sq+fZB69dkdVlCAKNBsb'"

-Warren

>pgsql-bugs@postgresql.org writes:
>>  Program terminated with signal 11, Segmentation fault.
>>  (Omitted symbol loading)
>>  #0  ExecEvalVar (variable=0x8239888, econtext=0x8239ed8,
>>isNull=0x823b7c4 "") at execQual.c:323
>>  323             heapTuple = slot->val;
>
>This isn't enough to say anything useful, but please send the results of
>    bt
>and
>    p debug_query_string
>in this same corefile.

Re: Bug #514: Backend crashes periodically

От
Tom Lane
Дата:
Warren Volz <wrv@po.cwru.edu> writes:
> #14 0x80c0369 in EvalPlanQual (estate=0x822ea18, rti=4,
> tid=0xbfffed28) at execMain.c:1870

The fact that the crash is down inside EvalPlanQual makes me think that
you are seeing the same problem recently reported by Barry Lind (see
pgsql-hackers archives for 5-Nov).  The only way to get into
EvalPlanQual is to be trying to update/delete a tuple that was already
updated or deleted by a concurrent transaction that hasn't yet
committed.  So the problem is not reproducible on a single-backend
basis --- but I think it might be fairly easy to reproduce if you
manually put two backends through the paces of your application.
Something like

    Session 1            Session 2

    begin;
    update foo where ...
                    begin;
                    update foo where ...
                    [ blocks waiting for session 1 ]
    end;
                    [ crash? ]

EvalPlanQual does work in simple test cases, so there's some extra
ingredient needed to cause the problem --- but what?  If you can get
a reproducible test sequence then I'm sure we can find and fix the
error.  Barry hasn't come back with a test case yet, so please work
on it...

            regards, tom lane

Re: Bug #514: Backend crashes periodically

От
Tom Lane
Дата:
Warren Volz <wrv@po.cwru.edu> writes:
> p debug_query_string output:

> $1 = 0x820caa0 "DELETE FROM session WHERE session_key =
> 'bRUjmnqcb5prUNRE.Bb0sq+fZB69dkdVlCAKNBsb'"

BTW, do you have rules or triggers on that table?  The backtrace shows
the system executing a nestloop join, which implies at least a two-table
query, which says that there's more going on than meets the eye here.
It would be useful to see the full schema for this table and any related
tables, as well as an EXPLAIN VERBOSE for the above query.

            regards, tom lane

Re: Bug #514: Backend crashes periodically

От
Warren Volz
Дата:
Tom,

Yeah, there is a trigger.  Here are the relevant bits of the schema,
followed by the output of EXPLAIN VERBOSE on a similar query.

Thanks,
Warren

--
-- Table structure for table 'session'
--
CREATE TABLE session (
         session_key             CHAR(40) PRIMARY KEY,
         sis_user_id             INTEGER NOT NULL REFERENCES sis_user
(sis_user_id),
         last_access_time        DATETIME NOT NULL DEFAULT TEXT 'now'
);

-- A rule for keeping the last_visit updated.
CREATE RULE session_del AS ON DELETE TO session DO
UPDATE sis_user
SET last_visit = OLD.last_access_time
WHERE OLD.sis_user_id = sis_user.sis_user_id;


CREATE TABLE sis_user (
         sis_user_id             INTEGER PRIMARY KEY REFERENCES member
(member_id),
         last_name               CHAR(30),
         cwrunet_id              CHAR(10) UNIQUE,
         email_address           CHAR(80) NOT NULL UNIQUE,
         description             VARCHAR(128),
         field_of_study          VARCHAR(80),
         favorite_quote          VARCHAR(128),
         last_visit              DATETIME NOT NULL DEFAULT TEXT 'now',
         second_to_last_visit    DATETIME NOT NULL DEFAULT TEXT 'now',
         email_bounced           SMALLINT NOT NULL DEFAULT 0,
         on_vacation_until       DATETIME,
         password                CHAR(13),
         privacy                 INTEGER NOT NULL,
         last_ip_address         CHAR(15) DEFAULT '0.0.0.0',
         checked_mail            BOOLEAN NOT NULL DEFAULT FALSE,
         num_messages            INTEGER DEFAULT 0
);



sis=> explain verbose delete from session where
session_key='ys!aoUC3kAF2UJ46vz6pSIz1UpsfuLps71YwaT+P';
NOTICE:  QUERY DUMP:

{ NESTLOOP :startup_cost 0.00 :total_cost 35.23 :rows 35 :width 145
:qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 23
:restypmod -1 :resname sis_user_id :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000
:varattno 1 :vartype 23 :vartypmod -1  :varlevelsup 0 :varnoold 4
:varoattno 1}} { TARGETENTRY :resdom { RESDOM :resno 2 :restype 1042
:restypmod 34 :resname last_name :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000
:varattno 2 :vartype 1042 :vartypmod 34  :varlevelsup 0 :varnoold 4
:varoattno 2}} { TARGETENTRY :resdom { RESDOM :resno 3 :restype 1042
:restypmod 14 :resname cwrunet_id :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000
:varattno 3 :vartype 1042 :vartypmod 14  :varlevelsup 0 :varnoold 4
:varoattno 3}} { TARGETENTRY :resdom { RESDOM :resno 4 :restype 1042
:restypmod 84 :resname email_address :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000
:varattno 4 :vartype 1042 :vartypmod 84  :varlevelsup 0 :varnoold 4
:varoattno 4}} { TARGETENTRY :resdom { RESDOM :resno 5 :restype 1043
:restypmod 132 :resname description :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000
:varattno 5 :vartype 1043 :vartypmod 132  :varlevelsup 0 :varnoold 4
:varoattno 5}} { TARGETENTRY :resdom { RESDOM :resno 6 :restype 1043
:restypmod 84 :resname field_of_study :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000
:varattno 6 :vartype 1043 :vartypmod 84  :varlevelsup 0 :varnoold 4
:varoattno 6}} { TARGETENTRY :resdom { RESDOM :resno 7 :restype 1043
:restypmod 132 :resname favorite_quote :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000
:varattno 7 :vartype 1043 :vartypmod 132  :varlevelsup 0 :varnoold 4
:varoattno 7}} { TARGETENTRY :resdom { RESDOM :resno 8 :restype 1184
:restypmod -1 :resname last_visit :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 65001
:varattno 1 :vartype 1184 :vartypmod -1  :varlevelsup 0 :varnoold 1
:varoattno 3}} { TARGETENTRY :resdom { RESDOM :resno 9 :restype 1184
:restypmod -1 :resname second_to_last_visit :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000
:varattno 8 :vartype 1184 :vartypmod -1  :varlevelsup 0 :varnoold 4
:varoattno 9}} { TARGETENTRY :resdom { RESDOM :resno 10 :restype 21
:restypmod -1 :resname email_bounced :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000
:varattno 9 :vartype 21 :vartypmod -1  :varlevelsup 0 :varnoold 4
:varoattno 10}} { TARGETENTRY :resdom { RESDOM :resno 11 :restype
1184 :restypmod -1 :resname on_vacation_until :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000
:varattno 10 :vartype 1184 :vartypmod -1  :varlevelsup 0 :varnoold 4
:varoattno 11}} { TARGETENTRY :resdom { RESDOM :resno 12 :restype
1042 :restypmod 17 :resname password :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { VAR :varno 65000
:varattno 11 :vartype 1042 :vartypmod 17  :varlevelsup 0 :varnoold 4
:varoattno 12}} { TARGETENTRY :resdom { RESDOM :resno 13 :restype 23
:restypmod -1 :resname privacy :reskey 0 :reskeyop 0 :ressortgroupref
0 :resjunk false } :expr { VAR :varno 65000 :varattno 12 :vartype 23
:vartypmod -1  :varlevelsup 0 :varnoold 4 :varoattno 13}} {
TARGETENTRY :resdom { RESDOM :resno 14 :restype 1042 :restypmod 19
:resname last_ip_address :reskey 0 :reskeyop 0 :ressortgroupref 0
:resjunk false } :expr { VAR :varno 65000 :varattno 13 :vartype 1042
:vartypmod 19  :varlevelsup 0 :varnoold 4 :varoattno 14}} {
TARGETENTRY :resdom { RESDOM :resno 15 :restype 16 :restypmod -1
:resname checked_mail :reskey 0 :reskeyop 0 :ressortgroupref 0
:resjunk false } :expr { VAR :varno 65000 :varattno 14 :vartype 16
:vartypmod -1  :varlevelsup 0 :varnoold 4 :varoattno 15}} {
TARGETENTRY :resdom { RESDOM :resno 16 :restype 23 :restypmod -1
:resname num_messages :reskey 0 :reskeyop 0 :ressortgroupref 0
:resjunk false } :expr { VAR :varno 65000 :varattno 15 :vartype 23
:vartypmod -1  :varlevelsup 0 :varnoold 4 :varoattno 16}} {
TARGETENTRY :resdom { RESDOM :resno 17 :restype 27 :restypmod -1
:resname ctid :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk true
} :expr { VAR :varno 65000 :varattno 16 :vartype 27 :vartypmod -1
:varlevelsup 0 :varnoold 4 :varoattno -1}}) :qpqual <> :lefttree {
SEQSCAN :startup_cost 0.00 :total_cost 1.43 :rows 1 :width 12
:qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 1184
:restypmod -1 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0
:resjunk false } :expr { VAR :varno 1 :varattno 3 :vartype 1184
:vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 3}} {
TARGETENTRY :resdom { RESDOM :resno 2 :restype 23 :restypmod -1
:resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false }
:expr { VAR :varno 1 :varattno 2 :vartype 23 :vartypmod -1
:varlevelsup 0 :varnoold 1 :varoattno 2}}) :qpqual ({ EXPR :typeOid
16  :opType op :oper { OPER :opno 1054 :opid 1048 :opresulttype 16 }
:args ({ VAR :varno 1 :varattno 1 :vartype 1042 :vartypmod 44
:varlevelsup 0 :varnoold 1 :varoattno 1} { CONST :consttype 1042
:constlen -1 :constbyval false :constisnull false :constvalue  44 [
44 0 0 0 121 115 33 97 111 85 67 51 107 65 70 50 85 74 52 54 118 122
54 112 83 73 122 49 85 112 115 102 117 76 112 115 55 49 89 119 97 84
43 80 ] })}) :lefttree <> :righttree <> :extprm () :locprm ()
:initplan <> :nprm 0  :scanrelid 1 } :righttree { INDEXSCAN
:startup_cost 0.00 :total_cost 33.36 :rows 35 :width 133
:qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 23
:restypmod -1 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0
:resjunk false } :expr { VAR :varno 4 :varattno 1 :vartype 23
:vartypmod -1  :varlevelsup 0 :varnoold 4 :varoattno 1}} {
TARGETENTRY :resdom { RESDOM :resno 2 :restype 1042 :restypmod 34
:resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false }
:expr { VAR :varno 4 :varattno 2 :vartype 1042 :vartypmod 34
:varlevelsup 0 :varnoold 4 :varoattno 2}} { TARGETENTRY :resdom {
RESDOM :resno 3 :restype 1042 :restypmod 14 :resname <> :reskey 0
:reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4
:varattno 3 :vartype 1042 :vartypmod 14  :varlevelsup 0 :varnoold 4
:varoattno 3}} { TARGETENTRY :resdom { RESDOM :resno 4 :restype 1042
:restypmod 84 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0
:resjunk false } :expr { VAR :varno 4 :varattno 4 :vartype 1042
:vartypmod 84  :varlevelsup 0 :varnoold 4 :varoattno 4}} {
TARGETENTRY :resdom { RESDOM :resno 5 :restype 1043 :restypmod 132
:resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false }
:expr { VAR :varno 4 :varattno 5 :vartype 1043 :vartypmod 132
:varlevelsup 0 :varnoold 4 :varoattno 5}} { TARGETENTRY :resdom {
RESDOM :resno 6 :restype 1043 :restypmod 84 :resname <> :reskey 0
:reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4
:varattno 6 :vartype 1043 :vartypmod 84  :varlevelsup 0 :varnoold 4
:varoattno 6}} { TARGETENTRY :resdom { RESDOM :resno 7 :restype 1043
:restypmod 132 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0
:resjunk false } :expr { VAR :varno 4 :varattno 7 :vartype 1043
:vartypmod 132  :varlevelsup 0 :varnoold 4 :varoattno 7}} {
TARGETENTRY :resdom { RESDOM :resno 8 :restype 1184 :restypmod -1
:resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false }
:expr { VAR :varno 4 :varattno 9 :vartype 1184 :vartypmod -1
:varlevelsup 0 :varnoold 4 :varoattno 9}} { TARGETENTRY :resdom {
RESDOM :resno 9 :restype 21 :restypmod -1 :resname <> :reskey 0
:reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4
:varattno 10 :vartype 21 :vartypmod -1  :varlevelsup 0 :varnoold 4
:varoattno 10}} { TARGETENTRY :resdom { RESDOM :resno 10 :restype
1184 :restypmod -1 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref
0 :resjunk false } :expr { VAR :varno 4 :varattno 11 :vartype 1184
:vartypmod -1  :varlevelsup 0 :varnoold 4 :varoattno 11}} {
TARGETENTRY :resdom { RESDOM :resno 11 :restype 1042 :restypmod 17
:resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false }
:expr { VAR :varno 4 :varattno 12 :vartype 1042 :vartypmod 17
:varlevelsup 0 :varnoold 4 :varoattno 12}} { TARGETENTRY :resdom {
RESDOM :resno 12 :restype 23 :restypmod -1 :resname <> :reskey 0
:reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4
:varattno 13 :vartype 23 :vartypmod -1  :varlevelsup 0 :varnoold 4
:varoattno 13}} { TARGETENTRY :resdom { RESDOM :resno 13 :restype
1042 :restypmod 19 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref
0 :resjunk false } :expr { VAR :varno 4 :varattno 14 :vartype 1042
:vartypmod 19  :varlevelsup 0 :varnoold 4 :varoattno 14}} {
TARGETENTRY :resdom { RESDOM :resno 14 :restype 16 :restypmod -1
:resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false }
:expr { VAR :varno 4 :varattno 15 :vartype 16 :vartypmod -1
:varlevelsup 0 :varnoold 4 :varoattno 15}} { TARGETENTRY :resdom {
RESDOM :resno 15 :restype 23 :restypmod -1 :resname <> :reskey 0
:reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { VAR :varno 4
:varattno 16 :vartype 23 :vartypmod -1  :varlevelsup 0 :varnoold 4
:varoattno 16}} { TARGETENTRY :resdom { RESDOM :resno 16 :restype 27
:restypmod -1 :resname <> :reskey 0 :reskeyop 0 :ressortgroupref 0
:resjunk false } :expr { VAR :varno 4 :varattno -1 :vartype 27
:vartypmod -1  :varlevelsup 0 :varnoold 4 :varoattno -1}}) :qpqual <>
:lefttree <> :righttree <> :extprm () :locprm () :initplan <> :nprm 0
:scanrelid 4 :indxid ( 183804) :indxqual (({ EXPR :typeOid 16
:opType op :oper { OPER :opno 96 :opid 65 :opresulttype 16 } :args ({
VAR :varno 4 :varattno 1 :vartype 23 :vartypmod -1  :varlevelsup 0
:varnoold 4 :varoattno 1} { VAR :varno 65001 :varattno 2 :vartype 23
:vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 2})}))
:indxqualorig (({ EXPR :typeOid 16  :opType op :oper { OPER :opno 96
:opid 65 :opresulttype 16 } :args ({ VAR :varno 65001 :varattno 2
:vartype 23 :vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno 2} {
VAR :varno 4 :varattno 1 :vartype 23 :vartypmod -1  :varlevelsup 0
:varnoold 4 :varoattno 1})})) :indxorderdir 0 } :extprm () :locprm ()
:initplan <> :nprm 0  :jointype 0 :joinqual <>}
NOTICE:  QUERY PLAN:

Nested Loop  (cost=0.00..35.23 rows=35 width=145)
   ->  Seq Scan on session  (cost=0.00..1.43 rows=1 width=12)
   ->  Index Scan using sis_user_pkey on sis_user  (cost=0.00..33.36
rows=35 width=133)

NOTICE:  QUERY DUMP:

{ SEQSCAN :startup_cost 0.00 :total_cost 1.43 :rows 1 :width 6
:qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 27
:restypmod -1 :resname ctid :reskey 0 :reskeyop 0 :ressortgroupref 0
:resjunk true } :expr { VAR :varno 1 :varattno -1 :vartype 27
:vartypmod -1  :varlevelsup 0 :varnoold 1 :varoattno -1}}) :qpqual ({
EXPR :typeOid 16  :opType op :oper { OPER :opno 1054 :opid 1048
:opresulttype 16 } :args ({ VAR :varno 1 :varattno 1 :vartype 1042
:vartypmod 44  :varlevelsup 0 :varnoold 1 :varoattno 1} { CONST
:consttype 1042 :constlen -1 :constbyval false :constisnull false
:constvalue  44 [ 44 0 0 0 121 115 33 97 111 85 67 51 107 65 70 50 85
74 52 54 118 122 54 112 83 73 122 49 85 112 115 102 117 76 112 115 55
49 89 119 97 84 43 80 ] })}) :lefttree <> :righttree <> :extprm ()
:locprm () :initplan <> :nprm 0  :scanrelid 1 }
NOTICE:  QUERY PLAN:

Seq Scan on session  (cost=0.00..1.43 rows=1 width=6)

EXPLAIN



At 10:53 PM -0500 11/11/01, Tom Lane wrote:
>Warren Volz <wrv@po.cwru.edu> writes:
>>  p debug_query_string output:
>
>>  $1 = 0x820caa0 "DELETE FROM session WHERE session_key =
>>  'bRUjmnqcb5prUNRE.Bb0sq+fZB69dkdVlCAKNBsb'"
>
>BTW, do you have rules or triggers on that table?  The backtrace shows
>the system executing a nestloop join, which implies at least a two-table
>query, which says that there's more going on than meets the eye here.
>It would be useful to see the full schema for this table and any related
>tables, as well as an EXPLAIN VERBOSE for the above query.
>
>            regards, tom lane

Re: Bug #514: Backend crashes periodically

От
Tom Lane
Дата:
Okay, I've extracted a reproducible test case from Warren's info:

Setup:

CREATE TABLE sis_user (
sis_user_id             INTEGER PRIMARY KEY,
last_visit              DATETIME NOT NULL DEFAULT TEXT 'now');

CREATE TABLE session (
session_key             CHAR(40) PRIMARY KEY,
sis_user_id             INTEGER NOT NULL REFERENCES sis_user(sis_user_id),
last_access_time        DATETIME NOT NULL DEFAULT TEXT 'now');

CREATE RULE session_del AS ON DELETE TO session DO
UPDATE sis_user SET last_visit = OLD.last_access_time
WHERE OLD.sis_user_id = sis_user.sis_user_id;

insert into sis_user values(1);

In session 1, do:

insert into session values('zzz', 1);
begin;
delete from session where  session_key  = 'zzz';

In session 2, do:

delete from session where  session_key  = 'zzz';

Back to session 1:

end;

Session 2 crashes.  Crash happens in current sources as well as 7.1.

The crash does NOT happen if the foreign key reference is removed from
session.sis_user_id.  I'm not sure at this point whether the rule is
a crucial element or not.  The problem appears to be generation of
an invalid plan tree: the Var being decoded has varno = OUTER, which
offhand I believe it should not have in an indexscan's qual tree.

I'm tired and am going to bed soon, but perhaps someone else would
like to poke at this before I get back to it...

            regards, tom lane

Re: Bug #514: Backend crashes periodically

От
Tom Lane
Дата:
I believe I have found and fixed the crash-in-EvalPlanQual problem you
two reported.  The patch for current sources is attached --- it should
work in 7.1.* too, although line numbers might be off a bit.

            regards, tom lane


*** src/backend/executor/nodeIndexscan.c.orig    Mon Oct 29 15:30:53 2001
--- src/backend/executor/nodeIndexscan.c    Mon Nov 12 12:18:06 2001
***************
*** 339,348 ****
      {
          /*
           * If we are being passed an outer tuple, save it for runtime key
!          * calc
           */
          if (exprCtxt != NULL)
              econtext->ecxt_outertuple = exprCtxt->ecxt_outertuple;

          /*
           * Reset the runtime-key context so we don't leak memory as each
--- 339,355 ----
      {
          /*
           * If we are being passed an outer tuple, save it for runtime key
!          * calc.  We also need to link it into the "regular" per-tuple
!          * econtext, so it can be used during indexqualorig evaluations.
           */
          if (exprCtxt != NULL)
+         {
+             ExprContext *stdecontext;
+
              econtext->ecxt_outertuple = exprCtxt->ecxt_outertuple;
+             stdecontext = node->scan.scanstate->cstate.cs_ExprContext;
+             stdecontext->ecxt_outertuple = exprCtxt->ecxt_outertuple;
+         }

          /*
           * Reset the runtime-key context so we don't leak memory as each

Re: Bug #514: Backend crashes periodically

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> BTW I have a question about this example.
> Session 2 deletes no row because *session 1* already
> deleted the rows but the session_del rule seems to
> be invoked. Is it a right behavior ?

Certainly the rule should be invoked --- whether it finds any rows
cannot be determined until it's invoked, anyhow.  Remember a rule
is not a trigger, it is a transform applied to the query before
we begin execution.

            regards, tom lane

Re: Bug #514: Backend crashes periodically

От
Hiroshi Inoue
Дата:
Tom Lane wrote:
>
> Okay, I've extracted a reproducible test case from Warren's info:
>
> Setup:
>
> CREATE TABLE sis_user (
> sis_user_id             INTEGER PRIMARY KEY,
> last_visit              DATETIME NOT NULL DEFAULT TEXT 'now');
>
> CREATE TABLE session (
> session_key             CHAR(40) PRIMARY KEY,
> sis_user_id             INTEGER NOT NULL REFERENCES sis_user(sis_user_id),
> last_access_time        DATETIME NOT NULL DEFAULT TEXT 'now');
>
> CREATE RULE session_del AS ON DELETE TO session DO
> UPDATE sis_user SET last_visit = OLD.last_access_time
> WHERE OLD.sis_user_id = sis_user.sis_user_id;
>
> insert into sis_user values(1);
>
> In session 1, do:
>
> insert into session values('zzz', 1);
> begin;
> delete from session where  session_key  = 'zzz';
>
> In session 2, do:
>
> delete from session where  session_key  = 'zzz';
>
> Back to session 1:
>
> end;

Thanks for your quick fix as usual.
BTW I have a question about this example.
Session 2 deletes no row because *session 1* already
deleted the rows but the session_del rule seems to
be invoked. Is it a right behavior ?

regards,
Hiroshi Inoue

Re: Bug #514: Backend crashes periodically

От
Warren Volz
Дата:
Tom,

Thanks for figuring this out. I applied the patch and the test case
no longer causes the backend to crash. I will let you know if I have
more problems with this bug.

-Warren

At 12:23 PM -0500 11/12/01, Tom Lane wrote:
>I believe I have found and fixed the crash-in-EvalPlanQual problem you
>two reported.  The patch for current sources is attached --- it should
>work in 7.1.* too, although line numbers might be off a bit.

Re: Bug #514: Backend crashes periodically

От
Hiroshi Inoue
Дата:
Tom Lane wrote:
>
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > BTW I have a question about this example.
> > Session 2 deletes no row because *session 1* already
> > deleted the rows but the session_del rule seems to
> > be invoked. Is it a right behavior ?
>
> Certainly the rule should be invoked --- whether it finds any rows
> cannot be determined until it's invoked, anyhow.  Remember a rule
> is not a trigger, it is a transform applied to the query before
> we begin execution.

If it's right, my way is to not use our rule system.

regards,
Hiroshi Inoue

Re: Bug #514: Backend crashes periodically

От
Hiroshi Inoue
Дата:
Tom Lane wrote:
>
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > BTW I have a question about this example.
> > Session 2 deletes no row because *session 1* already
> > deleted the rows but the session_del rule seems to
> > be invoked. Is it a right behavior ?
>
> Certainly the rule should be invoked --- whether it finds any rows
> cannot be determined until it's invoked, anyhow.  Remember a rule
> is not a trigger, it is a transform applied to the query before
> we begin execution.

If it's
>
>                         regards, tom lane

Re: Bug #514: Backend crashes periodically

От
Hiroshi Inoue
Дата:
Tom Lane wrote:
>
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > BTW I have a question about this example.
> > Session 2 deletes no row because *session 1* already
> > deleted the rows but the session_del rule seems to
> > be invoked. Is it a right behavior ?
>
> Certainly the rule should be invoked --- whether it finds any rows
> cannot be determined until it's invoked, anyhow.  Remember a rule
> is not a trigger, it is a transform applied to the query before
> we begin execution.

Hmm I may have misunderstood your explanation.
What I meant is
  Session 2 invokes the session_del rule and really
  updates a sis_user row by the rule though it deletes
  no session row.

regards,
Hiroshi Inoue

Re: Bug #514: Backend crashes periodically

От
Tom Lane
Дата:
Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> What I meant is
>   Session 2 invokes the session_del rule and really
>   updates a sis_user row by the rule though it deletes
>   no session row.

Hmmm ... that's an ugly thought, isn't it?  And I'm not sure there's
anything we can do to defend against it.  If both sessions are executing
the UPDATE at the same time, then neither can possibly know that the
other is about to do a DELETE.  So the UPDATE will happen twice, which
is harmless in the given scenario but would be decidedly not so if the
UPDATE were changing some sort of total or balance.

Perhaps a safer way to write such a rule would be

... ON DELETE DO

    SELECT * FROM target_table WHERE key = OLD.key FOR UPDATE;

    UPDATE other_table SET balance = balance + OLD.quantity
    WHERE something-or-other;

I haven't experimented with this but it seems that the FOR UPDATE
ought to provide the necessary interlock to ensure that only one
transaction does the UPDATE for a particular target row.

If this does work, is there a way to make the FOR UPDATE lock happen
implicitly in the rule mechanism?  Should we even try?  It's a lot
of overhead that may not always be needed.

            regards, tom lane

Re: Bug #514: Backend crashes periodically

От
"Hiroshi Inoue"
Дата:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
>
> Hiroshi Inoue <Inoue@tpf.co.jp> writes:
> > What I meant is
> >   Session 2 invokes the session_del rule and really
> >   updates a sis_user row by the rule though it deletes
> >   no session row.
>
> Hmmm ... that's an ugly thought, isn't it?  And I'm not sure there's
> anything we can do to defend against it.  If both sessions are executing
> the UPDATE at the same time, then neither can possibly know that the
> other is about to do a DELETE.  So the UPDATE will happen twice, which
> is harmless in the given scenario but would be decidedly not so if the
> UPDATE were changing some sort of total or balance.

Yes it seems a pretty serious problem.
The problem is that session 2 sees a not yet deleted( by session 1)
session row and an already updated( by session 1) sis_user row at
the same time. There's no such snapshot that could see both rows.

How do you think about the following query ?

  update sis_user set last_visit = session.last_access_time
  where sis_user_id = session.sis_user_id
  and session.session_key = 'zzz';

UPDATE acquires row level locks on the target sis_user rows
but doesn't acuiqre any row level lock on the related session
rows. Could it guarantee the consistency of the query ?

regards,
Hiroshi Inoue

Re: Bug #514: Backend crashes periodically

От
Tom Lane
Дата:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> The problem is that session 2 sees a not yet deleted( by session 1)
> session row and an already updated( by session 1) sis_user row at
> the same time. There's no such snapshot that could see both rows.

Running in SERIALIZABLE mode would guarantee consistency, no?
(Session 2 would be forced to roll back and try again, and when
it tried again it would see the session row already gone.)

I doubt that there is any lesser answer that guarantees consistency
across multiple queries.

            regards, tom lane

Re: Bug #514: Backend crashes periodically

От
Hiroshi Inoue
Дата:
Tom Lane wrote:
>
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > The problem is that session 2 sees a not yet deleted( by session 1)
> > session row and an already updated( by session 1) sis_user row at
> > the same time. There's no such snapshot that could see both rows.
>
> Running in SERIALIZABLE mode would guarantee consistency, no?
> (Session 2 would be forced to roll back and try again, and when
> it tried again it would see the session row already gone.)

Yes but there would be no one who uses SERIALIZABLE mode
because of such reason. In READ COMMITTED mode, PG system
couldn't abort easily and it makes the implementation of
READ COMMITTED mode much harder than that of SERIALIZABLE.
It's impossible to implement READ COMMITTED mode without
changing snapshots appropriately and acquire appropriate
locks carefully.

regards,
Hiroshi Inoue