Обсуждение: BUG #15677: Crash while deleting from partitioned table

Поиск
Список
Период
Сортировка

BUG #15677: Crash while deleting from partitioned table

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      15677
Logged by:          Norbert Benkocs
Email address:      infernorb@gmail.com
PostgreSQL version: 11.2
Operating system:   CentOS Linux release 7.4.1708 (Core)
Description:

Version: PostgreSQL 11.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
20150623 (Red Hat 4.8.5-36), 64-bit
OS: CentOS Linux release 7.4.1708 (Core) 

Hello,

We have an insert/update/delete query on a partitioned table (multiple
CTE-s) that causes our PostgreSQL server to crash once every few days. We
haven't been able to reproduce this crash so far, and re-running the same
query with the same parameters didn't result in a crash either. The table in
question is updated thousands of times each hour, and most of these work
fine.
Previously this table was not partitioned, we started seeing the crash after
partitioning the table.

DDL:

create table rates.room_daily_rates (
    "ratePlanId" integer not null,
    "roomTypeId" integer not null,
    date date not null,
    duration smallint not null,
    guests smallint not null,
    rate bigint not null,
    "fixedRate" bigint,
    constraint room_daily_rates_pkey primary key ("roomTypeId",
"ratePlanId", date, guests, duration),
    constraint rate_sanity check (rate > 0 and "fixedRate" >= 0),
    constraint guests_sanity check (guests > 0),
    constraint duration_sanity check (duration >= 0),
    constraint room_daily_rates_rate_fkey foreign key ("roomTypeId",
"ratePlanId") 
        references rates.room_rate_plans on update restrict on delete cascade
) partition by range (date);


create table rates.room_daily_rates_2014
    partition of rates.room_daily_rates
        for values from ('2014-01-01') to ('2015-01-01');

create table rates.room_daily_rates_2015
    partition of rates.room_daily_rates
        for values from ('2015-01-01') to ('2016-01-01');

[ ... repeated for each year up to 2023 ... ]

The query that caused the crash:

with in_rows (requested,
"roomTypeId","ratePlanId","duration","guests","date","rate","fixedRate") as
(
    values
(true,$1::integer,$2::integer,$3::integer,$4::integer,$5::"date",$6::"int8",$7::"int8")
),
candidates as (
    select t.*,
           case
               when found and not requested then 'Delete'
               when not found and requested then 'Add'
               when "rateNew" is distinct from "rateOriginal" or "fixedRateNew" is
distinct from "fixedRateOriginal" then 'Update'
               else null
           end as kind
    from (
        select  coalesce(in_rows."roomTypeId", db."roomTypeId") as
"roomTypeId",coalesce(in_rows."ratePlanId", db."ratePlanId") as
"ratePlanId",coalesce(in_rows."date", db."date") as
"date",coalesce(in_rows."duration", db."duration") as
"duration",coalesce(in_rows."guests", db."guests") as
"guests",in_rows.requested is not null as requested,db."roomTypeId" is not
null as found,db."rate" as "rateOriginal",in_rows."rate" as
"rateNew",db."fixedRate" as "fixedRateOriginal",in_rows."fixedRate" as
"fixedRateNew"
        from   in_rows
        full outer join (select * from rates.room_daily_rates db where
(db."roomTypeId", db."ratePlanId", db."date", db."duration", db."guests") in
(select * from (values ($1::integer, $2::integer, $5::"date", $3::integer,
$4::integer)) t)) db on (in_rows."roomTypeId" = db."roomTypeId" and
in_rows."ratePlanId" = db."ratePlanId" and in_rows."date" = db."date" and
in_rows."duration" = db."duration" and in_rows."guests" = db."guests")
    ) t
),
inserts as (
    insert into rates.room_daily_rates
        ("roomTypeId","ratePlanId","date","duration","guests","rate","fixedRate")
    select

in_rows."roomTypeId",in_rows."ratePlanId",in_rows."date",in_rows."duration",in_rows."guests",in_rows."rateNew",in_rows."fixedRateNew"
    from   candidates in_rows
    where  in_rows.kind = 'Add'
    returning  rates.room_daily_rates.*
),
updates as (
    update rates.room_daily_rates db
    set    "rate" = in_rows."rateNew","fixedRate" = in_rows."fixedRateNew"
    from   candidates in_rows
    where  in_rows.kind = 'Update' and
           in_rows."roomTypeId" = db."roomTypeId" and in_rows."ratePlanId" =
db."ratePlanId" and in_rows."date" = db."date" and in_rows."duration" =
db."duration" and in_rows."guests" = db."guests" and (db."roomTypeId",
db."ratePlanId", db."date", db."duration", db."guests") in (select * from
(values ($1::integer, $2::integer, $5::"date", $3::integer, $4::integer))
t)
    returning  db.*
),
deletes as (
    delete from rates.room_daily_rates db
    using  candidates in_rows
    where  in_rows.kind = 'Delete' and
           in_rows."roomTypeId" = db."roomTypeId" and in_rows."ratePlanId" =
db."ratePlanId" and in_rows."date" = db."date" and in_rows."duration" =
db."duration" and in_rows."guests" = db."guests" and (db."roomTypeId",
db."ratePlanId", db."date", db."duration", db."guests") in (select * from
(values ($1::integer, $2::integer, $5::"date", $3::integer, $4::integer))
t)
    returning  db.*
),
deletion_logger as (
    insert into rates.room_daily_rates_clog ("changeId", action, "roomTypeId",
"ratePlanId", "duration", "guests", "date", "rate", "fixedRate")
    select $8, 'Delete', changes."roomTypeId", changes."ratePlanId",
changes."duration", changes."guests", changes."date", changes."rate",
changes."fixedRate"
    from   deletes as changes
),
insert_logger as (
    insert into rates.room_daily_rates_clog ("changeId", action, "roomTypeId",
"ratePlanId", "date", "duration", "guests" )
    select $8, 'Insert', changes."roomTypeId", changes."ratePlanId",
changes."date", changes."duration", changes."guests" 
    from   inserts as changes
),
update_logger as (
    insert into rates.room_daily_rates_clog ("changeId", action, "roomTypeId",
"ratePlanId", "date", "duration", "guests", "rate", "fixedRate")
    select $8, 'Update', changes."roomTypeId", changes."ratePlanId",
changes."date", changes."duration", changes."guests", case when
changes."rateOriginal" is distinct from changes."rateNew" then
changes."rateOriginal" else null end, case when changes."fixedRateOriginal"
is distinct from changes."fixedRateNew" then changes."fixedRateOriginal"
else null end
    from   candidates as changes
    where  changes.kind = 'Update'
)
select (select count(*) from inserts) + (select count(*) from deletes) +
(select count(*) from updates)


Server log:

2019-03-07 16:16:08.831 CET [22673] LOG:  server process (PID 49730) was
terminated by signal 11: Segmentation fault
2019-03-07 16:16:08.831 CET [22673] DETAIL:  Failed process was running:
        with in_rows (requested,
"roomTypeId","ratePlanId","duration","guests","date","rate","fixedRate") as
(
                        values
(true,$1::integer,$2::integer,$3::integer,$4::integer,$5::"date",$6::"int8",$7::"int8")
                    ),
                    candidates as (
                        select t.*,
                               case
                                   when found and not requested then
'Delete'
                                   when not found and requested then 'Add'
                                   when "rateNew" is distinct from
"rateOriginal" or "fixedRateNew" is distinct from "fixedRateOriginal" then
'Update'
                                   else null
                               end as kind
                        from (
                            select  coalesce(in_rows."roomTypeId",
db."roomTypeId") as "roomTypeId",coalesce(in_rows."ratePlanId",
db."ratePlanId") as "ratePlanId",coalesce(in_rows."date", db."date") as
"date",coalesce(in_rows."duration", db."duration") as
"duration",coalesce(in_rows."guests", db."guests") as "guests
2019-03-07 16:16:08.831 CET [22673] LOG:  terminating any other active
server processes
2019-03-07 16:16:08.831 CET [56666] WARNING:  terminating connection because
of crash of another server process
2019-03-07 16:16:08.831 CET [56666] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2019-03-07 16:16:08.831 CET [56666] HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
[ ... The above 3 lines repeated many times ... ]
2019-03-07 16:16:09.044 CET [56807] FATAL:  the database system is in
recovery mode
[ ... repeated many times ... ]
2019-03-07 16:16:10.817 CET [22673] LOG:  all server processes terminated;
reinitializing
2019-03-07 16:16:12.504 CET [57223] LOG:  database system was interrupted;
last known up at 2019-03-07 16:15:27 CET
2019-03-07 16:16:13.043 CET [57223] LOG:  database system was not properly
shut down; automatic recovery in progress
2019-03-07 16:16:13.167 CET [57223] LOG:  redo starts at 399B/ECCF9450
2019-03-07 16:16:16.680 CET [57223] LOG:  invalid record length at
399B/FCF3F2B8: wanted 24, got 0
2019-03-07 16:16:16.680 CET [57223] LOG:  redo done at 399B/FCF3F290
2019-03-07 16:16:16.680 CET [57223] LOG:  last completed transaction was at
log time 2019-03-07 16:16:08.826429+01
2019-03-07 16:16:16.685 CET [57223] LOG:  checkpoint starting:
end-of-recovery immediate
2019-03-07 16:16:18.437 CET [57223] LOG:  checkpoint complete: wrote 39408
buffers (5.0%); 0 WAL file(s) added, 2 removed, 0 recycled; write=1.728 s,
sync=0.014 s, total=1.754 s; sync files=836, longest=0.010 s, average=0.000
s; distance=264471 kB, estimate=264471 kB
2019-03-07 16:16:18.491 CET [22673] LOG:  database system is ready to accept
connections



Backtrace from the latest crash:

(gdb) bt
#0  ExecInitModifyTable (node=node@entry=0x2568180,
estate=estate@entry=0x35f1440, eflags=eflags@entry=0) at
nodeModifyTable.c:2327
#1  0x000000000060af88 in ExecInitNode (node=0x2568180,
estate=estate@entry=0x35f1440, eflags=eflags@entry=0) at
execProcnode.c:174
#2  0x0000000000606fdd in EvalPlanQualStart (epqstate=0x3773848,
epqstate=0x3773848, planTree=0x36c3f08, parentestate=0xa6) at
execMain.c:3257
#3  EvalPlanQualBegin (epqstate=epqstate@entry=0x3773848,
parentestate=parentestate@entry=0x3698450) at execMain.c:3066
#4  0x00000000006070c1 in EvalPlanQual (estate=estate@entry=0x3698450,
epqstate=epqstate@entry=0x3773848, relation=relation@entry=0x7efa6640b040,
rti=38, lockmode=<optimized out>, tid=tid@entry=0x7ffcede66e80,
priorXmax=4181341875) at execMain.c:2536
#5  0x0000000000626b7e in ExecUpdate (mtstate=mtstate@entry=0x3773788,
tupleid=0x7ffcede66f60, oldtuple=0x0, slot=0x230b768,
planSlot=planSlot@entry=0x289ffa8, epqstate=epqstate@entry=0x3773848,
estate=estate@entry=0x3698450, canSetTag=false) at nodeModifyTable.c:1267
#6  0x000000000062715e in ExecModifyTable (pstate=0x3773788) at
nodeModifyTable.c:2166
#7  0x000000000062b21d in ExecProcNode (node=0x3773788) at
../../../src/include/executor/executor.h:247
#8  CteScanNext (node=node@entry=0x337aa78) at nodeCtescan.c:103
#9  0x000000000060c2fa in ExecScanFetch (recheckMtd=0x62b120
<CteScanRecheck>, accessMtd=0x62b140 <CteScanNext>, node=0x337aa78) at
execScan.c:95
#10 ExecScan (node=0x337aa78, accessMtd=0x62b140 <CteScanNext>,
recheckMtd=0x62b120 <CteScanRecheck>) at execScan.c:145
#11 0x00000000006127b9 in ExecProcNode (node=0x337aa78) at
../../../src/include/executor/executor.h:247
#12 fetch_input_tuple (aggstate=aggstate@entry=0x337a640) at nodeAgg.c:406
#13 0x0000000000614230 in agg_retrieve_direct (aggstate=0x337a640) at
nodeAgg.c:1740
#14 ExecAgg (pstate=0x337a640) at nodeAgg.c:1555
#15 0x000000000062d1d1 in ExecProcNode (node=0x337a640) at
../../../src/include/executor/executor.h:247
#16 ExecSetParamPlan (node=<optimized out>,
econtext=econtext@entry=0x337d528) at nodeSubplan.c:1095
#17 0x00000000005fe246 in ExecEvalParamExec (state=state@entry=0x337d7a0,
op=op@entry=0x337e1a8, econtext=econtext@entry=0x337d528) at
execExprInterp.c:2247
#18 0x0000000000600f4e in ExecInterpExpr (state=0x337d7a0,
econtext=0x337d528, isnull=<optimized out>) at execExprInterp.c:991
#19 0x0000000000628fbb in ExecEvalExprSwitchContext (isNull=0x7ffcede67267,
econtext=0x337d528, state=0x337d7a0) at
../../../src/include/executor/executor.h:313
#20 ExecProject (projInfo=0x337d798) at
../../../src/include/executor/executor.h:347
#21 ExecResult (pstate=<optimized out>) at nodeResult.c:136
#22 0x000000000060401a in ExecProcNode (node=0x337d418) at
../../../src/include/executor/executor.h:247
#23 ExecutePlan (execute_once=<optimized out>, dest=0x2708fb8,
direction=<optimized out>, numberTuples=0, sendTuples=true,
operation=CMD_SELECT, use_parallel_mode=<optimized out>,
planstate=0x337d418, estate=0x3698450) at execMain.c:1723
#24 standard_ExecutorRun (queryDesc=0x20ccfb8, direction=<optimized out>,
count=0, execute_once=<optimized out>) at execMain.c:364
#25 0x00007efbf28ea08d in pgss_ExecutorRun (queryDesc=0x20ccfb8,
direction=ForwardScanDirection, count=0, execute_once=<optimized out>) at
pg_stat_statements.c:892
#26 0x000000000074669a in ProcessQuery (plan=<optimized out>,
sourceText=0x2857d18 ' ' <repeats 12 times>, "with in_rows (requested,
\"roomTypeId\",\"ratePlanId\",\"duration\",\"guests\",\"date\",\"rate\",\"f"...,
params=0x2708d80, queryEnv=0x0, dest=0x2708fb8, completionTag=0x7ffcede674b0
"") at pquery.c:161
#27 0x00000000007468d2 in PortalRunMulti (portal=portal@entry=0x1aed290,
isTopLevel=isTopLevel@entry=true,
setHoldSnapshot=setHoldSnapshot@entry=true, dest=dest@entry=0x2708fb8,
altdest=0xcef3a0 <donothingDR>,
completionTag=completionTag@entry=0x7ffcede674b0 "") at pquery.c:1286
#28 0x0000000000746b91 in FillPortalStore (portal=portal@entry=0x1aed290,
isTopLevel=isTopLevel@entry=true) at pquery.c:1033
#29 0x000000000074759f in PortalRun (portal=<optimized out>,
count=9223372036854775807, isTopLevel=<optimized out>, run_once=<optimized
out>, dest=0x1a4eaf0, altdest=0x1a4eaf0, completionTag=0x7ffcede678d0 "") at
pquery.c:768
#30 0x0000000000744e6d in PostgresMain (argc=<optimized out>,
argv=<optimized out>, dbname=<optimized out>, username=<optimized out>) at
postgres.c:2036
#31 0x00000000004803e9 in BackendRun (port=0x1a96eb0) at postmaster.c:4361
#32 BackendStartup (port=0x1a96eb0) at postmaster.c:4033
#33 ServerLoop () at postmaster.c:1706
#34 0x00000000006d7b99 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x1a49280) at postmaster.c:1379
#35 0x000000000048124f in main (argc=3, argv=0x1a49280) at main.c:228

Dump of vars in the ExecInitModifyTable stack frame:

(gdb) info local
mtstate = 0x39c7e78
operation = CMD_DELETE
nplans = 15
saved_resultRelInfo = 0x0
resultRelInfo = <optimized out>
subplan = <optimized out>
l = <optimized out>
i = <optimized out>
rel = <optimized out>
update_tuple_routing_needed = false

(gdb) p *mtstate
$4 = {ps = {type = T_ModifyTableState, plan = 0x2568180, state = 0x35f1440,
ExecProcNode = 0x626e30 <ExecModifyTable>, ExecProcNodeReal = 0x0,
instrument = 0x0, worker_instrument = 0x0, worker_jit_instrument = 0x0, qual
= 0x0, lefttree = 0x0, righttree = 0x0, initPlan = 0x0, subPlan = 0x0,
chgParam = 0x0, ps_ResultTupleSlot = 0x0, ps_ExprContext = 0x0, 
    ps_ProjInfo = 0x0, scandesc = 0x0}, operation = CMD_DELETE, canSetTag =
false, mt_done = false, mt_plans = 0x39c8088, mt_nplans = 15, mt_whichplan =
0, resultRelInfo = 0x35f3f78, rootResultRelInfo = 0xc0, mt_arowmarks =
0x39c8118, mt_epqstate = {estate = 0x0, planstate = 0x0, origslot = 0x0,
plan = 0x0, arowMarks = 0x0, epqParam = 86}, 
  fireBSTriggers = true, mt_existing = 0x0, mt_excludedtlist = 0x0,
mt_conflproj = 0x0, mt_partition_tuple_routing = 0x0, mt_transition_capture
= 0x0, mt_oc_transition_capture = 0x0, mt_per_subplan_tupconv_maps = 0x0}

(gdb) p *mtstate->resultRelInfo
$2 = {type = T_ResultRelInfo, ri_RangeTableIndex = 96, ri_RelationDesc =
0x7efa664049c0, ri_NumIndices = 0, ri_IndexRelationDescs = 0x0,
ri_IndexRelationInfo = 0x0, ri_TrigDesc = 0x27a4328, ri_TrigFunctions =
0x27a4528, ri_TrigWhenExprs = 0x27a45b8, ri_TrigInstrument = 0x0,
ri_FdwRoutine = 0x0, ri_FdwState = 0x0, ri_usesFdwDirectModify = false, 
  ri_WithCheckOptions = 0x0, ri_WithCheckOptionExprs = 0x0,
ri_ConstraintExprs = 0x0, ri_junkFilter = 0x336fae0, ri_returningList =
0x25af130, ri_projectReturning = 0x336aa38, ri_onConflictArbiterIndexes =
0x0, ri_onConflict = 0x0, ri_PartitionCheck = 0x27a45d8,
ri_PartitionCheckExpr = 0x0, ri_PartitionRoot = 0x0,
ri_PartitionReadyForRouting = false}

(gdb) p *mtstate->resultRelInfo->ri_RelationDesc
$3 = {rd_node = {spcNode = 1663, dbNode = 20254, relNode = 458573263},
rd_smgr = 0x2098ee8, rd_refcnt = 8, rd_backend = -1, rd_islocaltemp = false,
rd_isnailed = false, rd_isvalid = true, rd_indexvalid = 1 '\001',
rd_statvalid = true, rd_createSubid = 0, rd_newRelfilenodeSubid = 0, rd_rel
= 0x7efa66404bd0, rd_att = 0x7efa66404ce0, rd_id = 458573263, 
  rd_lockInfo = {lockRelId = {relId = 458573263, dbId = 20254}}, rd_rules =
0x0, rd_rulescxt = 0x0, trigdesc = 0x7efa66405c40, rd_rsdesc = 0x0,
rd_fkeylist = 0x7efa66422940, rd_fkeyvalid = true, rd_partkeycxt = 0x0,
rd_partkey = 0x0, rd_pdcxt = 0x0, rd_partdesc = 0x0, rd_partcheck =
0x7efa6646f590, rd_indexlist = 0x7efa66417d60, rd_oidindex = 0, 
  rd_pkindex = 458573269, rd_replidindex = 458573269, rd_statlist = 0x0,
rd_indexattr = 0x0, rd_projindexattr = 0x0, rd_keyattr = 0x0, rd_pkattr =
0x0, rd_idattr = 0x0, rd_projidx = 0x0, rd_pubactions = 0x0, rd_options =
0x0, rd_index = 0x0, rd_indextuple = 0x0, rd_amhandler = 0, rd_indexcxt =
0x0, rd_amroutine = 0x0, rd_opfamily = 0x0, 
  rd_opcintype = 0x0, rd_support = 0x0, rd_supportinfo = 0x0, rd_indoption =
0x0, rd_indexprs = 0x0, rd_indpred = 0x0, rd_exclops = 0x0, rd_exclprocs =
0x0, rd_exclstrats = 0x0, rd_amcache = 0x0, rd_indcollation = 0x0,
rd_fdwroutine = 0x0, rd_toastoid = 0, pgstat_info = 0x1b37f18}

(gdb) p *mtstate->resultRelInfo->ri_RelationDesc->rd_rel
$1 = {relname = {data = "room_daily_rates_2014", '\000' <repeats 42 times>},
relnamespace = 20273, reltype = 458573265, reloftype = 0, relowner = 17696,
relam = 0, relfilenode = 458573263, reltablespace = 0, relpages = 29797,
reltuples = 4678117, relallvisible = 0, reltoastrelid = 0, relhasindex =
true, relisshared = false, relpersistence = 112 'p', 
  relkind = 114 'r', relnatts = 7, relchecks = 3, relhasoids = false,
relhasrules = false, relhastriggers = true, relhassubclass = false,
relrowsecurity = false, relforcerowsecurity = false, relispopulated = true,
relreplident = 100 'd', relispartition = true, relrewrite = 0, relfrozenxid
= 4167686063, relminmxid = 9744227}

(gdb) p *node
$6 = {plan = {type = T_ModifyTable, startup_cost = 0.52249999999999996,
total_cost = 6788616.1286984282, plan_rows = 15, plan_width = 94,
parallel_aware = false, parallel_safe = false, plan_node_id = 149,
targetlist = 0x25af130, qual = 0x0, lefttree = 0x0, righttree = 0x0,
initPlan = 0x0, extParam = 0x0, allParam = 0x0}, operation = CMD_DELETE, 
  canSetTag = false, nominalRelation = 92, partitioned_rels = 0x25688e8,
partColsUpdated = false, resultRelations = 0x2568938, resultRelIndex = 16,
rootResultRelIndex = 1, plans = 0x2568b28, withCheckOptionLists = 0x0,
returningLists = 0x25af0e0, fdwPrivLists = 0x25b5038, fdwDirectModifyPlans =
0x0, rowMarks = 0x25b5248, epqParam = 86, 
  onConflictAction = ONCONFLICT_NONE, arbiterIndexes = 0x0, onConflictSet =
0x0, onConflictWhere = 0x0, exclRelRTI = 91, exclRelTlist = 0x0}

(gdb) p *estate
$7 = {type = T_EState, es_direction = ForwardScanDirection, es_snapshot =
0x208ba70, es_crosscheck_snapshot = 0x0, es_range_table = 0x282af48,
es_plannedstmt = 0x2829e98, es_sourceText = 0x0, es_junkFilter = 0x0,
es_output_cid = 0, es_result_relations = 0x35f3378, es_num_result_relations
= 34, es_result_relation_info = 0x0, 
  es_root_result_relations = 0x0, es_num_root_result_relations = 0,
es_tuple_routing_result_relations = 0x0, es_trig_target_relations = 0x0,
es_trig_tuple_slot = 0x3814810, es_trig_oldtup_slot = 0x0,
es_trig_newtup_slot = 0x0, es_param_list_info = 0x2708d80,
es_param_exec_vals = 0x35f7388, es_queryEnv = 0x0, es_query_cxt = 0x35f1330,

  es_tupleTable = 0x35f96e8, es_rowMarks = 0x27f81b8, es_processed = 0,
es_lastoid = 0, es_top_eflags = 0, es_instrument = 0, es_finished = false,
es_exprcontexts = 0x35f3310, es_subplanstates = 0x35f97c8,
es_auxmodifytables = 0x38148e0, es_per_tuple_exprcontext = 0x0, es_epqTuple
= 0x35f6398, es_epqTupleSet = 0x35f9398, es_epqScanDone = 0x35f71c8, 
  es_use_parallel_mode = false, es_query_dsa = 0x0, es_jit_flags = 0, es_jit
= 0x0, es_jit_worker_instr = 0x0}


Re: BUG #15677: Crash while deleting from partitioned table

От
Amit Langote
Дата:
Hi,

On 2019/03/08 16:29, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      15677
> Logged by:          Norbert Benkocs
> Email address:      infernorb@gmail.com
> PostgreSQL version: 11.2
> Operating system:   CentOS Linux release 7.4.1708 (Core)
> Description:        
> 
> Version: PostgreSQL 11.2 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 4.8.5
> 20150623 (Red Hat 4.8.5-36), 64-bit
> OS: CentOS Linux release 7.4.1708 (Core) 
> 
> Hello,
> 
> We have an insert/update/delete query on a partitioned table (multiple
> CTE-s) that causes our PostgreSQL server to crash once every few days. We
> haven't been able to reproduce this crash so far, and re-running the same
> query with the same parameters didn't result in a crash either. The table in
> question is updated thousands of times each hour, and most of these work
> fine.
> Previously this table was not partitioned, we started seeing the crash after
> partitioning the table.

Thanks for the report and for providing detailed information which was
useful for diagnosing the bug.

I looked at this:

> (gdb) bt
> #0  ExecInitModifyTable (node=node@entry=0x2568180,
> estate=estate@entry=0x35f1440, eflags=eflags@entry=0) at
> nodeModifyTable.c:2327
> #1  0x000000000060af88 in ExecInitNode (node=0x2568180,
> estate=estate@entry=0x35f1440, eflags=eflags@entry=0) at
> execProcnode.c:174
> #2  0x0000000000606fdd in EvalPlanQualStart (epqstate=0x3773848,
> epqstate=0x3773848, planTree=0x36c3f08, parentestate=0xa6) at
> execMain.c:3257

note: ExecInitModifyTable() being called from EvalPlanQualStart().

and:

> (gdb) p *mtstate
> $4 = {ps = {type = T_ModifyTableState, plan = 0x2568180, state = 0x35f1440,
> ExecProcNode = 0x626e30 <ExecModifyTable>, ExecProcNodeReal = 0x0,
> instrument = 0x0, worker_instrument = 0x0, worker_jit_instrument = 0x0, qual
> = 0x0, lefttree = 0x0, righttree = 0x0, initPlan = 0x0, subPlan = 0x0,
> chgParam = 0x0, ps_ResultTupleSlot = 0x0, ps_ExprContext = 0x0, 
>     ps_ProjInfo = 0x0, scandesc = 0x0}, operation = CMD_DELETE, canSetTag =
> false, mt_done = false, mt_plans = 0x39c8088, mt_nplans = 15, mt_whichplan =
> 0, resultRelInfo = 0x35f3f78, rootResultRelInfo = 0xc0, mt_arowmarks =

note: rootResultRelInfo = 0xc0

and:

> (gdb) p *estate
> $7 = {type = T_EState, es_direction = ForwardScanDirection, es_snapshot =
> 0x208ba70, es_crosscheck_snapshot = 0x0, es_range_table = 0x282af48,
> es_plannedstmt = 0x2829e98, es_sourceText = 0x0, es_junkFilter = 0x0,
> es_output_cid = 0, es_result_relations = 0x35f3378, es_num_result_relations
> = 34, es_result_relation_info = 0x0, 
>   es_root_result_relations = 0x0, es_num_root_result_relations = 0,

note: es_root_result_relations = 0x0

From the above, I could conclude that EvalPlanQualStart() is not copying
the value of es_root_result_relations from the parent EState.  That means
ExecInitModifyTable called in the context of EvalPlanQual() checking has
the wrong value of es_root_result_relations to begin with, so the value it
computes for rootResultRelInfo for the ModifyTableState it's initializing
is wrong (0xc0 as seen above).

To reproduce, use these steps (needs 2 sessions to invoke EvalPlanQual at
all):

Setup:

create table p (a int) partition by list (a);
create table p1 partition of p for values in (1);
insert into p values (1);

Session 1:

begin;
update p set a = a;

Session 2:

with u as (update p set a = a returning p.*) update p set a = u.a from u;
<blocks>

Session 1:
commit;

Session 2:
<invokes-EvalPlanQual-and-crashes>
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.


This can be fixed by the attached patch, which modifies EvalPlanQualStart
to copy the value of es_root_result_relations from its parent EState.

Thanks,
Amit

Вложения

Re: BUG #15677: Crash while deleting from partitioned table

От
Amit Langote
Дата:
On 2019/03/11 12:02, Amit Langote wrote:
> From the above, I could conclude that EvalPlanQualStart() is not copying
> the value of es_root_result_relations from the parent EState.  That means
> ExecInitModifyTable called in the context of EvalPlanQual() checking has
> the wrong value of es_root_result_relations to begin with, so the value it
> computes for rootResultRelInfo for the ModifyTableState it's initializing
> is wrong (0xc0 as seen above).
> 
> To reproduce, use these steps (needs 2 sessions to invoke EvalPlanQual at
> all):
> 
> Setup:
> 
> create table p (a int) partition by list (a);
> create table p1 partition of p for values in (1);
> insert into p values (1);
> 
> Session 1:
> 
> begin;
> update p set a = a;
> 
> Session 2:
> 
> with u as (update p set a = a returning p.*) update p set a = u.a from u;
> <blocks>
> 
> Session 1:
> commit;
> 
> Session 2:
> <invokes-EvalPlanQual-and-crashes>
> server closed the connection unexpectedly
>     This probably means the server terminated abnormally
>     before or while processing the request.
> The connection to the server was lost. Attempting reset: Failed.
> 
> 
> This can be fixed by the attached patch, which modifies EvalPlanQualStart
> to copy the value of es_root_result_relations from its parent EState.

Forgot to mention that this bug exists in PG 10, 11, and HEAD.  The same
patch applies to all three branches.

Thanks,
Amit



Re: BUG #15677: Crash while deleting from partitioned table

От
Michael Paquier
Дата:
On Mon, Mar 11, 2019 at 12:55:05PM +0900, Amit Langote wrote:
> Forgot to mention that this bug exists in PG 10, 11, and HEAD.  The same
> patch applies to all three branches.

Could you add that to the future CF so as we don't forget?
--
Michael

Вложения

Re: BUG #15677: Crash while deleting from partitioned table

От
Amit Langote
Дата:
On 2019/03/11 13:55, Michael Paquier wrote:
> On Mon, Mar 11, 2019 at 12:55:05PM +0900, Amit Langote wrote:
>> Forgot to mention that this bug exists in PG 10, 11, and HEAD.  The same
>> patch applies to all three branches.
> 
> Could you add that to the future CF so as we don't forget?

Done:

* update/delete on partitioned tables may crash under concurrency *
https://commitfest.postgresql.org/23/2057/

Thanks,
Amit



Re: BUG #15677: Crash while deleting from partitioned table

От
Amit Langote
Дата:
On 2019/03/11 14:04, Amit Langote wrote:
> On 2019/03/11 13:55, Michael Paquier wrote:
>> On Mon, Mar 11, 2019 at 12:55:05PM +0900, Amit Langote wrote:
>>> Forgot to mention that this bug exists in PG 10, 11, and HEAD.  The same
>>> patch applies to all three branches.
>>
>> Could you add that to the future CF so as we don't forget?
> 
> Done:
> 
> * update/delete on partitioned tables may crash under concurrency *
> https://commitfest.postgresql.org/23/2057/

Thanks to cfbot [1], I came to know about a silly typo I had made in the
patch.

Attached updated patch.

Thanks,
Amit

[1] https://travis-ci.org/postgresql-cfbot/postgresql/builds/505157218

Вложения

Re: BUG #15677: Crash while deleting from partitioned table

От
Alvaro Herrera
Дата:
On 2019-Mar-11, Amit Langote wrote:

> To reproduce, use these steps (needs 2 sessions to invoke EvalPlanQual at
> all):
> 
> Setup:
> 
> create table p (a int) partition by list (a);
> create table p1 partition of p for values in (1);
> insert into p values (1);
> 
> Session 1:
> 
> begin;
> update p set a = a;
> 
> Session 2:
> 
> with u as (update p set a = a returning p.*) update p set a = u.a from u;
> <blocks>

Hmmm ... I rewrote to this isolationtester spec and doesn't reproduce a
problem in master.  I am probably missing something ...

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

Re: BUG #15677: Crash while deleting from partitioned table

От
Amit Langote
Дата:
On Sat, Jun 29, 2019 at 6:56 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> On 2019-Mar-11, Amit Langote wrote:
>
> > To reproduce, use these steps (needs 2 sessions to invoke EvalPlanQual at
> > all):
> >
> > Setup:
> >
> > create table p (a int) partition by list (a);
> > create table p1 partition of p for values in (1);
> > insert into p values (1);
> >
> > Session 1:
> >
> > begin;
> > update p set a = a;
> >
> > Session 2:
> >
> > with u as (update p set a = a returning p.*) update p set a = u.a from u;
> > <blocks>
>
> Hmmm ... I rewrote to this isolationtester spec and doesn't reproduce a
> problem in master.  I am probably missing something ...

This has been taken care of in
a8cb8f124679e0c373fdd07108b136e1cf1ee14a.  Actually, the discussion
leading to that commit occurred in a different thread; see here:

https://www.postgresql.org/message-id/af5e2f12-ccb0-04b4-2bea-6255dcce3c79%40lab.ntt.co.jp

The CF entry has been marked committed too:

https://commitfest.postgresql.org/23/2057/

Thanks,
Amit