FailedAssertion("pd_idx == pinfo->nparts", File: "execPartition.c", Line: 1689)

Поиск
Список
Период
Сортировка
От Justin Pryzby
Тема FailedAssertion("pd_idx == pinfo->nparts", File: "execPartition.c", Line: 1689)
Дата
Msg-id 20200802181131.GA27754@telsasoft.com
обсуждение исходный текст
Ответы Re: FailedAssertion("pd_idx == pinfo->nparts", File: "execPartition.c", Line: 1689)  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Core was generated by `postgres: telsasoft ts [local] BIND                                           '.

(gdb) bt
#0  0x00007f0951303387 in raise () from /lib64/libc.so.6
#1  0x00007f0951304a78 in abort () from /lib64/libc.so.6
#2  0x0000000000921005 in ExceptionalCondition (conditionName=conditionName@entry=0xa5db3d "pd_idx == pinfo->nparts",
errorType=errorType@entry=0x977389"FailedAssertion", 
 
    fileName=fileName@entry=0xa5da88 "execPartition.c", lineNumber=lineNumber@entry=1689) at assert.c:67
#3  0x0000000000672806 in ExecCreatePartitionPruneState (planstate=planstate@entry=0x908f6d8,
partitionpruneinfo=<optimizedout>) at execPartition.c:1689
 
#4  0x000000000068444a in ExecInitAppend (node=node@entry=0x7036b90, estate=estate@entry=0x11563f0,
eflags=eflags@entry=16)at nodeAppend.c:132
 
#5  0x00000000006731fd in ExecInitNode (node=0x7036b90, estate=estate@entry=0x11563f0, eflags=eflags@entry=16) at
execProcnode.c:179
#6  0x000000000069d03a in ExecInitResult (node=node@entry=0x70363d8, estate=estate@entry=0x11563f0,
eflags=eflags@entry=16)at nodeResult.c:210
 
#7  0x000000000067323c in ExecInitNode (node=0x70363d8, estate=estate@entry=0x11563f0, eflags=eflags@entry=16) at
execProcnode.c:164
#8  0x000000000069e834 in ExecInitSort (node=node@entry=0x7035ca8, estate=estate@entry=0x11563f0,
eflags=eflags@entry=16)at nodeSort.c:210
 
#9  0x0000000000672ff0 in ExecInitNode (node=0x7035ca8, estate=estate@entry=0x11563f0, eflags=eflags@entry=16) at
execProcnode.c:313
#10 0x00000000006812e8 in ExecInitAgg (node=node@entry=0x68311d0, estate=estate@entry=0x11563f0,
eflags=eflags@entry=16)at nodeAgg.c:3292
 
#11 0x0000000000672fb1 in ExecInitNode (node=0x68311d0, estate=estate@entry=0x11563f0, eflags=eflags@entry=16) at
execProcnode.c:328
#12 0x000000000068925a in ExecInitGatherMerge (node=node@entry=0x6830998, estate=estate@entry=0x11563f0,
eflags=eflags@entry=16)at nodeGatherMerge.c:110
 
#13 0x0000000000672f33 in ExecInitNode (node=0x6830998, estate=estate@entry=0x11563f0, eflags=eflags@entry=16) at
execProcnode.c:348
#14 0x00000000006812e8 in ExecInitAgg (node=node@entry=0x682eda8, estate=estate@entry=0x11563f0,
eflags=eflags@entry=16)at nodeAgg.c:3292
 
#15 0x0000000000672fb1 in ExecInitNode (node=node@entry=0x682eda8, estate=estate@entry=0x11563f0,
eflags=eflags@entry=16)at execProcnode.c:328
 
#16 0x000000000066c8e6 in InitPlan (eflags=16, queryDesc=<optimized out>) at execMain.c:1020
#17 standard_ExecutorStart (queryDesc=<optimized out>, eflags=16) at execMain.c:266
#18 0x00007f0944ca83b5 in pgss_ExecutorStart (queryDesc=0x1239b08, eflags=<optimized out>) at
pg_stat_statements.c:1007
#19 0x00007f09117e4891 in explain_ExecutorStart (queryDesc=0x1239b08, eflags=<optimized out>) at auto_explain.c:301
#20 0x00000000007f9983 in PortalStart (portal=0xeff810, params=0xfacc98, eflags=0, snapshot=0x0) at pquery.c:505
#21 0x00000000007f7370 in PostgresMain (argc=<optimized out>, argv=argv@entry=0xeb8500, dbname=0xeb84e0 "ts",
username=<optimizedout>) at postgres.c:1987
 
#22 0x000000000048916e in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4523
#23 BackendStartup (port=0xeb1000) at postmaster.c:4215
#24 ServerLoop () at postmaster.c:1727
#25 0x000000000076ec85 in PostmasterMain (argc=argc@entry=13, argv=argv@entry=0xe859b0) at postmaster.c:1400
#26 0x000000000048a82d in main (argc=13, argv=0xe859b0) at main.c:210

#3  0x0000000000672806 in ExecCreatePartitionPruneState (planstate=planstate@entry=0x908f6d8,
partitionpruneinfo=<optimizedout>) at execPartition.c:1689
 
        pd_idx = <optimized out>
        pp_idx = <optimized out>
        pprune = 0x908f910
        partdesc = 0x91937f8
        pinfo = 0x7d6ee78
        partrel = <optimized out>
        partkey = 0xfbba28
        lc2__state = {l = 0x7d6ee20, i = 0}
        partrelpruneinfos = 0x7d6ee20
        lc2 = <optimized out>
        npartrelpruneinfos = <optimized out>
        prunedata = 0x908f908
        j = 0
        lc__state = {l = 0x7d6edc8, i = 0}
        estate = 0x11563f0
        prunestate = 0x908f8b0
        n_part_hierarchies = <optimized out>
        lc = <optimized out>
        i = 0

(gdb) p *pinfo
$2 = {type = T_PartitionedRelPruneInfo, rtindex = 7, present_parts = 0x7d6ef10, nparts = 414, subplan_map = 0x7d6ef68,
subpart_map= 0x7d6f780, relid_map = 0x7d6ff98, initial_pruning_steps = 0x7d707b0, 
 
  exec_pruning_steps = 0x0, execparamids = 0x0}

(gdb) p pd_idx        
$3 = <optimized out>


< 2020-08-02 02:04:17.358 SST  >LOG:  server process (PID 20954) was terminated by signal 6: Aborted
< 2020-08-02 02:04:17.358 SST  >DETAIL:  Failed process was running: 
                    INSERT INTO child.cdrs_data_users_per_cell_20200801 (...list of columns elided...)
                    (
                    SELECT ..., $3::timestamp, $2,
                MODE() WITHIN GROUP (ORDER BY ...) AS ..., STRING_AGG(DISTINCT ..., ',') AS ..., ...

This crashed at 2am, which at first I thought was maybe due to simultaneously
creating today's partition.

Aug  2 02:04:08 telsasoftsky abrt-hook-ccpp: Process 19264 (postgres) of user 26 killed by SIGABRT - dumping core
Aug  2 02:04:17 telsasoftsky abrt-hook-ccpp: Process 20954 (postgres) of user 26 killed by SIGABRT - ignoring (repeated
crash)

Running:
postgresql13-server-13-beta2_1PGDG.rhel7.x86_64

Maybe this is a problem tickled by something new in v13.  However, this is a
new VM, and at the time of the crash I was running a shell loop around
pg_restore, in reverse-chronological order.  I have full logs, and I found that
just CREATEd was a table which the crashing process would've tried to SELECT FROM:

| 2020-08-02 02:04:01.48-11  | duration: 106.275 ms  statement: CREATE TABLE child.cdrs_huawei_sgwrecord_2019_06_14 (

That table *currently* has:
|Number of partitions: 416 (Use \d+ to list them.)
And the oldest table is still child.cdrs_huawei_sgwrecord_2019_06_14 (since the
shell loop probably quickly spun through hundreds of pg_restores, failing to
connect to the database "in recovery").  And today's partition was already
created, at: 2020-08-02 01:30:35.  So I think 

Based on commit logs, I suspect this may be an "older bug", specifically maybe
with:

|commit 898e5e3290a72d288923260143930fb32036c00c
|Author: Robert Haas <rhaas@postgresql.org>
|Date:   Thu Mar 7 11:13:12 2019 -0500
|
|    Allow ATTACH PARTITION with only ShareUpdateExclusiveLock.

I don't think it matters, but the process surrounding the table being INSERTed
INTO is more than a little special, involving renames, detaches, creation,
re-attaching within a transaction.  I think that doesn't matter though, and the
issue is surrounding the table being SELECTed *from*, which is actually behind
a view.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Removing <@ from contrib/intarray's GiST opclasses
Следующее
От: Daniel Gustafsson
Дата:
Сообщение: Re: Default gucs for EXPLAIN