On Mon, Jun 13, 2016 at 4:52 PM, Peter Tripp <peter@chartio.com> wrote:
> Thank you for your assistance in tracking this down Peter G. My apologies
> for the delayed reply, I've was out sick in days following my original post.
So, I've managed to create a test case that triggers this bug
reliably, without involving VACUUM, or foreign keys.
I have a test case involving pgbench and some conflicting jsonb-based
UPSERT statements with large, toastable jsonb datums. I did hand tune
the data distributions and number of clients a little to get the
problem to reproduce. I was attempting to roughly simulate a
customer's problem report, having failed with simpler cases that
involved upserting with TOAST (I approximated the data distributions
involved, and so on).
Here is a backtrace following promoting the "attempted to delete
invisible tuple" elog's elevel to PANIC:
(gdb) bt
#0 0x00007fbdbe2be418 in __GI_raise (sig=sig@entry=6) at
../sysdeps/unix/sysv/linux/raise.c:54
#1 0x00007fbdbe2c001a in __GI_abort () at abort.c:89
#2 0x00000000007dd674 in errfinish (dummy=<optimized out>) at elog.c:557
#3 0x00000000004a4e84 in heap_delete
(relation=relation@entry=0x7fbdbf9aab38, tid=tid@entry=0x156e9bc,
cid=0, crosscheck=crosscheck@entry=0x0, wait=wait@entry=1 '\001',
hufd=hufd@entry=0x7ffdccd28630) at heapam.c:3015
#4 0x00000000004a4f03 in simple_heap_delete
(relation=relation@entry=0x7fbdbf9aab38, tid=0x156e9bc) at
heapam.c:3378
#5 0x00000000004ae224 in toast_delete_datum (value=<optimized out>,
rel=0x7fbdbf9a5250) at tuptoaster.c:1706
#6 0x00000000004aee17 in toast_delete (rel=rel@entry=0x7fbdbf9a5250,
oldtup=oldtup@entry=0x7ffdccd2bf90) at tuptoaster.c:509
#7 0x00000000004a7cbf in heap_abort_speculative
(relation=0x7fbdbf9a5250, tuple=0x15b87b0) at heapam.c:6003
#8 0x00000000005e5624 in ExecInsert (canSetTag=1 '\001',
estate=0x156d0d0, onconflict=ONCONFLICT_UPDATE,
arbiterIndexes=0x15aa460, planSlot=0x156d7d8, slot=0x156d7d8,
mtstate=0x156d320) at nodeModifyTable.c:443
#9 ExecModifyTable (node=0x156d320) at nodeModifyTable.c:1496
Sure enough, the error code path goes through simple_heap_delete(),
and this is related to TOAST. It usually takes no more than 5 seconds
for the test case to show the problem. I'll go work on a proper
diagnosis now.
Thanks for your help, Peter.
--
Peter Geoghegan