Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication

Поиск
Список
Период
Сортировка
От Alvaro Herrera
Тема Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication
Дата
Msg-id 20191126171054.GB4161@alvherre.pgsql
обсуждение исходный текст
Ответ на Re: BUG #16125: Crash of PostgreSQL's wal sender during logicalreplication  (Andres Freund <andres@anarazel.de>)
Список pgsql-bugs
On 2019-Nov-19, Andres Freund wrote:

> > > 2518:rmgr: Heap        len (rec/tot):   2063/  2063, tx: 1667601530, lsn: 25EE/D66F11E8, prev 25EE/D66F0CB0,
desc:INSERT off 4, blkref #0: rel 1663/13018/88964795 blk 3125
 
> > > 2520:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F1A38, prev 25EE/D66F19F8,
desc:INSERT_LEAF off 256, blkref #0: rel 1663/13018/88964797 blk 15840
 
> > > 2522:rmgr: Heap        len (rec/tot):   1341/  9385, tx: 1667601530, lsn: 25EE/D66F2118, prev 25EE/D66F1A78,
desc:INSERT off 7, blkref #0: rel 1663/13018/88964795 blk 3139 FPW
 
> > > 2524:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F54C8, prev 25EE/D66F45E0,
desc:INSERT_LEAF off 257, blkref #0: rel 1663/13018/88964797 blk 15840
 
> > > 2525:rmgr: Heap        len (rec/tot):   1671/  1671, tx: 1667601530, lsn: 25EE/D66F5508, prev 25EE/D66F54C8,
desc:INSERT off 2, blkref #0: rel 1663/13018/88964792 blk 18475
 
> > > 2527:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601530, lsn: 25EE/D66F6738, prev 25EE/D66F5B90,
desc:INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
 
> > > 2530:rmgr: Btree       len (rec/tot):     96/    96, tx: 1667601530, lsn: 25EE/D66F7E40, prev 25EE/D66F7E10,
desc:INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
 
> > > 2531:rmgr: Btree       len (rec/tot):     80/    80, tx: 1667601530, lsn: 25EE/D66F7EA0, prev 25EE/D66F7E40,
desc:INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
 
> > > 2532:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7EF0, prev 25EE/D66F7EA0,
desc:DELETE off 2 , blkref #0: rel 1663/13018/88964792 blk 18475
 
> > > 2533:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7F28, prev 25EE/D66F7EF0,
desc:DELETE off 4 , blkref #0: rel 1663/13018/88964795 blk 3125
 
> > > 2534:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601530, lsn: 25EE/D66F7F60, prev 25EE/D66F7F28,
desc:DELETE off 7 , blkref #0: rel 1663/13018/88964795 blk 3139
 
> > > 56346:rmgr: Heap        len (rec/tot):    378/  8198, tx: 1667601537, lsn: 25EE/D6DE33B0, prev 25EE/D6DE1628,
desc:DELETE off 6 KEYS_UPDATED , blkref #0: rel 1663/13018/88964815 blk 1938695 FPW
 
> 
> > > 56347:rmgr: Heap        len (rec/tot):     54/    54, tx: 1667601538, lsn: 25EE/D6DE53D0, prev 25EE/D6DE33B0,
desc:LOCK off 5: xid 1667601538: flags 0 LOCK_ONLY EXCL_LOCK , blkref #0: rel 1663/13018/88964792 blk 18460
 
> > > 56348:rmgr: Heap        len (rec/tot):   6563/  6563, tx: 1667601538, lsn: 25EE/D6DE5408, prev 25EE/D6DE53D0,
desc:UPDATE off 5 xmax 1667601538 ; new off 3 xmax 0, blkref #0: rel 1663/13018/88964792 blk 18499, blkref #1: rel
1663/13018/88964792blk 18460
 
> > > 56349:rmgr: Btree       len (rec/tot):     64/    64, tx: 1667601538, lsn: 25EE/D6DE6DC8, prev 25EE/D6DE5408,
desc:INSERT_LEAF off 140, blkref #0: rel 1663/13018/88964798 blk 2058
 
> > > 56350:rmgr: Btree       len (rec/tot):     96/    96, tx: 1667601538, lsn: 25EE/D6DE6E08, prev 25EE/D6DE6DC8,
desc:INSERT_LEAF off 28, blkref #0: rel 1663/13018/88964799 blk 5076
 
> > > 56351:rmgr: Btree       len (rec/tot):     80/    80, tx: 1667601538, lsn: 25EE/D6DE6E68, prev 25EE/D6DE6E08,
desc:INSERT_LEAF off 88, blkref #0: rel 1663/13018/88964800 blk 3412
 
> > > 56352:rmgr: Transaction len (rec/tot):     66/    66, tx: 1667601527, lsn: 25EE/D6DE6EB8, prev 25EE/D6DE6E68,
desc:COMMIT 2019-11-16 03:01:39.326201 UTC; subxacts: 1667601528 1667601530 1667601537 1667601538
 
> 
> > So, this DELETE change triggers the issue. If I understand it correctly,
> > that means the transaction locked the tuple and then deleted it. But I
> > don't see any record about the lock.
> 
> Hm. I don't think it necessarily means that. compute_new_xmax_infomask(), when
> called from heap_delete(), will e.g. set HEAP_KEYS_UPDATED when
> 
>     if (old_infomask & HEAP_XMAX_INVALID)
> 
> is true. Which is the most common case.

I don't understand what you guys are saying.  There's a DELETE at
25EE/D6DE33B0 (line 56346) which seems to be the one causing trouble.
Right?  There's also a LOCK+UPDATE sequence just below that, which is
unrelated (different rel).  Are you saying that one of them is toast and
the other is its main heap?  It would be useful to know more about the
rels.

> I actually find it *more*
> curious to *not* see that for a delete. I've not traced fully through
> it, yet.
> Alvaro, it's not clear to me whether the the multixact codepaths in
> compute_new_xmax_infomask() actually work as expected for deletes.

Are you saying that the three *prior* deletes (lines 2532-2534) are
strange because they lack the KEYS_UPDATED bit?  (Note these are all for
different rels).

I'm not sure what to do about this report at this point.
Notwithstanding my general failure to focus on anything too complex
given my surrounding social context.

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: BUG #16138: Error while installing PostgreSQL
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: Failed assertion clauses != NIL