BUG #17257: (auto)vacuum hangs within lazy_scan_prune()

Поиск
Список
Период
Сортировка
От PG Bug reporting form
Тема BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Дата
Msg-id 17257-1e46de26bec11433@postgresql.org
обсуждение исходный текст
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      17257
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 14.0
Operating system:   Ubuntu 20.04
Description:

When running concurrent installchecks (x100) I've observed the autovacuum
process hanging:

tester    328713  328706  0 12:19 ?        00:02:03 postgres: logical
replication launcher 
tester   3301788  328706  0 16:33 ?        00:00:02 postgres: postgres
regress030 127.0.0.1(44534) CREATE TABLE
tester   3302076  328706 99 16:33 ?        05:17:10 postgres: autovacuum
worker regress030
tester   3311424  328706  0 16:34 ?        00:03:09 postgres: postgres
regress090 127.0.0.1(48744) ALTER TABLE
tester   3311453  328706  1 16:34 ?        00:03:12 postgres: postgres
regress048 127.0.0.1(48768) ALTER TABLE
tester   3311461  328706  1 16:34 ?        00:03:11 postgres: postgres
regress017 127.0.0.1(48774) DROP TABLE
tester   3311490  328706  0 16:34 ?        00:03:09 postgres: postgres
regress006 127.0.0.1(48804) ALTER TABLE
tester   3311493  328706  0 16:34 ?        00:03:08 postgres: postgres
regress008 127.0.0.1(48806) ALTER TABLE

gdb showed the following stacktrace:
0x0000564d6c93cd1f in GetPrivateRefCountEntry (buffer=<optimized out>,
do_move=<optimized out>) at bufmgr.c:312
312             Assert(BufferIsValid(buffer));
(gdb) bt
#0  0x0000564d6c93cd1f in GetPrivateRefCountEntry (buffer=<optimized out>,
do_move=<optimized out>) at bufmgr.c:312
#1  0x0000564d6c93fb79 in GetPrivateRefCount (buffer=111422) at
bufmgr.c:398
#2  BufferGetBlockNumber (buffer=buffer@entry=111422) at bufmgr.c:2771
#3  0x0000564d6b83519d in heap_prune_chain (prstate=0x7fff07f03e10,
rootoffnum=7, buffer=111422) at pruneheap.c:625
#4  heap_page_prune (relation=relation@entry=0x7fa60a9cff80,
buffer=buffer@entry=111422, 
    vistest=vistest@entry=0x564d710816a0 <GlobalVisCatalogRels>,
old_snap_xmin=old_snap_xmin@entry=0, 
    old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false,
off_loc=<optimized out>) at pruneheap.c:278
#5  0x0000564d6b84a7d4 in lazy_scan_prune (vacrel=0x625000044190,
buf=<optimized out>, blkno=<optimized out>, 
    page=0x7fa642bdd580 <incomplete sequence \334>, vistest=<optimized out>,
prunestate=<optimized out>)
    at vacuumlazy.c:1741
#6  0x0000564d6b8597cd in lazy_scan_heap (aggressive=<optimized out>,
params=<optimized out>, vacrel=<optimized out>)
    at vacuumlazy.c:1384
#7  heap_vacuum_rel (rel=<optimized out>, params=<optimized out>,
bstrategy=<optimized out>) at vacuumlazy.c:638
#8  0x0000564d6bf9b6b6 in table_relation_vacuum (bstrategy=<optimized out>,
params=0x6250000272c4, rel=0x7fa60a9cff80)
    at ../../../src/include/access/tableam.h:1678
#9  vacuum_rel (relid=<optimized out>, relation=<optimized out>,
params=0x6250000272c4) at vacuum.c:2034
#10 0x0000564d6bfa11f0 in vacuum (relations=0x625000041278,
relations@entry=0x6250000552b0, 
    params=params@entry=0x6250000272c4, bstrategy=<optimized out>,
bstrategy@entry=0x625000053048, 
    isTopLevel=isTopLevel@entry=true) at vacuum.c:475
#11 0x0000564d6c6e3a37 in autovacuum_do_vac_analyze
(bstrategy=0x625000053048, tab=0x6250000272c0) at autovacuum.c:3242
#12 do_autovacuum () at autovacuum.c:2490
#13 0x0000564d6c6e6681 in AutoVacWorkerMain (argv=0x0, argc=0) at
autovacuum.c:1714
#14 0x0000564d6c6e6a95 in StartAutoVacWorker () at autovacuum.c:1499
#15 0x0000564d6c7272d4 in StartAutovacuumWorker () at postmaster.c:5561
#16 sigusr1_handler (postgres_signal_arg=<optimized out>) at
postmaster.c:5266
#17 <signal handler called>
#18 __GI___sigprocmask (how=2, set=<optimized out>, oset=0x0) at
../sysdeps/unix/sysv/linux/sigprocmask.c:39
#19 0x0000564d6b52efc4 in __interceptor_sigprocmask.part.0 ()
#20 0x0000564d6c727d9e in ServerLoop () at postmaster.c:1707
#21 0x0000564d6c72c309 in PostmasterMain (argc=argc@entry=3,
argv=argv@entry=0x603000000250) at postmaster.c:1417
#22 0x0000564d6b4f6995 in main (argc=3, argv=0x603000000250) at main.c:209

(gdb) b 1820
Breakpoint 1 at 0x564d6b84b378: file vacuumlazy.c, line 1820.
(gdb) cont
Continuing.

Breakpoint 1, lazy_scan_prune (vacrel=0x625000044190, buf=<optimized out>,
blkno=<optimized out>, 
    page=0x7fa642bdd580 <incomplete sequence \334>, vistest=<optimized out>,
prunestate=<optimized out>)
    at vacuumlazy.c:1820
1820                            goto retry;

(gdb) print *vacrel
$15 = {rel = 0x7fa60a9cff80, indrels = 0x625000043b70, nindexes = 5,
failsafe_active = false, 
  consider_bypass_optimization = true, do_index_vacuuming = true,
do_index_cleanup = true, do_rel_truncate = true, 
  bstrategy = 0x625000053048, lps = 0x0, old_rel_pages = 6, old_live_tuples
= 133, relfrozenxid = 726, relminmxid = 1, 
  OldestXmin = 48155004, FreezeLimit = 4293122300, MultiXactCutoff =
4289973311, 
  relnamespace = 0x625000044140 "pg_catalog", relname = 0x625000044168
"pg_constraint", indname = 0x0, blkno = 11, 
  offnum = 32, phase = VACUUM_ERRCB_PHASE_SCAN_HEAP, dead_tuples =
0x62a0000a8240, rel_pages = 13, scanned_pages = 12, 
  pinskipped_pages = 0, frozenskipped_pages = 0, tupcount_pages = 12,
pages_removed = 0, lpdead_item_pages = 9, 
  nonempty_pages = 11, new_rel_tuples = 0, new_live_tuples = 0, indstats =
0x6250000444d8, num_index_scans = 0, 
  tuples_deleted = 9, lpdead_items = 226, new_dead_tuples = 0, num_tuples =
237, live_tuples = 237}

I've also caught the similar hang on (auto)vacuuming pg_attrdef,
pg_shdepend, pg_class.

It seems that this eternal retry is caused by the HEAPTUPLE_RECENTLY_DEAD
state of a tuple processed in the lazy_scan_prune loop.


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: conchuela timeouts since 2021-10-09 system upgrade
Следующее
От: Alexander Lakhin
Дата:
Сообщение: Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()