Re: Inval reliability, especially for inplace updates

Поиск
Список
Период
Сортировка
От Paul A Jungwirth
Тема Re: Inval reliability, especially for inplace updates
Дата
Msg-id CA+renyWSo3WigOO68ydGCQmDdZGLvXFGP3mLGo03XN09AKtDig@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Inval reliability, especially for inplace updates  (Paul A Jungwirth <pj@illuminatedcomputing.com>)
Список pgsql-hackers
On Thu, Jul 31, 2025 at 9:53 AM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> On Thu, Oct 31, 2024 at 09:20:52PM -0700, Noah Misch wrote:
> > Here, one of the autovacuum workers had the guilty stack trace, appearing at
> > the end of this message.  heap_inplace_update_and_unlock() calls
> > CacheInvalidateHeapTupleInplace() while holding BUFFER_LOCK_EXCLUSIVE on a
> > buffer of pg_class.  CacheInvalidateHeapTupleInplace() may call
> > CatalogCacheInitializeCache(), which opens the cache's rel.  If there's not a
> > valid relcache entry for the catcache's rel, we scan pg_class to make a valid
> > relcache entry.  The ensuing hang makes sense.
>
> Personally I never expected that catcache could depend on relcache,
> since it seems lower-level. But it makes sense that you need a
> relcache of pg_class at least, so their relationship is more
> complicated than just layers.
>
> I'm struggling to understand how your explanation incorporates
> *concurrency*, since a self-deadlock only involves locks from one
> backend. But the point is that a concurrent invalidation causes the
> relcache entry to vanish, so that we need to rebuild it. (We can't get
> this far without having built the relcache for pg_class once already.)
>
> Specifically, we drop the table while autovacuum is updating its
> statistics. But how is that possible? Don't both those things
> exclusive-lock the row in pg_class? I must be misunderstanding.

I was curious so I decided to look into this some more. We have a
self-deadlock, but it's rare. Concurrency is necessary to trigger it.
The patch fixes the double-locking, but what was the sequence that
caused the problem?

The only thing reproi.sh does is add & drop tables over and over.

I looked at a few reproductions, and the situation was always the
same: one of the autovacuum workers was stuck analyzing pg_attribute,
with a stack trace like this (no different than Noah's):

#0  __futex_abstimed_wait_common
(futex_word=futex_word@entry=0x7fa76af77238,
expected=expected@entry=0, clockid=clockid@entry=0,
    abstime=abstime@entry=0x0, private=<optimized out>,
cancel=cancel@entry=true) at ./nptl/futex-internal.c:103
#1  0x00007fa7742d5f7b in __GI___futex_abstimed_wait_cancelable64
(futex_word=futex_word@entry=0x7fa76af77238,
expected=expected@entry=0,
    clockid=clockid@entry=0, abstime=abstime@entry=0x0,
private=<optimized out>) at ./nptl/futex-internal.c:139
#2  0x00007fa7742e0cff in do_futex_wait (sem=sem@entry=0x7fa76af77238,
abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111
#3  0x00007fa7742e0d90 in __new_sem_wait_slow64
(sem=sem@entry=0x7fa76af77238, abstime=0x0, clockid=0) at
./nptl/sem_waitcommon.c:183
#4  0x00007fa7742e0df9 in __new_sem_wait
(sem=sem@entry=0x7fa76af77238) at ./nptl/sem_wait.c:42
#5  0x00005583c641a6a8 in PGSemaphoreLock (sema=0x7fa76af77238) at
pg_sema.c:327
#6  0x00005583c652e1da in LWLockAcquire (lock=0x7fa76b4fb4e4,
mode=mode@entry=LW_SHARED) at lwlock.c:1318
#7  0x00005583c64df801 in LockBuffer (buffer=36, mode=mode@entry=1) at
bufmgr.c:4182
#8  0x00005583c5eda4b3 in heapam_index_fetch_tuple
(scan=0x7fa76adc1b30, tid=0x7fa76adc3fb8, snapshot=0x5583e8ede3d8,
slot=0x7fa76adc1030,
    call_again=0x7fa76adc3fbe, all_dead=0x7ffebf643e6f) at
heapam_handler.c:146
#9  0x00005583c5efb548 in table_index_fetch_tuple (scan=<optimized
out>, tid=tid@entry=0x7fa76adc3fb8, snapshot=<optimized out>,
    slot=slot@entry=0x7fa76adc1030,
call_again=call_again@entry=0x7fa76adc3fbe,
all_dead=all_dead@entry=0x7ffebf643e6f)
    at ../../../../src/include/access/tableam.h:1226
#10 0x00005583c5efd1a7 in index_fetch_heap
(scan=scan@entry=0x7fa76adc3f58, slot=slot@entry=0x7fa76adc1030) at
indexam.c:622
#11 0x00005583c5efd362 in index_getnext_slot (scan=0x7fa76adc3f58,
direction=direction@entry=ForwardScanDirection, slot=0x7fa76adc1030)
    at indexam.c:682
#12 0x00005583c5efa32a in systable_getnext
(sysscan=sysscan@entry=0x7fa76adc3dd0) at genam.c:512
#13 0x00005583c677fd20 in ScanPgRelation (targetRelId=<optimized out>,
indexOK=indexOK@entry=true,
    force_non_historic=force_non_historic@entry=false) at
relcache.c:385
#14 0x00005583c6780cf8 in RelationReloadNailed
(relation=relation@entry=0x7fa7740fd698) at relcache.c:2381
#15 0x00005583c678b67e in RelationClearRelation
(relation=relation@entry=0x7fa7740fd698, rebuild=<optimized out>) at
relcache.c:2527
#16 0x00005583c678ce85 in RelationFlushRelation
(relation=0x7fa7740fd698) at relcache.c:2839
#17 0x00005583c678cf00 in RelationCacheInvalidateEntry
(relationId=<optimized out>) at relcache.c:2900
#18 0x00005583c67714fe in LocalExecuteInvalidationMessage
(msg=0x7ffebf644220) at inval.c:666
#19 0x00005583c651585b in ReceiveSharedInvalidMessages (
    invalFunction=invalFunction@entry=0x5583c677130d
<LocalExecuteInvalidationMessage>,
    resetFunction=resetFunction@entry=0x5583c67700cf
<InvalidateSystemCaches>) at sinval.c:121
#20 0x00005583c677012c in AcceptInvalidationMessages () at inval.c:766
#21 0x00005583c651f8d2 in LockRelationOid (relid=<optimized out>,
lockmode=1) at lmgr.c:137
#22 0x00005583c5e503db in relation_open
(relationId=relationId@entry=1259, lockmode=lockmode@entry=1) at
relation.c:56
#23 0x00005583c5f595e5 in table_open
(relationId=relationId@entry=1259, lockmode=lockmode@entry=1) at
table.c:43
#24 0x00005583c677fc8c in ScanPgRelation (targetRelId=<optimized out>,
indexOK=<optimized out>,
    force_non_historic=force_non_historic@entry=false) at
relcache.c:368
#25 0x00005583c67807c1 in RelationReloadIndexInfo
(relation=relation@entry=0x7fa774141028) at relcache.c:2257
#26 0x00005583c6780dff in RelationReloadNailed
(relation=relation@entry=0x7fa774141028) at relcache.c:2359
#27 0x00005583c678b67e in RelationClearRelation
(relation=relation@entry=0x7fa774141028, rebuild=<optimized out>) at
relcache.c:2527
#28 0x00005583c678ce85 in RelationFlushRelation
(relation=0x7fa774141028) at relcache.c:2839
#29 0x00005583c678cf00 in RelationCacheInvalidateEntry
(relationId=<optimized out>) at relcache.c:2900
#30 0x00005583c67714fe in LocalExecuteInvalidationMessage
(msg=0x7ffebf644640) at inval.c:666
#31 0x00005583c651585b in ReceiveSharedInvalidMessages (
    invalFunction=invalFunction@entry=0x5583c677130d
<LocalExecuteInvalidationMessage>,
    resetFunction=resetFunction@entry=0x5583c67700cf
<InvalidateSystemCaches>) at sinval.c:121
#32 0x00005583c677012c in AcceptInvalidationMessages () at inval.c:766
#33 0x00005583c651f8d2 in LockRelationOid (relid=<optimized out>,
lockmode=1) at lmgr.c:137
#34 0x00005583c5e503db in relation_open
(relationId=relationId@entry=1259, lockmode=lockmode@entry=1) at
relation.c:56
#35 0x00005583c5f595e5 in table_open
(relationId=relationId@entry=1259, lockmode=lockmode@entry=1) at
table.c:43
#36 0x00005583c677fc8c in ScanPgRelation (targetRelId=<optimized out>,
indexOK=indexOK@entry=true,
    force_non_historic=force_non_historic@entry=false) at relcache.c:368
#37 0x00005583c6780cf8 in RelationReloadNailed
(relation=relation@entry=0x7fa7740fd698) at relcache.c:2381
#38 0x00005583c678b67e in RelationClearRelation
(relation=relation@entry=0x7fa7740fd698, rebuild=<optimized out>) at
relcache.c:2527
#39 0x00005583c678ce85 in RelationFlushRelation
(relation=0x7fa7740fd698) at relcache.c:2839
#40 0x00005583c678cf00 in RelationCacheInvalidateEntry
(relationId=<optimized out>) at relcache.c:2900
#41 0x00005583c67714fe in LocalExecuteInvalidationMessage
(msg=0x7ffebf644a40) at inval.c:666
#42 0x00005583c651585b in ReceiveSharedInvalidMessages (
    invalFunction=invalFunction@entry=0x5583c677130d
<LocalExecuteInvalidationMessage>,
    resetFunction=resetFunction@entry=0x5583c67700cf
<InvalidateSystemCaches>) at sinval.c:121
#43 0x00005583c677012c in AcceptInvalidationMessages () at inval.c:766
#44 0x00005583c651f8d2 in LockRelationOid (relid=<optimized out>,
lockmode=1) at lmgr.c:137
#45 0x00005583c5e503db in relation_open (relationId=1259,
lockmode=lockmode@entry=1) at relation.c:56
#46 0x00005583c5f595e5 in table_open (relationId=<optimized out>,
lockmode=lockmode@entry=1) at table.c:43
#47 0x00005583c676b11a in CatalogCacheInitializeCache
(cache=cache@entry=0x5583e8e79f80) at catcache.c:929
#48 0x00005583c676e3c2 in PrepareToInvalidateCacheTuple
(relation=relation@entry=0x7fa7740fd698,
tuple=tuple@entry=0x7fa76adc3a88,
    newtuple=newtuple@entry=0x0,
function=function@entry=0x5583c676fac8 <RegisterCatcacheInvalidation>,
    context=context@entry=0x7fa76adc38b8) at catcache.c:2164
#49 0x00005583c676fdc3 in CacheInvalidateHeapTupleCommon
(relation=relation@entry=0x7fa7740fd698,
tuple=tuple@entry=0x7fa76adc3a88,
    newtuple=newtuple@entry=0x0,
prepare_callback=prepare_callback@entry=0x5583c676fe93
<PrepareInplaceInvalidationState>) at inval.c:1350
#50 0x00005583c6770ab9 in CacheInvalidateHeapTupleInplace
(relation=relation@entry=0x7fa7740fd698,
tuple=tuple@entry=0x7fa76adc3a88,
    newtuple=newtuple@entry=0x0) at inval.c:1461
#51 0x00005583c5ec8dc9 in heap_inplace_update_and_unlock
(relation=0x7fa7740fd698, oldtup=<optimized out>,
tuple=0x7fa76adc3a88, buffer=36)
    at heapam.c:6311
#52 0x00005583c5efb33a in systable_inplace_update_finish
(state=0x7fa76adc3748, tuple=<optimized out>) at genam.c:884
#53 0x00005583c61fbc92 in vac_update_relstats
(relation=relation@entry=0x7fa774102b20, num_pages=num_pages@entry=89,
num_tuples=4460,
    num_all_visible_pages=<optimized out>,
hasindex=hasindex@entry=true, frozenxid=frozenxid@entry=0,
minmulti=minmulti@entry=0,
    frozenxid_updated=0x0, minmulti_updated=0x0, in_outer_xact=false)
at vacuum.c:1455
#54 0x00005583c60ec3f2 in do_analyze_rel
(onerel=onerel@entry=0x7fa774102b20,
params=params@entry=0x5583e8e9bda4,
    va_cols=va_cols@entry=0x0, acquirefunc=<optimized out>,
relpages=89, inh=inh@entry=false, in_outer_xact=false, elevel=13)
    at analyze.c:645
#55 0x00005583c60ed155 in analyze_rel (relid=<optimized out>,
relation=0x5583e8ef9fa0, params=params@entry=0x5583e8e9bda4,
va_cols=0x0,
    in_outer_xact=<optimized out>, bstrategy=<optimized out>) at analyze.c:262
#56 0x00005583c61fd2d4 in vacuum (relations=0x5583e8efc008,
params=params@entry=0x5583e8e9bda4, bstrategy=<optimized out>,
    bstrategy@entry=0x5583e8ef2e00, isTopLevel=isTopLevel@entry=true)
at vacuum.c:493
#57 0x00005583c641e58b in autovacuum_do_vac_analyze
(tab=tab@entry=0x5583e8e9bda0,
bstrategy=bstrategy@entry=0x5583e8ef2e00)
    at autovacuum.c:3180
#58 0x00005583c6420f84 in do_autovacuum () at autovacuum.c:2503
#59 0x00005583c64219ae in AutoVacWorkerMain (argc=argc@entry=0,
argv=argv@entry=0x0) at autovacuum.c:1716
#60 0x00005583c6421bb3 in StartAutoVacWorker () at autovacuum.c:1494
#61 0x00005583c6433c22 in StartAutovacuumWorker () at postmaster.c:5536
#62 0x00005583c64344ca in sigusr1_handler
(postgres_signal_arg=<optimized out>) at postmaster.c:5241
#63 <signal handler called>
#64 0x00007fa77434e904 in __GI___select (nfds=nfds@entry=8,
readfds=readfds@entry=0x7ffebf6461c0, writefds=writefds@entry=0x0,
    exceptfds=exceptfds@entry=0x0,
timeout=timeout@entry=0x7ffebf6461b0) at
../sysdeps/unix/sysv/linux/select.c:69
#65 0x00005583c6435442 in ServerLoop () at postmaster.c:1773
#66 0x00005583c6437c5e in PostmasterMain (argc=argc@entry=1,
argv=argv@entry=0x5583e8e0fe00) at postmaster.c:1481
#67 0x00005583c62d57f5 in main (argc=1, argv=0x5583e8e0fe00) at main.c:202

In frame 55 we are analyzing relation 1249 (pg_attribute):

(gdb) f 55
#55 0x00005583c60ed155 in analyze_rel (relid=<optimized out>,
relation=0x5583e8ef9fa0, params=params@entry=0x5583e8e9bda4,
va_cols=0x0,
    in_outer_xact=<optimized out>, bstrategy=<optimized out>) at analyze.c:262
262                     do_analyze_rel(onerel, params, va_cols, acquirefunc,
(gdb) p *relation
$1 = {type = T_RangeVar, catalogname = 0x0, schemaname =
0x5583e8ef3308 "pg_catalog", relname = 0x5583e8ef32e0 "pg_attribute",
inh = true,
  relpersistence = 112 'p', alias = 0x0, location = -1}
(gdb) p *onerel
$2 = {rd_node = {spcNode = 1663, dbNode = 5, relNode = 1249}, rd_smgr
= 0x5583e8eb98c0, rd_refcnt = 2, rd_backend = -1,
  rd_islocaltemp = false, rd_isnailed = true, rd_isvalid = true,
rd_indexvalid = true, rd_statvalid = false, rd_createSubid = 0,
  rd_newRelfilenodeSubid = 0, rd_firstRelfilenodeSubid = 0,
rd_droppedSubid = 0, rd_rel = 0x7fa774102d38, rd_att = 0x7fa774102e50,
  rd_id = 1249, rd_lockInfo = {lockRelId = {relId = 1249, dbId = 5}},
rd_rules = 0x0, rd_rulescxt = 0x0, trigdesc = 0x0, rd_rsdesc = 0x0,
  rd_fkeylist = 0x0, rd_fkeyvalid = false, rd_partkey = 0x0,
rd_partkeycxt = 0x0, rd_partdesc = 0x0, rd_pdcxt = 0x0,
  rd_partdesc_nodetached = 0x0, rd_pddcxt = 0x0,
rd_partdesc_nodetached_xmin = 0, rd_partcheck = 0x0, rd_partcheckvalid
= false,
  rd_partcheckcxt = 0x0, rd_indexlist = 0x7fa774120298, rd_pkindex =
2659, rd_replidindex = 0, rd_statlist = 0x0, rd_indexattr = 0x0,
  rd_keyattr = 0x0, rd_pkattr = 0x0, rd_idattr = 0x0, rd_pubdesc =
0x0, rd_options = 0x0, rd_amhandler = 3,
  rd_tableam = 0x5583c6b0bf00 <heapam_methods>, rd_index = 0x0,
rd_indextuple = 0x0, rd_indexcxt = 0x0, rd_indam = 0x0, rd_opfamily =
0x0,
  rd_opcintype = 0x0, rd_support = 0x0, rd_supportinfo = 0x0,
rd_indoption = 0x0, rd_indexprs = 0x0, rd_indpred = 0x0, rd_exclops =
0x0,
  rd_exclprocs = 0x0, rd_exclstrats = 0x0, rd_indcollation = 0x0,
rd_opcoptions = 0x0, rd_amcache = 0x0, rd_fdwroutine = 0x0,
  rd_toastoid = 0, pgstat_enabled = true, pgstat_info = 0x5583e8eb0090}

It makes sense that pg_attribute would experience a lot of churn,
since we just add & drop tables.

We call vac_update_relstats, which does an in-place update to pg_class
(frame 53).

Within vac_update_relstats, we call system_inplace_update_begin, which
LWLocks the buffer we want to update. That's the first part of the
self-deadlock. But it's no problem yet. We proceed in
vac_update_relstats to sys_inplace_update_finish. . . .

That sends an inplace invalidation message for the pg_class tuple
(frame 50). So we need to replace the tuple in the pg_class catcache.
But before we do that, we make sure the catcache is initialized (frame
47):

    /* Just in case cache hasn't finished initialization yet... */
    if (ccp->cc_tupdesc == NULL)
        CatalogCacheInitializeCache(ccp);

That locks pg_class with AccessShareLock, which looks for pending
invalidation messages (frame 43).

It finds one invalidating the relcache for pg_class:

(gdb) f 41
#41 0x00005583c67714fe in LocalExecuteInvalidationMessage
(msg=0x7ffebf644a40) at inval.c:666
666
RelationCacheInvalidateEntry(msg->rc.relId);
(gdb) p *msg
$9 = {id = -2 '\376', cc = {id = -2 '\376', dbId = 5, hashValue =
1259}, cat = {id = -2 '\376', dbId = 5, catId = 1259}, rc = {
    id = -2 '\376', dbId = 5, relId = 1259}, sm = {id = -2 '\376',
backend_hi = -47 '\321', backend_lo = 59642, rnode = {spcNode = 5,
      dbNode = 1259, relNode = 32679}}, rm = {id = -2 '\376', dbId =
5}, sn = {id = -2 '\376', dbId = 5, relId = 1259}}

(Btw where did this message come from?)

So we have to rebuild the relcache for pg_class. Obviously we need to
look up its entry in (heh) pg_class (frame 37). So we open that table
with an AccessShareLock, which looks for pending invalidation messages
(frame 32). We find one telling us to invalidate 2662
(pg_class_oid_index):

(gdb) f 30
#30 0x00005583c67714fe in LocalExecuteInvalidationMessage
(msg=0x7ffebf644640) at inval.c:666
666
RelationCacheInvalidateEntry(msg->rc.relId);
(gdb) p *msg
$12 = {id = -2 '\376', cc = {id = -2 '\376', dbId = 5, hashValue =
2662}, cat = {id = -2 '\376', dbId = 5, catId = 2662}, rc = {
    id = -2 '\376', dbId = 5, relId = 2662}, sm = {id = -2 '\376',
backend_hi = -47 '\321', backend_lo = 59642, rnode = {spcNode = 5,
      dbNode = 2662, relNode = 32679}}, rm = {id = -2 '\376', dbId =
5}, sn = {id = -2 '\376', dbId = 5, relId = 2662}}

I guess that is coming from the add & drop table commands.

So we reload the relcache entry for 2662 (frame 26). That opens
pg_class with AccessShareLock, which looks for pending invalidation
messages (frame 20). We find one telling us to invalidate pg_class:

(gdb) f 18
#18 0x00005583c67714fe in LocalExecuteInvalidationMessage
(msg=0x7ffebf644220) at inval.c:666
666
RelationCacheInvalidateEntry(msg->rc.relId);
(gdb) p *msg
$14 = {id = -2 '\376', cc = {id = -2 '\376', dbId = 5, hashValue =
1259}, cat = {id = -2 '\376', dbId = 5, catId = 1259}, rc = {
    id = -2 '\376', dbId = 5, relId = 1259}, sm = {id = -2 '\376',
backend_hi = 75 'K', backend_lo = 48996, rnode = {spcNode = 5,
      dbNode = 1259, relNode = 32679}}, rm = {id = -2 '\376', dbId =
5}, sn = {id = -2 '\376', dbId = 5, relId = 1259}}

So we rebuild the relcache for pg_class, which means scanning pg_class
(frame 14). We try to lock one of its buffers (frame 7), and we
self-deadlock.

So the problem is triggered by an invalidation message for the
pg_class relcache entry. But what sends that message? (Actually it
seems we got two of them.) Why does anything we're doing invalidate
pg_class? Normally an inplace update to pg_class sends an invalidation
message for the relation described by that tuple, not all of pg_class
itself (see CacheInvalidateHeapTupleCommon).

I couldn't figure that out, until I put a breakpoint at the top of
RegisterRelcacheInvalidation and ran `vacuum analyze`. When we analyze
pg_class (and btw the repro script gives it a lot of churn), we call
vac_update_relstats on it, so we are doing an inplace update of
pg_class on the tuple *about* pg_class. So that's why we send the
invalidation about the whole table.

The concurrency, then, is analyzing pg_attribute in one worker and
pg_class in another. No locks would prevent that. It doesn't have to
be pg_attribute; it could be some user table. But since the repro
script adds & drops tables, pg_attribute is getting churned.

Yours,

--
Paul              ~{:-)
pj@illuminatedcomputing.com



В списке pgsql-hackers по дате отправления: