Re: Inval reliability, especially for inplace updates

Поиск
Список
Период
Сортировка
От Paul A Jungwirth
Тема Re: Inval reliability, especially for inplace updates
Дата
Msg-id CA+renyWCW+_2QvXERBQ+mna6ANwAVXXmHKCA-WzL04bZRsjoBA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Inval reliability, especially for inplace updates  (Noah Misch <noah@leadboat.com>)
Список pgsql-hackers
Surya Poondla (cc'd) and I took another look at this as part of the
November Patch Review Workshop.

We think it looks good. But I couldn't get the latest patch to apply
on top of REL_17_STABLE until I did this:

```
git am inplace160-inval-durability-inplace-v7.patch_v17
git revert bc6bad8857 # revert the revert of "WAL-log inplace update"
git am inplace280-comment-fix-v1.patch.nocfbot # attached
git am inplace290-comments202508-v1.patch
```

The inplace280 step adds a small comment change that seems to be in
your git history, but I couldn't find it in the email chain. Also the
290 patch has context from reverting the WAL-log revert.

The patch avoids deadlocks by reordering invalidation prep before
buffer locking. While no explicit assertion exists to detect future
violations, would it be helpful to add a helper or macro that enforces
this lock ordering rule more visibly? Probably not for a backpatch,
but in master?

On Sun, Aug 24, 2025 at 4:39 PM Noah Misch <noah@leadboat.com> wrote:
> > Some of the comments felt a bit compressed. They make sense in the
> > context of this fix, but reading them cold seems like it will be
> > challenging. For example this took a lot of thinking to follow:
> >
> >      * Construct shared cache inval if necessary.  Because we pass a tuple
> >      * version without our own inplace changes or inplace changes other
> >      * sessions complete while we wait for locks, inplace update mustn't
> >      * change catcache lookup keys.  But we aren't bothering with index
> >      * updates either, so that's true a fortiori.
>
> What do you think of the attached rewrite?  I also removed this part:
>
> -        * If we're mutating a tuple visible only to this transaction, there's an
> -        * equivalent transactional inval from the action that created the tuple,
> -        * and this inval is superfluous.
>
> That would have needed s/this transaction/this command/ to be correct, and I
> didn't feel it was saying something important enough to keep.  There are
> plenty of ways for invals to be redundant.

Thanks for expanding on this! Here are some thoughts about the new comment:

+    * Register shared cache invals if necessary.  Our input to inval can be
+    * weaker than heap_update() input to inval in these ways:

Perhaps "than the heap_update() input" or "than heap_update()'s input"?

+    * - This passes only the old version of the tuple.  Inval reacts only to
+    * catcache lookup key columns and pg_class.oid values stored in
+    * relcache-relevant catalog columns.  All of those columns are indexed.
+    * Inplace update mustn't be used for any operations that could change
+    * those.  Hence, the new tuple would provide no additional inval-relevant
+    * information.  Those facts also make it fine to skip updating indexes.

This is confusing to me. "Inval only reacts": who is inval? Are you
talking about the other backends when they receive the message? After
spending a lot of time, I think you mean
CacheInvalidateHeapTupleCommon, how it decides whether to invalidate
first the catcache and then the relcache. Also I wondered, if an
inplace update never changes index keys, and [something] only cares
about inval messages that change index keys, why are we sending an
inval message at all? After a few months away from this patch, it was
hard for me to remember. I think the comment is a bit misleading. We
*do* send catcache invalidations if non-key columns change. What about
this?:

+    * - This passes only the old version of the tuple. Catcache
invalidation doesn't need newtuple because inplace updates never
change key columns, so it only needs to invalidate one hash value, not
two. [For the same reason, we don't need to update indexes.] Relcache
invalidation (in CacheInvalidateHeapTupleCommon) ignores newtuple
altogether, even for regular updates, because an update can never move
a tuple from one relcache entry to another.

I bracketed the line about indexes, because I don't understand why
we're talking about updating indexes here. I don't see anything about
that in CacheInvalidateHeapTupleCommon or PrepareInvalidationState
(which doesn't have access to newtuple anyway).

Also it feels like this comment (or something similar) really belongs
on CacheInvalidateHeapTupleInplace. And that function doesn't need its
newtuple parameter.

+    * - The xwait found below may COMMIT between now and this function
+    * returning, making the tuple dead.  That can change inval decisions, so

Is this third bullet point still explaining why an inplace update can
be looser about invalidating caches than heap_update? Or is it making
a separate point? It seems like it should be a paragraph, not a bullet
point.

Incidentally, this comment on heap_inplace_lock looks suspicious:

 * One could modify this to return true for tuples with delete in progress,
 * All inplace updaters take a lock that conflicts with DROP.  If explicit
 * "DELETE FROM pg_class" is in progress, we'll wait for it like we would an
 * update.

I think it should be "While one could modify this . . . , all inplace
updaters . . . ." Something to consider for a non-backpatch commit
anyway.

> > Or this:
> >
> >      * WAL contains likely-unnecessary commit-time invals from the
> >      * CacheInvalidateHeapTuple() call in heap_inplace_update().
> >
> > Why likely-unnecessary? I know you explain it at that callsite, but
> > some hint might help here.
>
> On further study, I think one could construct a logical decoding output plugin
> for which it's necessary.  I've detailed that in the attached edits.  This was
> in the heap_decode() changes, which is the part of the patch I understand the
> least.  I likely should have made it a separate patch.  Here's the surrounding
> change I made in 2024, in context diff format:
>
> *** 508,530 **** heap_decode(LogicalDecodingContext *ctx, XLogRecordBuffer *buf)
>
>                         /*
>                          * Inplace updates are only ever performed on catalog tuples and
> !                        * can, per definition, not change tuple visibility.  Since we
> !                        * don't decode catalog tuples, we're not interested in the
> !                        * record's contents.
>                          *
> !                        * In-place updates can be used either by XID-bearing transactions
> !                        * (e.g.  in CREATE INDEX CONCURRENTLY) or by XID-less
> !                        * transactions (e.g.  VACUUM).  In the former case, the commit
> !                        * record will include cache invalidations, so we mark the
> !                        * transaction as catalog modifying here. Currently that's
> !                        * redundant because the commit will do that as well, but once we
> !                        * support decoding in-progress relations, this will be important.
>                          */
> -                       if (!TransactionIdIsValid(xid))
> -                               break;
> -
> -                       (void) SnapBuildProcessChange(builder, xid, buf->origptr);
> -                       ReorderBufferXidSetCatalogChanges(ctx->reorder, xid, buf->origptr);
>                         break;
>
>                 case XLOG_HEAP_CONFIRM:
> --- 508,526 ----
>
>                         /*
>                          * Inplace updates are only ever performed on catalog tuples and
> !                        * can, per definition, not change tuple visibility.  Inplace
> !                        * updates don't affect storage or interpretation of table rows,
> !                        * so they don't affect logicalrep_write_tuple() outcomes.  Hence,
> !                        * we don't process invalidations from the original operation.  If
> !                        * inplace updates did affect those things, invalidations wouldn't
> !                        * make it work, since there are no snapshot-specific versions of
> !                        * inplace-updated values.  Since we also don't decode catalog
> !                        * tuples, we're not interested in the record's contents.
>                          *
> !                        * WAL contains likely-unnecessary commit-time invals from the
> !                        * CacheInvalidateHeapTuple() call in heap_inplace_update().
> !                        * Excess invalidation is safe.
>                          */
>                         break;
>
>                 case XLOG_HEAP_CONFIRM:
>
>
> This code had been unchanged since commit b89e1510 (v9.4) introduced logical
> decoding.  I'm making the following assumptions about the old code:
>
> - I guessed "decoding in-progress relations" was a typo for "decoding
>   in-progress transactions", something we've supported since at least commit
>   4648243 "Add support for streaming to built-in logical replication"
>   (2020-09, v14).  If it's not a typo, I don't know what "in-progress
>   relations" denoted here.

I agree it's probably a typo.

> - It said "redundant because the commit will do that as well", but I didn't
>   find such code.  I bet that it referenced the DecodeCommit() lines removed
>   in commit c55040c "WAL Log invalidations at command end with
>   wal_level=logical" (2020-07, v14).  The same commit likely made the
>   ReorderBufferXidSetCatalogChanges() call obsolete.

Makes sense.

> - I had no idea why we call SnapBuildProcessChange().  Every other caller uses
>   its return value.  I guess there was a desire for its snapshot side effects,
>   but I didn't follow that.  Nothing snapshot-relevant happens at
>   XLOG_HEAP_INPLACE.  Removing this call doesn't break any test on v13 or on
>   v9.4.  Similarly, no test fails after removing both this and the
>   ReorderBufferXidSetCatalogChanges() call.

See below.

> - In v9.4, this area of code (per-heapam-record-type decoding) had inval
>   responsibilities.  That went away in v14, so there's no need for the comment
>   here to keep discussing invals.

Okay.

> I now think it would be prudent to omit from back-patch the non-comment
> heap_decode() changes.  While the above assumptions argue against needing the
> removed ReorderBufferXidSetCatalogChanges(), that's the sort of speculation I
> should keep out back-branch changes.

Okay. We didn't investigate whether/why this code is still needed. I
did run check-world without removing those lines, and it still passes.
So if they are unnecessary, at least they're not harmful.

> > It's a bit surprising that wrongly leaving relhasindex=t is safe (for
> > example after BEGIN; CREATE INDEX; ROLLBACK;). I guess this column is
> > just to save us a lookup for tables with no index, and no harm is done
> > if we do the lookup needlessly but find no indexes.
>
> > And vacuum can
> > repair it later. Still it's a little unnerving.
>
> DROP INDEX doesn't clear it, either.  That's longstanding, and it doesn't
> involve narrow race conditions.  Hence, I'm not worrying about it.  If it were
> broken, we'd have heard by now.

Yes, I've come to terms with it. :-)

> >  PrepareToInvalidateCacheTuple(Relation relation,
> >                                HeapTuple tuple,
> >                                HeapTuple newtuple,
> > -                              void (*function) (int, uint32, Oid))
> > +                              void (*function) (int, uint32, Oid, void *),
> > +                              void *context)
> >
> > It's a little odd that PrepareToInvalidateCacheTuple takes a callback
> > function when it only has one caller, so it always calls
> > RegisterCatcacheInvalidation. Is it just to avoid adding dependencies
> > to inval.c? But it already #includes catcache.h and contains lots of
> > knowledge about catcache specifics. Maybe originally
> > PrepareToInvalidateCacheTuple was built to take
> > RegisterRelcacheInvalidation as well? Is it worth still passing the
> > callback?
>
> Looking at the history, it did have two callbacks in Postgres95 1.01.  It was
> down to one callback when it got the name PrepareToInvalidateCacheTuple() in
> commit 81d08fc of 2001-01.  I think the main alternative to today's callback
> would be to make RegisterCatcacheInvalidation() an extern for catcache.c to
> call.  All the other Register* functions would remain static.  I left things
> unchanged since that would be cleaner in one way and dirtier in another.

Okay. It still feels more convoluted than the alternative, but a
backpatch isn't the place to change it.

> > @@ -6511,6 +6544,7 @@ heap_inplace_unlock(Relation relation,
> >  {
> >      LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
> >      UnlockTuple(relation, &oldtup->t_self, InplaceUpdateTupleLock);
> > +    ForgetInplace_Inval();
> >  }
> >
> > Is this the right place to add this? We think on balance yes, but the
> > question crossed my mind: Clearing the invals seems like a separate
> > responsibility from unlocking the buffer & tuple. After this patch,
> > our only remaining caller of heap_inplace_unlock is
> > systable_inplace_update_cancel, so perhaps it should call
> > ForgetInplace_Inval itself? OTOH we like that putting it here
> > guarantees it gets called, as a complement to building the invals in
> > heap_inplace_lock.
>
> Style principles behind the placement:
>
> - Inplace update puts some responsibilities in genam.c and others in heapam.c.
>   A given task, e.g. draining the inval queue, should consistently appear on
>   the same side of that boundary.
>
> - The heapam.c side of inplace update should cover roughly as much as the
>   heapam.c side of heap_update().  Since heap_update() handles invals and the
>   critical section, so should the heapam.c side of inplace update.
>
> It wouldn't take a strong reason to override those principles.
>
> postgr.es/m/20240831011043.2b.nmisch@google.com has an inventory of the ways
> to assign inval responsibility to heapam.c vs. genam.c.
> postgr.es/m/20241013004511.9c.nmisch@google.com has a related discussion,
> about the need for AtInplace_Inval() to be in the critical section.

Thanks for the explanation!

Yours,

--
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

В списке pgsql-hackers по дате отправления: