Re: Inval reliability, especially for inplace updates

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: Inval reliability, especially for inplace updates
Дата
Msg-id 20240618152349.7f.nmisch@google.com
обсуждение исходный текст
Ответ на Re: Inval reliability, especially for inplace updates  (Andres Freund <andres@anarazel.de>)
Ответы Re: Inval reliability, especially for inplace updates
Список pgsql-hackers
On Mon, Jun 17, 2024 at 06:57:30PM -0700, Andres Freund wrote:
> On 2024-06-17 16:58:54 -0700, Noah Misch wrote:
> > On Sat, Jun 15, 2024 at 03:37:18PM -0700, Noah Misch wrote:
> > > On Wed, May 22, 2024 at 05:05:48PM -0700, Noah Misch wrote:
> > > > https://postgr.es/m/20240512232923.aa.nmisch@google.com wrote:
> > > > > Separable, nontrivial things not fixed in the attached patch stack:
> > > > >
> > > > > - Inplace update uses transactional CacheInvalidateHeapTuple().  ROLLBACK of
> > > > >   CREATE INDEX wrongly discards the inval, leading to the relhasindex=t loss
> > > > >   still seen in inplace-inval.spec.  CacheInvalidateRelmap() does this right.
> > > >
> > > > I plan to fix that like CacheInvalidateRelmap(): send the inval immediately,
> > > > inside the critical section.  Send it in heap_xlog_inplace(), too.
> 
> I'm worried this might cause its own set of bugs, e.g. if there are any places
> that, possibly accidentally, rely on the invalidation from the inplace update
> to also cover separate changes.

Good point.  I do have index_update_stats() still doing an ideally-superfluous
relcache update for that reason.  Taking that further, it would be cheap
insurance to have the inplace update do a transactional inval in addition to
its immediate inval.  Future master-only work could remove the transactional
one.  How about that?

> Have you considered instead submitting these invalidations during abort as
> well?

I had not.  Hmmm.  If the lock protocol in README.tuplock (after patch
inplace120) told SearchSysCacheLocked1() to do systable scans instead of
syscache reads, that could work.  Would need to ensure a PANIC if transaction
abort doesn't reach the inval submission.  Overall, it would be harder to
reason about the state of caches, but I suspect the patch would be smaller.
How should we choose between those strategies?

> > > > a. Within logical decoding, cease processing invalidations for inplace
> > >
> > > I'm attaching the implementation.  This applies atop the v3 patch stack from
> > > https://postgr.es/m/20240614003549.c2.nmisch@google.com, but the threads are
> > > mostly orthogonal and intended for independent review.  Translating a tuple
> > > into inval messages uses more infrastructure than relmapper, which needs just
> > > a database ID.  Hence, this ended up more like a miniature of inval.c's
> > > participation in the transaction commit sequence.
> > >
> > > I waffled on whether to back-patch inplace150-inval-durability-atcommit
> >
> > That inplace150 patch turned out to be unnecessary.  Contrary to the
> > "noncritical resource releasing" comment some lines above
> > AtEOXact_Inval(true), the actual behavior is already to promote ERROR to
> > PANIC.  An ERROR just before or after sending invals becomes PANIC, "cannot
> > abort transaction %u, it was already committed".
> 
> Relying on that, instead of explicit critical sections, seems fragile to me.
> IIRC some of the behaviour around errors around transaction commit/abort has
> changed a bunch of times. Tying correctness into something that could be
> changed for unrelated reasons doesn't seem great.

Fair enough.  It could still be a good idea for master, but given I missed a
bug in inplace150-inval-durability-atcommit-v1.patch far worse than the ones
$SUBJECT fixes, let's not risk it in back branches.

> I'm not sure it holds true even today - what if the transaction didn't have an
> xid? Then RecordTransactionAbort() wouldn't trigger
>      "cannot abort transaction %u, it was already committed"
> I think?

I think that's right.  As the inplace160-inval-durability-inplace-v2.patch
edits to xact.c say, the concept of invals in XID-less transactions is buggy
at its core.  Fortunately, after that patch, we use them only for two things
that could themselves stop with something roughly as simple as the attached.

> > > - Same change, no WAL version bump.  Standby must update before primary.  This
> > >   is best long-term, but the transition is more disruptive.  I'm leaning
> > >   toward this one, but the second option isn't bad:
> 
> Hm. The inplace record doesn't use the length of the "main data" record
> segment for anything, from what I can tell. If records by an updated primary
> were replayed by an old standby, it'd just ignore the additional data, afaict?

Agreed, but ...

> I think with the code as-is, the situation with an updated standby replaying
> an old primary's record would actually be worse - it'd afaict just assume the
> now-longer record contained valid fields, despite those just pointing into
> uninitialized memory.  I think the replay routine would have to check the
> length of the main data and execute the invalidation conditionally.

I anticipated back branches supporting a new XLOG_HEAP_INPLACE_WITH_INVAL
alongside the old XLOG_HEAP_INPLACE.  Updated standbys would run both fine,
and old binaries consuming new WAL would PANIC, "heap_redo: unknown op code".

> > > - heap_xlog_inplace() could set the shared-inval-queue overflow signal on
> > >   every backend.  This is more wasteful, but inplace updates might be rare
> > >   enough (~once per VACUUM) to make it tolerable.
> 
> We already set that surprisingly frequently, as
> a) The size of the sinval queue is small
> b) If a backend is busy, it does not process catchup interrupts
>    (i.e. executing queries, waiting for a lock prevents processing)
> c) There's no deduplication of invals, we often end up sending the same inval
>    over and over.
> 
> So I suspect this might not be too bad, compared to the current badness.

That is good.  We might be able to do the overflow signal once at end of
recovery, like RelationCacheInitFileRemove() does for the init file.  That's
mildly harder to reason about, but it would be cheaper.  Hmmm.

> At least for core code. I guess there could be extension code triggering
> inplace updates more frequently? But I'd hope they'd do it not on catalog
> tables... Except that we wouldn't know that that's the case during replay,
> it's not contained in the record.

For what it's worth, from a grep of PGXN, only citus does inplace updates.

> > > - Use LogStandbyInvalidations() just after XLOG_HEAP_INPLACE.  This isn't
> > >   correct if one ends recovery between the two records, but you'd need to be
> > >   unlucky to notice.  Noticing would need a procedure like the following.  A
> > >   hot standby backend populates a relcache entry, then does DDL on the rel
> > >   after recovery ends.
> 
> Hm. The problematic cases presumably involves an access exclusive lock? If so,
> could we do LogStandbyInvalidations() *before* logging the WAL record for the
> inplace update? The invalidations can't be processed by other backends until
> the exclusive lock has been released, which should avoid the race?

A lock forces a backend to drain the inval queue before using the locked
object, but it doesn't stop the backend from draining the queue and
repopulating cache entries earlier.  For example, pg_describe_object() can
query many syscaches without locking underlying objects.  Hence, the inval
system relies on the buffer change getting fully visible to catcache queries
before the sinval message enters the shared queue.

Thanks,
nm



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ashutosh Sharma
Дата:
Сообщение: Re: Truncation of mapped catalogs (whether local or shared) leads to server crash
Следующее
От: Ashutosh Sharma
Дата:
Сообщение: Re: Truncation of mapped catalogs (whether local or shared) leads to server crash