Re: race condition in pg_class

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: race condition in pg_class
Дата
Msg-id 20231027214946.79.nmisch@google.com
обсуждение исходный текст
Ответ на Re: race condition in pg_class  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: race condition in pg_class  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Fri, Oct 27, 2023 at 03:32:26PM -0400, Tom Lane wrote:
> Noah Misch <noah@leadboat.com> writes:
> > On Thu, Oct 26, 2023 at 09:44:04PM -0700, Noah Misch wrote:
> >> We'll likely need to change how we maintain relhasindex or perhaps take a lock
> >> in GRANT.
> 
> > The main choice is accepting more DDL blocking vs. accepting inefficient
> > relcache builds.  Options I'm seeing:
> 
> It looks to me like you're only thinking about relhasindex, but it
> seems to me that any call of heap_inplace_update brings some
> risk of this kind.  Excluding the bootstrap-mode-only usage in
> create_toast_table, I see four callers:
> 
> * index_update_stats updating a pg_class tuple's
>   relhasindex, relpages, reltuples, relallvisible
> 
> * vac_update_relstats updating a pg_class tuple's
>   relpages, reltuples, relallvisible, relhasindex, relhasrules,
>   relhastriggers, relfrozenxid, relminmxid
> 
> * vac_update_datfrozenxid updating a pg_database tuple's
>   datfrozenxid, datminmxid
> 
> * dropdb updating a pg_database tuple's datconnlimit
> 
> So we have just as much of a problem with GRANTs on databases
> as GRANTs on relations.  Also, it looks like we can lose
> knowledge of the presence of rules and triggers, which seems
> nearly as bad as forgetting about indexes.  The rest of these
> updates might not be correctness-critical, although I wonder
> how bollixed things could get if we forget an advancement of
> relfrozenxid or datfrozenxid (especially if the calling
> transaction goes on to make other changes that assume that
> the update happened).

Thanks for researching that.  Let's treat frozenxid stuff as critical; I
wouldn't want to advance XID limits based on a datfrozenxid that later gets
rolled back.  I agree relhasrules and relhastriggers are also critical.  The
"inefficient relcache builds" option family can't solve cases like
relfrozenxid and datconnlimit, so that leaves us with the "more DDL blocking"
option family.

> BTW, vac_update_datfrozenxid believes (correctly I think) that
> it cannot use the syscache copy of a tuple as the basis for in-place
> update, because syscache will have detoasted any toastable fields.
> These other callers are ignoring that, which seems like it should
> result in heap_inplace_update failing with "wrong tuple length".
> I wonder how come we're not seeing reports of that from the field.

Good question.  Perhaps we'll need some test cases that exercise each inplace
update against a row having a toast pointer.  It's too easy to go a long time
without encountering those in the field.

> I'm inclined to propose that heap_inplace_update should check to
> make sure that it's operating on the latest version of the tuple
> (including, I guess, waiting for an uncommitted update?) and throw
> error if not.  I think this is what your B3 option is, but maybe
> I misinterpreted.  It might be better to throw error immediately
> instead of waiting to see if the other updater commits.

That's perhaps closer to B2.  To be pedantic, B3 was about not failing or
waiting for GRANT to commit but instead inplace-updating every member of the
update chain.  For B2, I was thinking we don't need to error.  There are two
problematic orders of events.  The easy one is heap_inplace_update() mutating
a tuple that already has an xmax.  That's the one in the test case upthread,
and detecting it is trivial.  The harder one is heap_inplace_update() mutating
a tuple after GRANT fetches the old tuple, before GRANT enters heap_update().
I anticipate a new locktag per catalog that can receive inplace updates,
i.e. LOCKTAG_RELATION_DEFINITION and LOCKTAG_DATABASE_DEFINITION.  Here's a
walk-through for the pg_database case.  GRANT will use the following sequence
of events:

- acquire LOCKTAG_DATABASE_DEFINITION in exclusive mode
- fetch latest pg_database tuple
- heap_update()
- COMMIT, releasing LOCKTAG_DATABASE_DEFINITION

vac_update_datfrozenxid() sequence of events:

- acquire LOCKTAG_DATABASE_DEFINITION in exclusive mode
- (now, all GRANTs on the given database have committed or aborted)
- fetch latest pg_database tuple
- heap_inplace_update()
- release LOCKTAG_DATABASE_DEFINITION, even if xact not ending
- continue with other steps, e.g. vac_truncate_clog()

How does that compare to what you envisioned?  vac_update_datfrozenxid() could
further use xmax as a best-efforts thing to catch conflict with manual UPDATE
statements, but it wouldn't solve the case where the UPDATE had fetched the
tuple but not yet heap_update()'d it.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Fwd: Annoying corruption in PostgreSQL.
Следующее
От: Steven Rostedt
Дата:
Сообщение: Re: [POC][RFC][PATCH v2] sched: Extended Scheduler Time Slice