Re: Sequence's value can be rollback after a crashed recovery.

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Sequence's value can be rollback after a crashed recovery.
Дата
Msg-id 54833.1637701937@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Sequence's value can be rollback after a crashed recovery.  (Tomas Vondra <tomas.vondra@enterprisedb.com>)
Ответы Re: Sequence's value can be rollback after a crashed recovery.  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Sequence's value can be rollback after a crashed recovery.  (Andy Fan <zhihui.fan1213@gmail.com>)
Список pgsql-hackers
Tomas Vondra <tomas.vondra@enterprisedb.com> writes:
> I see Tom speculated we may not flush WAL if a transaction only does
> nextval() in that other thread, but I don't think that's true. AFAICS if
> the nextval() call writes stuff to WAL, the RecordTransactionCommit will
> have wrote_xlog=true and valid XID. And so it'll do the usual usual
> XLogFlush() etc.

Yeah.  I didn't look at the code during last year's discussion, but now
I have, and I see that if nextval_internal() decides it needs to make
a WAL entry, it is careful to ensure the xact has an XID:

    /*
     * If something needs to be WAL logged, acquire an xid, so this
     * transaction's commit will trigger a WAL flush and wait for syncrep.
     * It's sufficient to ensure the toplevel transaction has an xid, no need
     * to assign xids subxacts, that'll already trigger an appropriate wait.
     * (Have to do that here, so we're outside the critical section)
     */
    if (logit && RelationNeedsWAL(seqrel))
        GetTopTransactionId();

So that eliminates my worry that RecordTransactionCommit would decide
it doesn't need to do anything.  If there was a WAL record, we will
flush it at commit (and not before).

As you say, this is exactly as durable as any other DML operation.
So I don't feel a need to change the code behavior.

The problematic situation seems to be where an application gets
a nextval() result and uses it for some persistent outside-the-DB
state, without having made sure that the nextval() was committed.
You could say that that's the same rookie error as relying on the
persistence of any other uncommitted DML ... except that at [1]
we say

    To avoid blocking concurrent transactions that obtain numbers from the
    same sequence, a nextval operation is never rolled back; that is, once
    a value has been fetched it is considered used and will not be
    returned again. This is true even if the surrounding transaction later
    aborts, or if the calling query ends up not using the value.

It's not so unreasonable to read that as promising persistence over
crashes as well as xact aborts.  So I think we need to improve the docs
here.  A minimal fix would be to leave the existing text alone and add a
separate para to the <caution> block, along the lines of

    However, the above statements do not apply if the database cluster
    crashes before committing the transaction containing the nextval
    operation.  In that case the sequence advance might not have made its
    way to persistent storage, so that it is uncertain whether the same
    value can be returned again after the cluster restarts.  If you wish
    to use a nextval result for persistent outside-the-database purposes,
    make sure that the nextval has been committed before doing so.

I wonder though if we shouldn't try to improve the existing text.
The phrasing "never rolled back" seems like it's too easily
misinterpreted.  Maybe rewrite the <caution> block like

    To avoid blocking concurrent transactions that obtain numbers from the
    same sequence, the value obtained by nextval is not reclaimed for
    re-use if the calling transaction later aborts.  This means that
    transaction aborts or database crashes can result in gaps in the
    sequence of assigned values.  That can happen without a transaction
    abort, too.
    -- this text is unchanged: --
    For example an INSERT with an ON CONFLICT clause will compute the
    to-be-inserted tuple, including doing any required nextval calls,
    before detecting any conflict that would cause it to follow the ON
    CONFLICT rule instead. Such cases will leave unused “holes” in the
    sequence of assigned values. Thus, PostgreSQL sequence objects cannot
    be used to obtain “gapless” sequences.

    Likewise, any sequence state changes made by setval are not undone if
    the transaction rolls back.
    -- end unchanged text --

    If the database cluster crashes before committing the transaction
    containing a nextval operation, the sequence advance might not yet
    have made its way to persistent storage, so that it is uncertain
    whether the same value can be returned again after the cluster
    restarts.  This is harmless for usage of the nextval value within
    that transaction, since its other effects will not be visible either.
    However, if you wish to use a nextval result for persistent
    outside-the-database purposes, make sure that the nextval operation
    has been committed before doing so.

Thoughts?

            regards, tom lane

[1] https://www.postgresql.org/docs/current/functions-sequence.html



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jeremy Schneider
Дата:
Сообщение: Re: Sequence's value can be rollback after a crashed recovery.
Следующее
От: Jacob Champion
Дата:
Сообщение: Re: pg_upgrade parallelism