Re: ERROR: subtransaction logged without previous top-level txn record
От | Arseny Sher |
---|---|
Тема | Re: ERROR: subtransaction logged without previous top-level txn record |
Дата | |
Msg-id | 8736bs81sx.fsf@ars-thinkpad обсуждение исходный текст |
Ответ на | Re: ERROR: subtransaction logged without previous top-level txn record (Amit Kapila <amit.kapila16@gmail.com>) |
Ответы |
Re: ERROR: subtransaction logged without previous top-level txn record
|
Список | pgsql-bugs |
Amit Kapila <amit.kapila16@gmail.com> writes: >> I don't see a bug here. At least in reproduced scenario I see false >> alert, as explained above: transaction with skipped xl_xact_assignment >> won't be streamed as it finishes before confirmed_flush_lsn. >> > > Does this guarantee come from the fact that we need to wait for such a > transaction before reaching a consistent snapshot state? If not, can > you explain a bit more what makes you say so? Right, see FULL_SNAPSHOT -> SNAPBUILD_CONSISTENT transition -- it exists exactly for this purpose: once we have good snapshot, we need to wait for all running xacts to finish to see all xacts we are promising to stream in full. This ensures <restart_lsn, confirmed_flush_lsn> pair is good (reading WAL since the former is enough to see all xacts committing after the latter in full) initially, and slot advancement arrangements ensure it stays good forever (see LogicalIncreaseRestartDecodingForSlot). Well, almost. This is true as long initial snapshot construction process goes the long way of building the snapshot by itself. If it happens to pick up from disk ready snapshot pickled there by another decoding session, it fast path'es to SNAPBUILD_CONSISTENT, which is technically a bug as described in https://www.postgresql.org/message-id/87ftjifoql.fsf%40ars-thinkpad In theory, this bug could indeed lead to 'subtransaction logged without previous top-level txn record' error. In practice, I think its possibility is disappearingly small -- process of slot creation must be intervened in a very short gap by another decoder who serializes its snapshot (see the exact sequence of steps in the mail above). What is much more probable (doesn't involve new slot creation and relatively easily reproducible without sleeps) is false alert triggered by unlucky position of restart_lsn. Surely we still must fix it. I just mean - People definitely encountered false alert, not this bug (at least because nobody said this was immediately after slot creation). - I've no bright ideas how to relax the check to make it proper without additional complications and I'm pretty sure this is impossible (again, see above for details), so I'd remove it. -- Arseny Sher Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
В списке pgsql-bugs по дате отправления: