Hi,
On 2017-04-15 05:18:49 +0200, Petr Jelinek wrote:
> + /*
> + * c) we already seen the xl_running_xacts and tried to do the above.
> + * However because of race condition in LogStandbySnapshot() there might
> + * have been transaction reported as running but in reality has already
> + * written commit record before the xl_running_xacts so decoding has
> + * missed it. We now see xl_running_xacts that suggests all transactions
> + * from the original one were closed but the consistent state wasn't
> + * reached which means the race condition has indeed happened.
> + *
> + * Start tracking again as if this was the first xl_running_xacts we've
> + * seen, with the advantage that because decoding was already running,
> + * any transactions committed before the xl_running_xacts record will be
> + * known to us so we won't hit with the same issue again.
> + */
Unfortunately I don't think that's true, as coded. You're using
information about committed transactions:
> + else if (TransactionIdFollows(running->oldestRunningXid,
> + builder->running.xmax))
> + {
> + int off;
> +
> + SnapBuildStartXactTracking(builder, running);
> +
> /*
> + * Nark any transactions that are known to have committed before the
> + * xl_running_xacts as finished to avoid the race condition in
> + * LogStandbySnapshot().
> *
> + * We can use SnapBuildEndTxn directly as it only does the
> + * transaction running check and handling without any additional
> + * side effects.
> */
> + for (off = 0; off < builder->committed.xcnt; off++)
> + SnapBuildEndTxn(builder, lsn, builder->committed.xip[off]);
but a transaction might just have *aborted* before the new snapshot, no?
Since we don't keep track of those, I don't think this guarantees anything?
ISTM, we need a xip_status array in SnapBuild->running. Then, whenever
a xl_running_xacts is encountered while builder->running.xcnt > 0, loop
for i in 0 .. xcnt_space, and end SnapBuildEndTxn() if xip_status[i] but
the xid is not xl_running_xacts? Does that make sense?
- Andres