On Thu, Sep 9, 2021 at 8:33 PM Antonin Houska <ah@cybertec.at> wrote:
>
> The cfbot complained that the patch series no longer applies, so I've rebased
> it and also tried to make sure that the other flags become green.
>
> One particular problem was that pg_upgrade complained that "live undo data"
> remains in the old cluster. I found out that the temporary undo log causes the
> problem, so I've adjusted the query in check_for_undo_data() accordingly until
> the problem gets fixed properly.
>
> The problem of the temporary undo log is that it's loaded into local buffers
> and that backend can exit w/o flushing local buffers to disk, and thus we are
> not guaranteed to find enough information when trying to discard the undo log
> the backend wrote. I'm thinking about the following solutions:
>
> 1. Let the backend manage temporary undo log on its own (even the slot
> metadata would stay outside the shared memory, and in particular the
> insertion pointer could start from 1 for each session) and remove the
> segment files at the same moment the temporary relations are removed.
>
> However, by moving the temporary undo slots away from the shared memory,
> computation of oldestFullXidHavingUndo (see the PROC_HDR structure) would
> be affected. It might seem that a transaction which only writes undo log
> for temporary relations does not need to affect oldestFullXidHavingUndo,
> but it needs to be analyzed thoroughly. Since oldestFullXidHavingUndo
> prevents transactions to be truncated from the CLOG too early, I wonder if
> the following is possible (This scenario is only applicable to the zheap
> storage engine [1], which is not included in this patch, but should already
> be considered.):
>
> A transaction creates a temporary table, does some (many) changes and then
> gets rolled back. The undo records are being applied and it takes some
> time. Since XID of the transaction did not affect oldestFullXidHavingUndo,
> the XID can disappear from the CLOG due to truncation.
>
By above do you mean to say that in zheap code, we don't consider XIDs
that operate on temp table/undo for oldestFullXidHavingUndo?
> However zundo.c in
> [1] indicates that the transaction status *is* checked during undo
> execution, so we might have a problem.
>
It would be easier to follow if you can tell which exact code are you
referring here?
--
With Regards,
Amit Kapila.