Обсуждение: All-visible pages with valid prune xid are confusing
Hi,
We currently do not set pd_prune_xid (the oldest prunable XID) when
replaying XLOG_HEAP2_PRUNE* records. We've never done this, AFAICT.
Since 8.3, this comment has been in the pruning redo function:
* Note: we don't worry about updating the page's prunability hints.
* At worst this will cause an extra prune cycle to occur soon.
During normal operation, when a page has no prunable tuples, we set
pd_prune_xid to InvalidTransactionId. But during recovery, the old
value is left behind.
When we then set the page all-visible in the VM, the page is marked
all-visible but the prune hint claims there are prunable tuples. On
the standby, this triggers an unnecessary prune cycle of almost all
all-visible pages the next time they are accessed. However, I think
the page being in this confusing state is the bigger problem. It's not
incorrect, but it seems like it could mask actual page corruption
(e.g. when there are dead tuples and we mistakenly set the page
all-visible).
Fixing this would require adding the prune xid to the WAL record.
UPDATE/DELETE WAL records don't have to include the new prune xid
because they set the page prune hint to the xlog record's transaction
ID.
If we don't think the overhead of the extra transaction ID in the WAL
record is worth it, we could set the prune hint to
InvalidTranasctionId during recovery if the page is all-visible. This
would at least avoid that confusing page state.
- Melanie
On 02/12/2025 18:20, Melanie Plageman wrote: > Hi, > > We currently do not set pd_prune_xid (the oldest prunable XID) when > replaying XLOG_HEAP2_PRUNE* records. We've never done this, AFAICT. > > Since 8.3, this comment has been in the pruning redo function: > > * Note: we don't worry about updating the page's prunability hints. > * At worst this will cause an extra prune cycle to occur soon. > > During normal operation, when a page has no prunable tuples, we set > pd_prune_xid to InvalidTransactionId. But during recovery, the old > value is left behind. > > When we then set the page all-visible in the VM, the page is marked > all-visible but the prune hint claims there are prunable tuples. On > the standby, this triggers an unnecessary prune cycle of almost all > all-visible pages the next time they are accessed. However, I think > the page being in this confusing state is the bigger problem. It's not > incorrect, but it seems like it could mask actual page corruption > (e.g. when there are dead tuples and we mistakenly set the page > all-visible). > > Fixing this would require adding the prune xid to the WAL record. > UPDATE/DELETE WAL records don't have to include the new prune xid > because they set the page prune hint to the xlog record's transaction > ID. > > If we don't think the overhead of the extra transaction ID in the WAL > record is worth it, we could set the prune hint to > InvalidTranasctionId during recovery if the page is all-visible. This > would at least avoid that confusing page state. Hmm. If the page has no prunable tuples left, it makes sense to set pd_prune_xid to InvalidTransactionId to avoid the useless round of pruning. In other cases, it would make sense to set it to some XID so that it gets pruned later. But a standby will only start pruning if it's later promoted to become a primary. At that point, all currently running transactions will be finished (except for prepared transactions). Therefore it doesn't seem important what we set the prune XID to. Any valid XID that's not in the future should do the trick. So how about adding a boolean flag to the WAL record, to indicate whether there's anything prunable left on the page or not? - Heikki
On Tue, Dec 2, 2025 at 12:49 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > > Hmm. If the page has no prunable tuples left, it makes sense to set > pd_prune_xid to InvalidTransactionId to avoid the useless round of > pruning. In other cases, it would make sense to set it to some XID so > that it gets pruned later. But a standby will only start pruning if it's > later promoted to become a primary. At that point, all currently running > transactions will be finished (except for prepared transactions). What about on-access pruning during SELECT queries on a hot standby? - Melanie
Hi, On December 2, 2025 1:23:57 PM EST, Melanie Plageman <melanieplageman@gmail.com> wrote: >On Tue, Dec 2, 2025 at 12:49 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: >> >> Hmm. If the page has no prunable tuples left, it makes sense to set >> pd_prune_xid to InvalidTransactionId to avoid the useless round of >> pruning. In other cases, it would make sense to set it to some XID so >> that it gets pruned later. But a standby will only start pruning if it's >> later promoted to become a primary. At that point, all currently running >> transactions will be finished (except for prepared transactions). > >What about on-access pruning during SELECT queries on a hot standby? There's no on-access-pruning on the hot standby itself, it'd lead to divergence between primary and standby (and you couldn'tWAL log it). Greetings, Andres -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
On Tue, Dec 2, 2025 at 1:41 PM Andres Freund <andres@anarazel.de> wrote: > > On December 2, 2025 1:23:57 PM EST, Melanie Plageman <melanieplageman@gmail.com> wrote: > > >What about on-access pruning during SELECT queries on a hot standby? > > There's no on-access-pruning on the hot standby itself, it'd lead to divergence between primary and standby (and you couldn'tWAL log it). Ah, right. > Therefore it doesn't seem important what we set the prune XID to. Any > valid XID that's not in the future should do the trick. So how about > adding a boolean flag to the WAL record, to indicate whether there's > anything prunable left on the page or not? We could add a flag to xl_heap_prune flags (which is now a uint16 and thus has room) to indicate that there are prunable tuples. In terms of finding some XID to set it to, could we do what updates and deletes do and use the XLogRecord->xl_xid? - Melanie
On 02/12/2025 21:13, Melanie Plageman wrote: > In terms of finding some XID to set it to, could we do what updates > and deletes do and use the XLogRecord->xl_xid? A prune record can have XLogRecord->xl_xid == InvalidTransactionId, if the transaction hasn't been assigned a transaction ID yet. I think ReadNextTransactionId() - 1 would work. (Using TransactionIdRetreat rather than plain - 1, of course) - Heikki
Hi, On 2025-12-02 21:39:42 +0200, Heikki Linnakangas wrote: > On 02/12/2025 21:13, Melanie Plageman wrote: > > In terms of finding some XID to set it to, could we do what updates > > and deletes do and use the XLogRecord->xl_xid? > > A prune record can have XLogRecord->xl_xid == InvalidTransactionId, if the > transaction hasn't been assigned a transaction ID yet. I think > ReadNextTransactionId() - 1 would work. (Using TransactionIdRetreat rather > than plain - 1, of course) I think it'd be preferrable if we just made the records large enough to maintain the same pd_prune_xid on the standby as on the primary. The closer the pages are on primary & standby the better, any allowed divergence makes it harder to find bugs imo. In comparison to the space the relevant records use, it doesn't seem like including an accurate prune xid would take a lot of space? Greetings, Andres