Hi,
On 2020-02-13 16:16:14 -0500, Tom Lane wrote:
> In principle perhaps we could improve the granularity of dead-row
> detection, so that if a row version is both created and deleted by
> the current transaction, and we have no live snapshots that could
> see it, we could go ahead and mark the row dead. But it's not clear
> that that'd be worth the extra cost to do. Certainly no existing PG
> release tries to do it.
I've repeatedly wondered about improving our logic around this. There's
a lot of cases where we deal with a lot of bloat solely because our
simplistic liveliness analysis.
It's not just within a single transaction, but also makes the impact of
longrunning transactions significantly worse. It's common to have
"areas" of some tables that change quickly, without normally causing a
lot of problems - but once there is a single longrunning transaction the
amount of bloat created is huge. It's not that bad to have the "hot
areas" increased in size by 2-3x, but right now it'll often be several
orders of magnitude.
But perhaps it doesn't make sense to conflate your suggestion above with
what I brought up: There'd might not be a lot of common
code/infrastructure between deleting row versions that are invisible due
to no backend having a snapshot to see them (presumably inferred via
xmin/xmax), and newly created row versions within a transaction that are
invisible because there's no snapshot with that cid.
Greetings,
Andres Freund