On 2015-04-15 08:42:33 -0400, Simon Riggs wrote:
> > Because it makes it subsequent accesses to the page cheaper.
>
> Cheaper for whom?
Everyone. Including further readers. Following HOT chains in read mostly
workloads can be really expensive. If you have workloads with a 'hot'
value range that's frequently updated, but that range moves you can
easily end up with heavily chained tuples which won't soon be touched by
a writer again.
And writers will often not yet be able to prune the page because there's
still live readers for the older versions (like other updaters).
> > Of
> > course, that applies in all cases, but when the page is already dirty,
> > the cost of pruning it is probably quite small - we're going to have
> > to write the page anyway, and pruning it before it gets evicted
> > (perhaps even by our scan) will be cheaper than writing it now and
> > writing it again after it's pruned. When the page is clean, the cost
> > of pruning is significantly higher.
>
> "We" aren't going to have to write the page, but someone will.
If it's already dirty that doesn't change at all. *Not* pruning in that
moment actually will often *increase* the total amount of writes to the
OS. Because now the pruning will happen on the next write access or
vacuum - when the page already might have been undirtied.
I don't really see the downside to this suggestion.
> The actions you suggest are reasonable and should ideally be the role
> of a background process. But that doesn't mean in the absence of that
> we should pay the cost in the foreground.
I'm not sure that's true. A background process will either cause
additional read IO to find worthwhile pages, or it'll not find
worthwhile pages because they're already paged out.
Greetings,
Andres Freund