Обсуждение: Setting vacuum_freeze_min_age really low

Поиск
Список
Период
Сортировка

Setting vacuum_freeze_min_age really low

От
Josh Berkus
Дата:
Folks,

In the past, setting vacuum_freeze_min_age (vfma) really low (say to
10000 or 50000) would have caused lots of extra writing work due to
dirtying extra pages for freezing.  This has been our stated reason to
keep vfma high, despite the obvious advantage of freezing tuples while
they're still in the cache.

With the visibility map, though, vfma should only be dirtying pages
which vacuum is already visiting because there's dirty tuples on the
page.  That is, pages which vacuum will probably dirty anyway, freezing
or not.  (This is assuming one has applied the 9.2.3 update.)

Given that, it seems like the cost of lowering vfma *should* be
marginal.  The only extra work done by a lower vfma should be:

1. extra cpu time to put in the froxenXIDs on vacuumed pages, and
2. dirtying the minority of pages which vacuum decided to scan, but not
write to.

The second point is the one where I'm not sure how to evaluate.  How
likely, as of 9.2, is vacuum to visit a page and not dirty it?  And are
there other costs I'm not thinking of?

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Setting vacuum_freeze_min_age really low

От
Robert Haas
Дата:
On Mon, Mar 25, 2013 at 4:31 PM, Josh Berkus <josh@agliodbs.com> wrote:
> In the past, setting vacuum_freeze_min_age (vfma) really low (say to
> 10000 or 50000) would have caused lots of extra writing work due to
> dirtying extra pages for freezing.  This has been our stated reason to
> keep vfma high, despite the obvious advantage of freezing tuples while
> they're still in the cache.

That, and Tom's concern about forensics, which I understand to be the
larger sticking point.

> With the visibility map, though, vfma should only be dirtying pages
> which vacuum is already visiting because there's dirty tuples on the
> page.  That is, pages which vacuum will probably dirty anyway, freezing
> or not.  (This is assuming one has applied the 9.2.3 update.)

I think this is probably not accurate, although I'll defer to someone
with more real-world experience.  I'd guess that it's uncommon for
actively updated data and very-rarely-updated data to be mixed
together on the same pages with any real regularity.  IOW, the dirty
pages probably don't have anything on them that can be frozen anyway.

So, if the table's age is less than vacuum_freeze_table_age, we'll
only scan pages not already marked all-visible.  Regardless of vfma,
we probably won't freeze much.

On the other hand, if the table's age is at least
vacuum_freeze_table_age, we'll scan the whole table and freeze a lotta
stuff all at once.  Again, whether vfma is high or low won't matter
much: it's definitely less than vacuum_freeze_table_age.

Basically, I would guess that both the costs and the benefits of
changing this are pretty small.  It would be nice to hear from someone
who has tried it, though.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Setting vacuum_freeze_min_age really low

От
Andres Freund
Дата:
Hi,

On 2013-03-25 13:31:17 -0700, Josh Berkus wrote:
> In the past, setting vacuum_freeze_min_age (vfma) really low (say to
> 10000 or 50000) would have caused lots of extra writing work due to
> dirtying extra pages for freezing.  This has been our stated reason to
> keep vfma high, despite the obvious advantage of freezing tuples while
> they're still in the cache.
>
> With the visibility map, though, vfma should only be dirtying pages
> which vacuum is already visiting because there's dirty tuples on the
> page.  That is, pages which vacuum will probably dirty anyway, freezing
> or not.  (This is assuming one has applied the 9.2.3 update.)
>
> Given that, it seems like the cost of lowering vfma *should* be
> marginal.  The only extra work done by a lower vfma should be:
>
> 1. extra cpu time to put in the froxenXIDs on vacuumed pages, and
> 2. dirtying the minority of pages which vacuum decided to scan, but not
> write to.

It will also often enough lead to a page being frozen repeatedly which
causes unneccessary IO and WAL traffic. If a page contains pages from
several transactions its not unlikely that some tuples are older and
some are newer than vfma. That scenario isn't unlikely because of two
scenarios:
- INSERT/UPDATE reusing space on older pages where tuples have been
  deleted.
- When a backend extends a relation that page is *not* known to have
  free space to other relations. Until vacuum comes along for the first
  time only this backend will use its space. Given that busy clusters
  frequently burn loads of xids per second it is not uncommon to have a
  wide range of xids on such a page.

> And are there other costs I'm not thinking of?

I think (but am not 100% sure right now) it would have another rather
big cost:
When a page contains freezable items, as determined by freeze_min_age,
and we are doing a full table scan we won't skip buffers that we can't
lock for cleanup. Instead we will wait and then lock them for
cleanup. So I think this would be rather noticeably impact the speed of
vacuum (since it waits more often) and concurrency (since we lock more
buffers than before, even if they are actively used).

Makes sense?

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Setting vacuum_freeze_min_age really low

От
Andres Freund
Дата:
On 2013-05-09 12:09:04 -0400, Robert Haas wrote:
> On Mon, Mar 25, 2013 at 4:31 PM, Josh Berkus <josh@agliodbs.com> wrote:
> > In the past, setting vacuum_freeze_min_age (vfma) really low (say to
> > 10000 or 50000) would have caused lots of extra writing work due to
> > dirtying extra pages for freezing.  This has been our stated reason to
> > keep vfma high, despite the obvious advantage of freezing tuples while
> > they're still in the cache.
>
> That, and Tom's concern about forensics, which I understand to be the
> larger sticking point.

FWIW I found having sensible xmin/xmax repeatedly really useful for
debugging problems. Most problems don't get noticed within minutes so
loosing evidence that fast is bad.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Setting vacuum_freeze_min_age really low

От
Josh Berkus
Дата:
Robert, Andres,

> That, and Tom's concern about forensics, which I understand to be the
> larger sticking point.

I don't buy the idea that we should cause regular recurring performance
issues for all of our users in order to aid diagnosing the kind of
issues which happen 1% of the time to 2% of our users.

> So, if the table's age is less than vacuum_freeze_table_age, we'll
> only scan pages not already marked all-visible.  Regardless of vfma,
> we probably won't freeze much.

Right, but the pages which were dirtied *anyway* will get frozen.

> On the other hand, if the table's age is at least
> vacuum_freeze_table_age, we'll scan the whole table and freeze a lotta
> stuff all at once.  Again, whether vfma is high or low won't matter
> much: it's definitely less than vacuum_freeze_table_age.

Right.

> Basically, I would guess that both the costs and the benefits of
> changing this are pretty small.  It would be nice to hear from someone
> who has tried it, though.

Well, I have, but I don't exactly have empirical testing results from
it.  That's really the sticking point here: can we measurably
demonstrate that lowering vfma makes autovacuum freeze happen less
often, and do less work when it does?  Realistically, I think that's
waiting on me having time to do some lengthy performance testing.

> It will also often enough lead to a page being frozen repeatedly which
> causes unneccessary IO and WAL traffic. If a page contains pages from
> several transactions its not unlikely that some tuples are older and
> some are newer than vfma. That scenario isn't unlikely because of two
> scenarios:

Nobody has yet explained to me where this extra WAL and IO traffic would
come from.  vfma only takes effect if the page is being vacuumed
*anyway*.  And if the page is being vacuumed anyway, the page is being
rewritten anyway, and it doesn't matter how many changes we make on that
page, except as far as CPU time is concerned.  As far as IO is
concerned, an 8K page is an 8K page.  No?

The only time I can imagine this resulting in extra IO is if vacuum is
regularly visiting pages which don't have any other work to do, but do
have tuples which could be frozen if vfma was lowered.  I would tend to
think that this would be a tiny minority of pages, but testing may be
the only way to answer that.

> When a page contains freezable items, as determined by freeze_min_age,
> and we are doing a full table scan we won't skip buffers that we can't
> lock for cleanup. Instead we will wait and then lock them for
> cleanup. So I think this would be rather noticeably impact the speed of
> vacuum (since it waits more often) and concurrency (since we lock more
> buffers than before, even if they are actively used).

Well, that behavior sounds like something we should maybe fix,
regardless of whether we're lowering the default vfma or not.

--Josh Berkus






--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: Setting vacuum_freeze_min_age really low

От
Andres Freund
Дата:
Hi Josh,

On 2013-05-11 16:28:32 -0700, Josh Berkus wrote:
> > That, and Tom's concern about forensics, which I understand to be the
> > larger sticking point.
>
> I don't buy the idea that we should cause regular recurring performance
> issues for all of our users in order to aid diagnosing the kind of
> issues which happen 1% of the time to 2% of our users.

Well. For one you haven't proven that the changed setting actually
improves performance. So the comparison isn't really valid. We will
still need full table vacuums to be able to change relfrozenxids. Also,
he small percentages are the cases where the shit really hit the
fan. Making sure you have at least some chance of a) diagnosing the
issue b) recovering data is a pretty good thing.

> > So, if the table's age is less than vacuum_freeze_table_age, we'll
> > only scan pages not already marked all-visible.  Regardless of vfma,
> > we probably won't freeze much.

> Right, but the pages which were dirtied *anyway* will get frozen.

I think you're missing the fact that we don't neccessarily dirty pages,
just because vacuum visits them. In a mostly insert workload its not
uncommon that vacuum doesn't change anything. In many scenarios the
first time vacuum visits a page it cannot yet me marked "all-visible"
yet so we will visit again soon after anyway. And after that there will
be regular full table vacuums.

> > It will also often enough lead to a page being frozen repeatedly which
> > causes unneccessary IO and WAL traffic. If a page contains pages from
> > several transactions its not unlikely that some tuples are older and
> > some are newer than vfma. That scenario isn't unlikely because of two
> > scenarios:
>
> Nobody has yet explained to me where this extra WAL and IO traffic would
> come from.  vfma only takes effect if the page is being vacuumed
> *anyway*.

There's multiple points here:
a) we don't necessarily write/dirty anything if vacuum doesn't find
   anything to do
b) freezing tuples requires a xlog_heap_freeze wal record to be
   emitted. If we don't freeze, we don't need to emit it.

>  And if the page is being vacuumed anyway, the page is being
> rewritten anyway, and it doesn't matter how many changes we make on that
> page, except as far as CPU time is concerned.  As far as IO is
> concerned, an 8K page is an 8K page.  No?

Sure, *if* we writeout the page, it doesn't matter at all whether we
changed one byte or all of them. Unless it also requires extra xlog
records to be emitted.

> The only time I can imagine this resulting in extra IO is if vacuum is
> regularly visiting pages which don't have any other work to do, but do
> have tuples which could be frozen if vfma was lowered.  I would tend to
> think that this would be a tiny minority of pages, but testing may be
> the only way to answer that.

INSERT only produces workloads like that.

> > When a page contains freezable items, as determined by freeze_min_age,
> > and we are doing a full table scan we won't skip buffers that we can't
> > lock for cleanup. Instead we will wait and then lock them for
> > cleanup. So I think this would be rather noticeably impact the speed of
> > vacuum (since it waits more often) and concurrency (since we lock more
> > buffers than before, even if they are actively used).
>
> Well, that behavior sounds like something we should maybe fix,
> regardless of whether we're lowering the default vfma or not.

Well, that's easier said than done ;)

I wonder if we couldn't do the actual freezeing - not the dead tuple
deletion - without a cleanup but just with an exclusive lock?

I think I have said that before, but anyway: I think as long as we need
to regularly walk the whole relation for correctness there isn't much
hope to get this into an acceptable state. If we would track the oldest
xid in a page in a 'freeze map' we could make much of this more
efficient and way more scalable to bigger data volumes.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Setting vacuum_freeze_min_age really low

От
Robert Haas
Дата:
On Sun, May 12, 2013 at 8:50 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> [ a response that I entirely agree with ]

+1 to all that.

It's maybe worth noting that it's probably fairly uncommon for vacuum
to read a page and not dirty it, because if the page is all-visible,
we won't read it.  And if it's not all-visible, and there's nothing
else interesting to do with it, we'll probably make it all-visible,
which will dirty it.  It can happen, if for example we vacuum a page
with no dead tuples while the inserting transaction is still running,
or committed but not yet all-visible.  Of course, in those cases we
won't be able to freeze, either.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


Re: Setting vacuum_freeze_min_age really low

От
Andres Freund
Дата:
On 2013-05-13 13:21:54 -0400, Robert Haas wrote:
> On Sun, May 12, 2013 at 8:50 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > [ a response that I entirely agree with ]
>
> +1 to all that.

> It's maybe worth noting that it's probably fairly uncommon for vacuum
> to read a page and not dirty it, because if the page is all-visible,
> we won't read it.

But only if 50(?)+ pages are marked all-visible in one go, otherwise we
afair won't skip afair. And we don't skip them at all during full table
vacuums.

>  And if it's not all-visible, and there's nothing
> else interesting to do with it, we'll probably make it all-visible,
> which will dirty it.  It can happen, if for example we vacuum a page
> with no dead tuples while the inserting transaction is still running,
> or committed but not yet all-visible.  Of course, in those cases we
> won't be able to freeze, either.

IIRC the actual values below which we freeze are always computed
relative to GetOldestXmin() (and have to, otherwise rows will suddently
appear visible). In many, many environment thats lagging behind quite a
bit. Longrunning user transactions, pg_dump, hot_standby_feedback,
vacuum_defer_cleanup_age...

Also, even if the *whole* page isn't all visible because e.g. there just
was another row inserted we still freeze individual rows.

Greetings,

Andres Freund

--
 Andres Freund                       http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


Re: Setting vacuum_freeze_min_age really low

От
Josh Berkus
Дата:
HAndres,

> Well. For one you haven't proven that the changed setting actually
> improves performance. So the comparison isn't really valid. We will

I agree that I haven't proven this yet, but that doesn't make it
invalid.  Just unproven.

I agree that performance testing is necessary ... and the kind of
performance testing which generated freeze activity, which makes it harder.

> I think you're missing the fact that we don't neccessarily dirty pages,
> just because vacuum visits them. In a mostly insert workload its not
> uncommon that vacuum doesn't change anything. In many scenarios the

Hmmm.  But does vacuum visit the pages anyway, in that case?

> b) freezing tuples requires a xlog_heap_freeze wal record to be
>    emitted. If we don't freeze, we don't need to emit it.

Oh, that's annoying.

> I think I have said that before, but anyway: I think as long as we need
> to regularly walk the whole relation for correctness there isn't much
> hope to get this into an acceptable state. If we would track the oldest
> xid in a page in a 'freeze map' we could make much of this more
> efficient and way more scalable to bigger data volumes.

Yeah, or come up with some way to eliminate freezing entirely.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com