On Sat, Feb 2, 2013 at 5:25 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-02-01 15:09:34 -0800, Jeff Janes wrote:
>> On Fri, Feb 1, 2013 at 2:34 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> > On 2013-02-01 14:05:46 -0800, Jeff Janes wrote:
>>
>> >> As far as I can tell this bug kicks in when your cluster gets to be
>> >> older than freeze_min_age, and then lasts forever after. After that
>> >> point pretty much every auto-vacuum inspired by update/deletion
>> >> activity will get promoted to a full table scan. (Which makes me
>> >> wonder how much field-testing the vm-only vacuum has received, if it
>> >> was rarely happening in practice due to this bug.)
>> >
>> > I think you're misreading the code. freezeTableLimit is calculated by
>>
>> >> > limit = ReadNewTransactionId() - freezetable;
>>
>> > which is always relative to the current xid. The bug was that
>> > freezetable had the wrong value in autovac due to freeze_min_age being
>> > used instead of freeze_table_age.
>>
>> Right. Since freeze_min_age was mistakenly being used, the limit
>> would be 50 million in the past (rather than 150 million) under
>> defaults. But since the last full-table vacuum, whenever that was,
>> used freeze_min_age for its intended purpose, that means the 50
>> million in the past *at the time of that last vacuum* is the highest
>> that relfrozenxid can be. And that is going to be further back than
>> 50 million from right now, so the vacuum will always be promoted to a
>> full scan.
>
> Oh, wow. Youre right. I shouldn't answer emails after sport with cramped
> fingers on a friday night... And I should have thought about this
> scenario, because I essentially already explained it upthread, just with
> a different set of variables.
>
> This is rather scary. How come nobody noticed that this major
> performance improvement was effectively disabled for that long?
I'm not sure whom to address this to, but the just-committed release
notes for this issue reflect the original understanding that it only
applied when vacuum_freeze_min_age was lowered from its default.
Rather than the current understanding that it effects all old-enough
systems.
If the release notes are not already baked in, I would suggest this wording:
+ The main consequence of this mistake is that it
+ caused full-table vacuuming scans to occur much more frequently
+ than intended.
Cheers,
Jeff