Обсуждение: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

Поиск
Список
Период
Сортировка

Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Alexey Bashtanov
Дата:
Hello!

Autovacuum daemon performs vacuum when the number of rows 
updated/deleted (n_dead_tuples) reaches some threshold.
Similarly it performs analyze when the number of rows changed in any way 
(incl. inserted).
When a table is mostly insert-only, its visibility map is not updated as 
vacuum threshold is almost never reached, but analyze does not update 
visibility map.

Why could it be a bad idea to run vacuum after some number of any 
changes including inserts, like analyze?
Or at least make it tunable by user (add a second bunch of paramters to 
control second vacuum threshold, disabled by default)?

Best regards,  Alexey Bashtanov



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Bruce Momjian
Дата:
On Thu, Oct  9, 2014 at 02:34:17PM +0400, Alexey Bashtanov wrote:
> Hello!
> 
> Autovacuum daemon performs vacuum when the number of rows
> updated/deleted (n_dead_tuples) reaches some threshold.
> Similarly it performs analyze when the number of rows changed in any
> way (incl. inserted).
> When a table is mostly insert-only, its visibility map is not
> updated as vacuum threshold is almost never reached, but analyze
> does not update visibility map.
> 
> Why could it be a bad idea to run vacuum after some number of any
> changes including inserts, like analyze?
> Or at least make it tunable by user (add a second bunch of paramters
> to control second vacuum threshold, disabled by default)?

I agree this is a serious problem.  We have discussed various options,
but have not decided on anything.  The TODO list has:
https://wiki.postgresql.org/wiki/Todo
Improve setting of visibility map bits for read-only and insert-onlyworkloads
http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us


--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + Everyone has their own god. +



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Alvaro Herrera
Дата:
Bruce Momjian wrote:

> I agree this is a serious problem.  We have discussed various options,
> but have not decided on anything.  The TODO list has:
> 
>     https://wiki.postgresql.org/wiki/Todo
> 
>     Improve setting of visibility map bits for read-only and insert-only
>     workloads
>     
>       http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us

I hate to repeat myself, but I think autovacuum could be modified to run
actions other than vacuum and analyze.  In this specific case we could
be running a table scan that checks only pages that don't have the
all-visible bit set, and see if it can be set.  (Of course, this idea
needs refinement to avoid running over and over when the bit cannot be
set on some pages for whatever reason.)

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Andres Freund
Дата:
On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
> Bruce Momjian wrote:
> 
> > I agree this is a serious problem.  We have discussed various options,
> > but have not decided on anything.  The TODO list has:
> > 
> >     https://wiki.postgresql.org/wiki/Todo
> > 
> >     Improve setting of visibility map bits for read-only and insert-only
> >     workloads
> >     
> >       http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us
> 
> I hate to repeat myself, but I think autovacuum could be modified to run
> actions other than vacuum and analyze.  In this specific case we could
> be running a table scan that checks only pages that don't have the
> all-visible bit set, and see if it can be set.

Isn't that *precisely* what a plain vacuum run does?

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Kevin Grittner
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> wrote:
> Bruce Momjian wrote:

>> I agree this is a serious problem.  We have discussed various options,
>> but have not decided on anything.  The TODO list has:
>>
>>     https://wiki.postgresql.org/wiki/Todo
>>
>>     Improve setting of visibility map bits for read-only and insert-only
>>     workloads
>>
>>       http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us
>
> I hate to repeat myself, but I think autovacuum could be modified to run
> actions other than vacuum and analyze.  In this specific case we could
> be running a table scan that checks only pages that don't have the
> all-visible bit set, and see if it can be set.  (Of course, this idea
> needs refinement to avoid running over and over when the bit cannot be
> set on some pages for whatever reason.)

Wouldn't we get substantially the same thing just by counting tuple
inserts toward the autovacuum vacuum threshold?  I mean, it unless
the table is due for wraparound prevention autovacuum, it will only
visit pages that don't have the all-visible bit set, right?  And
how much work would that do beyond what you're describing if none
of the tuples are dead?

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Alvaro Herrera
Дата:
Andres Freund wrote:
> On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
> > Bruce Momjian wrote:
> > 
> > > I agree this is a serious problem.  We have discussed various options,
> > > but have not decided on anything.  The TODO list has:
> > > 
> > >     https://wiki.postgresql.org/wiki/Todo
> > > 
> > >     Improve setting of visibility map bits for read-only and insert-only
> > >     workloads
> > >     
> > >       http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us
> > 
> > I hate to repeat myself, but I think autovacuum could be modified to run
> > actions other than vacuum and analyze.  In this specific case we could
> > be running a table scan that checks only pages that don't have the
> > all-visible bit set, and see if it can be set.
> 
> Isn't that *precisely* what a plain vacuum run does?

Well, it also scans for dead tuples, removes them, and needs to go
through indexes to remove their references.  I'm thinking in something
very lightweight.  Otherwise, why don't we just reduce the
vacuum_scale_factor default to something very small, so that vacuum is
triggered more often?

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Alvaro Herrera
Дата:
Kevin Grittner wrote:

> Wouldn't we get substantially the same thing just by counting tuple
> inserts toward the autovacuum vacuum threshold?  I mean, it unless
> the table is due for wraparound prevention autovacuum, it will only
> visit pages that don't have the all-visible bit set, right?  And
> how much work would that do beyond what you're describing if none
> of the tuples are dead?

The problem is precisely what happens if there are some dead tuples, but
not enough to reach the 20% threshold: this vacuum now has to scan the
table twice and has to clean up indexes also.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Andres Freund
Дата:
On 2014-10-09 18:16:46 -0300, Alvaro Herrera wrote:
> Andres Freund wrote:
> > On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
> > > Bruce Momjian wrote:
> > > 
> > > > I agree this is a serious problem.  We have discussed various options,
> > > > but have not decided on anything.  The TODO list has:
> > > > 
> > > >     https://wiki.postgresql.org/wiki/Todo
> > > > 
> > > >     Improve setting of visibility map bits for read-only and insert-only
> > > >     workloads
> > > >     
> > > >       http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us
> > > 
> > > I hate to repeat myself, but I think autovacuum could be modified to run
> > > actions other than vacuum and analyze.  In this specific case we could
> > > be running a table scan that checks only pages that don't have the
> > > all-visible bit set, and see if it can be set.
> > 
> > Isn't that *precisely* what a plain vacuum run does?
> 
> Well, it also scans for dead tuples, removes them, and needs to go
> through indexes to remove their references.

IIRC it doesn't do most of that if that there's no need. And if it's a
insert only table without rollbacks. I *do* think there's some
optimizations we could make in general.

> I'm thinking in something
> very lightweight.  Otherwise, why don't we just reduce the
> vacuum_scale_factor default to something very small, so that vacuum is
> triggered more often?

The problem here is that that doesn't trigger for inserts. Just for
updates/deletes or rollbacks.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Jim Nasby
Дата:
On 10/9/14, 4:03 PM, Alvaro Herrera wrote:
> Bruce Momjian wrote:
>
>> I agree this is a serious problem.  We have discussed various options,
>> but have not decided on anything.  The TODO list has:
>>
>>     https://wiki.postgresql.org/wiki/Todo
>>
>>     Improve setting of visibility map bits for read-only and insert-only
>>     workloads
>>     
>>       http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us
>
> I hate to repeat myself, but I think autovacuum could be modified to run
> actions other than vacuum and analyze.  In this specific case we could
> be running a table scan that checks only pages that don't have the
> all-visible bit set, and see if it can be set.  (Of course, this idea
> needs refinement to avoid running over and over when the bit cannot be
> set on some pages for whatever reason.)

If we go down that road we should also think about having it proactively set hint bits...
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Jim Nasby
Дата:
On 10/9/14, 4:19 PM, Andres Freund wrote:
> On 2014-10-09 18:16:46 -0300, Alvaro Herrera wrote:
>> >Andres Freund wrote:
>>> > >On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
>>>> > > >Bruce Momjian wrote:
>>>> > > >
>>>>> > > > >I agree this is a serious problem.  We have discussed various options,
>>>>> > > > >but have not decided on anything.  The TODO list has:
>>>>> > > > >
>>>>> > > > >    https://wiki.postgresql.org/wiki/Todo
>>>>> > > > >
>>>>> > > > >    Improve setting of visibility map bits for read-only and insert-only
>>>>> > > > >    workloads
>>>>> > > > >    
>>>>> > > > >    http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us
>>>> > > >
>>>> > > >I hate to repeat myself, but I think autovacuum could be modified to run
>>>> > > >actions other than vacuum and analyze.  In this specific case we could
>>>> > > >be running a table scan that checks only pages that don't have the
>>>> > > >all-visible bit set, and see if it can be set.
>>> > >
>>> > >Isn't that*precisely*  what a plain vacuum run does?
>> >
>> >Well, it also scans for dead tuples, removes them, and needs to go
>> >through indexes to remove their references.
> IIRC it doesn't do most of that if that there's no need. And if it's a
> insert only table without rollbacks. I*do*  think there's some
> optimizations we could make in general.

No, it always attempts dead tuple removal. The "weird" part is that if it's not doing a freeze it will just punt on a
pageif it can't get the cleanup lock. I have to believe that could seriously screw up autovacuum scheduling.
 

Now that we have forks, I'm wondering if it would be best to come up with a per-page system that could be used to
determinewhen a table needs background work to be done. The visibility map could serve a lot of this purpose, but I'm
notsure if it would work for getting hint bits set in the background. I think it would also be a win if we had a way to
advancerelfrozenxid and relminmxid. Perhaps something that simply remembered the last XID that touched each page...
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Andres Freund
Дата:
On 2014-10-18 21:36:48 -0500, Jim Nasby wrote:
> On 10/9/14, 4:19 PM, Andres Freund wrote:
> >On 2014-10-09 18:16:46 -0300, Alvaro Herrera wrote:
> >>>Andres Freund wrote:
> >>>> >On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
> >>>>> > >Bruce Momjian wrote:
> >>>>> > >
> >>>>>> > > >I agree this is a serious problem.  We have discussed various options,
> >>>>>> > > >but have not decided on anything.  The TODO list has:
> >>>>>> > > >
> >>>>>> > > >    https://wiki.postgresql.org/wiki/Todo
> >>>>>> > > >
> >>>>>> > > >    Improve setting of visibility map bits for read-only and insert-only
> >>>>>> > > >    workloads
> >>>>>> > > >    
> >>>>>> > > >    http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us
> >>>>> > >
> >>>>> > >I hate to repeat myself, but I think autovacuum could be modified to run
> >>>>> > >actions other than vacuum and analyze.  In this specific case we could
> >>>>> > >be running a table scan that checks only pages that don't have the
> >>>>> > >all-visible bit set, and see if it can be set.
> >>>> >
> >>>> >Isn't that*precisely*  what a plain vacuum run does?
> >>>
> >>>Well, it also scans for dead tuples, removes them, and needs to go
> >>>through indexes to remove their references.

> >IIRC it doesn't do most of that if that there's no need. And if it's a
> >insert only table without rollbacks. I*do*  think there's some
> >optimizations we could make in general.
> 
> No, it always attempts dead tuple removal.

I said some steps, not all steps. Check it out:
   /* If any tuples need to be deleted, perform final vacuum cycle */   /* XXX put a threshold on min number of tuples
here?*/   if (vacrelstats->num_dead_tuples > 0)   {       /* Log cleanup info before we touch indexes */
vacuum_log_cleanup_info(onerel,vacrelstats);
 
       /* Remove index entries */       for (i = 0; i < nindexes; i++)           lazy_vacuum_index(Irel[i],
               &indstats[i],                             vacrelstats);       /* Remove tuples from heap */
lazy_vacuum_heap(onerel,vacrelstats);       vacrelstats->num_index_scans++;   }
 

There's rub here though. We unconditionally do:/* Do post-vacuum cleanup and statistics update for each index */for (i
=0; i < nindexes; i++)    lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
 

and that's not particularly cheap. Maybe we should make that conditional
when there's been no lazy_vacuum_index/heap calls at all?

> The "weird" part is that if it's not doing a freeze it will just punt
> on a page if it can't get the cleanup lock.

I don't think that's particularly wierd. Otherwise vacuum can get stuck
behind a single very hot page - leading to much, much more bloat.

> I have to believe that could seriously screw up autovacuum scheduling.

Why?

> Now that we have forks, I'm wondering if it would be best to come up
> with a per-page system that could be used to determine when a table
> needs background work to be done. The visibility map could serve a lot
> of this purpose, but I'm not sure if it would work for getting hint
> bits set in the background.

It would. Per definition, all tuples that are 'all visible' need to be
fully hint bitted.

> I think it would also be a win if we had a way to advance relfrozenxid
> and relminmxid. Perhaps something that simply remembered the last XID
> that touched each page...

Not sure what you're getting at here?

I think the big missing piece lest something like Heikki's xid lsn
ranges thing gets finished is a freeze map.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Tom Lane
Дата:
Andres Freund <andres@2ndquadrant.com> writes:
> There's rub here though. We unconditionally do:
>     /* Do post-vacuum cleanup and statistics update for each index */
>     for (i = 0; i < nindexes; i++)
>         lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);

> and that's not particularly cheap. Maybe we should make that conditional
> when there's been no lazy_vacuum_index/heap calls at all?

Absolutely not.  If the cleanup step is skippable, it would be the
province of the index AM to make that decision.
        regards, tom lane



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Andres Freund
Дата:
On 2014-10-19 12:50:30 -0400, Tom Lane wrote:
> Andres Freund <andres@2ndquadrant.com> writes:
> > There's rub here though. We unconditionally do:
> >     /* Do post-vacuum cleanup and statistics update for each index */
> >     for (i = 0; i < nindexes; i++)
> >         lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
> 
> > and that's not particularly cheap. Maybe we should make that conditional
> > when there's been no lazy_vacuum_index/heap calls at all?
> 
> Absolutely not.  If the cleanup step is skippable, it would be the
> province of the index AM to make that decision.

Fair point. At the moment we're doing a full of nbtree indexes everytime
we do a vacuum. Even when the heap vacuum only scanned a couple hundred
pages of a huge table. That makes partial vacuum noticeably less
useful. So I do think we need to do something to improve upon the
situation.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Jim Nasby
Дата:

On 10/19/14, 11:41 AM, Andres Freund wrote:
> On 2014-10-18 21:36:48 -0500, Jim Nasby wrote:
>> On 10/9/14, 4:19 PM, Andres Freund wrote:
>>> On 2014-10-09 18:16:46 -0300, Alvaro Herrera wrote:
>>>>> Andres Freund wrote:
>>>>>>> On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
>>>>>>>>> Bruce Momjian wrote:
>>>>>>>>>
>>>>>>>>>>> I agree this is a serious problem.  We have discussed various options,
>>>>>>>>>>> but have not decided on anything.  The TODO list has:
>>>>>>>>>>>
>>>>>>>>>>>     https://wiki.postgresql.org/wiki/Todo
>>>>>>>>>>>
>>>>>>>>>>>     Improve setting of visibility map bits for read-only and insert-only
>>>>>>>>>>>     workloads
>>>>>>>>>>>     
>>>>>>>>>>>     http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us
>>>>>>>>>
>>>>>>>>> I hate to repeat myself, but I think autovacuum could be modified to run
>>>>>>>>> actions other than vacuum and analyze.  In this specific case we could
>>>>>>>>> be running a table scan that checks only pages that don't have the
>>>>>>>>> all-visible bit set, and see if it can be set.
>>>>>>>
>>>>>>> Isn't that*precisely*  what a plain vacuum run does?
>>>>>
>>>>> Well, it also scans for dead tuples, removes them, and needs to go
>>>>> through indexes to remove their references.
>
>>> IIRC it doesn't do most of that if that there's no need. And if it's a
>>> insert only table without rollbacks. I*do*  think there's some
>>> optimizations we could make in general.
>>
>> No, it always attempts dead tuple removal.
>
> I said some steps, not all steps. Check it out:
>
>      /* If any tuples need to be deleted, perform final vacuum cycle */
>      /* XXX put a threshold on min number of tuples here? */
>      if (vacrelstats->num_dead_tuples > 0)
>      {
>          /* Log cleanup info before we touch indexes */
>          vacuum_log_cleanup_info(onerel, vacrelstats);
>
>          /* Remove index entries */
>          for (i = 0; i < nindexes; i++)
>              lazy_vacuum_index(Irel[i],
>                                &indstats[i],
>                                vacrelstats);
>          /* Remove tuples from heap */
>          lazy_vacuum_heap(onerel, vacrelstats);
>          vacrelstats->num_index_scans++;
>      }
>
> There's rub here though. We unconditionally do:
>     /* Do post-vacuum cleanup and statistics update for each index */
>     for (i = 0; i < nindexes; i++)
>         lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
>
> and that's not particularly cheap. Maybe we should make that conditional
> when there's been no lazy_vacuum_index/heap calls at all?

We could possibly pass in to lazy_cleanup_index whether we actually removed any tuples.

>> The "weird" part is that if it's not doing a freeze it will just punt
>> on a page if it can't get the cleanup lock.
>
> I don't think that's particularly wierd. Otherwise vacuum can get stuck
> behind a single very hot page - leading to much, much more bloat.
>
>> I have to believe that could seriously screw up autovacuum scheduling.
>
> Why?

I'm worried there could be some pathological cases where we'd skip a large number of pages, perhaps if a vacuum scan
anda seqscan ended up running alongside each other.
 

Perhaps this is just paranoia, but we have no idea how bad things might be, because we don't have any logging for how
manypages we skipped because we couldn't lock them.
 

Also, if this really is that big a deal for heap pages, how come we don't get screwed by it on Btree index pages, where
wemandate that we acquire a cleanup lock?
 

>> Now that we have forks, I'm wondering if it would be best to come up
>> with a per-page system that could be used to determine when a table
>> needs background work to be done. The visibility map could serve a lot
>> of this purpose, but I'm not sure if it would work for getting hint
>> bits set in the background.
>
> It would. Per definition, all tuples that are 'all visible' need to be
> fully hint bitted.
>
>> I think it would also be a win if we had a way to advance relfrozenxid
>> and relminmxid. Perhaps something that simply remembered the last XID
>> that touched each page...
>
> Not sure what you're getting at here?

That ultimately, our current method for determining when and what to vacuum is rather crude, and likely results in
wastedeffort during scans as well as not firing autovac often enough. Keep in mind that autovac started as a user-space
utilityand the best it could possibly do was to keep a table of stats counters.
 

The visibility map obviously helps cut down on extra work during a scan, but it only goes so far in that regard.

Instead of relying on the crude methods, if we reliably tracked certain txids on a per-block basis in a fork, we could
cheaplyscan the fork and make an extremely informed decision on how much a vacuum would gain us, and exactly what
blocksit should hit.
 

Let me use freezing as an example. If we had a reliable list of the lowest txid for each block of a relation that would
allowus to do a freeze scan by hitting only blocks with minimum txid within our freeze range. The same could be done
formultixacts.
 

That's just one example. We could do something similar for background hinting (oldest xmin/xmax of all non-hinted
tuples),and whether it's worth trying to vacuum (I think that'd be a combination of oldest non-locking xmax and seeing
howmuch room the FSM has listed for the page).
 

If we stored 3 txids for each block in a fork, we could fit information for ~680 heap blocks in each fork block. So in
adatabase with 680G of heap data, we could fully determine every *block* (not table) we needed to vacuum by scanning
just1GB of data. That would allow for far better autovacuum scheduling than what we do today.
 

> I think the big missing piece lest something like Heikki's xid lsn
> ranges thing gets finished is a freeze map.

The problem with a simple freeze map is when do you actually set the bit? If you do it while the transaction that
createdall the tuples is still running then any attempt to use the map prior to those tuples becoming all visible is
pointless.Presumably this is why pd_prune_xid stores a txid and isn't just a boolean.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Andres Freund
Дата:
On 2014-10-19 20:43:29 -0500, Jim Nasby wrote:
> On 10/19/14, 11:41 AM, Andres Freund wrote:
> >On 2014-10-18 21:36:48 -0500, Jim Nasby wrote:
> >>The "weird" part is that if it's not doing a freeze it will just punt
> >>on a page if it can't get the cleanup lock.
> >
> >I don't think that's particularly wierd. Otherwise vacuum can get stuck
> >behind a single very hot page - leading to much, much more bloat.
> >
> >>I have to believe that could seriously screw up autovacuum scheduling.
> >
> >Why?
> 
> I'm worried there could be some pathological cases where we'd skip a
> large number of pages, perhaps if a vacuum scan and a seqscan ended up
> running alongside each other.

I've seen little evidence of that. The reverse, a stuck autovacuum, is
imo much more likely. For this to be an actual problem you'd need to
encounter many pages that are not locked, but are pinned. That state
doesn't exist for very long.

> Perhaps this is just paranoia, but we have no idea how bad things
> might be, because we don't have any logging for how many pages we
> skipped because we couldn't lock them.

But so what? If we skip individual pages it won't be too bad - and very
likely waiting very long is going to be more painful. The page won't be
marked 'all visible' so the next vacuum will come around to it
again. And it'll also get cleaned up by opportunistic hot pruning.

> Also, if this really is that big a deal for heap pages, how come we
> don't get screwed by it on Btree index pages, where we mandate that we
> acquire a cleanup lock?

Because we never hold pins for btree pages for very long. Whereas we do
that for heap pages. If you e.g. run a cursor forward you can hold a pin
for essentially unbounded time.

> >>Now that we have forks, I'm wondering if it would be best to come up
> >>with a per-page system that could be used to determine when a table
> >>needs background work to be done. The visibility map could serve a lot
> >>of this purpose, but I'm not sure if it would work for getting hint
> >>bits set in the background.
> >
> >It would. Per definition, all tuples that are 'all visible' need to be
> >fully hint bitted.
> >
> >>I think it would also be a win if we had a way to advance relfrozenxid
> >>and relminmxid. Perhaps something that simply remembered the last XID
> >>that touched each page...
> >
> >Not sure what you're getting at here?
> 
> That ultimately, our current method for determining when and what to
> vacuum is rather crude, and likely results in wasted effort during
> scans as well as not firing autovac often enough. Keep in mind that
> autovac started as a user-space utility and the best it could possibly
> do was to keep a table of stats counters.

I agree that we should trigger autovacuum more often. It's
*intentionally* not triggered *at all* for insert only workloads (if you
discount anti wraparound vacuums). I think it's time to change that. For
that we'd need to make vacuums that don't delete any tuples cheaper. We
already rescan only the changed parts of the heaps - but we always scan
indexes fully...

> The visibility map obviously helps cut down on extra work during a
> scan, but it only goes so far in that regard.

Aha.

> Instead of relying on the crude methods, if we reliably tracked
> certain txids on a per-block basis in a fork, we could cheaply scan
> the fork and make an extremely informed decision on how much a vacuum
> would gain us, and exactly what blocks it should hit.

> Let me use freezing as an example. If we had a reliable list of the
> lowest txid for each block of a relation that would allow us to do a
> freeze scan by hitting only blocks with minimum txid within our freeze
> range. The same could be done for multixacts.

It'd also become a prime contention point because you'd need to
constantly update it. In contrast to a simple 'is frozen' bit (akin to
is_visible) which only changes infrequently, and only in one direction.

> If we stored 3 txids for each block in a fork, we could fit
> information for ~680 heap blocks in each fork block. So in a database
> with 680G of heap data, we could fully determine every *block* (not
> table) we needed to vacuum by scanning just 1GB of data. That would
> allow for far better autovacuum scheduling than what we do today.

It's not that simple. Wraparounds and locking complicate it
significantly.

> >I think the big missing piece lest something like Heikki's xid lsn
> >ranges thing gets finished is a freeze map.
> 
> The problem with a simple freeze map is when do you actually set the
> bit?

There's precisely one place where you can set it for normal
operation. During vacuum's scan.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Jim Nasby
Дата:
On 10/20/14, 3:11 PM, Andres Freund wrote:
> On 2014-10-19 20:43:29 -0500, Jim Nasby wrote:
>> On 10/19/14, 11:41 AM, Andres Freund wrote:
>>> On 2014-10-18 21:36:48 -0500, Jim Nasby wrote:
>>>> The "weird" part is that if it's not doing a freeze it will just punt
>>>> on a page if it can't get the cleanup lock.
>>>
>>> I don't think that's particularly wierd. Otherwise vacuum can get stuck
>>> behind a single very hot page - leading to much, much more bloat.
>>>
>>>> I have to believe that could seriously screw up autovacuum scheduling.
>>>
>>> Why?
>>
>> I'm worried there could be some pathological cases where we'd skip a
>> large number of pages, perhaps if a vacuum scan and a seqscan ended up
>> running alongside each other.
>
> I've seen little evidence of that. The reverse, a stuck autovacuum, is
> imo much more likely. For this to be an actual problem you'd need to
> encounter many pages that are not locked, but are pinned. That state
> doesn't exist for very long.

How would you actually get evidence of this... we don't log it. :) (See my proposal at
http://www.postgresql.org/message-id/54446C10.2080203@BlueTreble.com)

>> Perhaps this is just paranoia, but we have no idea how bad things
>> might be, because we don't have any logging for how many pages we
>> skipped because we couldn't lock them.
>
> But so what? If we skip individual pages it won't be too bad - and very
> likely waiting very long is going to be more painful. The page won't be
> marked 'all visible' so the next vacuum will come around to it
> again. And it'll also get cleaned up by opportunistic hot pruning.

Probably true. Hopefully we can start logging it and then we'll know for sure.


>> That ultimately, our current method for determining when and what to
>> vacuum is rather crude, and likely results in wasted effort during
>> scans as well as not firing autovac often enough. Keep in mind that
>> autovac started as a user-space utility and the best it could possibly
>> do was to keep a table of stats counters.
>
> I agree that we should trigger autovacuum more often. It's
> *intentionally* not triggered *at all* for insert only workloads (if you
> discount anti wraparound vacuums). I think it's time to change that. For
> that we'd need to make vacuums that don't delete any tuples cheaper. We
> already rescan only the changed parts of the heaps - but we always scan
> indexes fully...

Or maybe vacuum isn't the right way to handle some of these scenarios. It's become the catch-all for all of this stuff,
butmaybe that doesn't make sense anymore. Certainly when it comes to dealing with inserts there's no reason we *have*
todo anything other than set hint bits and possibly freeze xmin.
 

>> Instead of relying on the crude methods, if we reliably tracked
>> certain txids on a per-block basis in a fork, we could cheaply scan
>> the fork and make an extremely informed decision on how much a vacuum
>> would gain us, and exactly what blocks it should hit.
>
>> Let me use freezing as an example. If we had a reliable list of the
>> lowest txid for each block of a relation that would allow us to do a
>> freeze scan by hitting only blocks with minimum txid within our freeze
>> range. The same could be done for multixacts.
>
> It'd also become a prime contention point because you'd need to
> constantly update it. In contrast to a simple 'is frozen' bit (akin to
> is_visible) which only changes infrequently, and only in one direction.

Actually, the contention on freeze would very possibly be minimal, because it probably doesn't change very often. Even
ifit did, it's OK if the value isn't 100% accurate, so long as the recorded XID is guaranteed older than what's
actuallyon the page.
 

>> If we stored 3 txids for each block in a fork, we could fit
>> information for ~680 heap blocks in each fork block. So in a database
>> with 680G of heap data, we could fully determine every *block* (not
>> table) we needed to vacuum by scanning just 1GB of data. That would
>> allow for far better autovacuum scheduling than what we do today.
>
> It's not that simple. Wraparounds and locking complicate it
> significantly.

I realize what I'm talking about isn't trivial (though, I'm confused by your comment about wraparound since presumably
TransactionIdPrecedes()and it's ilk solve that problem...)
 

My ultimate point here is that we're using what are (today) very crude methods to control what gets vacuumed when, and
Ithink that now that we have resource forks would could do *much* better without a tremendous amount of work. But to
makea big advancement here we'll need to take a step back and rethink some things (like vacuum is the only way to
handlethese problems).
 

Let me put some thought into this.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Josh Berkus
Дата:
On 10/20/2014 05:39 PM, Jim Nasby wrote:
> Or maybe vacuum isn't the right way to handle some of these scenarios.
> It's become the catch-all for all of this stuff, but maybe that doesn't
> make sense anymore. Certainly when it comes to dealing with inserts
> there's no reason we *have* to do anything other than set hint bits and
> possibly freeze xmin.

+1

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Andres Freund
Дата:
On 2014-10-20 17:43:26 -0700, Josh Berkus wrote:
> On 10/20/2014 05:39 PM, Jim Nasby wrote:
> > Or maybe vacuum isn't the right way to handle some of these scenarios.
> > It's become the catch-all for all of this stuff, but maybe that doesn't
> > make sense anymore. Certainly when it comes to dealing with inserts
> > there's no reason we *have* to do anything other than set hint bits and
> > possibly freeze xmin.
> 
> +1

A page read is a page read. What's the point of heaving another process
do it? Vacuum doesn't dirty pages if they don't have to be
dirtied. Especially stuff like freezing cannot really be dealt with
outside of vacuum unless you make already complex stuff more complex for
a marginal benefit.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Jeff Janes
Дата:
On Mon, Oct 20, 2014 at 5:46 PM, Andres Freund <andres@2ndquadrant.com> wrote:
On 2014-10-20 17:43:26 -0700, Josh Berkus wrote:
> On 10/20/2014 05:39 PM, Jim Nasby wrote:
> > Or maybe vacuum isn't the right way to handle some of these scenarios.
> > It's become the catch-all for all of this stuff, but maybe that doesn't
> > make sense anymore. Certainly when it comes to dealing with inserts
> > there's no reason we *have* to do anything other than set hint bits and
> > possibly freeze xmin.
>
> +1

A page read is a page read. What's the point of heaving another process
do it?

It is only a page read if you have to read the page.  It would seem optimal to have bgwriter adventitiously set hint bits and vm bits, because that is the last point at which the page can be changed without risking that it be written out twice. At that point, it has been given the maximum amount of time it can be given for the interested transactions to have committed and to have aged past the xmin horizon.  I seem to recall that the main problem with that, though, is that you must be attached to a database in order to determine visibility, and bgwriter is not attached to a database.

Cheers,

Jeff

Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Jim Nasby
Дата:
On 10/21/14, 4:36 PM, Jeff Janes wrote:
> On Mon, Oct 20, 2014 at 5:46 PM, Andres Freund <andres@2ndquadrant.com <mailto:andres@2ndquadrant.com>> wrote:
>
>     On 2014-10-20 17:43:26 -0700, Josh Berkus wrote:
>     > On 10/20/2014 05:39 PM, Jim Nasby wrote:
>     > > Or maybe vacuum isn't the right way to handle some of these scenarios.
>     > > It's become the catch-all for all of this stuff, but maybe that doesn't
>     > > make sense anymore. Certainly when it comes to dealing with inserts
>     > > there's no reason we *have* to do anything other than set hint bits and
>     > > possibly freeze xmin.
>     >
>     > +1
>
>     A page read is a page read. What's the point of heaving another process
>     do it?
>
>
> It is only a page read if you have to read the page.  It would seem optimal to have bgwriter adventitiously set hint
bitsand vm bits, because that is the last point at which the page can be changed without risking that it be written out
twice.At that point, it has been given the maximum amount of time it can be given for the interested transactions to
havecommitted and to have aged past the xmin horizon.  I seem to recall that the main problem with that, though, is
thatyou must be attached to a database in order to determine visibility, and bgwriter is not attached to a database.
 

It's also a bit more complex than a simple question of "is the page still in shared buffers". Our *real* last chance is
whenthe page is about to be evicted from the filesystem cache; after that reading it back it will be extremely
expensive(relatively speaking).
 

I think it's worth considering this, because if you have any moderate length transactions on a busy database bgwriter
won'tbe able to help much; you'll be burning through shared buffers too quickly.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Alvaro Herrera
Дата:
Jeff Janes wrote:

> It is only a page read if you have to read the page.  It would seem optimal
> to have bgwriter adventitiously set hint bits and vm bits, because that is
> the last point at which the page can be changed without risking that it be
> written out twice. At that point, it has been given the maximum amount of
> time it can be given for the interested transactions to have committed and
> to have aged past the xmin horizon.  I seem to recall that the main problem
> with that, though, is that you must be attached to a database in order to
> determine visibility, and bgwriter is not attached to a database.

Regarding tuple hint bits, I couldn't find any such limitation in
SetHintBits, other than in MarkBufferDirtyHint there being some code
that would cause trouble: it accesses MyPgXact, which bgwriter would
obviously not have.  Maybe worth some experimentation ...

I'm not sure about vm bits, though.  That's a whole different topic.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services



Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

От
Robert Haas
Дата:
On Mon, Oct 27, 2014 at 5:51 PM, Alvaro Herrera
<alvherre@2ndquadrant.com> wrote:
> Jeff Janes wrote:
>> It is only a page read if you have to read the page.  It would seem optimal
>> to have bgwriter adventitiously set hint bits and vm bits, because that is
>> the last point at which the page can be changed without risking that it be
>> written out twice. At that point, it has been given the maximum amount of
>> time it can be given for the interested transactions to have committed and
>> to have aged past the xmin horizon.  I seem to recall that the main problem
>> with that, though, is that you must be attached to a database in order to
>> determine visibility, and bgwriter is not attached to a database.
>
> Regarding tuple hint bits, I couldn't find any such limitation in
> SetHintBits, other than in MarkBufferDirtyHint there being some code
> that would cause trouble: it accesses MyPgXact, which bgwriter would
> obviously not have.  Maybe worth some experimentation ...
>
> I'm not sure about vm bits, though.  That's a whole different topic.

From a theoretical point of view, hint bits and VM bits present
mostly-similar issues; they are basically dependent on the state of
the transaction machinery, which for the most part is cluster-wide, so
it should be fine for a process to do that sort of work without being
bound to a specific database.   I think the biggest problem for either
is that the bgwriter can't look at the system catalogs to find out
whether a particular buffer contains a heap page or an index page (or
something else) and it needs to know that, which probably requires
another buffer flag.

From a practical point of view, there's a major performance difference
between the two operations: setting a VM bit requires emitting a WAL
record, whereas setting hint bits does not.  That might make it too
expensive to be worth doing, at least in some situations.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company