Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Дата
Msg-id 544468C1.6050101@BlueTreble.com
обсуждение исходный текст
Ответ на Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: Autovacuum fails to keep visibility map up-to-date in mostly-insert-only-tables
Список pgsql-hackers

On 10/19/14, 11:41 AM, Andres Freund wrote:
> On 2014-10-18 21:36:48 -0500, Jim Nasby wrote:
>> On 10/9/14, 4:19 PM, Andres Freund wrote:
>>> On 2014-10-09 18:16:46 -0300, Alvaro Herrera wrote:
>>>>> Andres Freund wrote:
>>>>>>> On 2014-10-09 18:03:00 -0300, Alvaro Herrera wrote:
>>>>>>>>> Bruce Momjian wrote:
>>>>>>>>>
>>>>>>>>>>> I agree this is a serious problem.  We have discussed various options,
>>>>>>>>>>> but have not decided on anything.  The TODO list has:
>>>>>>>>>>>
>>>>>>>>>>>     https://wiki.postgresql.org/wiki/Todo
>>>>>>>>>>>
>>>>>>>>>>>     Improve setting of visibility map bits for read-only and insert-only
>>>>>>>>>>>     workloads
>>>>>>>>>>>     
>>>>>>>>>>>     http://www.postgresql.org/message-id/20130906001437.GA29264@momjian.us
>>>>>>>>>
>>>>>>>>> I hate to repeat myself, but I think autovacuum could be modified to run
>>>>>>>>> actions other than vacuum and analyze.  In this specific case we could
>>>>>>>>> be running a table scan that checks only pages that don't have the
>>>>>>>>> all-visible bit set, and see if it can be set.
>>>>>>>
>>>>>>> Isn't that*precisely*  what a plain vacuum run does?
>>>>>
>>>>> Well, it also scans for dead tuples, removes them, and needs to go
>>>>> through indexes to remove their references.
>
>>> IIRC it doesn't do most of that if that there's no need. And if it's a
>>> insert only table without rollbacks. I*do*  think there's some
>>> optimizations we could make in general.
>>
>> No, it always attempts dead tuple removal.
>
> I said some steps, not all steps. Check it out:
>
>      /* If any tuples need to be deleted, perform final vacuum cycle */
>      /* XXX put a threshold on min number of tuples here? */
>      if (vacrelstats->num_dead_tuples > 0)
>      {
>          /* Log cleanup info before we touch indexes */
>          vacuum_log_cleanup_info(onerel, vacrelstats);
>
>          /* Remove index entries */
>          for (i = 0; i < nindexes; i++)
>              lazy_vacuum_index(Irel[i],
>                                &indstats[i],
>                                vacrelstats);
>          /* Remove tuples from heap */
>          lazy_vacuum_heap(onerel, vacrelstats);
>          vacrelstats->num_index_scans++;
>      }
>
> There's rub here though. We unconditionally do:
>     /* Do post-vacuum cleanup and statistics update for each index */
>     for (i = 0; i < nindexes; i++)
>         lazy_cleanup_index(Irel[i], indstats[i], vacrelstats);
>
> and that's not particularly cheap. Maybe we should make that conditional
> when there's been no lazy_vacuum_index/heap calls at all?

We could possibly pass in to lazy_cleanup_index whether we actually removed any tuples.

>> The "weird" part is that if it's not doing a freeze it will just punt
>> on a page if it can't get the cleanup lock.
>
> I don't think that's particularly wierd. Otherwise vacuum can get stuck
> behind a single very hot page - leading to much, much more bloat.
>
>> I have to believe that could seriously screw up autovacuum scheduling.
>
> Why?

I'm worried there could be some pathological cases where we'd skip a large number of pages, perhaps if a vacuum scan
anda seqscan ended up running alongside each other.
 

Perhaps this is just paranoia, but we have no idea how bad things might be, because we don't have any logging for how
manypages we skipped because we couldn't lock them.
 

Also, if this really is that big a deal for heap pages, how come we don't get screwed by it on Btree index pages, where
wemandate that we acquire a cleanup lock?
 

>> Now that we have forks, I'm wondering if it would be best to come up
>> with a per-page system that could be used to determine when a table
>> needs background work to be done. The visibility map could serve a lot
>> of this purpose, but I'm not sure if it would work for getting hint
>> bits set in the background.
>
> It would. Per definition, all tuples that are 'all visible' need to be
> fully hint bitted.
>
>> I think it would also be a win if we had a way to advance relfrozenxid
>> and relminmxid. Perhaps something that simply remembered the last XID
>> that touched each page...
>
> Not sure what you're getting at here?

That ultimately, our current method for determining when and what to vacuum is rather crude, and likely results in
wastedeffort during scans as well as not firing autovac often enough. Keep in mind that autovac started as a user-space
utilityand the best it could possibly do was to keep a table of stats counters.
 

The visibility map obviously helps cut down on extra work during a scan, but it only goes so far in that regard.

Instead of relying on the crude methods, if we reliably tracked certain txids on a per-block basis in a fork, we could
cheaplyscan the fork and make an extremely informed decision on how much a vacuum would gain us, and exactly what
blocksit should hit.
 

Let me use freezing as an example. If we had a reliable list of the lowest txid for each block of a relation that would
allowus to do a freeze scan by hitting only blocks with minimum txid within our freeze range. The same could be done
formultixacts.
 

That's just one example. We could do something similar for background hinting (oldest xmin/xmax of all non-hinted
tuples),and whether it's worth trying to vacuum (I think that'd be a combination of oldest non-locking xmax and seeing
howmuch room the FSM has listed for the page).
 

If we stored 3 txids for each block in a fork, we could fit information for ~680 heap blocks in each fork block. So in
adatabase with 680G of heap data, we could fully determine every *block* (not table) we needed to vacuum by scanning
just1GB of data. That would allow for far better autovacuum scheduling than what we do today.
 

> I think the big missing piece lest something like Heikki's xid lsn
> ranges thing gets finished is a freeze map.

The problem with a simple freeze map is when do you actually set the bit? If you do it while the transaction that
createdall the tuples is still running then any attempt to use the map prior to those tuples becoming all visible is
pointless.Presumably this is why pd_prune_xid stores a txid and isn't just a boolean.
 
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabrízio de Royes Mello
Дата:
Сообщение: Re: Proposal : REINDEX SCHEMA
Следующее
От: Jim Nasby
Дата:
Сообщение: Re: get_actual_variable_range vs idx_scan/idx_tup_fetch