Piggybacking vacuum I/O

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Piggybacking vacuum I/O
Дата
Msg-id 45B4CF83.8060109@enterprisedb.com
обсуждение исходный текст
Ответы Re: Piggybacking vacuum I/O  ("Jim C. Nasby" <jim@nasby.net>)
Re: Piggybacking vacuum I/O  (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
Re: Piggybacking vacuum I/O  ("Pavan Deolasee" <pavan.deolasee@gmail.com>)
Список pgsql-hackers
I've been looking at the way we do vacuums.

The fundamental performance issue is that a vacuum generates 
nheapblocks+nindexblocks+ndirtyblocks I/Os. Vacuum cost delay helps to 
spread the cost like part payment, but the total is the same. In an I/O 
bound system, the extra I/O directly leads to less throughput.

Therefore, we need to do less I/O. Dead space map helps by allowing us 
to skip blocks that don't need vacuuming, reducing the # of I/Os to 
2*ndirtyblocks+nindexblocks. That's great, but it doesn't help us if the 
dead tuples are spread uniformly.

If we could piggyback the vacuum I/Os to the I/Os that we're doing 
anyway, vacuum wouldn't ideally have to issue any I/O of its own. I've 
tried to figure out a way to do that.

Vacuum is done in 3 phases:

1. Scan heap
2. Vacuum index
3. Vacuum heap

Instead of doing a sequential scan, we could perform the 1st phase by 
watching the buffer pool, scanning blocks for dead tuples when they're 
in memory and keeping track of which pages we've seen. When all pages 
have been seen, the tid list is sorted and 1st phase is done.

In theory, the index vacuum could also be done that way, but let's 
assume for now that indexes would be scanned like they are currently.

The 3rd phase can be performed similarly to the 1st phase. Whenever a 
page enters the buffer pool, we check the tid list and remove any 
matching tuples from the page. When the list is empty, vacuum is complete.

Of course, there's some issues in the design as described above. For 
example, the vacuum might take a long time if there's cold spots in the 
table. In fact, a block full of dead tuples might never be visited again.

A variation of the scheme would be to keep scanning pages that are in 
cache, until the tid list reaches a predefined size, instead of keeping 
track of which pages have already been seen. That would deal better with 
tables with hot and cold spots, but it couldn't advance the relfrozenid 
because there would be no guarantee that all pages are visited. Also, we 
could start 1st phase of the next vacuum, while we're still in the 3rd 
phase of previous one.

Also, after we've seen 95% of the pages or a timeout expires, we could 
fetch the rest of them with random I/O to let the vacuum finish.

I'm not sure how exactly this would be implemented. Perhaps bgwriter or 
autovacuum would do it, or a new background process. Presumably the 
process would need access to relcache.

One issue is that if we're trying to vacuum every table simultaneously 
this way, we'll need more overall memory for the tid lists. I'm hoping 
there's a way to implement this without requiring shared memory for the 
tid lists, that would make the memory management a nightmare. Also, we'd 
need changes to bufmgr API to support this.

This would work nicely with the DSM. The list of pages that need to be 
visited in phase 1 could be initialized from the DSM, largely avoiding 
the problem with cold spots.

Any thoughts before I start experimenting?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Merlin Moncure"
Дата:
Сообщение: Re: savepoint improvements
Следующее
От: "Simon Riggs"
Дата:
Сообщение: Re: savepoint improvements