Re: optimizing vacuum truncation scans

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: optimizing vacuum truncation scans
Дата
Msg-id CAMkU=1yfq8vDvS8o+3ubNL6PjixLwN78T4PVjRY1Ef+cu44bKw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: optimizing vacuum truncation scans  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: optimizing vacuum truncation scans  (Robert Haas <robertmhaas@gmail.com>)
Re: optimizing vacuum truncation scans  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Re: optimizing vacuum truncation scans  (Simon Riggs <simon@2ndQuadrant.com>)
Список pgsql-hackers
On Wed, Jul 22, 2015 at 6:59 AM, Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jun 29, 2015 at 1:54 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
> Attached is a patch that implements the vm scan for truncation.  It
> introduces a variable to hold the last blkno which was skipped during the
> forward portion.  Any blocks after both this blkno and after the last
> inspected nonempty page (which the code is already tracking) must have been
> observed to be empty by the current vacuum.  Any other process rendering the
> page nonempty are required to clear the vm bit, and no other process can set
> the bit again during the vacuum's lifetime.  So if the bit is still set, the
> page is still empty without needing to inspect it.

Urgh.  So if we do this, that forever precludes having HOT pruning set
the all-visible bit. 

I wouldn't say forever, as it would be easy to revert the change if something more important came along that conflicted with it.  I don't think this change would grow tentacles across the code that make it hard to revert, you would just have to take the performance hit (and by that time, maybe HDD will truly be dead anyway and so we don't care anymore). But yes, that is definitely a downside.  HOT pruning is one example, but also one could envision having someone (bgwriter?) set vm bits on unindexed tables.  Or if we invent some efficient way to know that no expiring tids for a certain block range are stored in indexes, other jobs could also set the vm bit on indexed tables.  Or parallel vacuums in the same table, not that I really see a reason to have those.
 
At the least we'd better document that carefully
so that nobody breaks it later.  But I wonder if there isn't some
better approach, because I would certainly rather that we didn't
foreclose the possibility of doing something like that in the future.

But where do we document it (other than in-place)?  README.HOT doesn't seem sufficient, and there is no README.vm.

I guess add an "Assert(InRecovery || running_a_vacuum);" to the visibilitymap_set with a comment there, except that I don't know how to implement running_a_vacuum so that it covers manual vacs as well as autovac.  Perhaps assert that we hold a SHARE UPDATE EXCLUSIVE on rel?

The advantage of the other approach, just force kernel read-ahead to work for us, is that it doesn't impose any of these restrictions on future development.  The disadvantage is that I don't know how to auto-tune it, or auto-disable it for SSD, and it will never be as quite as efficient.
 
Cheers,

Jeff

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Parallel Seq Scan
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: WAL logging problem in 9.4.3?