Re: [PERFORM] encouraging index-only scans

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [PERFORM] encouraging index-only scans
Дата
Msg-id 20130917231050.GD29545@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: [PERFORM] encouraging index-only scans  (Jim Nasby <jim@nasby.net>)
Ответы Re: [PERFORM] encouraging index-only scans  (Jim Nasby <jim@nasby.net>)
Re: [PERFORM] encouraging index-only scans  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On 2013-09-17 11:37:35 -0500, Jim Nasby wrote:
> On 9/7/13 12:34 AM, Andres Freund wrote:
> >What I was thinking of was to keep track of the oldest xids on pages
> >that cannot be marked all visible. I haven't thought about the
> >statistics part much, but what if we binned the space between
> >[RecentGlobalXmin, ->nextXid) into 10 bins and counted the number of
> >pages falling into each bin. Then after the vacuum finished we could
> >compute how far RecentGlobalXmin would have to progress to make another
> >vacuum worthwile by counting the number of pages from the lowest bin
> >upwards and use the bin's upper limit as the triggering xid.
> >
> >Now, we'd definitely need to amend that scheme by something that handles
> >pages that are newly written to, but it seems something like that
> >wouldn't be too hard to implement and would make autovacuum more useful.
> 
> If we're binning by XID though you're still dependent on scanning to
> build that range. Anything that creates dead tuples will also be be
> problematic, because it's going to unset VM bits on you, and you won't
> know if it's due to INSERTS or dead tuples.

I don't think that's all that much of a problem. In the end, it's a good
idea to look at pages shortly after they have been filled/been
touched. Setting hint bits at that point avoid repetitive IO and in many
cases we will already be able to mark them all-visible.
The binning idea was really about sensibly estimating whether a new scan
already makes sense which is currently very hard to judge.

I generally think the current logic for triggering VACUUMs via
autovacuum doesn't really make all that much sense in the days where we
have the visibility map.

> What if we maintained XID stats for ranges of pages in a separate
> fork? Call it the XidStats fork. Presumably the interesting pieces
> would be min(xmin) and max(xmax) for pages that aren't all visible. If
> we did that at a granularity of, say, 1MB worth of pages[1] we're
> talking 8 bytes per MB, or 1 XidStats page per GB of heap. (Worst case
> alignment bumps that up to 2 XidStats pages per GB of heap.)

Yes, I have thought about similar ideas as well, but I came to the
conclusion that it's not worth it. If you want to make the boundaries
precise and the xidstats fork small, you're introducing new contention
points because every DML will need to make sure it's correct.
Also, the amount of code that would require seems to be bigger than
justified by the increase of precision when to vacuum.


Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: [RFC] Extend namespace of valid guc names
Следующее
От: Andres Freund
Дата:
Сообщение: Re: INSERT...ON DUPLICATE KEY LOCK FOR UPDATE