Re: [PERFORM] encouraging index-only scans

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: [PERFORM] encouraging index-only scans
Дата
Msg-id 5238854F.5040708@nasby.net
обсуждение исходный текст
Ответ на Re: [PERFORM] encouraging index-only scans  (Andres Freund <andres@2ndquadrant.com>)
Ответы Re: [PERFORM] encouraging index-only scans  (Andres Freund <andres@2ndquadrant.com>)
Список pgsql-hackers
On 9/7/13 12:34 AM, Andres Freund wrote:
> What I was thinking of was to keep track of the oldest xids on pages
> that cannot be marked all visible. I haven't thought about the
> statistics part much, but what if we binned the space between
> [RecentGlobalXmin, ->nextXid) into 10 bins and counted the number of
> pages falling into each bin. Then after the vacuum finished we could
> compute how far RecentGlobalXmin would have to progress to make another
> vacuum worthwile by counting the number of pages from the lowest bin
> upwards and use the bin's upper limit as the triggering xid.
>
> Now, we'd definitely need to amend that scheme by something that handles
> pages that are newly written to, but it seems something like that
> wouldn't be too hard to implement and would make autovacuum more useful.

If we're binning by XID though you're still dependent on scanning to build that range. Anything that creates dead
tupleswill also be be problematic, because it's going to unset VM bits on you, and you won't know if it's due to
INSERTSor dead tuples.
 

What if we maintained XID stats for ranges of pages in a separate fork? Call it the XidStats fork. Presumably the
interestingpieces would be min(xmin) and max(xmax) for pages that aren't all visible. If we did that at a granularity
of,say, 1MB worth of pages[1] we're talking 8 bytes per MB, or 1 XidStats page per GB of heap. (Worst case alignment
bumpsthat up to 2 XidStats pages per GB of heap.)
 

Having both min(xmin) and max(xmax) for a range of pages would allow for very granular operation of vacuum. Instead of
hittingevery heap page that's not all-visible, it would only hit those that are not visible and where min(xmin) or
max(xmax)were less than RecentGlobalXmin.
 

One concern is maintaining this data. A key point is that we don't have to update it every time it changes; if the
min/maxare only off by a few hundred XIDs there's no point to updating the XidStats page. We'd obviously need the
XidStatspage to be read in, but even a 100GB heap would be either 100 or 200 XidStats pages.
 

[1]: There's a trade-off between how much space we 'waste' on XidStats pages and how many heap pages we potentially
haveto scan in the range. We'd want to see what this looked like in a real system. The thing that helps here is that
regardlessof what the stats for a particular heap range are, you're not going to scan any pages in that range that are
alreadyall-visible.
 
-- 
Jim C. Nasby, Data Architect                       jim@nasby.net
512.569.9461 (cell)                         http://jim.nasby.net



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: relscan_details.h
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Freezing without write I/O