Re: index-only scans

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: index-only scans
Дата
Msg-id 4E46ED29.7020000@enterprisedb.com
обсуждение исходный текст
Ответ на Re: index-only scans  (Kääriäinen Anssi <anssi.kaariainen@thl.fi>)
Ответы Re: index-only scans  (Jim Nasby <jim@nasby.net>)
Re: index-only scans  (Anssi Kääriäinen <anssi.kaariainen@thl.fi>)
Re: index-only scans  (Marti Raudsepp <marti@juffo.org>)
Список pgsql-hackers
On 13.08.2011 23:35, Kääriäinen Anssi wrote:
> """
> Now, suppose that we know that 50% of the heap pages have their
> visibility map bits set.  What's the chance that this query won't need
> a heap fetch?  Well, the chance is 50% *if* you assume that a row
> which has been quiescent for a long time is just as likely to be
> queried as one that has been recently inserted or updated.  However,
> in many real-world use cases, nothing could be farther from the truth.
>
> What do we do about that?
> """
>
> The example is much more realistic if the query is a fetch of N latest rows from a table. Very common use case, and
thewhole relation's visibility statistics are completely wrong for that query.
 

That is somewhat compensated by the fact that tuples that are accessed 
more often are also more likely to be in cache. Fetching the heap tuple 
to check visibility is very cheap when the tuple is in cache.

I'm not sure how far that compensates it, though. I'm sure there's 
typically nevertheless a fairly wide range of pages that have been 
modified since the last vacuum, but not in cache anymore.

> Wouldn't it be great if there was something like pg_stat_statements that would know the statistics per query, derived
fromusage...
 
>
> Even if the statistics are not available per query, the statistics could be calculated from the relation usage: the
weightedvisibility of the pages would be pages_visible_when_read / total_pages_read for the relation. That percentage
wouldminimize the average cost of the plans much better than just the non-weighted visibility percentage.
 
>
> For the above example, if the usage is 90% read the N latest rows and we assume they are never visible, the weighted
visibilitypercentage would be 10% while the non-weighted visibility percentage could be 90%. Even if the visibility
percentagewould be incorrect for the queries reading old rows, by definition of the weighted visibility percentage
therewould not be too many of them.
 
>
> The same idea could of course be used to calculate the effective cache hit ratio for each table. Cache hit ratio
wouldhave the problem of feedback loops, though.
 

Yeah, I'm not excited about making the planner and statistics more 
dynamic. Feedback loops and plan instability are not fun. I think we 
should rather be more aggressive in setting the visibility map bits.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Greg Stark
Дата:
Сообщение: Re: our buffer replacement strategy is kind of lame
Следующее
От: Greg Stark
Дата:
Сообщение: Re: Further news on Clang - spurious warnings