Re: index-only scans

Поиск
Список
Период
Сортировка
От Anssi Kääriäinen
Тема Re: index-only scans
Дата
Msg-id 4E4A4559.7040804@thl.fi
обсуждение исходный текст
Ответ на Re: index-only scans  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Ответы Re: index-only scans  (Cédric Villemain <cedric.villemain.debian@gmail.com>)
Re: index-only scans  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers
On 08/14/2011 12:31 AM, Heikki Linnakangas wrote:
>> The same idea could of course be used to calculate the effective cache hit ratio for each table. Cache hit ratio
wouldhave the problem of feedback loops, though.
 
> Yeah, I'm not excited about making the planner and statistics more
> dynamic. Feedback loops and plan instability are not fun.
I might be a little out of my league here... But I was thinking about 
the cache hit ratio and feedback loops. I understand automatic tuning 
would be hard. But making automatic tuning easier (by using pg_tune for 
example) would be a big plus for most use cases.

To make it easier to tune the page read costs automatically, it would be 
nice if there would be four variables instead of the current two:  - random_page_cost is the cost of reading a random
pagefrom storage. 
 
Currently it is not, it is the cost of accessing a random page, taking 
in account it might be in memory.  - seq_page_cost is the cost of reading pages sequentially from storage  -
memory_page_costis the cost of reading a page in memory  - cache_hit_ratio is the expected cache hit ratio
 

memory_page_cost would be server global, random and seq page costs 
tablespace specific, and cache_hit_ratio relation specific. You would 
get the current behavior by tuning *_page_costs realistically, and 
setting cache_hit_ratio globally so that the expected random_page_cost / 
seq_page_cost stays the same as now.

The biggest advantage of this would be that the correct values are much 
easier to detect automatically compared to current situation. This can 
be done using pg_statio_* views and IO speed testing. They should not be 
tuned automatically by PostgreSQL, at least not the cache_hit_ratio, as 
that leads to the possibility of feedback loops and plan instability. 
The variables would also be much easier to understand.

There is the question if one should be allowed to tune the *_page_costs 
at all. If I am not missing something, it is possible to detect the 
correct values programmatically and they do not change if you do not 
change the hardware. Cache hit ratio is the real reason why they are 
currently so important for tuning.

An example why the current random_page_cost and seq_page_cost tuning is 
not adequate is that you can only set random_page_cost per tablespace. 
That makes perfect sense if random_page_cost would be the cost of 
accessing a page in storage. But it is not, it is a combination of that 
and caching effects, so that it actually varies per relation (and over 
time). How do you set it correctly for a query where one relation is 
fully cached and another one not?

Another problem is that if you use random_page_cost == seq_page_cost, 
you are effectively saying that everything is in cache. But if 
everything is in cache, the cost of page access relative to cpu_*_costs 
is way off. The more random_page_cost and seq_page_cost are different, 
the more they mean the storage access costs. When they are the same, 
they mean the memory page cost. There can be an order of magnitude in 
difference of a storage page cost and a memory page cost. So it is hard 
to tune the cpu_*_costs realistically for cases where sometimes data is 
in cache and sometimes not.

Ok, enough hand waving for one post :) Sorry if this all is obvious / 
discussed before. My googling didn't turn out anything directly related, 
although these have some similarity: - Per-table random_page_cost for tables that we know are always cached 
[http://archives.postgresql.org/pgsql-hackers/2008-04/msg01503.php] - Script to compute random page cost
[http://archives.postgresql.org/pgsql-hackers/2002-09/msg00503.php]
-  The science of optimization in practical terms?
[http://archives.postgresql.org/pgsql-hackers/2009-02/msg00718.php], 
getting really interesting starting from here:
[http://archives.postgresql.org/pgsql-hackers/2009-02/msg00787.php]
 - Anssi



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Magnus Hagander
Дата:
Сообщение: Re: pg_stat_replication vs StandbyReplyMessage
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: Re: Should we have an optional limit on the recursion depth of recursive CTEs?