Re: Dynamic gathering the values for seq_page_cost/xxx_cost

Поиск
Список
Период
Сортировка
От Andy Fan
Тема Re: Dynamic gathering the values for seq_page_cost/xxx_cost
Дата
Msg-id CAKU4AWqx_6k9D+2Px-7kRv4SnX-OFCRo3Dyxy+CVJiejsJKgcA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Dynamic gathering the values for seq_page_cost/xxx_cost  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Ответы Re: Dynamic gathering the values for seq_page_cost/xxx_cost
Список pgsql-hackers


On Fri, Sep 25, 2020 at 5:15 PM Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> wrote:
On Tue, Sep 22, 2020 at 10:57 AM Andy Fan <zhihui.fan1213@gmail.com> wrote:
>
>
> My tools set the random_page_cost to 8.6, but based on the fio data, it should be
> set to 12.3 on the same hardware.  and I do see the better plan as well with 12.3.
> Looks too smooth to believe it is true..
>
> The attached result_fio_mytool.tar.gz is my test result.   You can use git show HEAD^^
> is the original plan with 8.6.  git show HEAD^  show the plan changes after we changed
> the random_page_cost. git show HEAD shows the run time statistics changes for these queries.
> I also uploaded the test tool [1] for this for your double check.

The scripts seem to start and stop the server, drop caches for every
query. That's where you are seeing that setting random_page_cost to
fio based ratio provides better plans. But in practice, these costs
need to be set on a server where the queries are run concurrently and
repeatedly. That's where the caching behaviour plays an important 
role. Can we write a tool which can recommend costs for that scenario?
 
I totally agree with this. Actually the first thing I did is to define a
proper IO workload. At the very beginning, I used DIRECT_IO for both seq read
and random read on my SSD, and then found the result is pretty bad per testing
(random_page_cost = ~1.6).  then I realized postgresql relies on the prefetch
which is disabled by DIRECT_IO. After I fixed this, I tested again with the above
scenario (cache hit ratio = 0) to verify my IO model. Per testing, it looks good. 
I am also thinking if the random_page_cost = 1.1 doesn't provide a good result
on my SSD because it ignores the prefects of seq read.

After I am OK with my IO model, I test with the way you see above. but 
I also detect the latency for file system cache hit, which is handled by 
get_fs_cache_latency_us in my code (I ignored the shared buffer hits for now).
and allows user to provides a cache_hit_ratio, the final random_page_cost 
= (real_random_lat) / real_seq_lat, where 
real_xxx_lat = cache_hit_ratio * fs_cache_lat + (1 - cache_hit_ratio) * xxx_lat. 
See function cal_real_lat and cal_random_page_cost.

As for the testing with cache considered, I found how to estimate cache hit
ratio is hard or how to control a hit ratio to test is hard. Recently I am thinking
a method that we can get a page_reads, shared_buffer_hit from pg_kernel
and the real io (without the file system cache hit) at os level (just as what 
iotop/pidstat do). then we can know the shared_buffer hit ratio and file system
cache hit ratio (assume it will be stable after a long run). and then do a testing.
However this would be another branch of manual work and I still have not got
it done until now.

I'd not like to share too many details, but "lucky" many cases I have haven't file
system cache, that makes things a bit easier. What I am doing right now is to 
calculate the random_page_cost with the above algorithm with only shared_buffer
considered.  and test the real benefits with real workload to see how it works. 
If it works well, I think the only thing left is to handle file system cache.

The testing is time consuming since I have to cooperate with many site engineers, 
so any improvement on the design will be much helpful. 
 
How do the fio based cost perform when the queries are run repeatedly?


That probably is not good since I have 280G+ file system cache and I have to
prepare much more than 280G data size for testing. 


--
Best Regards
Andy Fan

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: Re: Optimize memory allocation code
Следующее
От: Li Japin
Дата:
Сообщение: Re: Optimize memory allocation code