Re: Performance

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: Performance
Дата
Msg-id 4DA6216E.9020907@fuzzy.cz
обсуждение исходный текст
Ответ на Re: Performance  (Nathan Boley <npboley@gmail.com>)
Ответы Re: Performance  (Claudio Freire <klaussfreire@gmail.com>)
Re: Performance  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-performance
Dne 14.4.2011 00:05, Nathan Boley napsal(a):
>>> If you model the costing to reflect the reality on your server, good
>>> plans will be chosen.
>>
>> Wouldn't it be "better" to derive those costs from actual performance
>> data measured at runtime?
>>
>> Say, pg could measure random/seq page cost, *per tablespace* even.
>>
>> Has that been tried?
>
> FWIW, awhile ago I wrote a simple script to measure this and found
> that the *actual* random_page / seq_page cost ratio was much higher
> than 4/1.
>
> The problem is that caching effects have a large effect on the time it
> takes to access a random page, and caching effects are very workload
> dependent. So anything automated would probably need to optimize the
> parameter values over a set of 'typical' queries, which is exactly
> what a good DBA does when they set random_page_cost...

Plus there's a separate pagecache outside shared_buffers, which adds
another layer of complexity.

What I was thinking about was a kind of 'autotuning' using real
workload. I mean - measure the time it takes to process a request
(depends on the application - could be time to load a page, process an
invoice, whatever ...) and compute some reasonable metric on it
(average, median, variance, ...). Move the cost variables a bit (e.g.
the random_page_cost) and see how that influences performance. If it
improved, do another step in the same direction, otherwise do step in
the other direction (or do no change the values at all).

Yes, I've had some lectures on non-linear programming so I'm aware that
this won't work if the cost function has multiple extremes (walleys /
hills etc.) but I somehow suppose that's not the case of cost estimates.

Another issue is that when measuring multiple values (processing of
different requests), the decisions may be contradictory so it really
can't be fully automatic.

regards
Tomas

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Kevin Grittner"
Дата:
Сообщение: Re: Performance
Следующее
От: Claudio Freire
Дата:
Сообщение: Re: Performance