Re: A costing analysis tool

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: A costing analysis tool
Дата
Msg-id 87hdbifo0t.fsf@stark.xeocode.com
обсуждение исходный текст
Ответ на Re: A costing analysis tool  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: A costing analysis tool  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> writes:

> More generally, I think that depending entirely on EXPLAIN ANALYZE
> numbers is a bad idea, because the overhead of EXPLAIN ANALYZE is both
> significant and variable depending on the plan structure.  The numbers
> that I think we must capture are the top-level EXPLAIN cost and the
> actual runtime of the query (*without* EXPLAIN).  Those are the things
> we would like to get to track closely.  EXPLAIN ANALYZE is incredibly
> valuable as context for such numbers, but it's not the thing we actually
> wish to optimize.

If you really want to do this I think it pays to go down to the raw data that
EXPLAIN is really calculating:

The resulting time is a sum of the time spent in various operations. So x
milliseconds were spent in sequential block fetchs, y milliseconds were spent
in random access block fetches, z milliseconds were spent in index lookups,
etc. Of course we can't really instrument every operation and record the
actual time spent in all of these things, only the total. My point is that
it's a linear equation, the sum of the average times for these various
operations with factors representing their counts.

The cost that the optimizer calculates is similar. It's a linear equation,
where it predicts x' sequential block reads, y' random access block reads, and
z' index lookups, etc. The resulting cost is just the sum of these things.

If the optimizer didn't collapse the cost for each node into a single value
and instead retained the individual parameters at each node it could bubble
those values all the way up to the surface. Then use the configuration options
like random_page_cost etc to calculate the resulting cost once.

But then you could gather the raw data where each line is a linear equation
and the actual calculated time spent in the query is the sum. The resulting
matrix of linear equations could then be thrown at some linear solver to tell
you what your systems "real" random_page_cost, index_tuple_cost, etc are.

Of course the fact that these values aren't really constant across queries and
also that the parameters being entered are just estimates to begin with would
mean there wouldn't be a perfect solution, but I bet there's research into how
to come up with good compromise solutions.

Hell I bet there's even good algorithms for telling you which parameters are
sketchy and are probably being poorly estimated or don't represent real-world
constant operations well.

Doing all this would mean a lot more far reaching changes than just looking at
EXPLAIN output of course :/

-- 
greg



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: slow IN() clause for many cases
Следующее
От: Greg Stark
Дата:
Сообщение: Re: A costing analysis tool