Re: Column correlation drifts, index ignored again

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Column correlation drifts, index ignored again
Дата
Msg-id 23533.1077647386@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Column correlation drifts, index ignored again  (Josh Berkus <josh@agliodbs.com>)
Ответы Re: Column correlation drifts, index ignored again
Список pgsql-performance
Josh Berkus <josh@agliodbs.com> writes:
> Kevin,
>> 1.  set enable_seqscan = on
>> 2.  set random_page_cost = <some really high value to force seqscans>
>> 3.  EXPLAIN ANALYZE query
>> 4.  record the ratio of estimated to actual scan times.
>> 5.  set enable_seqscan = off
>> 6.  set random_page_cost = <rough estimate of what it should be>
>> 7.  EXPLAIN ANALYZE query
>> 8.  record the actual index scan time(s)
>> 9.  tweak random_page_cost
>> 10.  EXPLAIN query
>> 11.  If ratio of estimate to actual (recorded in step 8) is much
>> different than that recorded in step 4, then go back to step 9.
>> Reduce random_page_cost if the random ratio is larger than the
>> sequential ratio, increase if it's smaller.

> Nice, we ought to post that somewhere people can find it in the future.

If we post it as recommended procedure we had better put big caveat
notices on it.  The pitfalls with doing this are:

1. If you repeat the sequence exactly as given, you will be homing in on
a RANDOM_PAGE_COST that describes your system's behavior with a fully
cached query.  It is to be expected that you will end up with 1.0 or
something very close to it.  The only way to avoid that is to use a
query that is large enough to blow out your kernel's RAM cache; which of
course will take long enough that iterating step 10 will be no fun,
and people will be mighty tempted to take shortcuts.

2. Of course, you are computing a RANDOM_PAGE_COST that is relevant to
just this single query.  Prudence would suggest repeating the process
with several different queries and taking some sort of average.

When I did the experiments that led up to choosing 4.0 as the default,
some years ago, it took several days of thrashing the disks on a couple
of different machines before I had numbers that I didn't think were
mostly noise :-(.  I am *real* suspicious of any replacement numbers
that have been derived in just a few minutes.

            regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: Steve Atkins
Дата:
Сообщение: Re: Slow join using network address function
Следующее
От: "Ed L."
Дата:
Сообщение: Re: [PERFORMANCE] slow small delete on large table