Re: Sort and index

Поиск

Список

Период

Сортировка

От	Jim C. Nasby
Тема	Re: Sort and index
Дата	24 апреля 2005 г. 19:01:59
Msg-id	20050424220146.GN58835@decibel.org обсуждение исходный текст
Ответ на	Re: Sort and index (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Sort and index
Список	pgsql-performance

Дерево обсуждения

On Sat, Apr 23, 2005 at 01:00:40AM -0400, Tom Lane wrote:
> "Jim C. Nasby" <decibel@decibel.org> writes:
> >> Feel free to propose better cost equations.
>
> > Where would I look in code to see what's used now?
>
> All the gold is hidden in src/backend/optimizer/path/costsize.c.
>
>             regards, tom lane

After setting up a second test that orders the table by a highly
non-correlated column, I think I've found part of the problem. The
estimated index scan cost for (project_id, id, date) is
0.00..100117429.34 while the estimate for work_units is
0.00..103168408.62; almost no difference, even though project_id
correlation is .657 while work_units correlation is .116. This is with
random_page_cost set to 1.1; if I set it much higher I can't force the
index scan (BTW, would it make more sense to set the cost of a disable
seqscan to either pages or tuples * disable_cost?), but even with only a
10% overhead on random page fetches it seems logical that the two
estimates should be much farther apart. If you look at the results of
the initial run (http://stats.distributed.net/~decibel/timing.log),
you'll see that the cost of the index scan is way overestimated. Looking
at the code, the runcost is calculated as

    run_cost += max_IO_cost + csquared * (min_IO_cost - max_IO_cost);

where csquared is indexCorrelation^2. Why is indexCorrelation squared?
The comments say a linear interpolation between min_IO and max_IO is
used, but ISTM that if it was linear then instead of csquared,
indexCorrelation would just be used.

By the way, I'm running a test for ordering by work_units right now, and
I included code to allocate and zero 3.3G of memory (out of 4G) between
steps to clear the kernel buffers. This brought the seqscan times up to
~6800 seconds, so it seems there was in fact buffering going on in the
first test. The second test has been running an index scan for over 14
hours now, so clearly a seqscan+sort is the way to go for a highly
uncorrelated index (at least one that won't fit in
effective_cache_size).
--
Jim C. Nasby, Database Consultant               decibel@decibel.org
Give your computer some brain candy! www.distributed.net Team #1828

Windows: "Where do you want to go today?"
Linux: "Where do you want to go tomorrow?"
FreeBSD: "Are you guys coming, or what?"

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Sort and index