Re: new correlation metric

Поиск

Список

Период

Сортировка

От	Jeff Davis
Тема	Re: new correlation metric
Дата	26 октября 2008 г. 17:08:39
Msg-id	1225051712.5795.1.camel@localhost.localdomain обсуждение исходный текст
Ответ на	Re: new correlation metric (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: new correlation metric Re: new correlation metric
Список	pgsql-hackers

Дерево обсуждения

On Sun, 2008-10-26 at 12:44 -0400, Tom Lane wrote:
> I wonder whether we ought to rethink the problem entirely.
[...]

What you say makes a lot of sense.

We would have to take a sample of index leaf pages, but I think we
could get a really useful number from it. For BTree, we can just read
the values in order, and maintain 3 counts:

* same_count: this TID points to the same page as the last one did
* near_count: points to a page that is close enough that readahead or
caching will help
* far_count: points to a page where we don't have a reason to think
readahead or caching will be a major factor

Then, we could use a formula like:

run_cost = min_IO_cost + (max_IO_cost - min_IO_cost)*(far_count/total_count)         + small_factor*(max_IO_cost -
min_IO_cost)*(near_count/total_count)

small_factor should be the one minus the percentage of the near_count
that we expect to find in cache (because it was recently read or due to
readahead).

I think from these numbers most of the concerns are taken into account.
Heikki's almost-in-order case is solved because we could recognize that
the pages are close enough to be in cache still.

Interleaving might still cause a lower estimate, but it seems like any
optimistic estimate we could make for that case is pretty risky. If the
caching effects or readahead aren't as helpful as we expected, it would
mean very bad performance.

Of course, I agree that we need some empiral testing.

> useful stats into someplace or other.  We might need to invent some
> other catalog besides pg_statistic if we want to represent per-index
> properties like correlation.  A minimal solution would be to add a
> correlation column to pg_class or pg_index, but that doesn't scale well
> if you imagine that different index AMs might want different sorts of
> stats.

Why can't we just use pg_statistic with the starelid set to the index
oid?

Regards,Jeff Davis

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: new correlation metric