Re: index problems (again)

Поиск

Список

Период

Сортировка

От	Geoff Winkless
Тема	Re: index problems (again)
Дата	7 марта 2016 г. 19:38:00
Msg-id	CAEzk6fdjPxaVNXXH0TiqPacJDOvt7z-Fma=iXtKOrpYY5CivZg@mail.gmail.com обсуждение исходный текст
Ответ на	Re: index problems (again) (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: index problems (again) Re: index problems (again) Re: index problems (again)
Список	pgsql-general

Дерево обсуждения

On 7 March 2016 at 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In English, what that plan is trying to do is scan the index
> in sc_id order until it hits a row with scdate in the target range.
> The first such row, by definition, has the correct min(sc_id) value.
> The problem is that we're guessing at how soon we'll hit such a row.
> If the columns are independent, then the planner can guess based on how
> many rows in the whole table have scdate in the target range, and it
> will probably be about right.  But that estimate can fall down very
> badly if sc_id and scdate increase together, because then the target
> rows aren't randomly distributed in the index sequence but could all be
> all the way at the far end of the index.

I'm sorry, I'm obviously not being clear. I already accepted this
argument when Victor gave it, although I believe that in part it falls
down because sc_id is also (potentially) randomly distributed so it's
not like you're doing a sequential table scan (it might work better on
a clustered table, but we don't have those :) )

So you still have an extra layer of indirection into a large table
with lots of random accesses.

> If we had cross-column correlation stats we could detect this pitfall,
> but without that it's hard to do.

But as far as I can see, apart from the absolute extremes, the
index-only scan is _always_ going to be quicker than the index+table
scan. It doesn't matter whether or not the distribution is random or
skewed, the index-only scan is going to be better (or approximately
equally as good). We can see that by the massive speedup I get by
using index(scid,scdate), which in all other respects is going to
suffer from exactly the same problem from that the scid-only index
suffers.

And the real advantage: at the extremes, the index-only worst-case is
minimally worse than the best case. Whereas the worst-case of the
index-scan-plus-table-compare method is horrific.

I don't believe you need any further statistics than what is currently
available to be able to make that judgement, and that's why I believe
it's suboptimal.

Geoff

В списке pgsql-general по дате отправления:

Предыдущее

От: Adrian Klaver
Дата: 07 марта 2016 г., 19:35:43
Сообщение: Re: Inserting JSON via Java PreparedStatment

Следующее

От: Tom Lane
Дата: 07 марта 2016 г., 19:44:49
Сообщение: Re: index problems (again)

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: index problems (again)

Предыдущее

Следующее