Re: index problems (again)

Поиск
Список
Период
Сортировка
От Geoff Winkless
Тема Re: index problems (again)
Дата
Msg-id CAEzk6fdjPxaVNXXH0TiqPacJDOvt7z-Fma=iXtKOrpYY5CivZg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: index problems (again)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: index problems (again)  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: index problems (again)  (Jeff Janes <jeff.janes@gmail.com>)
Re: index problems (again)  ("Peter J. Holzer" <hjp-pgsql@hjp.at>)
Список pgsql-general
On 7 March 2016 at 16:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> In English, what that plan is trying to do is scan the index
> in sc_id order until it hits a row with scdate in the target range.
> The first such row, by definition, has the correct min(sc_id) value.
> The problem is that we're guessing at how soon we'll hit such a row.
> If the columns are independent, then the planner can guess based on how
> many rows in the whole table have scdate in the target range, and it
> will probably be about right.  But that estimate can fall down very
> badly if sc_id and scdate increase together, because then the target
> rows aren't randomly distributed in the index sequence but could all be
> all the way at the far end of the index.

I'm sorry, I'm obviously not being clear. I already accepted this
argument when Victor gave it, although I believe that in part it falls
down because sc_id is also (potentially) randomly distributed so it's
not like you're doing a sequential table scan (it might work better on
a clustered table, but we don't have those :) )

So you still have an extra layer of indirection into a large table
with lots of random accesses.

> If we had cross-column correlation stats we could detect this pitfall,
> but without that it's hard to do.

But as far as I can see, apart from the absolute extremes, the
index-only scan is _always_ going to be quicker than the index+table
scan. It doesn't matter whether or not the distribution is random or
skewed, the index-only scan is going to be better (or approximately
equally as good). We can see that by the massive speedup I get by
using index(scid,scdate), which in all other respects is going to
suffer from exactly the same problem from that the scid-only index
suffers.

And the real advantage: at the extremes, the index-only worst-case is
minimally worse than the best case. Whereas the worst-case of the
index-scan-plus-table-compare method is horrific.

I don't believe you need any further statistics than what is currently
available to be able to make that judgement, and that's why I believe
it's suboptimal.

Geoff


В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian Klaver
Дата:
Сообщение: Re: Inserting JSON via Java PreparedStatment
Следующее
От: Tom Lane
Дата:
Сообщение: Re: index problems (again)