Re: Planner enhancement suggestion.

Поиск
Список
Период
Сортировка
От Jim C. Nasby
Тема Re: Planner enhancement suggestion.
Дата
Msg-id 20060307183715.GU82989@pervasive.com
обсуждение исходный текст
Ответ на Re: Planner enhancement suggestion.  (PFC <lists@peufeu.com>)
Список pgsql-performance
On Tue, Mar 07, 2006 at 07:09:15PM +0100, PFC wrote:
>
> >The problem is that you're now talking about doing 2 index scans instead
> >of just one and a sort.
>
>     It depends on what you call an index scan :
>     a- Scanning just the index (no heap lookup) to create a bitmap

Sure, and then you scan the other index and read the heap at the same
time (b). Your plan requires doing both. The question is: in what cases
will it be faster to scan the extra index and build the bitmap vs. just
doing a sort.

>     b- Scanning the index and hitting the heap in index order to
>     retrieve the  rows
>
>     (a) should be quite fast, because indexes generally use less space
>     than  the main table, and have good locality of reference. (b) is OK if the
> table fits in memory, but if it has to seek on every row from the heap...

If the table fits in memory, who cares? A sort should be damn fast at
that point, because you're dealing with a small set of data.

>     So, when doing :
>     SELECT * FROM products WHERE category=C ORDER BY price LIMIT 20;
>
>     If the category contains few products, using the index on category
>     then  sorting is good.
>     However, if the category contains many items, postgres is likely to
>     use  the index on price to avoid the sort. It needs to lose time fetching

Have you actually seen this behavior? My experience is that you have to
have a correlation somewhere over 80-90% before an index scan is favored
over a seqscan + sort (which as I mentioned before appears to be
broken).

>     The bitmap trick I proposed in my previous post would be even more
> interesting if the table is clustered on category (which seems a
> reasonable thing to do).

In which case it's highly unlikely that using the price index will buy
you anything.

>     Does the cost estimator know about this kind of correlation ?

Yes. The problem is that the index scan cost estimator figures out a
best and worst case cost, and then interpolates between the two using
correlation^2. IMO it should be using abs(correlation) to do this, and
there's some data at http://stats.distributed.net/~decibel/ that backs
this up. There's also been some discussions on -hackers (search the
archives for "index cost correlation nasby"), but I've not had time to
follow up on this. If you wanted to test a new index cost formula it
would be a one line change to the code.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

В списке pgsql-performance по дате отправления:

Предыдущее
От: PFC
Дата:
Сообщение: Re: Planner enhancement suggestion.
Следующее
От: "Jim C. Nasby"
Дата:
Сообщение: Re: t1000/t2000 sun-servers