Re: Adding additional index causes 20,000x slowdown for certain select queries - postgres 9.0.3

Поиск
Список
Период
Сортировка
От Timothy Garnett
Тема Re: Adding additional index causes 20,000x slowdown for certain select queries - postgres 9.0.3
Дата
Msg-id AANLkTinBzY5U69UkfTV6MnkFHVbeUp=iccYEVYg8hOJ0@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Adding additional index causes 20,000x slowdown for certain select queries - postgres 9.0.3  (Shaun Thomas <sthomas@peak6.com>)
Ответы Re: Adding additional index causes 20,000x slowdown for certain select queries - postgres 9.0.3  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
Список pgsql-performance
Hi all,

The bl_number is nearly a unique value per a row (some small portion are duplicated on a handful or rows).

We need the unique on pair of bl_number and month, but evaluating current usage we don't make use of selecting on just month currently (though we expect to have usage scenarios that do that in the not too distant future, i.e. pulling out all the records that match a given month date).  But for the time being we've gone with the suggestion here of flipping the order of the index columns to (bl_number, month) which rescues the original performance (since the new index can no longer be used with the query).

We'd still be interested in other suggestions for convincing the query planner not to pick the bad plan in this case (since we'll eventually need an index on month) without having to use the slower CTE form.  To me the problem seems two fold,
 (1) planner doesn't know there's a correlation between month and particular buyer_ids (some are randomly distributed across month)
 (2) even in cases where there isn't a correlation (not all of our buyer id's are correlated with month) it still seems really surprising to me the planner thought this plan would be faster, the estimated selectivity of the buyer fields is 48k / 45million ~ 1/1000 so for limit 100 it should expect to backward index scan ~100K rows, vs. looking up the expected 48k rows and doing a top-100 sort on them, I'd expect the latter plan to be faster in almost all situations (unless we're clustered on month perhaps, but we're actually clustered on supplier_id, buyer_id which would favor the latter plan as well I'd think).

(an aside) there's also likely some benefit from clustering in the original plan before the new index, since we cluster on supplier_id, buyer_id and a given buyer_id while having up to 100k rows will generally only have a few supplier ids

Tim

On Wed, Mar 16, 2011 at 1:05 PM, Shaun Thomas <sthomas@peak6.com> wrote:
On 03/15/2011 01:23 PM, Timothy Garnett wrote:

         Column          |          Type
--------------------------+------------------------+
 id                       | integer                |

 bl_number                | character varying(16)  |
 month                    | date                   |
 buyer_id                 | integer                |
 supplier_id              | integer                |

Ok. In your table description, you don't really talk about the distribution of bl_number. But this part of your query:


ORDER BY month DESC LIMIT 100 OFFSET 0

Is probably tricking the planner into using that index. But there's the fun thing about dates: we almost always want them in order of most recent to least recent. So you might want to try again with your index_customs_records_on_month_and_bl_number declared like this instead:

CREATE INDEX index_customs_records_on_month_and_bl_number
   ON customs_records (month DESC, bl_number);

Or, if bl_number is more selective anyway, but you need both columns for other queries and you want this one to ignore it:

CREATE INDEX index_customs_records_on_month_and_bl_number
   ON customs_records (bl_number, month DESC);

Either way, I bet you'll find that your other queries that use this index are also doing a backwards index scan, which will always be slower by about two orders of magnitude, since backwards reads act basically like random reads.

The effect you're getting is clearly exaggerated, and I've run into it on occasion for effectively the entire history of PostgreSQL. Normally increasing the statistics on the affected columns and re-analyzing fixes it, but on a composite index, that won't necessarily be the case.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604
312-676-8870
sthomas@peak6.com

______________________________________________

See  http://www.peak6.com/email_disclaimer.php
for terms and conditions related to this email

В списке pgsql-performance по дате отправления:

Предыдущее
От: Craig James
Дата:
Сообщение: Re: Xeon twice the performance of opteron
Следующее
От: "Kevin Grittner"
Дата:
Сообщение: Re: Adding additional index causes 20,000x slowdown for certain select queries - postgres 9.0.3