Re: Query using SeqScan instead of IndexScan
От | Brendan Duddridge |
---|---|
Тема | Re: Query using SeqScan instead of IndexScan |
Дата | |
Msg-id | A4EB422F-8A4E-44E8-BB22-E18C5EA7A7A0@clickspace.com обсуждение исходный текст |
Ответ на | Re: Query using SeqScan instead of IndexScan ("chris smith" <dmagick@gmail.com>) |
Список | pgsql-performance |
Ah I see. Ok, well we have a very wide variety here... category_id | count -------------+------- 1000521 | 31145 1001211 | 22991 1001490 | 22019 1001628 | 12472 1000046 | 10480 1000087 | 10338 1001223 | 10020 1001560 | 9532 1000954 | 8633 1001314 | 8191 1001482 | 8140 1001556 | 7959 1001481 | 7850 [snip...] 1001133 | 1 1000532 | 1 1000691 | 1 1000817 | 1 1000783 | 1 1000689 | 1 (1157 rows) So what's the best kind of query to handle this kind of data to make it fast in all cases? I'd like get down to sub-second response times. currently we have: select cp.product_id from category_product cp, product_attribute_value pav where cp.category_id = 1001082 and cp.product_id = pav.product_id; Thanks, ____________________________________________________________________ Brendan Duddridge | CTO | 403-277-5591 x24 | brendan@clickspace.com ClickSpace Interactive Inc. Suite L100, 239 - 10th Ave. SE Calgary, AB T2G 0V9 http://www.clickspace.com On Mar 31, 2006, at 6:23 PM, chris smith wrote: > On 4/1/06, Brendan Duddridge <brendan@clickspace.com> wrote: >> Hi Jim, >> >> I'm not quite sure what you mean by the correlation of category_id? > > It means how many distinct values does it have (at least that's my > understanding of it ;) ). > > select category_id, count(*) from category_product group by > category_id; > > will show you how many category_id's there are and how many products > are in each category. > > Having a lot of products in one category (or having a small amount of > categories) can slow things down because the db can't use the index > effectively.. which might be what you're seeing (hence why it's fast > for some categories, slow for others). > > >> On Mar 31, 2006, at 8:59 AM, Jim C. Nasby wrote: >> >>> What's the correlation of category_id? The current index scan cost >>> estimator places a heavy penalty on anything with a correlation much >>> below about 90%. >>> >>> On Wed, Mar 29, 2006 at 08:12:28PM -0700, Brendan Duddridge wrote: >>>> Hi, >>>> >>>> I have a query that is using a sequential scan instead of an index >>>> scan. I've turned off sequential scans and it is in fact faster >>>> with >>>> the index scan. >>>> >>>> Here's my before and after. >>>> >>>> Before: >>>> >>>> ssdev=# SET enable_seqscan TO DEFAULT; >>>> ssdev=# explain analyze select cp.product_id >>>> from category_product cp, product_attribute_value pav >>>> where cp.category_id = 1001082 and cp.product_id = >>>> pav.product_id; >>>> >>>> >>>> QUERY PLAN >>>> ------------------------------------------------------------------- >>>> -- >>>> --- >>>> ------------------------------------------------------------------- >>>> -- >>>> --- >>>> ------------------------------ >>>> Hash Join (cost=25.52..52140.59 rows=5139 width=4) (actual >>>> time=4.521..2580.520 rows=19695 loops=1) >>>> Hash Cond: ("outer".product_id = "inner".product_id) >>>> -> Seq Scan on product_attribute_value pav >>>> (cost=0.00..40127.12 >>>> rows=2387312 width=4) (actual time=0.039..1469.295 rows=2385846 >>>> loops=1) >>>> -> Hash (cost=23.10..23.10 rows=970 width=4) (actual >>>> time=2.267..2.267 rows=1140 loops=1) >>>> -> Index Scan using >>>> x_category_product__category_id_fk_idx >>>> on category_product cp (cost=0.00..23.10 rows=970 width=4) (actual >>>> time=0.122..1.395 rows=1140 loops=1) >>>> Index Cond: (category_id = 1001082) >>>> Total runtime: 2584.221 ms >>>> (7 rows) >>>> >>>> >>>> After: >>>> >>>> ssdev=# SET enable_seqscan TO false; >>>> ssdev=# explain analyze select cp.product_id >>>> from category_product cp, product_attribute_value pav >>>> where cp.category_id = 1001082 and cp.product_id = >>>> pav.product_id; >>>> >>>> >>>> QUERY PLAN >>>> ------------------------------------------------------------------- >>>> -- >>>> --- >>>> ------------------------------------------------------------------- >>>> -- >>>> --- >>>> ------------------------------------- >>>> Nested Loop (cost=0.00..157425.22 rows=5139 width=4) (actual >>>> time=0.373..71.177 rows=19695 loops=1) >>>> -> Index Scan using x_category_product__category_id_fk_idx on >>>> category_product cp (cost=0.00..23.10 rows=970 width=4) (actual >>>> time=0.129..1.438 rows=1140 loops=1) >>>> Index Cond: (category_id = 1001082) >>>> -> Index Scan using product_attribute_value__product_id_fk_idx >>>> on product_attribute_value pav (cost=0.00..161.51 rows=61 width=4) >>>> (actual time=0.016..0.053 rows=17 loops=1140) >>>> Index Cond: ("outer".product_id = pav.product_id) >>>> Total runtime: 74.747 ms >>>> (6 rows) >>>> >>>> There's quite a big difference in speed there. 2584.221 ms vs. >>>> 74.747 >>>> ms. >>>> >>>> Any ideas what I can do to improve this without turning sequential >>>> scanning off? >>>> >>>> Thanks, >>>> >>>> ___________________________________________________________________ >>>> _ >>>> Brendan Duddridge | CTO | 403-277-5591 x24 | >>>> brendan@clickspace.com >>>> >>>> ClickSpace Interactive Inc. >>>> Suite L100, 239 - 10th Ave. SE >>>> Calgary, AB T2G 0V9 >>>> >>>> http://www.clickspace.com >>>> >>> >>> >>> >>> -- >>> Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com >>> Pervasive Software http://pervasive.com work: 512-231-6117 >>> vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 >>> >>> ---------------------------(end of >>> broadcast)--------------------------- >>> TIP 1: if posting/reading through Usenet, please send an appropriate >>> subscribe-nomail command to majordomo@postgresql.org so that >>> your >>> message can get through to the mailing list cleanly >>> >> >> >> >> > > > -- > Postgresql & php tutorials > http://www.designmagick.com/ >
Вложения
В списке pgsql-performance по дате отправления: