Re: Non-unique index performance
От | Richard Huxton |
---|---|
Тема | Re: Non-unique index performance |
Дата | |
Msg-id | 42BBCFF2.3000200@archonet.com обсуждение исходный текст |
Ответ на | Non-unique index performance (Sezai YILMAZ <sezai.yilmaz@pro-g.com.tr>) |
Ответы |
Re: Non-unique index performance
(Sezai YILMAZ <sezai.yilmaz@pro-g.com.tr>)
|
Список | pgsql-general |
Sezai YILMAZ wrote: > Hello! > > I have a table eith name person as described below. It has an unique > index for id column (it is also primary key) and has an index for parent > column. > > If I run a query with where clause on id column it uses the index (look > at the first explain analyze result; it says "Index Scan using...") and > the query for 582856 rows table results in 225,893 ms. > > But, if I run another query with where clause on parent column it does > not use the index (look at the second explain analyze result; it says > "Seq Scan using...") and the query for 582856 rows table results in > 11192.913 ms. > > Why the difference of both queries is so dramatical for unique and > non-unique indexed columns? Why PostgreSQL does not use the non-unique > indexes (it says that it does sequential scan)? Because it thinks it will be faster/cheaper. > I have to use an index on non-unique column. What is the solution for > that? Is there a way to speed up non-unique indexes? You don't want to force it to use an index, you want it to make better estimates. Let's have a look... > ***************************************************************** > test=> \d person > Table "public.person" > Column | Type | Modifiers > ---------+-----------------------+----------- > name | character varying(30) | > surname | character varying(30) | > id | integer | not null > parent | integer | > Indexes: > "person_pkey" primary key, btree (id) > "parent_ndx" btree (parent) OK - all very simple. And you've said there are about 580,000 rows. > test=> explain analyze select id,name from person where id in ('17201', > '338191', '244319', '515209', '20415'); Why are you quoting integers? > test=> explain analyze select * from person where parent in ('17201', > '338191', '244319', '515209', '20415'); > QUERY PLAN > ----------------------------------------------------------------------------------------------------------------- > > Seq Scan on person (cost=0.00..35871.26 rows=14427 width=32) (actual > time=0.063..11192.809 rows=5 loops=1) > Filter: ((parent = 17201) OR (parent = 338191) OR (parent = 244319) OR > (parent = 515209) OR (parent = 20415)) > Total runtime: 11192.913 ms Hmm - for some reason it's expecting 14427 rows to be returned. If there were that many matches, then it might well be a better choice than going back and fore between the index and the table all the time. So - we need to find out why it thinks there will be so many rows returned. 1. VACUUM FULL ANALYSE person; 2. re-run the explain That will make sure that the table's statistics are up-to-date. If that hasn't helped, then perhaps we need to educate PG about the distribution of values in "parent". 2. ALTER TABLE person ALTER COLUMN parent SET STATISTICS 100; 3. ANALYSE person; 4. re-run the explain If that still doesn't work, keep increasing the statistics (max. value 1000). Let us know how that works for you. -- Richard Huxton Archonet Ltd
В списке pgsql-general по дате отправления: