Обсуждение: Index Scan Backward vs. Sort/Sequential Scan when using ORDER BY

Поиск
Список
Период
Сортировка

Index Scan Backward vs. Sort/Sequential Scan when using ORDER BY

От
Keith Bussey
Дата:
Greetings,

I have stumbled upon a confusing aspect of PostgreSQL queries involving ORDER 
BY.

In trying to figure out just why my ORDER BY queries were so slow, I came 
across something interesting.

First, let me give you 2 very similar queries:

1) SELECT p.uid    FROM client_profiles p    INNER JOIN client_profiles_2 c    USING(uid)    WHERE
(p.profiles_gender='M')   AND (p.profiles_orientation[2] = 'F' OR p.profiles_orientation[1]='M')       ORDER BY
c.profiles_2_last_updateDESC   LIMIT 5;
 

2) SELECT p.uid    FROM client_profiles p    INNER JOIN client_profiles_2 c    USING(uid)    WHERE
(p.profiles_gender='F')   AND (p.profiles_orientation[2] = 'F' OR p.profiles_orientation[1]='M')       ORDER BY
c.profiles_2_last_updateDESC   LIMIT 5;
 

The only difference is in #1, p.profiles_gender='M' while in #2 
p.profiles_gender='F'.

SELECT count(uid) 
FROM client_profiles 
WHERE profiles_gender='M';
----------------------
408526

SELECT count(uid) 
FROM client_profiles 
WHERE profiles_gender='F';
----------------------
54713

Here are the EXPLAINs:

1) EXPLAIN #1:
Limit  (cost=0.00..1763.83 rows=5 width=24) ->  Nested Loop  (cost=0.00..2203068.58 rows=6245 width=24)       ->  Index
ScanBackward using index_client_profiles_2_last_up on 
 
client_profiles_2 c  (cost=0.00..239553.52 rows=394263 width=16)       ->  Index Scan using client_profiles_pkey on
client_profilesp  
 
(cost=0.00..4.97 rows=1 width=8)

2) EXPLAIN #2:
Limit  (cost=36046.44..36046.44 rows=5 width=24) ->  Sort  (cost=36046.44..36046.44 rows=160 width=24)       ->  Nested
Loop (cost=0.00..36040.58 rows=160 width=24)             ->  Index Scan using index_client_profiles_gender on 
 
client_profiles p  (cost=0.00..35064.98 rows=198 width=8)             ->  Index Scan using client_profiles_2_pkey on 
client_profiles_2 c  (cost=0.00..4.91 rows=1 width=16)   
Now the only reason I can see to explain this is because there are many more 
p.profiles_gender='M' than p.profiles_gender='F', Postgres knows its faster 
to do a Index Scan Backward than a Sort/Sequential Scan (and trust me, it 
is!). However it thinks the opposite when I am searching for 
p.profiles_gender='F' and form my results, that just isn't true.

Does anyone have any insight as to how I can 'force' postgres to use Index 
Backward Scan for #2???

Or, perhaps another method of making my ORDER BY faster ??

Your help would be greatly appreciated, Thanks

-- 
Keith Bussey
kbussey@wisol.com
Programmer - WISOL.com
(514) 398-9994 ext. 225


Re: Index Scan Backward vs. Sort/Sequential Scan when using ORDER BY

От
Tom Lane
Дата:
Keith Bussey <kbussey@wisol.com> writes:
> In trying to figure out just why my ORDER BY queries were so slow, I came 
> across something interesting.

The issue here seems to be that Postgres is drastically underestimating
the number of rows that will come out of the indexscan in the second
case:

>               ->  Index Scan using index_client_profiles_gender on 
> client_profiles p  (cost=0.00..35064.98 rows=198 width=8)

198 rows out when you have 54713 females seems a tad low; if it is
indeed much too low, that would explain why the planner mistakenly
prefers this plan.

It'd be interesting to look at the EXPLAIN estimate and actual results for

SELECT count(*) FROM client_profiles p    WHERE (p.profiles_gender='F');

SELECT count(*) FROM client_profiles p    WHERE (p.profiles_gender='F')    AND (p.profiles_orientation[2] = 'F' OR
p.profiles_orientation[1]='M');

I suspect the main problem may be lack of stats about the array element
distributions.  Does profiles_orientation really need to be an array,
or could you break it out into separate fields?
        regards, tom lane