Обсуждение: Slow query with 'or' clause

Поиск
Список
Период
Сортировка

Slow query with 'or' clause

От
philippe
Дата:
Hi

I'm using posstgresql 8.1.4 on linux 2.6
shared_buffers = tested with 3000 and 10000
temp_buffers = 1000
work_mem = 4096
effective_cache_size = 65536
random_page_cost = 2

I have a query which I think is anormaly slow with à 'OR'


select count(*) from client_contact
left join client_company using(cli_id)
where (cli_mail = 'xxx@xxx.xx') OR
(lower(cli_nom) = 'xxxxxx' and zipcode = '10001');

if I split this query in 2 query like this

first
select count(*) from client_contact
left join client_company using(cli_id)
where (cli_mail = 'xxx@xxx.xx')

second
select count(*) from client_contact
left join client_company using(cli_id)
where (lower(cli_nom) = 'xxxxxx' and zipcode = '10001');

each query are under 100 ms

Why postgresql think scanning index on cli_nom and cli_mail is not a good thing
with the OR clause ?


I hope you can help me understanding the problem
regards,



explain analyse
select count(*) from client_contact
left join client_company using(cli_id)
where (cli_mail = 'xxx@xxx.xx') OR
(lower(cli_nom) = 'xxxxxx' and zipcode = '10001');

QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=37523.98..37523.99 rows=1 width=0) (actual
time=3871.086..3871.087 rows=1 loops=1)
   ->  Merge Left Join  (cost=0.00..36719.10 rows=321952 width=0) (actual
time=3871.058..3871.058 rows=0 loops=1)
         Merge Cond: ("outer".cli_id = "inner".cli_id)
         Filter: ((("outer".cli_mail)::text = 'xxx@xxx.xx'::text) OR
((lower(("outer".cli_nom)::text) = 'xxxxxx'::text) AND (("inner".zipcode)::text
= '10001'::text)))
         ->  Index Scan using client_pkey on client_contact
(cost=0.00..14801.29 rows=321952 width=38) (actual time=0.110..1130.134
rows=321152 loops=1)
         ->  Index Scan using client_company_cli_id_idx on client_company
(cost=0.00..13891.30 rows=321114 width=12) (actual time=0.097..1171.905
rows=321152 loops=1)
 Total runtime: 3871.443 ms

explain analyse
select count(*) from client_contact
left join client_company using(cli_id)
where (cli_mail = 'xxx@xxx.xx')

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2750.11..2750.12 rows=1 width=0) (actual time=23.930..23.932
rows=1 loops=1)
   ->  Nested Loop Left Join  (cost=0.00..2750.08 rows=11 width=0) (actual
time=23.918..23.918 rows=0 loops=1)
         ->  Index Scan using email_client on client_contact
(cost=0.00..2711.33 rows=11 width=4) (actual time=23.913..23.913 rows=0
loops=1)
               Index Cond: ((cli_mail)::text = 'xxx@xxx.xx'::text)
         ->  Index Scan using client_company_cli_id_idx on client_company
(cost=0.00..3.51 rows=1 width=4) (never executed)
               Index Cond: ("outer".cli_id = client_company.cli_id)
 Total runtime: 24.018 ms


explain analyse
select count(*) from client_contact
left join client_company using(cli_id)
where
(lower(cli_nom) = 'xxxxxx' and zipcode = '10001');

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=107.18..107.19 rows=1 width=0) (actual time=84.935..84.936
rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..107.17 rows=1 width=0) (actual
time=84.928..84.928 rows=0 loops=1)
         ->  Index Scan using client_contact_cli_nom_idx on client_contact
(cost=0.00..40.19 rows=19 width=4) (actual time=84.832..84.835 rows=1 loops=1)
               Index Cond: (lower((cli_nom)::text) = 'xxxxxx'::text)
         ->  Index Scan using client_company_cli_id_idx on client_company
(cost=0.00..3.51 rows=1 width=4) (actual time=0.083..0.083 rows=0 loops=1)
               Index Cond: ("outer".cli_id = client_company.cli_id)
               Filter: ((zipcode)::text = '10001'::text)
 Total runtime: 85.013 ms

Re: Slow query with 'or' clause

От
Richard Huxton
Дата:
philippe wrote:
> explain analyse
> select count(*) from client_contact
> left join client_company using(cli_id)
> where (cli_mail = 'xxx@xxx.xx') OR
> (lower(cli_nom) = 'xxxxxx' and zipcode = '10001');
>
> QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>  Aggregate  (cost=37523.98..37523.99 rows=1 width=0) (actual
> time=3871.086..3871.087 rows=1 loops=1)
>    ->  Merge Left Join  (cost=0.00..36719.10 rows=321952 width=0) (actual
> time=3871.058..3871.058 rows=0 loops=1)

This is the root of the problem - it's expecting to match over 320000
rows rather than 0.

I'm guessing there's a lot of correlation between cli_mail and cli_nom
(you're expecting them to match the same clients) but the planner
doesn't know this.

If this is a common query, you could try an index on zipcode - that
might cut down the other side.

However, I have to ask why you're using a left-join? Do you really have
rows in client_contact without a matching cli_id in client_company?

--
   Richard Huxton
   Archonet Ltd

Re: Slow query with 'or' clause

От
philippe
Дата:
Selon Richard Huxton <dev@archonet.com>:

> philippe wrote:
> > explain analyse
> > select count(*) from client_contact
> > left join client_company using(cli_id)
> > where (cli_mail = 'xxx@xxx.xx') OR
> > (lower(cli_nom) = 'xxxxxx' and zipcode = '10001');
> >
> > QUERY PLAN
> >
>

-------------------------------------------------------------------------------------------------------------------------------------------------------------------------
> >  Aggregate  (cost=37523.98..37523.99 rows=1 width=0) (actual
> > time=3871.086..3871.087 rows=1 loops=1)
> >    ->  Merge Left Join  (cost=0.00..36719.10 rows=321952 width=0) (actual
> > time=3871.058..3871.058 rows=0 loops=1)
>
> This is the root of the problem - it's expecting to match over 320000
> rows rather than 0.
>
> I'm guessing there's a lot of correlation between cli_mail and cli_nom
> (you're expecting them to match the same clients) but the planner
> doesn't know this.
>
> If this is a common query, you could try an index on zipcode - that
> might cut down the other side.
>
> However, I have to ask why you're using a left-join? Do you really have
> rows in client_contact without a matching cli_id in client_company?
>

You are right, I was focused on server perf and I should have analysed my query.

Query time is ok now.

thanks you !!



> --
>    Richard Huxton
>    Archonet Ltd
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: explain analyze is your friend
>