Обсуждение: or kills performance

Поиск
Список
Период
Сортировка

or kills performance

От
"Sim Zacks"
Дата:
Does it make sense for a simple or in a where clause to kill performance?
The statement with the OR takes 2500 ms and the statement without the OR
takes 190 ms:

select c.partid,c.pnid,c.leadstateid,e.stock from assembliesbatch a join
partsassembly b on a.assemblyid=b.assemblyid
join manufacturerpartpn c on c.partid=b.partid join assemblies d on
d.assemblyid=a.assemblyid
join stock e on e.pnid=c.pnid
where a.batchid=382 and e.stock>0 and e.ownerid=1 and
--THIS LINE CAUSING PROBLEM
((d.leadfree and leadstateid in (1,3)) or (not d.leadfree and leadstateid in
(2,3,4)))
order by partid,leadstateid

In this particular query, the first condition does not return any rows. When
I run it as
    and not d.leadfree and leadstateid in (2,3,4)
then it takes only 190 ms and when I use only the first clause it takes less
then that and returns no rows.

Following are the 2 explain analyze results. (With Or first)
"Sort  (cost=253.83..253.83 rows=1 width=16) (actual time=8368.145..8368.352
rows=62 loops=1)"
"  Sort Key: c.partid, c.leadstateid"
"  ->  Nested Loop  (cost=9.61..253.82 rows=1 width=16) (actual
time=14.358..8367.822 rows=62 loops=1)"
"        ->  Nested Loop  (cost=9.61..243.77 rows=1 width=24) (actual
time=14.295..8353.241 rows=699 loops=1)"
"              ->  Nested Loop  (cost=9.61..94.60 rows=16 width=20) (actual
time=0.100..3605.150 rows=291711 loops=1)"
"                    Join Filter: (((NOT "inner".leadfree) AND
(("outer".leadstateid = 2) OR ("outer".leadstateid = 3) OR
("outer".leadstateid = 4))) OR ("inner".leadfree AND (("outer".leadstateid =
1) OR ("outer".leadstateid = 3))))"
"                    ->  Nested Loop  (cost=0.00..44.14 rows=3 width=16)
(actual time=0.070..27.924 rows=793 loops=1)"
"                          ->  Index Scan using ownerids on stock e
(cost=0.00..26.13 rows=3 width=8) (actual time=0.039..4.433 rows=793
loops=1)"
"                                Index Cond: (ownerid = 1)"
"                                Filter: (stock > 0)"
"                          ->  Index Scan using manufacturerpartpn_pkey on
manufacturerpartpn c  (cost=0.00..5.99 rows=1 width=12) (actual
time=0.010..0.015 rows=1 loops=793)"
"                                Index Cond: ("outer".pnid = c.pnid)"
"                    ->  Materialize  (cost=9.61..13.80 rows=419 width=5)
(actual time=0.003..1.444 rows=419 loops=793)"
"                          ->  Seq Scan on assemblies d  (cost=0.00..9.19
rows=419 width=5) (actual time=0.008..1.931 rows=419 loops=1)"
"              ->  Index Scan using ix_080c8ff0_5017_42a2_a174_28095b85106e_
on assembliesbatch a  (cost=0.00..9.31 rows=1 width=4) (actual
time=0.009..0.009 rows=0 loops=291711)"
"                    Index Cond: ("outer".assemblyid = a.assemblyid)"
"                    Filter: (batchid = 382)"
"        ->  Index Scan using idx_u_assidpartid on partsassembly b
(cost=0.00..10.02 rows=2 width=8) (actual time=0.012..0.012 rows=0
loops=699)"
"              Index Cond: (("outer".partid = b.partid) AND (b.assemblyid =
"outer".assemblyid))"
"Total runtime: 8368.708 ms"

(Without OR)
"Sort  (cost=1251.95..1251.95 rows=1 width=16) (actual time=634.110..634.333
rows=62 loops=1)"
"  Sort Key: c.partid, c.leadstateid"
"  ->  Nested Loop  (cost=9.71..1251.94 rows=1 width=16) (actual
time=3.455..633.817 rows=62 loops=1)"
"        ->  Hash Join  (cost=9.71..1168.03 rows=9 width=24) (actual
time=3.428..370.329 rows=16405 loops=1)"
"              Hash Cond: ("outer".assemblyid = "inner".assemblyid)"
"              ->  Nested Loop  (cost=0.00..1154.62 rows=368 width=20)
(actual time=0.080..235.833 rows=16472 loops=1)"
"                    ->  Nested Loop  (cost=0.00..44.16 rows=1 width=16)
(actual time=0.053..25.756 rows=699 loops=1)"
"                          ->  Index Scan using ownerids on stock e
(cost=0.00..26.13 rows=3 width=8) (actual time=0.023..4.123 rows=793
loops=1)"
"                                Index Cond: (ownerid = 1)"
"                                Filter: (stock > 0)"
"                          ->  Index Scan using manufacturerpartpn_pkey on
manufacturerpartpn c  (cost=0.00..6.00 rows=1 width=12) (actual
time=0.010..0.013 rows=1 loops=793)"
"                                Index Cond: ("outer".pnid = c.pnid)"
"                                Filter: ((leadstateid = 2) OR (leadstateid
= 3) OR (leadstateid = 4))"
"                    ->  Index Scan using partidpa on partsassembly b
(cost=0.00..1105.87 rows=368 width=8) (actual time=0.011..0.115 rows=24
loops=699)"
"                          Index Cond: ("outer".partid = b.partid)"
"              ->  Hash  (cost=9.19..9.19 rows=210 width=4) (actual
time=3.324..3.324 rows=0 loops=1)"
"                    ->  Seq Scan on assemblies d  (cost=0.00..9.19 rows=210
width=4) (actual time=0.014..1.735 rows=417 loops=1)"
"                          Filter: (NOT leadfree)"
"        ->  Index Scan using ix_080c8ff0_5017_42a2_a174_28095b85106e_ on
assembliesbatch a  (cost=0.00..9.31 rows=1 width=4) (actual
time=0.009..0.009 rows=0 loops=16405)"
"              Index Cond: ("outer".assemblyid = a.assemblyid)"
"              Filter: (batchid = 382)"
"Total runtime: 634.672 ms"

Thank You
Sim



Re: or kills performance

От
Tom Lane
Дата:
"Sim Zacks" <sim@compulab.co.il> writes:
> Does it make sense for a simple or in a where clause to kill performance?

Did you get the parenthesization correct?  Remember that AND binds more
tightly than OR.

            regards, tom lane

Re: or kills performance

От
Sim Zacks
Дата:
My goal is a specific batchid, stock>0, ownerid=1 and either leadfree with a state of 1 or 3 OR not leadfree with a
stateof 2,3 or 4 

The parenthesis are correct:
where a.batchid=382 and e.stock>0 and e.ownerid=1 and
(
    (d.leadfree and leadstateid in (1,3) )
     or
    (not d.leadfree and leadstateid in (2,3,4) )
)
order by partid,leadstateid

I checked my indices and every field in both the where clause and the join is already indexed.

I tried unioning the 2 queries and it was much faster then with the OR statement. (Took 200 ms as opposed to 2000 ms).
Theunion will work, but it seems like overkill for a simple or clause. Is this the recommended way to do it? 

select c.partid,c.pnid,c.leadstateid,e.stock from
assemblies d join assembliesbatch a on d.assemblyid=a.assemblyid
join partsassembly b on b.assemblyid=d.assemblyid
join manufacturerpartpn c on c.partid=b.partid
join stock e on e.pnid=c.pnid
where a.batchid=382 and e.stock>0 and e.ownerid=1 and
not d.leadfree and leadstateid in (2,3,4)
union
select c.partid,c.pnid,c.leadstateid,e.stock from
assemblies d join assembliesbatch a on d.assemblyid=a.assemblyid
join partsassembly b on b.assemblyid=d.assemblyid
join manufacturerpartpn c on c.partid=b.partid
join stock e on e.pnid=c.pnid
where a.batchid=382 and e.stock>0 and e.ownerid=1 and
d.leadfree and leadstateid in (1,3)
order by partid,leadstateid


________________________________________________________________________________

"Sim Zacks" <sim@compulab.co.il> writes:
> Does it make sense for a simple or in a where clause to kill performance?

Did you get the parenthesization correct?  Remember that AND binds more
tightly than OR.

                        regards, tom lane


Re: or kills performance

От
Tom Lane
Дата:
Sim Zacks <sim@compulab.co.il> writes:
> The parenthesis are correct:
> where a.batchid=382 and e.stock>0 and e.ownerid=1 and
> (
>     (d.leadfree and leadstateid in (1,3) )
>      or
>     (not d.leadfree and leadstateid in (2,3,4) )
> )

[ goes back and looks more closely ]  The row count estimates in your
EXPLAINs are so far off that it's a wonder you got an OK plan for either
query.  Have you ANALYZEd these tables recently?

The direct source of the problem seems to be that leadfree and
leadstateid come from different tables, so you're taking what had been
independent filter conditions for the two tables and converting them into
a join condition that can't be applied until after the join.  However it
doesn't look to me like that is really a huge deal, because apparently
these conditions are pretty unselective and you'd be reading most of
each table anyway.  What is really causing the problem is the choice to
join partsassembly last in the slow query; in the faster query, that's
joined before joining to assemblies and assembliesbatch, and apparently
that reduces the number of joins to assembliesbatch very considerably.

With the rowcount estimates so far off, though, it's really just luck
that the planner makes a good join order choice in either case.  And it
doesn't look like the conditions are too hard for the planner to figure
out ... I think it must be working with obsolete statistics.

            regards, tom lane

Re: or kills performance

От
Sim Zacks
Дата:
I am using autovacuum, and I assumed it was working correctly. Is there a way to tell if it is not doing what it is
supposedto? 
This is the result of ps for the autovacuum PID.
The docs say that it should automatically analyze.
  PID TTY      STAT   TIME COMMAND
10184 ?        Ss     0:32 /usr/bin/pg_autovacuum -D -v 1000 -V 2 -s 300 -S 2 -L
/var/lib/postgresql/data/pg_autovacuum.log

My log files show nothing other then an error because I renamed a (different) database and it received a fatal error
fora month, but that stopped last week (before I sent this message) when I rebooted the server There has been nothing
inthe log file since that. 

This is running on gentoo and the pg_autovacuum is the default settings that starts automatically when the server
boots.
Is there a way to check when the last analyze was done? I don't want to run analyze now until I can check that to see
wherethe problem is coming from. 

Thank you
Sim

________________________________________________________________________________

Sim Zacks <sim@compulab.co.il> writes:

> [ goes back and looks more closely ]  The row count estimates in your
> EXPLAINs are so far off that it's a wonder you got an OK plan for either
> query.  Have you ANALYZEd these tables recently?
>
>
>
>                         regards, tom lane