Обсуждение: Re-ordering of OR conditions
IF I run the following with the a < 2900 condition first, the more expensive EXISTS only gets executed when needed, but if I change the order of the OR's, the EXISTS is always executed. It would be good if the optimizer could re-order the OR conditions based on estimated cost (granted, this wouldn't work very well if you've got functions in the OR, but it'd still be useful): select * from a where a < 2900 or exists (select * from b where b.a = a.a); Here's a full example. Note the loops count for the Subplan between both cases: decibel=# create table a as select * from generate_series(1,3000) a; SELECT decibel=# create table b as select a,b from a, generate_series(1,100) b where a > 10; SELECT decibel=# create index b__a on b(a); CREATE INDEX decibel=# explain analyze select * from a where a < 2900 or exists (select * from b where b.a = a.a); QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------ Seq Scan on a (cost=0.00..8080.41 rows=1997 width=4) (actual time=0.014..1.784 rows=3000 loops=1) Filter: ((a < 2900) OR (subplan)) SubPlan -> Index Scan using b__a on b (cost=0.00..4006.44rows=1495 width=8) (actual time=0.009..0.009 rows=1 loops=101) Index Cond: (a = $0) Total runtime: 2.151 ms (6 rows) decibel=# explain analyze select * from a where exists (select * from b where b.a = a.a) or a < 2000; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------- Seq Scan on a (cost=0.00..8080.41 rows=1997 width=4) (actual time=0.067..37.011 rows=3000 loops=1) Filter: ((subplan) OR (a < 2000)) SubPlan -> Index Scan using b__a on b (cost=0.00..4006.44rows=1495 width=8) (actual time=0.011..0.011 rows=1 loops=3000) Index Cond: (a = $0) Total runtime: 37.497 ms (6 rows) decibel=# (This is on HEAD as of a few minutes ago) -- Jim Nasby jim.nasby@enterprisedb.com EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
"Jim Nasby" <jim.nasby@enterprisedb.com> writes: > IF I run the following with the a < 2900 condition first, the more > expensive EXISTS only gets executed when needed, but if I change the > order of the OR's, the EXISTS is always executed. It would be good if > the optimizer could re-order the OR conditions based on estimated > cost (granted, this wouldn't work very well if you've got functions > in the OR, but it'd still be useful): I looked at this for a bit. It's in principle do-able but I'm not sure it's a good idea. The problem is that while AND'ed condition lists are usually fairly short and hence cheap to sort, OR'ed condition lists are not infrequently very long --- nobody blinks an eye at hundreds of items in an IN-list for instance. I'm afraid we'd waste a lot more cycles sorting than we could hope to regain. regards, tom lane
On Feb 9, 2007, at 10:46 AM, Tom Lane wrote: > "Jim Nasby" <jim.nasby@enterprisedb.com> writes: >> IF I run the following with the a < 2900 condition first, the more >> expensive EXISTS only gets executed when needed, but if I change the >> order of the OR's, the EXISTS is always executed. It would be good if >> the optimizer could re-order the OR conditions based on estimated >> cost (granted, this wouldn't work very well if you've got functions >> in the OR, but it'd still be useful): > > I looked at this for a bit. It's in principle do-able but I'm not > sure it's a good idea. The problem is that while AND'ed condition > lists are usually fairly short and hence cheap to sort, OR'ed > condition > lists are not infrequently very long --- nobody blinks an eye at > hundreds of items in an IN-list for instance. I'm afraid we'd waste > a lot more cycles sorting than we could hope to regain. Do people actually do that with OR lists though? My understanding is that now IN lists are converted to arrays, so I'd think that wouldn't be an issue there. Is it easy for the planner to discern between simple OR expressions and stuff like EXISTS? If so it might be worth automatically pushing EXISTS to the end... -- Jim Nasby jim.nasby@enterprisedb.com EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Fri, 2007-02-09 at 11:46 -0500, Tom Lane wrote: > "Jim Nasby" <jim.nasby@enterprisedb.com> writes: > > IF I run the following with the a < 2900 condition first, the more > > expensive EXISTS only gets executed when needed, but if I change the > > order of the OR's, the EXISTS is always executed. It would be good if > > the optimizer could re-order the OR conditions based on estimated > > cost (granted, this wouldn't work very well if you've got functions > > in the OR, but it'd still be useful): > > I looked at this for a bit. It's in principle do-able but I'm not > sure it's a good idea. The problem is that while AND'ed condition > lists are usually fairly short and hence cheap to sort, OR'ed condition > lists are not infrequently very long --- nobody blinks an eye at > hundreds of items in an IN-list for instance. I'm afraid we'd waste > a lot more cycles sorting than we could hope to regain. Seems like the planner could decide ahead of time whether sorting the conditions at execution time was likely to be effective or not. Perhaps limiting it to at most 5 conditions, where at least one of those was a function or a join condition? That would be a fairly cheap test at planning time, but potentially a good win at execution time. The OR'ed condition is common condition when the schema uses complex sub-classing. Now we have function costs it seems more likely this idea would get used in practice. Anyway, not necessarily for you to do, but sounds like a useful idea all the same. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com