Обсуждение: self join revisited
How hard would it be to teach planer to optimize self join? While this query which demonstrates it is not that common SELECT count(*) FROM big_table a INNER JOIN big_table b ON a.id = b.id; This type of query (self joining large table) is very common (at least in our environment because of heavy usage of views). It would be great if Postgres could rewrite this query SELECT bt1.id, bt1.total, sq.id, sq.total FROM big_table bt1 INNER JOIN small_table st1 on st1.big_id = bt1.id INNER JOIN ( SELECT bt2.id, st2.total FROM big_table bt2 INNER JOIN small_table st2 on st2.big_id = bt2.id WHERE st2.total > 100 ) sq ON sq.id = bt1.id WHERE st1.total<200 like this SELECT bt1.id, bt1.total, bt1.id, st2.total FROM big_table bt1 INNER JOIN small_table st1 on st1.big_id = bt1.id INNER JOIN small_table st2 on st2.big_id = bt1.id AND st2.total > 100 WHERE st1.total<200 Regards, Rikard
On Wed, 1 Apr 2009, Rikard Pavelic wrote: > It would be great if Postgres could rewrite this query > > SELECT bt1.id, bt1.total, sq.id, sq.total > FROM > big_table bt1 > INNER JOIN small_table st1 on st1.big_id = bt1.id > INNER JOIN > ( > SELECT bt2.id, st2.total > FROM > big_table bt2 > INNER JOIN small_table st2 on st2.big_id = bt2.id > WHERE > st2.total > 100 > ) sq ON sq.id = bt1.id > WHERE > st1.total<200 > > like this > > SELECT bt1.id, bt1.total, bt1.id, st2.total > FROM > big_table bt1 > INNER JOIN small_table st1 on st1.big_id = bt1.id > INNER JOIN small_table st2 on st2.big_id = bt1.id AND st2.total > 100 > WHERE > st1.total<200 Those queries are only equivalent if big_table.id is unique. However, even so some benefit could be gained from a self-join algorithm. For instance, if given some rather evil cleverness, it could be adapted to calculate overlaps very quickly. However, a self-join is very similar to a merge join, and the benefit over a standard merge join would be small. Matthew -- "We did a risk management review. We concluded that there was no risk of any management." -- Hugo Mills <hugo@carfax.nildram.co.uk>
Rikard Pavelic <rikard.pavelic@zg.htnet.hr> writes: > It would be great if Postgres could rewrite this query AFAICS those queries are not going to produce the same results, so Postgres would be 100% incorrect to rewrite it like that for you. (If they do produce the same results, it would depend on a bunch of assumptions you have not stated.) regards, tom lane
Tom Lane wrote: > Rikard Pavelic <rikard.pavelic@zg.htnet.hr> writes: > >> It would be great if Postgres could rewrite this query >> > > AFAICS those queries are not going to produce the same results, > so Postgres would be 100% incorrect to rewrite it like that for you. > > (If they do produce the same results, it would depend on a bunch > of assumptions you have not stated.) > > regards, tom lane > Can I try again? :) How hard would it be to teach the planner about preserving uniqueness of relations in subqueries? And using that information to remove unnecessary self joins on unique sets? I can try to rewrite some queries to test it on real data for how much gain it would provide. Regards, Rikard
> Can I try again? :) > > How hard would it be to teach the planner about preserving uniqueness of > relations in subqueries? > And using that information to remove unnecessary self joins on unique sets? > > I can try to rewrite some queries to test it on real data for how much > gain it would provide. I think join removal is a swell idea. In fact, I went so far as to post a patch to implement it. :-) http://archives.postgresql.org/message-id/603c8f070901181956j9541113xe68df5985d558a97@mail.gmail.com It's a slightly different problem, because I'm looking at removing left joins that provably don't change the output set due to a sufficiently strong uniqueness contraint on the nullable side of the join, and you're looking at removing self-joins that provably don't change the output set, which I believe requires establishing that all the columns of some unique index are constrained to be equal between the two copies of the table. But the two problems are very similar in that you need an efficient way to assess whether there's an adequate unique constraint, and some of the infrastructure could probably be shared. The problem from a coding perspective seems to be how and where to do the test for unique-ness. In either the left-join case or the self-join case, you need to verify that one of the relations involved has a unique index whose column list is equal to or a superset of the available merge-joinable clauses (or perhaps hash-joinable clauses). I ended up putting the logic in sort_inner_and_outer(), but that's making the decision to drop the join fairly late in the game. It would be nice to make it earlier, before we start the dynamic programming algorithm, especially for self-join removal, where throwing away the join after it's been constructed involves moving the quals around. create_unique_path() also does some interesting stuff in this area, but I haven't figured out how much of that might be applicable to join removal. ...Robert