Re: Proposal for Merge Join for Non '=' Operators

Поиск
Список
Период
Сортировка
От Dilip kumar
Тема Re: Proposal for Merge Join for Non '=' Operators
Дата
Msg-id 4205E661176A124FAF891E0A6BA91352663061AD@szxeml509-mbs.china.huawei.com
обсуждение исходный текст
Ответ на Re: Proposal for Merge Join for Non '=' Operators  (Hadi Moshayedi <hadi@moshayedi.net>)
Список pgsql-hackers

On 29 April 2014 13:28, Hadi Moshayedi Wrote,

 

>This looks like a great improvement. Repeating Nicolas's question, do you have a real-world example of such joins?

 

I can think of some scenario where, user need to self-join and find the comparison  with other tuples, For example, list down all the employee which has less salary compare to others employees and count of the employees who are earning more than that emp. Like query given below

 

“select ta.emp_name, count(*) from t1 as ta, t1 as tb where ta.emp_salary<tb.emp_salary group by ta.emp_name;”

 

>In my experience, I see more queries like "self-join table A and table B where A.time BETWEEN B.time - '1 week' and B.time", similar to what Nicolas and Tom mentioned. As an example, "count users who placed   an order in the week following their registration".

 

Currently I have implemented very basic POC which can work only for a < b query, I think actual patch can be enhanced for these type of queries also.

 

>Can you send a patch so we can also try it?

 

Patch is attached in the mail, but for testing we need to take care of some points

1.      Patch is implemented only for a<b type of queries (only for table with one integer field, this can be modified in create_nestloop_plan if needed, I have written for basic test with integer).

 

2.      What changes are done

There is no changes done in planner cost calculation, so hack is put while generating the plan.

IF planner has selected NLJ plan, and enable material is set to off (this is the hint to select special Merge Join)

Then add sort node above left and right tree for NLJ.

 

3.      So if you want to test with normal NLJ no need to change anything, and if you want to test using this merge join just run ‘set enable_material=off’;

 

 

postgres=# explain select count(*) from t1 as ta, t1 as tb where ta.a<tb.a;

                                QUERY PLAN

---------------------------------------------------------------------------

Aggregate  (cost=396625.51..396625.52 rows=1 width=0)

   ->  Nested Loop  (cost=0.00..375758.83 rows=8346672 width=0)

         Join Filter: (ta.a < tb.a)

         ->  Seq Scan on t1 ta  (cost=0.00..73.04 rows=5004 width=4)

         ->  Materialize  (cost=0.00..98.06 rows=5004 width=4)

               ->  Seq Scan on t1 tb  (cost=0.00..73.04 rows=5004 width=4)

Planning time: 0.291 ms

(7 rows)

 

Now For enabling this merge Join

postgres=# set enable_material=off;

SET

postgres=# explain select count(*) from t1 as ta, t1 as tb where ta.a<tb.a;

                                QUERY PLAN

---------------------------------------------------------------------------

Aggregate  (cost=699432.08..699432.09 rows=1 width=0)

   ->  Nested Loop  (cost=0.00..678565.40 rows=8346672 width=0)

         Join Filter: (ta.a < tb.a)

         ->  Sort  (cost=380.51..393.02 rows=5004 width=4)

               Sort Key: ta.a

               ->  Seq Scan on t1 ta  (cost=0.00..73.04 rows=5004 width=4)

         ->  Sort  (cost=380.51..393.02 rows=5004 width=4)

               Sort Key: tb.a

               ->  Seq Scan on t1 tb  (cost=0.00..73.04 rows=5004 width=4)

Planning time: 0.286 ms

(10 rows)

 

Thanks & Regards,

 Dilip Kumar

 

 

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: Allowing empty target list in SELECT (1b4f7f93b4693858cb983af3cd557f6097dab67b)
Следующее
От: Christoph Berg
Дата:
Сообщение: Re: includedir_internal headers are not self-contained