On 09 April 2014 19:14, Tom Lane Wrote:
>
> I'm not following how this would help much. Consider "a.x < b.y" where
> we know that the inputs are sorted by x/y. I think you are saying that
> for a given b row, we could scan forward from the beginning of a, but
> stop as soon as we find a row with a.x >= b.y. Great ... but instead
> of comparing M*N rows, we are comparing M*N/2 rows. So it's still
> O(N^2), though with a slightly smaller constant. Is this going to be
> worth the added time to sort the inputs?
>
> It's slightly more promising for range constraints (that is, "a.x
> between b.y and b.z") but even then you have to assume that the ranges
> are narrow, at which point a nestloop join using an inner indexscan on
> a.x might do about as well.
>
> So personally, I suspect there's a reason why this isn't a standard
> join algorithm already. If you want to try it, I'd suggest that you
> focus on getting to where you can do some performance testing ASAP,
> without doing much code polishing or worrying about teaching the
> planner how to cost it.
As per your suggestion have done quick hack and done the performance testing for one specific scenario.
I shall perform some more test, for that I need to do some more hack in the code and I will post them soon..
Test Scenario:
Create table t1 (a int, b int);
Create table t2 (a int, b int);
Random record inserted in t1 and t2, as per attached files. (10K records are inserted in both the tables)
Performance is taken for the query : select count(*) from t1,t2 where t1.b < t2.b;
Test Result:
Nest Loop Join : Time: 36038.842 ms
Merge Join : Time: 19774.975 ms
Number of record selected: 42291979
Thanks & Regards,
Dilip