Обсуждение: Hash function constant in the Hash join algorithm

Поиск
Список
Период
Сортировка

Hash function constant in the Hash join algorithm

От
Ravi Kiran
Дата:
Hi, 

As part part of my project, I had to compare the time taken by the Hashjoin algorithm to that of Nested loop algorithm for the  inner join and Natural join queries, under two cases. I used six  tables to join each other each containing 5000 rows each approximately.

The two cases are given below.

1)When the hash function of the of hash join algorithm is not made to return a constant value, In this case the time taken by the hashjoin algorithm for the inner join is 1.45 secs and for the natural join is 4.58 secs and the time taken by the nested loop algorithm for inner join is 11.49 secs, and for natural join 46.94 secs.

2) when the hash function of the hash join algorithm is made to return a  constant value. Here, in the program hashfunc.c, I made all the hash functions return a constant value. In this case the time taken by the hashjoin algorithm for inner join is  8.44 secs and for natural join is 29.18 secs, and the time taken by the Nested loop algorithm for inner join is 11.66 secs and for natural join is 46.9 secs.

According to my understanding when a hashfunction is made to return a constant, all the tuples are hashed into a single bucket, and therefore the hashjoin algorithm is converted into a nestedloop algorithm. 

From the above two cases, I am understanding that even even when the hash function is made to return a constant, The hashjoin agorithm is taking significantly lower time compared to nested loop. 

Could anyone explain why does the hashjoin algorithm takes significantly lower time compared to nested loop when the hash function is made to return a constant value or have I done any mistake at any part of the code?

Note - I am using the postgresql9.4 version for coding. I know the post is long, There might be some grammar mistakes, I regret if  I caused any convenience to anyone.

Thank you.



--
Regards,

K.Ravikiran

Re: Hash function constant in the Hash join algorithm

От
Tom Lane
Дата:
Ravi Kiran <ravi.kolanpaka@gmail.com> writes:
> From the above two cases, I am understanding that even even when the hash
> function is made to return a constant, The hashjoin agorithm is taking
> significantly lower time compared to nested loop.

> Could anyone explain why does the hashjoin algorithm takes significantly
> lower time compared to nested loop when the hash function is made to return
> a constant value or have I done any mistake at any part of the code?

Well, hashjoin would suck the inner relation into memory (into its
hashtable) whereas nestloop would rescan the inner relation each time,
implying more buffer-access overhead, tuple visibility checking, etc.

A closer comparison might be to a nestloop that has a Materialize node
above its inner relation.

            regards, tom lane