Обсуждение: Block nested loop join

Поиск
Список
Период
Сортировка

Block nested loop join

От
"Bramandia Ramadhana"
Дата:
Hi all,

I am new to postgresql. I am currently doing research to optimize the query performance of RDBMS, specifically postgresql. Hence, I am currently reading out the code to understand the implementation of various query evaluation algorithm in postgresql.

Currently, I am investigating the nested loop join algorithm in nodeNestloop.c. After reading the code, my understanding is that it performs simple nested loop join (not block nested loop join). Is this true? Does postgresql support block nested loop join? If it does, where is it? I might miss some buffering/caching mechanism.

I appreciate any helps/advice.

Regards,

Bramandia R.

Re: Block nested loop join

От
Heikki Linnakangas
Дата:
Bramandia Ramadhana wrote:
> Currently, I am investigating the nested loop join algorithm in
> nodeNestloop.c. After reading the code, my understanding is that it performs
> simple nested loop join (not block nested loop join). Is this true?

Yep.

> Does postgresql support block nested loop join? 

Nope.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Block nested loop join

От
Gregory Stark
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

>> Does postgresql support block nested loop join? 
>
> Nope.

We do support Hash Join though so I think the only difference is that we can't
use the hash join for cartesian joins.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!


Re: Block nested loop join

От
"Bramandia Ramadhana"
Дата:
Thanks for the clarifications.

Just for curiosity, is there any reason of not having block nested-loop join implementation? Is it rarely useful?

As far as I am aware of, in the case of cross product of two tables, block nested-loop join is the most efficient algorithm.

Regards,

Bramandia R.

On Fri, Oct 10, 2008 at 3:30 PM, Gregory Stark <stark@enterprisedb.com> wrote:

Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:

>> Does postgresql support block nested loop join?
>
> Nope.

We do support Hash Join though so I think the only difference is that we can't
use the hash join for cartesian joins.

--
 Gregory Stark
 EnterpriseDB          http://www.enterprisedb.com
 Ask me about EnterpriseDB's PostGIS support!

Re: Block nested loop join

От
Gregory Stark
Дата:
"Bramandia Ramadhana" <bramandia@gmail.com> writes:

> Thanks for the clarifications.
>
> Just for curiosity, is there any reason of not having block nested-loop join
> implementation? Is it rarely useful?

Oh, actually it occurs to me that we do implement something analogous to a
degenerate block nested loop where we materialize one side of the nested loop.

It looks "backward" since the materialized side is the "inner" side of the
loop but it's basically equivalent to a block nested loop with the entire
outer side in a single block.

So the use case of a real block nested loop would be doing a cartesian join of
two large tables where neither fits in RAM. That does seem like it might be
kind of narrow given how large the output would be.

But we won't know unless someone does the experiment.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication
support!


Re: Block nested loop join

От
Tom Lane
Дата:
Gregory Stark <stark@enterprisedb.com> writes:
> So the use case of a real block nested loop would be doing a cartesian join of
> two large tables where neither fits in RAM. That does seem like it might be
> kind of narrow given how large the output would be.

Yeah.  If you have a hashable join condition then our existing batched
hash join code should be roughly equivalent to this method.  So the use
case is joining a couple of large tables with an un-hashable,
un-indexable join condition (or none at all, ie cross product) and that
just isn't something we hear people wanting to do a lot.  I can't really
see why we'd bother maintaining extra code for block nested loop.
        regards, tom lane


Re: Block nested loop join

От
Gregory Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> So the use case of a real block nested loop would be doing a cartesian join of
>> two large tables where neither fits in RAM. That does seem like it might be
>> kind of narrow given how large the output would be.
>
> Yeah.  If you have a hashable join condition then our existing batched
> hash join code should be roughly equivalent to this method.  So the use
> case is joining a couple of large tables with an un-hashable,
> un-indexable join condition (or none at all, ie cross product) and that
> just isn't something we hear people wanting to do a lot.  I can't really
> see why we'd bother maintaining extra code for block nested loop.

Hm, I hadn't thought of other non-hashable join conditions.

I wonder how much code it would be though if we just hacked hash join to
support returning the full cartesian product. Ie, instead of doing a hash
lookup do a full scan of the hash and recheck the join condition if any for
every combination.

That seems like it would be a pretty small change to HashJoin and would
effectively support precisely this join type.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production
Tuning


Re: Block nested loop join

От
"Bramandia Ramadhana"
Дата:
Dear All,

I took a look at the source code for hash join this morning and I realized that the block nested loop join is somewhat similar to that.

Thanks for the discussions.

Bramandia R.

On Fri, Oct 10, 2008 at 8:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Gregory Stark <stark@enterprisedb.com> writes:
> So the use case of a real block nested loop would be doing a cartesian join of
> two large tables where neither fits in RAM. That does seem like it might be
> kind of narrow given how large the output would be.

Yeah.  If you have a hashable join condition then our existing batched
hash join code should be roughly equivalent to this method.  So the use
case is joining a couple of large tables with an un-hashable,
un-indexable join condition (or none at all, ie cross product) and that
just isn't something we hear people wanting to do a lot.  I can't really
see why we'd bother maintaining extra code for block nested loop.

                       regards, tom lane