Re: Proposal for Merge Join for Non '=' Operators

Поиск
Список
Период
Сортировка
От Dilip kumar
Тема Re: Proposal for Merge Join for Non '=' Operators
Дата
Msg-id 4205E661176A124FAF891E0A6BA913526595F4CA@SZXEML507-MBS.china.huawei.com
обсуждение исходный текст
Ответ на Re: Proposal for Merge Join for Non '=' Operators  (Nicolas Barbier <nicolas.barbier@gmail.com>)
Ответы Re: Proposal for Merge Join for Non '=' Operators  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 09 April 2014 13:31, Nicolas Barbier Wrote
> Do you have a real-world example use case of such joins, to offset the
> extra planner time that will likely have to be paid (even for queries
> for which the functionality ends up not being used)?
> 
> I guess there might be queries that join on “values that are not too
> far apart” or something, but as those cases (there typically not being
> a lot of “inner” rows that join with each “outer” row) are probably
> executed efficiently using a nested loop + index scan, I don’t see the
> point (yet). Are you aiming for the case where the inner relation is
> difficult to compute and cannot be accessed using an index scan?

I think this will be more useful when Both the relation are Big and are of almost equal size.

Here we can compare the cost of existing methods and new approach..
For such case planner can select 
Either    (1). NLJ [Outer(seq scan) JOIN Inner (Materialize))OR  (2). NLJ [Outer(seq scan) JOIN Inner (Index)) As you
mentioned.   
 
In approach (1), cost will be:    NLJ Cost : Outer Tuple * Inner Tuple    + I/O cost : Seq Page Cost (Inner rel pages +
outerrel pages).
 
    * And relation can be too big for materialization.

In approach (2), Cost will be:        NLJ Cost : OuterTuple * Inner Tuple in Path    { Inner Tuple in Path -> Join
selectivity* Number of inner tuple}    + I/O Cost : OuterTuple * Index Rescan Cost    { Index Rescan Cost will depend
upon(how many pages fetched in scan)} 
 
    * This will be costly because I/O cost will increase for doing multiple index rescan(for each outer).

In New Approach(3) Cost will be:    (since here for each outer tuple we need not to scan complete Inner Tuple.)    MJ
Cost: Outer Tuple * Inner Tuple in Path + (every outer tuple need to scan some tuple before reach to qualifying tuple)
 + I/O Cost : Index Scan Cost                   {Only one scan}      
 
    So for This Best case cost will be : MJ Cost : Outer Tuple * Inner Tuple in Path + I/O Cost : Index Scan Cost
    Worst case cost will be:  MJ Cost : Outer Tuple * Inner Tuple + I/O Cost : Index Scan Cost
 
    So for many case approach(3) can be cheaper, that can be detected by planner cost calculation.


> > selecting this new cost calculation can be implemented in planner
> 
> Hmm. Of course, the difficult part will be adding support for this in
> the planner, but you don’t seem to have any plans for implementing that?
>

Yes, I have plan to implement this part, but it's not completed yet.

Thanks & Regards,
Dilip



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Meskes
Дата:
Сообщение: Re: Pointer to structure in ECPG
Следующее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Pointer to structure in ECPG