Re: 9.3 Pre-proposal: Range Merge Join

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: 9.3 Pre-proposal: Range Merge Join
Дата
Msg-id CA+TgmoYSKvnGDB7ZRHaus-AEHq_dLUKkGxOcgALRpMr=it1PHw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: 9.3 Pre-proposal: Range Merge Join  (Jeff Davis <pgsql@j-davis.com>)
Ответы Re: 9.3 Pre-proposal: Range Merge Join  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: 9.3 Pre-proposal: Range Merge Join  (Jeff Davis <pgsql@j-davis.com>)
Список pgsql-hackers
On Mon, Apr 16, 2012 at 1:43 PM, Jeff Davis <pgsql@j-davis.com> wrote:
> On Mon, 2012-04-16 at 02:52 -0400, Tom Lane wrote:
>> Jeff Davis <pgsql@j-davis.com> writes:
>> >  1. Order the ranges on both sides by the lower bound, then upper bound.
>> > Empty ranges can be excluded entirely.
>> >  2. Left := first range on left, Right := first range on right
>> >  3. If Left or Right is empty, terminate.
>> >  4. If lower(Left) > upper(Right), discard Right, goto 2
>> >  5. If lower(Right) > upper(Left), discard Left, goto 2
>> >  6. return (Left, Right) as joined tuple
>> >  7. Right := next range on right
>> >  8. goto 3
>>
>> This is surely not correct in detail.  As written, it will be impossible
>> for any tuple on the right side to be joined to more than one left-side
>> tuple.  You will need something analogous to the mark-and-restore
>> rewinding logic in standard mergejoin to make this work.
>
> Every time you discard a tuple on the left, you go to step 2, which
> rewinds the right side back to the first non-discarded tuple.
>
> So, implemented using mark and restore:
>
>  * start off with the marks at the first tuple on each side
>  * "discard" means move the mark down a tuple
>  * setting it back to the first range means restoring to the mark
>  * going to the next range (step 7) just means getting another
>    tuple, without changing the mark
>
> Only one side really needs the mark and restore logic, but it was easier
> to write the pseudocode in a symmetrical way (except step 7).

I'm actually not sure these are equivalent formulations.  Suppose one
side has [i,i] where i ranges from 1 to 10000000 and the other side
the exact same thing plus [1,10000000].  That one really big range
will come up second on the right side, and you will not be able to
discard it until you reach the end of the join.  If you just keep
rewinding the right side, you're going to end up with O(n^2) behavior,
whereas if you can discard tuples "from the middle" on the right side,
then you will get O(n) behavior, which is a big difference.  In other
words, in your original algorithm the tuples that you discard in step
4 or 5 need not be the first remaining tuple on whichever side of the
join we're talking about.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: Bug tracker tool we need
Следующее
От: Robert Haas
Дата:
Сообщение: Re: 9.3 Pre-proposal: Range Merge Join