Re: UNION ALL has higher cost than inheritance

Поиск

Список

Период

Сортировка

От	Robert Haas
Тема	Re: UNION ALL has higher cost than inheritance
Дата	8 ноября 2010 г. 18:56:40
Msg-id	AANLkTi=-jCsqj3v-pTZfOg+KbEr1S-AzSiOSdkU=mbvX@mail.gmail.com обсуждение исходный текст
Ответ на	Re: UNION ALL has higher cost than inheritance (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: UNION ALL has higher cost than inheritance (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

On Mon, Nov 8, 2010 at 1:19 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> I did some hacking on this and came up with the attached patch, which
>> could use a bit more work on the comments but passes regression tests.
>> However, this just solves the issue of being smart about top-level
>> UNION ALL cases.  It might be worth looking into using MergeAppend
>> for the sorting required for other types of set operations.  That would
>> involve quite a different patch, and I'm not sure if it'd remove the
>> need for this one or not.
>
> I spent some more time thinking about this.  It seems difficult to make
> any real further progress without essentially throwing out the current
> approach to planning set-operation trees and starting over.  What we'd
> want is to generate Paths representing the different ways a set-op could
> be implemented and then pick the best one.  I can see the general shape
> of how to do that: a set-op should be thought of as generating a
> constrained join relation in a fashion similar to an OUTER JOIN
> construct, and then the different Paths are possible implementations for
> the join rel.  We'd need new Path types for hashed or sorted UNION,
> INTERSECT, EXCEPT, as well as some analog of the SpecialJoinInfo
> struct (or extend that struct to cover these cases).

It strikes me that INTERSECT ALL and EXCEPT ALL are pretty much JUST a
semi- or anti-join (on the entire set of output columns).   But
without ALL it's completely different.

> It strikes me that maybe top-level DISTINCT could be married into this
> too, since it's basically the same as a one-input UNION operator.  But
> not sure what to do with DISTINCT ON.
>
> Another thought here is that the current definition of the SetOp
> execution node type could be improved.  Instead of taking a single
> pre-merged input tuple stream, it'd be better if it had two inputs
> like a JOIN node.  Then we could eliminate the "flag" column, which
> would simplify matters for planning and reduce the amount of data
> that has to be passed through sorting.  SetOp itself would need to
> borrow some of the capability of MergeAppend so that it could read
> two sorted streams in parallel and keep them in sync.
>
> But this all looks like a pretty substantial amount of work, and
> given the low level of user demand for improving the performance of
> set operations, it seems to belong fairly far down the to-do list.
> So I'm not going to tackle it now.  Barring objection, I'll clean up
> yesterday's patch a bit more and commit it.

I agree.  If we had infinite resources it would be nice to tackle
this, but I think we have bigger fish to fry.  In particular, I wonder
if you've thought any more about the generalized inner-indexscan
machinery, or taken a look at any of the issues around KNNGIST.  I'd
like to see our limited supply of planner-fu invested in those areas,
or perhaps in making inner join removal work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Peter Eisentraut
Дата: 08 ноября 2010 г., 18:49:36
Сообщение: Re: Should we use make -k on the buildfarm?

Следующее

От: Robert Haas
Дата: 08 ноября 2010 г., 18:59:58
Сообщение: Re: How to share the result data of separated plan

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: UNION ALL has higher cost than inheritance

Предыдущее

Следующее