Re: [HACKERS] subselect

Поиск
Список
Период
Сортировка
От Bruce Momjian
Тема Re: [HACKERS] subselect
Дата
Msg-id 199801052051.PAA29341@candle.pha.pa.us
обсуждение исходный текст
Список pgsql-hackers
>
> Bruce Momjian wrote:
> >
> > > always inner one in this case. The same for handling ALL modifier.
> > > Note, that we generaly can't use aggregates here: we can't add MAX to
> > > subquery in the case of > ALL (subquery), because of > ALL should return FALSE
> > > if subquery returns NULL(s) but aggregates don't take NULLs into account.
> >
> > OK, here are my ideas.  First, I think you have to handle subselects in
> > the outer node because a subquery could have its own subquery.  Also, we
>
> I hope that this is no matter: if results of subquery (with/without sub-subqueries)
> will go into temp table then this table will be re-scanned for each outer tuple.

OK, sounds good.

>
> > now have a field in Aggreg to all us to 'usenulls'.
>                                            ^^^^^^^^
>  This can't help:
>
> vac=> select * from x;
> y
> -
> 1
> 2
> 3
>  <<< this is NULL
> (4 rows)
>
> vac=> select max(y) from x;
> max
> ---
>   3
>
> ==> we can't replace
>
> select * from A where A.a > ALL (select y from x);
>                                  ^^^^^^^^^^^^^^^
>            (NULL will be returned and so A.a > ALL is FALSE - this is what
>             Sybase does, is it right ?)
> with
>
> select * from A where A.a > (select max(y) from x);

I agree.  I don't see how we can ever replace an '> ALL (y)' with '> ALL
(max(y))'. This sounds too detailed for the system to deal with.  If
they do ALL, we have to implement ALL, without any use of aggregates to
try and second-guess their request.

>                              ^^^^^^^^^^^^^^^^^^^^
> just because of we lose knowledge about NULLs here.

Yep.  And it is too much work.  If they want to replace the query with
max(), let them do it, if not, we do what they requested.

>
> Also, I would like to handle ANY and ALL modifiers for all bool
> operators, either built-in or user-defined, for all data types -
> isn't PostgreSQL OO-like RDBMS -:)

OK, sounds good to me.

>
> > OK, here it is.  I recommend we pass the outer and subquery through
> > the parser and optimizer separately.
>
> I don't like this. I would like to get parse-tree from parser for
> entire query and let optimizer (on upper level) decide how to rewrite
> parse-tree and what plans to produce and how these plans should be
> merged. Note, that I don't object your methods below, but only where
> to place handling of this. I don't understand why should we add
> new part to the system which will do optimizer' work (parse-tree -->
> execution plan) and deal with optimizer nodes. Imho, upper optimizer
> level is nice place to do this.

I am confused.  Do you want one flat query and want to pass the whole
thing into the optimizer?  That brings up some questions:

How do we want to do this?  Well, we could easily have the two queries
share the same range table by making the subquery have the proper
alias's/refnames.

However, how do we represent the join and correlated joins to the
subquery.  We can do the correlated stuff by having the outer columns
reference the inner queries range table entries that we added, but how
to represent the subquery WHERE clause, and the join of the outer to
inner queries?

In:

    select *
    from taba
    where col1 = (select col2
              from tabb
              where taba.col3 = tabb.col4)

How do we represent join of col1 to tabb.col2?  I guess we have a new
node type for IN and NOT IN and ANY, and we put that operator in the
parse grammar.

So I assume you are suggesting we flatten the query, to merge the range
tables of the two queries, and the WHERE clauses of the two queries, add
the proper WHERE conditionals to join the two range tables for
correlated queries, and have the IN, NOT IN, ALL nodes in the WHERE
clause, and have the optimizer figure out how to handle the issues.

How do we handle aggregates in the subquery?  Currently the optimizer
does those last, but we must put them above the materialized node.  And
if we merge the outer and subquery to produce one flat query, how do we
tell the optimizer to make sure the aggregate is in a node that can be
materialized?

---------------------------------------------------------------------------

If you don't want to flatten the outer query and subquery into one
query, I am really confused.  There certainly will be stuff that needs
to be put into the upper optimizer, to properly handle the two plans and
make sure they are merged into one plan.

Are you suggesting we put the IN node in the upper optimizer, and the
correlation stuff.  That sounds good.

> > I realize making merging the two plans and doing IN and NOT IN is the
>                    ^^^^^^^^^^^^^^^^^^^^^
> This is very easy to do! As I already said we have just change sub1
> access path (SeqScan of sub1) with SeqScan of Material node with
> subquery plan.

Good.  Makes sense.  This is what I was suggesting.

>
> > real challenge, but I hoped this would give us a start.
>
> Decision about how to record subquery stuff in to parse-tree
> would be very good start -:)
>
> BTW, note that for _expression_ subqueries (which are introduced without
> IN, EXISTS, ALL, ANY - this follows Sybase' naming) - as in your examples -
> we have to check that subquery returns single tuple...

Yes, I realize this.

--
Bruce Momjian
maillist@candle.pha.pa.us

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Vadim B. Mikheev"
Дата:
Сообщение: Re: [QUESTIONS] where to find SPI support (and what is the latest version?)
Следующее
От: "Vadim B. Mikheev"
Дата:
Сообщение: Re: [HACKERS] subselect