Обсуждение: Re: [SQL] cursor and update + view

Поиск
Список
Период
Сортировка

Re: [SQL] cursor and update + view

От
Vadim Mikheev
Дата:
Jan Wieck wrote:
>
> > My other problem is with the views: it seems not work if there is an
> > aggregate command in the SELECT. It's only my experience, or others
> > noticed it also? (The details are on the pgsql-novice (subject: view
> > and aggregate command problem))
>
>     Aggregates  in  view's  are  still  buggy.  It is possible to
>     change the grouping when doing a join from such  a  view  and
>     another relation in a way, that crashes the backend.

We'll have to implement subqueries in FROM someday - this would
allow handle aggregates in views...

Vadim

Re: [HACKERS] Re: [SQL] cursor and update + view

От
jwieck@debis.com (Jan Wieck)
Дата:
Vadim wrote:

>
> Jan Wieck wrote:
> >
> > > My other problem is with the views: it seems not work if there is an
> > > aggregate command in the SELECT. It's only my experience, or others
> > > noticed it also? (The details are on the pgsql-novice (subject: view
> > > and aggregate command problem))
> >
> >     Aggregates  in  view's  are  still  buggy.  It is possible to
> >     change the grouping when doing a join from such  a  view  and
> >     another relation in a way, that crashes the backend.
>
> We'll have to implement subqueries in FROM someday - this would
> allow handle aggregates in views...

    You're  right.  In the current implementation, any rangetable
    entry (RTE) that is finally (after rewrite) used in  any  Var
    node  of  the  querytree  results in a scan of that relation.
    Having  an  RTE  that  contains  its   own   subquery   would
    materialize the view internally and help us out.

    This kind of subquery RTE will also be the base for functions
    that return result  sets  (SETOF  complex  type).  These  are
    broken too.

    It  will  be  a  little tricky to pull out the qualifications
    that restrict the subquery RTE, so the view must not entirely
    get  materialized.  At least any constant expression compared
    against one of the views attributes must go into.

    There is another thing that I would like to have. The current
    rule  system  tries  to  turn  a  qualification  that uses an
    aggregate column of a view into a subquery. This  is  because
    the  planner  doesn't  support plain aggregate expressions in
    the qualification. If it would be possible  to  have  another
    type  of Var node that points to a targetlist entry, we could
    put the aggregates from the qualification  into  junk  TLE's.
    TGL - are you listening - I think it's your code I'm ulyfying
    here :-).


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] Re: [SQL] cursor and update + view

От
Vadim Mikheev
Дата:
Jan Wieck wrote:
>
>     There is another thing that I would like to have. The current
>     rule  system  tries  to  turn  a  qualification  that uses an
>     aggregate column of a view into a subquery. This  is  because
>     the  planner  doesn't  support plain aggregate expressions in
>     the qualification. If it would be possible  to  have  another
>     type  of Var node that points to a targetlist entry, we could
>     put the aggregates from the qualification  into  junk  TLE's.

This thing would also handled by subqueries in FROM!
Having support in planner/executor for queries like this:

select * from A, (select c, max(d) as m from B group by c) SQ
where SQ.c = A.x and SQ.m = A.y

rule system will be able to put _any_ VIEW' query into
FROM clause...

Vadim

Re: [HACKERS] Re: [SQL] cursor and update + view

От
jwieck@debis.com (Jan Wieck)
Дата:
Vadim wrote:

> This thing would also handled by subqueries in FROM!
> Having support in planner/executor for queries like this:
>
> select * from A, (select c, max(d) as m from B group by c) SQ
> where SQ.c = A.x and SQ.m = A.y
>
> rule system will be able to put _any_ VIEW' query into
> FROM clause...

    Possible - but IMHO the wrong thing to do. As it is now for a
    view that has no aggregate,  the  rule  system  rewrites  the
    query  to  something that is the same as if the user resolved
    the view definition by hand and  used  all  the  real  tables
    instead. Have the following:

        CREATE TABLE t1 (a int4, b int4);
        CREATE TABLE t2 (a int4, c int4);
        CREATE TABLE t3 (a int4);
        CREATE VIEW v1 AS SELECT t1.a, t1.b, t2.c
           FROM t1, t2 WHERE t1.a = t2.a;

    Now do a

        SELECT t3.a, v1.b, v1.c FROM t3, v1
            WHERE t3.a = v1.a;

    The current rewrite system builds a querytree that is exactly
    that what would have been produced by the parser if  you  had
    typed

        SELECT t3.a, t1.b, t2.c FROM t3, t1, t2
            WHERE t3.a = t1.a AND t1.a = t2.a;

    Now  the  planner/optimizer has _ALL_ the tables that need to
    be scanned and _ALL_ the qualifications in  _ONE_  querytree.
    It  is  the  job of the optimizer to decide which is the best
    join path for this access. To make a good decision, it  needs
    all this information plus the VACUUM statistics.

    If  we put any view into a subquery RTE, we force the planner
    to materialize the  view  and  do  a  nestloop  over  t3  and
    materialized  v1  where  possibly  using  t1  or  t2  as  the
    outermost scanrelation would be better.

    Stonebraker & Co where absolutely right when they spoke about
    productional  rule  systems.  And what PostgreSQL does now is
    how I understood them.

        "Production rule systems  are  conceptually  simple,  but
        there  are  many subtle points involved in actually using
        them."
                                                -- Stonebraker

    I think the different grouping requirements  for  subsets  of
    data when using aggregate columns in views is just one of the
    the problems he addressed with the above statement.

    We should build a subquery  RTE  only  if  Query->hasAggs  is
    true.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] Re: [SQL] cursor and update + view

От
Vadim Mikheev
Дата:
Jan Wieck wrote:
>
> > rule system will be able to put _any_ VIEW' query into
> > FROM clause...
>
>     Possible - but IMHO the wrong thing to do. As it is now for a

"Will be able" doesn't mean "will DO" -:))
I just said that subqueries in FROM will resolve all
problems with aggregates in VIEWs.

>     If  we put any view into a subquery RTE, we force the planner
>     to materialize the  view  and  do  a  nestloop  over  t3  and
         ^^^^^^^^^^^
Do you mean creating some tmp table etc?
No - it's not required.

>     materialized  v1  where  possibly  using  t1  or  t2  as  the
>     outermost scanrelation would be better.

  SELECT t3.a, v1.b, v1.c
  FROM t3,
       (SELECT t1.a, t1.b, t2.c FROM t1, t2 WHERE t1.a = t2.a) v1
  WHERE t3.a = v1.a;

can be planned as

    Nestloop
    SubPlan
           ...what is costless for subquery...
    Seq/Index scan on t3

- no materialization...

On the other hand, as we talk about query optimization - why
rule system should do optimizer' work? Why not just put
_any_ VIEW' query into FROM and let optimizer decide
could query be rewritten as join or not? Ppl do strange
things sometimes -:) Sometimes they use subqueries in
WHERE while joins could be used and our optimizer don't
try to catch this. I know that Sybase does.
And, imho, we should implement this ... sometime -:))

Vadim

Re: [HACKERS] Re: [SQL] cursor and update + view

От
jwieck@debis.com (Jan Wieck)
Дата:
Vadim wrote:

>
> Jan Wieck wrote:
> >
> >     If  we put any view into a subquery RTE, we force the planner
> >     to materialize the  view  and  do  a  nestloop  over  t3  and
>          ^^^^^^^^^^^
> Do you mean creating some tmp table etc?
> No - it's not required.

    Sometimes  a  sortset  is  required (grouping, nesting etc.).
    With materialize I meant the same thing the executor does for
    a scan, merge or iter node. They return in memory tuples from
    a relation or a temp file. In our new case  it's  mostly  the
    same as a scan node that does the view selection inside.  And
    it returs the same tuples as a SELECT * from the view  would.
    That's internal, on the fly materialization of the view.

> On the other hand, as we talk about query optimization - why
> rule system should do optimizer' work? Why not just put
> _any_ VIEW' query into FROM and let optimizer decide
> could query be rewritten as join or not? Ppl do strange
> things sometimes -:) Sometimes they use subqueries in
> WHERE while joins could be used and our optimizer don't
> try to catch this. I know that Sybase does.
> And, imho, we should implement this ... sometime -:))

    Depends on where the optimization is done. If we do it on the
    parsetree (Query struct), it's the job of  the  rule  system.
    The optimizer does not have to modify the parsetree. If it is
    done on the way from the parsetree to the plan, it is the job
    of the optimizer.

    If  it  is  possible to do it on the parsetree, I would do it
    there.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] Re: [SQL] cursor and update + view

От
Vadim Mikheev
Дата:
Jan Wieck wrote:
> 
> > On the other hand, as we talk about query optimization - why
> > rule system should do optimizer' work? Why not just put
> > _any_ VIEW' query into FROM and let optimizer decide
> > could query be rewritten as join or not? Ppl do strange
> > things sometimes -:) Sometimes they use subqueries in
> > WHERE while joins could be used and our optimizer don't
> > try to catch this. I know that Sybase does.
> > And, imho, we should implement this ... sometime -:))
> 
>     Depends on where the optimization is done. If we do it on the
>     parsetree (Query struct), it's the job of  the  rule  system.
>     The optimizer does not have to modify the parsetree. If it is
>     done on the way from the parsetree to the plan, it is the job
>     of the optimizer.
> 
>     If  it  is  possible to do it on the parsetree, I would do it
>     there.

Subquery --> Join transformation/optimization implemented in
rule system will be used for Views only. Being implemented
in optimizer it will be used in all cases.

Vadim


Re: [HACKERS] Re: [SQL] cursor and update + view

От
jwieck@debis.com (Jan Wieck)
Дата:
Vadim wrote:

> Subquery --> Join transformation/optimization implemented in
> rule system will be used for Views only. Being implemented
> in optimizer it will be used in all cases.

    Right  for the current rule system, because it looks only for
    pg_rewrite entries to apply. Since it  is  called  for  every
    optimizable statement, it could do this as a last step on the
    querylist to be returned. Even if there  where  no  rules  to
    apply.

    I still think that it's the right place to do. Transforming a
    subselect into a join means to modify the users input,  doing
    something  different finally.  This is kind of rewriting like
    for view rules. Reading the debug  output  "After  rewriting"
    someone  should  be  able to see which relations get scanned,
    where and which of their attributes are used for what.

    "On the other hand" I  thought  a  little  deeper  about  the
    transformation itself. On the first thought it looked so easy
    but on the third I confused myself a little.  Let's  take  an
    easy subquery

        SELECT A.f1 FROM A
            WHERE A.f2 IN (SELECT B.f1 FROM B WHERE B.f2 = 'x');

    This  will  return  any A.f1 where f2 is referenced by a B.f1
    WHERE B.f2 = 'x'. Regardless how often it is  referenced,  it
    will  only  be  returned once.  I cannot think of a join that
    can do this. The join

        SELECT A.f1 FROM A, B
            WHERE A.f2 = B.f1 AND B.f2 = 'x';

    will return A.f1 as many times as there are duplicates  in  B
    that  match.  And DISTINCT doesn't help here because it would
    remove duplicate A.f1's too  (what  isn't  the  same  as  the
    initial subselect does).

    Could  you  give  me  an  example  where a subquery could get
    translated into a join that produces exactly the same output,
    no matter if there are duplicates or not?


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#======================================== jwieck@debis.com (Jan Wieck) #

Re: [HACKERS] Re: [SQL] cursor and update + view

От
Vadim Mikheev
Дата:
Jan Wieck wrote:
> 
> Vadim wrote:
> 
> > Subquery --> Join transformation/optimization implemented in
> > rule system will be used for Views only. Being implemented
> > in optimizer it will be used in all cases.
> 
>     Right  for the current rule system, because it looks only for
>     pg_rewrite entries to apply. Since it  is  called  for  every
>     optimizable statement, it could do this as a last step on the
>     querylist to be returned. Even if there  where  no  rules  to
>     apply.
> 
>     I still think that it's the right place to do. Transforming a
>     subselect into a join means to modify the users input,  doing
>     something  different finally.  This is kind of rewriting like
>     for view rules. Reading the debug  output  "After  rewriting"
>     someone  should  be  able to see which relations get scanned,
>     where and which of their attributes are used for what.
...
>     Could  you  give  me  an  example  where a subquery could get
>     translated into a join that produces exactly the same output,
>     no matter if there are duplicates or not?


Sybase' example:

select title, price 
from titles 
where price =   (select price    from titles    where title = "Straight Talk About Computers")

select t1.title, t1.price 
from titles t1, titles t2
where t1.price = t2.price and      t2.title = "Straight Talk About Computers"

- yes, executor should ensure that there was only one
record with t2.title equal "Straight Talk About Computers",
this could be flagged in some way...

On the other hand, I'm not sure that we have to concern
about such cases and about "unwise" queries at all -:)
So, forgive me for noise -:)

There are many other things to do and handling subqueries
in FROM is one of them.

Vadim