Обсуждение: Re: [SQL] cursor and update + view
Jan Wieck wrote: > > > My other problem is with the views: it seems not work if there is an > > aggregate command in the SELECT. It's only my experience, or others > > noticed it also? (The details are on the pgsql-novice (subject: view > > and aggregate command problem)) > > Aggregates in view's are still buggy. It is possible to > change the grouping when doing a join from such a view and > another relation in a way, that crashes the backend. We'll have to implement subqueries in FROM someday - this would allow handle aggregates in views... Vadim
Vadim wrote:
>
> Jan Wieck wrote:
> >
> > > My other problem is with the views: it seems not work if there is an
> > > aggregate command in the SELECT. It's only my experience, or others
> > > noticed it also? (The details are on the pgsql-novice (subject: view
> > > and aggregate command problem))
> >
> > Aggregates in view's are still buggy. It is possible to
> > change the grouping when doing a join from such a view and
> > another relation in a way, that crashes the backend.
>
> We'll have to implement subqueries in FROM someday - this would
> allow handle aggregates in views...
You're right. In the current implementation, any rangetable
entry (RTE) that is finally (after rewrite) used in any Var
node of the querytree results in a scan of that relation.
Having an RTE that contains its own subquery would
materialize the view internally and help us out.
This kind of subquery RTE will also be the base for functions
that return result sets (SETOF complex type). These are
broken too.
It will be a little tricky to pull out the qualifications
that restrict the subquery RTE, so the view must not entirely
get materialized. At least any constant expression compared
against one of the views attributes must go into.
There is another thing that I would like to have. The current
rule system tries to turn a qualification that uses an
aggregate column of a view into a subquery. This is because
the planner doesn't support plain aggregate expressions in
the qualification. If it would be possible to have another
type of Var node that points to a targetlist entry, we could
put the aggregates from the qualification into junk TLE's.
TGL - are you listening - I think it's your code I'm ulyfying
here :-).
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #
Jan Wieck wrote: > > There is another thing that I would like to have. The current > rule system tries to turn a qualification that uses an > aggregate column of a view into a subquery. This is because > the planner doesn't support plain aggregate expressions in > the qualification. If it would be possible to have another > type of Var node that points to a targetlist entry, we could > put the aggregates from the qualification into junk TLE's. This thing would also handled by subqueries in FROM! Having support in planner/executor for queries like this: select * from A, (select c, max(d) as m from B group by c) SQ where SQ.c = A.x and SQ.m = A.y rule system will be able to put _any_ VIEW' query into FROM clause... Vadim
Vadim wrote:
> This thing would also handled by subqueries in FROM!
> Having support in planner/executor for queries like this:
>
> select * from A, (select c, max(d) as m from B group by c) SQ
> where SQ.c = A.x and SQ.m = A.y
>
> rule system will be able to put _any_ VIEW' query into
> FROM clause...
Possible - but IMHO the wrong thing to do. As it is now for a
view that has no aggregate, the rule system rewrites the
query to something that is the same as if the user resolved
the view definition by hand and used all the real tables
instead. Have the following:
CREATE TABLE t1 (a int4, b int4);
CREATE TABLE t2 (a int4, c int4);
CREATE TABLE t3 (a int4);
CREATE VIEW v1 AS SELECT t1.a, t1.b, t2.c
FROM t1, t2 WHERE t1.a = t2.a;
Now do a
SELECT t3.a, v1.b, v1.c FROM t3, v1
WHERE t3.a = v1.a;
The current rewrite system builds a querytree that is exactly
that what would have been produced by the parser if you had
typed
SELECT t3.a, t1.b, t2.c FROM t3, t1, t2
WHERE t3.a = t1.a AND t1.a = t2.a;
Now the planner/optimizer has _ALL_ the tables that need to
be scanned and _ALL_ the qualifications in _ONE_ querytree.
It is the job of the optimizer to decide which is the best
join path for this access. To make a good decision, it needs
all this information plus the VACUUM statistics.
If we put any view into a subquery RTE, we force the planner
to materialize the view and do a nestloop over t3 and
materialized v1 where possibly using t1 or t2 as the
outermost scanrelation would be better.
Stonebraker & Co where absolutely right when they spoke about
productional rule systems. And what PostgreSQL does now is
how I understood them.
"Production rule systems are conceptually simple, but
there are many subtle points involved in actually using
them."
-- Stonebraker
I think the different grouping requirements for subsets of
data when using aggregate columns in views is just one of the
the problems he addressed with the above statement.
We should build a subquery RTE only if Query->hasAggs is
true.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #
Jan Wieck wrote:
>
> > rule system will be able to put _any_ VIEW' query into
> > FROM clause...
>
> Possible - but IMHO the wrong thing to do. As it is now for a
"Will be able" doesn't mean "will DO" -:))
I just said that subqueries in FROM will resolve all
problems with aggregates in VIEWs.
> If we put any view into a subquery RTE, we force the planner
> to materialize the view and do a nestloop over t3 and
^^^^^^^^^^^
Do you mean creating some tmp table etc?
No - it's not required.
> materialized v1 where possibly using t1 or t2 as the
> outermost scanrelation would be better.
SELECT t3.a, v1.b, v1.c
FROM t3,
(SELECT t1.a, t1.b, t2.c FROM t1, t2 WHERE t1.a = t2.a) v1
WHERE t3.a = v1.a;
can be planned as
Nestloop
SubPlan
...what is costless for subquery...
Seq/Index scan on t3
- no materialization...
On the other hand, as we talk about query optimization - why
rule system should do optimizer' work? Why not just put
_any_ VIEW' query into FROM and let optimizer decide
could query be rewritten as join or not? Ppl do strange
things sometimes -:) Sometimes they use subqueries in
WHERE while joins could be used and our optimizer don't
try to catch this. I know that Sybase does.
And, imho, we should implement this ... sometime -:))
Vadim
Vadim wrote:
>
> Jan Wieck wrote:
> >
> > If we put any view into a subquery RTE, we force the planner
> > to materialize the view and do a nestloop over t3 and
> ^^^^^^^^^^^
> Do you mean creating some tmp table etc?
> No - it's not required.
Sometimes a sortset is required (grouping, nesting etc.).
With materialize I meant the same thing the executor does for
a scan, merge or iter node. They return in memory tuples from
a relation or a temp file. In our new case it's mostly the
same as a scan node that does the view selection inside. And
it returs the same tuples as a SELECT * from the view would.
That's internal, on the fly materialization of the view.
> On the other hand, as we talk about query optimization - why
> rule system should do optimizer' work? Why not just put
> _any_ VIEW' query into FROM and let optimizer decide
> could query be rewritten as join or not? Ppl do strange
> things sometimes -:) Sometimes they use subqueries in
> WHERE while joins could be used and our optimizer don't
> try to catch this. I know that Sybase does.
> And, imho, we should implement this ... sometime -:))
Depends on where the optimization is done. If we do it on the
parsetree (Query struct), it's the job of the rule system.
The optimizer does not have to modify the parsetree. If it is
done on the way from the parsetree to the plan, it is the job
of the optimizer.
If it is possible to do it on the parsetree, I would do it
there.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #
Jan Wieck wrote: > > > On the other hand, as we talk about query optimization - why > > rule system should do optimizer' work? Why not just put > > _any_ VIEW' query into FROM and let optimizer decide > > could query be rewritten as join or not? Ppl do strange > > things sometimes -:) Sometimes they use subqueries in > > WHERE while joins could be used and our optimizer don't > > try to catch this. I know that Sybase does. > > And, imho, we should implement this ... sometime -:)) > > Depends on where the optimization is done. If we do it on the > parsetree (Query struct), it's the job of the rule system. > The optimizer does not have to modify the parsetree. If it is > done on the way from the parsetree to the plan, it is the job > of the optimizer. > > If it is possible to do it on the parsetree, I would do it > there. Subquery --> Join transformation/optimization implemented in rule system will be used for Views only. Being implemented in optimizer it will be used in all cases. Vadim
Vadim wrote:
> Subquery --> Join transformation/optimization implemented in
> rule system will be used for Views only. Being implemented
> in optimizer it will be used in all cases.
Right for the current rule system, because it looks only for
pg_rewrite entries to apply. Since it is called for every
optimizable statement, it could do this as a last step on the
querylist to be returned. Even if there where no rules to
apply.
I still think that it's the right place to do. Transforming a
subselect into a join means to modify the users input, doing
something different finally. This is kind of rewriting like
for view rules. Reading the debug output "After rewriting"
someone should be able to see which relations get scanned,
where and which of their attributes are used for what.
"On the other hand" I thought a little deeper about the
transformation itself. On the first thought it looked so easy
but on the third I confused myself a little. Let's take an
easy subquery
SELECT A.f1 FROM A
WHERE A.f2 IN (SELECT B.f1 FROM B WHERE B.f2 = 'x');
This will return any A.f1 where f2 is referenced by a B.f1
WHERE B.f2 = 'x'. Regardless how often it is referenced, it
will only be returned once. I cannot think of a join that
can do this. The join
SELECT A.f1 FROM A, B
WHERE A.f2 = B.f1 AND B.f2 = 'x';
will return A.f1 as many times as there are duplicates in B
that match. And DISTINCT doesn't help here because it would
remove duplicate A.f1's too (what isn't the same as the
initial subselect does).
Could you give me an example where a subquery could get
translated into a join that produces exactly the same output,
no matter if there are duplicates or not?
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me. #
#======================================== jwieck@debis.com (Jan Wieck) #
Jan Wieck wrote: > > Vadim wrote: > > > Subquery --> Join transformation/optimization implemented in > > rule system will be used for Views only. Being implemented > > in optimizer it will be used in all cases. > > Right for the current rule system, because it looks only for > pg_rewrite entries to apply. Since it is called for every > optimizable statement, it could do this as a last step on the > querylist to be returned. Even if there where no rules to > apply. > > I still think that it's the right place to do. Transforming a > subselect into a join means to modify the users input, doing > something different finally. This is kind of rewriting like > for view rules. Reading the debug output "After rewriting" > someone should be able to see which relations get scanned, > where and which of their attributes are used for what. ... > Could you give me an example where a subquery could get > translated into a join that produces exactly the same output, > no matter if there are duplicates or not? Sybase' example: select title, price from titles where price = (select price from titles where title = "Straight Talk About Computers") select t1.title, t1.price from titles t1, titles t2 where t1.price = t2.price and t2.title = "Straight Talk About Computers" - yes, executor should ensure that there was only one record with t2.title equal "Straight Talk About Computers", this could be flagged in some way... On the other hand, I'm not sure that we have to concern about such cases and about "unwise" queries at all -:) So, forgive me for noise -:) There are many other things to do and handling subqueries in FROM is one of them. Vadim