Обсуждение: Common Table Expressions (WITH RECURSIVE) patch

Поиск
Список
Период
Сортировка

Common Table Expressions (WITH RECURSIVE) patch

От
Jeff Davis
Дата:
These are my initial comments about the Common Table Expressions (CTE)
patch, also known as WITH [RECURSIVE]. These comments are based on the
patch here:

http://archives.postgresql.org/pgsql-patches/2008-08/msg00021.php

This is a semantically complex feature, and the standard is fairly
complex as well. So I'm approaching this by making my own
interpretations from the standard first (I included my interpretations
and section references at the end of this email) and comparing to the
behavior of the patch.

The following examples may be inconsistent with the standard. Some
have already been mentioned, and I don't think they all need to be
fixed for 8.4, but I mention them here for completeness.

* Mutual Recursion:
 with recursive   foo(i) as (values(1) union all select i+1 from bar where i < 10),   bar(i) as (values(1) union all
selecti+1 from foo where i < 10) select * from foo; ERROR:  mutual recursive call is not supported
 
 The standard allows mutual recursion.

* Single Evaluation:
 with   foo(i) as (select random() as i) select * from foo union all select * from foo;          i -------------------
0.233165248762816  0.62126633618027 (2 rows)
 
 The standard specifies that non-recursive WITH should be evaluated once.

* RHS only:
 with recursive   foo(i) as (select i+1 from foo where i < 10 union all values(1)) select * from foo; ERROR:  Left hand
sideof UNION ALL must be a non-recursive term in a
 
recursive query
 The standard does not require that the recursive term be on the RHS.

* UNION ALL only:
 with recursive   foo(i) as (values(1) union select i+1 from foo where i < 10) select * from foo; ERROR:  non-recursive
termand recursive term must be combined with
 
UNION ALL
 The standard seems to allow UNION ALL, UNION, INTERSECT, and EXCEPT (when the recursive term is not on the RHS of the
EXCEPT).

* Binary recursion and subselect strangeness:
 with recursive foo(i) as   (values (1)   union all   select * from     (select i+1 from foo where i < 10     union all
   select i+1 from foo where i < X) t) select * from foo;
 
 Produces 10 rows of output regardless of what "X" is. This should be
fixed for 8.4. Also, this is non-linear recursion, which the standard seems to
disallow.

* Multiple recursive references:
 with recursive foo(i) as   (values (1)   union all   select i+1 from foo where i < 10   union all   select i+1 from
foowhere i < 20) select * from foo; ERROR:  Left hand side of UNION ALL must be a non-recursive term in a
 
recursive query
 If we're going to allow non-linear recursion (which the standard does not), this seems like it should be a valid
case.

* Strange result with except:
 with recursive foo(i) as   (values (1)   union all   select * from   (select i+1 from foo where i < 10   except
selecti+1 from foo where i < 5) t) select * from foo; ERROR:  table "foo" has 0 columns available but 1 columns
specified
 This query works if you replace "except" with "union". This should be
fixed for 8.4.

* Aggregates allowed:
 with recursive foo(i) as   (values(1)   union all   select max(i)+1 from foo where i < 10) select * from foo;
 Aggregates should be blocked according to the standard. Also, causes an infinite loop. This should be fixed for 8.4.

* DISTINCT should supress duplicates:
 with recursive foo(i) as   (select distinct * from (values(1),(2)) t   union all   select distinct i+1 from foo where
i< 10) select * from foo;
 
 This outputs a lot of duplicates, but they should be supressed
according to the standard. This query is essentially the same as
supporting UNION for recursive queries, so we should either fix both for
8.4 or block both for consistency.

* outer joins on a recursive reference should be blocked:
 with recursive foo(i) as   (values(1)   union all   select i+1 from foo left join (values(1)) t on (i=column1)) select
*from foo;
 
 Causes an infinite loop, but the standard says using an outer join in this situation should be prohibited. This should
befixed for 8.4.
 

* ORDER BY, LIMIT, and OFFSET are rejected for recursive queries. The
standard does not seem to say that these should be rejected.


The following are my interpretations of relevant parts of the SQL
standard (200n), and the associated sections. These are only my
interpretation, so let me know if you interpret the standard
differently.

Non-linear recursion forbidden: 7.13: Syntax Rules: 2.g.ii   My interpretation of 2.g.ii.2 is that WQN[k] and WQN[l]
maybe the   same <query name>. 7.13: Syntax Rules: 2.g.iv
 

EXCEPT can't be used for recursive queries if a recursive reference
appears on the RHS: 7.13: Syntax Rules: 2.g.iii.1

INTERSECT ALL/EXCEPT ALL can't be used for recursive queries: 7.13: Syntax Rules: 2.g.iii.5

recursive references must appear in FROM clause: 7.13: Syntax Rules: 2.g.iii.3   My interpretation of this rule is that
itdoes not allow a   recursive reference in a subquery in the targetlist or a subquery   in the where clause.
 

stratum defined: 7.13: Syntax Rules: 2.f 7.13: Syntax Rules: 2.g.i.1

recursive query must have anchor for every stratum: 7.13: Syntax Rules: 2.g.i.3

outer joins not allowed to join recursive references: 7.13: Syntax Rules: 2.g.iii.6 7.13: Syntax Rules: 2.g.iii.7

Aggregates/HAVING disallowed: 7.13: Syntax Rules: 2.g.iii.4.B

Mutual recursion defined: 7.13: Syntax Rules: 2.g.i.1

Evaluate each WITH entry once, even if it's referenced multiple times: 7.13: General Rules: 1 7.13: General Rules: 2.b
Seealso:
 
http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php

Evaluation order with mutual recursion: 7.13: General Rules: 2.a 7.13: General Rules: 2.b

Evaluation semantics: 7.13: General Rules: 2.c

DISTINCT: 7.13: General Rules: 2.c.iv 7.13: General Rules: 2.c.ix.3.A

I will provide comments about the code and documentation soon. This is a
very useful feature.

Regards,Jeff Davis



Re: Common Table Expressions (WITH RECURSIVE) patch

От
Andrew Gierth
Дата:
>>>>> "Jeff" == Jeff Davis <pgsql@j-davis.com> writes:
Jeff> * Mutual Recursion:

This limitation isn't at all uncommon in other implementations; DB2
docs for example say:

"If more than one common table expression is defined in the same
statement, cyclic references between the common table expressions are
not permitted. A cyclic reference occurs when two common table
expressions dt1 and dt2 are created such that dt1 refers to dt2 and
dt2 refers to dt1."


http://publib.boulder.ibm.com/infocenter/iadthelp/v7r0/index.jsp?topic=/com.ibm.etools.iseries.langref2.doc/rbafzmst295.htm

MSSQL's documentation is less clear, but it also appears not to allow
mutual recursion (not allowing forward references to CTEs).

"A CTE can reference itself and previously defined CTEs in the same
WITH clause. Forward referencing is not allowed."

http://msdn.microsoft.com/en-us/library/ms175972.aspx
Jeff> * RHS only:
Jeff>   with recursiveJeff>     foo(i) as (select i+1 from foo where i < 10 union all values(1))Jeff>   select * from
foo;Jeff>  ERROR:  Left hand side of UNION ALL must be a non-recursive term in aJeff> recursive query
 
Jeff>   The standard does not require that the recursive term be onJeff>   the RHS.

Again, the standard may not, but existing implementations do:

MSSQL:

"The recursive CTE definition must contain at least two CTE query
definitions, an anchor member and a recursive member. Multiple anchor
members and recursive members can be defined; however, all anchor
member query definitions must be put before the first recursive member
definition. All CTE query definitions are anchor members unless they
reference the CTE itself. "

DB2:

"The following restrictions apply to a recursive common-table-expression:   * A list of column-names must be specified
followingthe     table-identifier.   * The UNION ALL set operator must be specified.   * The first fullselect of the
firstunion (the initialization     fullselect) must not include a reference to the     common-table-expression itself
inany FROM clause.
 
"
Jeff> * UNION ALL only:
Jeff>   with recursiveJeff>     foo(i) as (values(1) union select i+1 from foo where i < 10)Jeff>   select * from
foo;Jeff>  ERROR:  non-recursive term and recursive term must be combined withJeff> UNION ALL
 
Jeff>   The standard seems to allow UNION ALL, UNION, INTERSECT, andJeff>   EXCEPT (when the recursive term is not on
theRHS of theJeff>   EXCEPT).
 

Again, existing implementations disagree. See above for DB2, and for
MSSQL:

"Anchor members must be combined by one of these set operators: UNION
ALL, UNION, INTERSECT, or EXCEPT. UNION ALL is the only set operator
allowed between the last anchor member and first recursive member, and
when combining multiple recursive members."
Jeff> * Binary recursion and subselect strangeness:
Jeff>   with recursive foo(i) asJeff>     (values (1)Jeff>     union allJeff>     select * fromJeff>       (select i+1
fromfoo where i < 10Jeff>       union allJeff>       select i+1 from foo where i < X) t)Jeff>   select * from foo;
 
Jeff>   Produces 10 rows of output regardless of what "X" is. ThisJeff> should be fixed for 8.4.  Also, this is
non-linearrecursion,Jeff> which the standard seems to disallow.
 

That looks like it should be disallowed somehow.
Jeff> * Multiple recursive references:[snip]
Jeff>   If we're going to allow non-linear recursion (which theJeff>   standard does not), this seems like it should be
avalidJeff>   case.
 

We probably shouldn't allow it (as above).

[snip * Strange result with except: which looks like a bug]
Jeff> * Aggregates allowed: which
Jeff>   with recursive foo(i) asJeff>     (values(1)Jeff>     union allJeff>     select max(i)+1 from foo where i <
10)Jeff>  select * from foo;
 
Jeff>   Aggregates should be blocked according to the standard.Jeff>   Also, causes an infinite loop. This should be
fixedfor 8.4.
 

Does the standard require anywhere that non-conforming statements must
be diagnosed? (seems impractical, since it would forbid extensions)
Jeff> * DISTINCT should supress duplicates:
Jeff>   with recursive foo(i) asJeff>     (select distinct * from (values(1),(2)) tJeff>     union allJeff>     select
distincti+1 from foo where i < 10)Jeff>   select * from foo;
 
Jeff>   This outputs a lot of duplicates, but they should beJeff> supressed according to the standard. This query is
essentiallyJeff>the same as supporting UNION for recursive queries, so weJeff> should either fix both for 8.4 or block
bothfor consistency.
 

Yeah, though the standard's use of DISTINCT in this way is something
of a violation of the POLA.
Jeff> * outer joins on a recursive reference should be blocked:
Jeff>   with recursive foo(i) asJeff>     (values(1)Jeff>     union allJeff>     select i+1 from foo left join
(values(1))t on (i=column1))Jeff>   select * from foo;
 
Jeff>   Causes an infinite loop, but the standard says using an outerJeff>   join in this situation should be
prohibited.This should beJeff>   fixed for 8.4.
 

No. This has already been discussed; it's neither possible nor desirable
to diagnose all cases which can result in infinite loops, and there are
important types of queries which would be unnecessarily forbidden.

Besides, you've misread the spec here: it prohibits the recursive
reference ONLY on the nullable side of the join. You cite:
Jeff> outer joins not allowed to join recursive references:Jeff>   7.13: Syntax Rules: 2.g.iii.6Jeff>   7.13: Syntax
Rules:2.g.iii.7
 

6) WQEi shall not contain a <qualified join> QJ in which:
  A) QJ immediately contains a <join type> that specifies FULL and a     <table reference> or <table factor> that
containsa <query name>     referencing WQNj.
 

[no recursive FULL JOIN at all]
  B) QJ immediately contains a <join type> that specifies LEFT and a     <table factor> following the <join type> that
containsa <query     name> referencing WQNj.
 

[note "following the <join type>", so  WQNj LEFT JOIN foo is allowed,
but foo LEFT JOIN WQNj is not]
  C) QJ immediately contains a <join type> that specifies RIGHT and a     <table reference> preceding the <join type>
thatcontains a     <query name> referencing WQNj.
 

[note "preceding the <join type>", so  foo RIGHT JOIN WQNj is allowed,
but WQNj RIGHT JOIN foo is not]

para (7) is identical but for natural rather than qualified joins.
Jeff> * ORDER BY, LIMIT, and OFFSET are rejected for recursiveJeff> queries. The standard does not seem to say that
theseshould beJeff> rejected.
 

Note that supporting those in subqueries (including CTEs) is a separate
optional feature of the standard.

-- 
Andrew (irc:RhodiumToad)


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Gregory Stark
Дата:
Jeff Davis <pgsql@j-davis.com> writes:

> * Mutual Recursion:
>
>   with recursive
>     foo(i) as (values(1) union all select i+1 from bar where i < 10),
>     bar(i) as (values(1) union all select i+1 from foo where i < 10)
>   select * from foo;
>   ERROR:  mutual recursive call is not supported
>
>   The standard allows mutual recursion.

This seems to be a point of confusion. I originally read the standard and
concluded that mutual recursion was an optional feature. Itagaki-san showed me
a copy of the spec where it seemed there was a clear blanket prohibition on
mutually recursive queries and in fact anything but simple linearly expandable
queries. I wonder if there are different versions of the spec floating around
on this point.

Take a second look at your spec and read on to where it defines "linear" and
"expandable". If it doesn't define those terms then it's definitely different
from what I read. If it does, read on to see what it does with them. The main
reason to define them appeared to be to use them to say that supporting mutual
recursion is not required.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Jeff Davis
Дата:
On Mon, 2008-09-08 at 21:13 +0100, Gregory Stark wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> 
> > * Mutual Recursion:
> >
> >   with recursive
> >     foo(i) as (values(1) union all select i+1 from bar where i < 10),
> >     bar(i) as (values(1) union all select i+1 from foo where i < 10)
> >   select * from foo;
> >   ERROR:  mutual recursive call is not supported
> >
> >   The standard allows mutual recursion.
> 
> This seems to be a point of confusion. I originally read the standard and
> concluded that mutual recursion was an optional feature. Itagaki-san showed me
> a copy of the spec where it seemed there was a clear blanket prohibition on
> mutually recursive queries and in fact anything but simple linearly expandable
> queries. I wonder if there are different versions of the spec floating around
> on this point.
> 
> Take a second look at your spec and read on to where it defines "linear" and
> "expandable". If it doesn't define those terms then it's definitely different
> from what I read. If it does, read on to see what it does with them. The main
> reason to define them appeared to be to use them to say that supporting mutual
> recursion is not required.

I think we're reading the same version of the spec. I'm reading 200n.

My interpretation (Syntax Rule 2.h) is that expandable is only used to
determine whether it can contain a <search or cycle>.

That being said, I don't think it should be a requirement for 8.4. The
CTE patch is an important feature, and we shouldn't hold it up over
something comparatively obscure like mutual recursion. As Andrew Gierth
cited, other systems don't support it anyway. It should be kept in mind
for future work though.

Regards,Jeff Davis



Re: Common Table Expressions (WITH RECURSIVE) patch

От
Jeff Davis
Дата:
On Mon, 2008-09-08 at 18:08 +0100, Andrew Gierth wrote:
>  Jeff> * Mutual Recursion:
> 
> This limitation isn't at all uncommon in other implementations; DB2
> docs for example say:

As with some other things in my list, this doesn't need to be supported
in 8.4. I just wanted to lay out my interpretation of the standard, and
places that we might (currently) fall short of it.

The fact that other DBMSs don't support mutual recursion is a good
indication that it's not important immediately.

>  Jeff>   The standard does not require that the recursive term be on
>  Jeff>   the RHS.
> 
> Again, the standard may not, but existing implementations do:
> 

Again, I don't think we need this for 8.4.

However, I think it's probably more important than mutual recursion.

>  Jeff> * UNION ALL only:
> 
>  Jeff>   with recursive
>  Jeff>     foo(i) as (values(1) union select i+1 from foo where i < 10)
>  Jeff>   select * from foo;
>  Jeff>   ERROR:  non-recursive term and recursive term must be combined with
>  Jeff> UNION ALL
> 
>  Jeff>   The standard seems to allow UNION ALL, UNION, INTERSECT, and
>  Jeff>   EXCEPT (when the recursive term is not on the RHS of the
>  Jeff>   EXCEPT).
> 
> Again, existing implementations disagree. See above for DB2, and for
> MSSQL:
> 

And again, I agree that it's not important for 8.4.

At some point we need to determine what the goalposts are though. Are we
copying existing implementations, or are we implementing the standard?

>  Jeff>   Produces 10 rows of output regardless of what "X" is. This
>  Jeff> should be fixed for 8.4.  Also, this is non-linear recursion,
>  Jeff> which the standard seems to disallow.
> 
> That looks like it should be disallowed somehow.

Agreed. I think it should just throw an error, probably.

> [snip * Strange result with except: which looks like a bug]
> 
>  Jeff> * Aggregates allowed: which
> 
>  Jeff>   with recursive foo(i) as
>  Jeff>     (values(1)
>  Jeff>     union all
>  Jeff>     select max(i)+1 from foo where i < 10)
>  Jeff>   select * from foo;
> 
>  Jeff>   Aggregates should be blocked according to the standard.
>  Jeff>   Also, causes an infinite loop. This should be fixed for 8.4.
> 
> Does the standard require anywhere that non-conforming statements must
> be diagnosed? (seems impractical, since it would forbid extensions)
> 

2.g.iii.4.B explicitly says aggregates should be rejected, unless I have
misinterpreted.

>  
> Yeah, though the standard's use of DISTINCT in this way is something
> of a violation of the POLA.
> 

I agree that's kind of a funny requirement. But that's pretty typical of
the SQL standard. If DB2 or SQL Server follow the standard here, we
should, too. If not, it's open for discussion.

> No. This has already been discussed; it's neither possible nor desirable
> to diagnose all cases which can result in infinite loops, and there are
> important types of queries which would be unnecessarily forbidden.

I didn't say we should forbid all infinite loops. But we should forbid
ones that the standard tells us to forbid.

> Besides, you've misread the spec here: it prohibits the recursive
> reference ONLY on the nullable side of the join. You cite:
> 

Thank you for the correction. It does properly reject the outer joins
that the standard says should be rejected.

>  Jeff> * ORDER BY, LIMIT, and OFFSET are rejected for recursive
>  Jeff> queries. The standard does not seem to say that these should be
>  Jeff> rejected.
> 
> Note that supporting those in subqueries (including CTEs) is a separate
> optional feature of the standard.
> 

I don't feel strongly about this either way, but I prefer that we are
consistent when possible. We do support these things in a subquery, so
shouldn't we support them in all subqueries?

Regards,Jeff Davis




Re: Common Table Expressions (WITH RECURSIVE) patch

От
Andrew Gierth
Дата:
>>>>> "Jeff" == Jeff Davis <jdavis@truviso.com> writes:
Jeff> Aggregates should be blocked according to the standard.Jeff> Also, causes an infinite loop. This should be fixed
for8.4.
 
>> Does the standard require anywhere that non-conforming statements>> must be diagnosed? (seems impractical, since it
wouldforbid>> extensions)
 
Jeff> 2.g.iii.4.B explicitly says aggregates should be rejected,Jeff> unless I have misinterpreted.

Yes, you've misinterpreted. When the spec says that a query "shall
not" do such-and-such, it means that a conforming client isn't allowed
to do that, it does NOT mean that the server is required to produce an
error when it sees it.

Chapter and verse on this is given in the Framework doc, at 6.3.3.2:
 In the Syntax Rules, the term "shall" defines conditions that are required to be true of syntactically conforming SQL
language.When such conditions depend on the contents of one or more schemas, they are required to be true just before
theactions specified by the General Rules are performed. The treatment of language that does not conform to the SQL
Formatsand Syntax Rules is implementation-dependent.  If any condition required by Syntax Rules is not satisfied when
theevaluation of Access or General Rules is attempted and the implementation is neither processing non-conforming SQL
languagenor processing conforming SQL language in a non-conforming manner, then an exception condition is raised:
"syntaxerror or access rule violation".
 

Including an aggregate violates a "shall" in a syntax rule, therefore the
query is non-conforming, therefore the server can either process it in an
implementation-dependent manner or reject it with an exception.
>> Yeah, though the standard's use of DISTINCT in this way is something>> of a violation of the POLA.
Jeff> I agree that's kind of a funny requirement. But that's prettyJeff> typical of the SQL standard. If DB2 or SQL
Serverfollow theJeff> standard here, we should, too. If not, it's open for discussion.
 

MSSQL does not:

"The following items are not allowed in the CTE_query_definition of arecursive member:
   * SELECT DISTINCT   * GROUP BY   * HAVING   * Scalar aggregation   * TOP   * LEFT, RIGHT, OUTER JOIN (INNER JOIN is
allowed)  * Subqueries   * A hint applied to a recursive reference to a CTE inside a     CTE_query_definition.
 
"

For DB2 the docs do not appear to specify either way; they don't seem to
forbid the use of SELECT DISTINCT inside a recursive CTE, but neither do
they seem to mention any unusual effect of including it.
Jeff> * ORDER BY, LIMIT, and OFFSET are rejected for recursiveJeff> queries. The standard does not seem to say that
theseshould beJeff> rejected.>> Note that supporting those in subqueries (including CTEs) is a>> separate optional
featureof the standard.
 
Jeff> I don't feel strongly about this either way, but I prefer that weJeff> are consistent when possible. We do
supportthese things in aJeff> subquery, so shouldn't we support them in all subqueries?
 

Ideally we should. DB2 appears to (other than OFFSET which it doesn't
seem to have at all). But it's not at all clear that it would be either
useful or easy to do so.

-- 
Andrew.


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Jeff Davis
Дата:
On Mon, 2008-09-08 at 22:53 +0100, Andrew Gierth wrote:
> Yes, you've misinterpreted. When the spec says that a query "shall
> not" do such-and-such, it means that a conforming client isn't allowed
> to do that, it does NOT mean that the server is required to produce an
> error when it sees it.
> 

Interesting. Thanks for the clear reference.

So, we either need to reject it or define some implementation-dependent
behavior, right?

The patch currently rejects HAVING, which means that it's a little
difficult to use an aggregate effectively. I lean toward rejecting
aggregates if we reject HAVING, for consistency. If we allow it, we
should allow HAVING as well.

>  Jeff> I agree that's kind of a funny requirement. But that's pretty
>  Jeff> typical of the SQL standard. If DB2 or SQL Server follow the
>  Jeff> standard here, we should, too. If not, it's open for discussion.
> 
> MSSQL does not:

Thanks again for a reference.

If MSSQL rejects SELECT DISTINCT for a recursive <query expression>,
then I think we should reject it. Right now the patch allows it but
provides a result that is inconsistent with the standard.

If we reject SELECT DISTINCT for recursive queries now, we can always
meet the standard later, or decide that the standard behavior is too
likely to cause confusion and just continue to block it.

> Ideally we should. DB2 appears to (other than OFFSET which it doesn't
> seem to have at all). But it's not at all clear that it would be either
> useful or easy to do so.

Ok. If it's easy to support it should probably be done.

As an aside, you seem to be an expert with the standard, have you had a
chance to look at the question I asked earlier?:

http://archives.postgresql.org/pgsql-hackers/2008-09/msg00487.php

Regards,Jeff Davis



Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tatsuo Ishii
Дата:
Thanks for the review.

[I dropped y-asaba@sraoss.co.jp from the Cc list since he has left our
company and the email address is being deleted.]

I'm going to look into issues which are seem to be bug (of course if
you know what to fix, patches are always welcome:-).

> These are my initial comments about the Common Table Expressions (CTE)
> patch, also known as WITH [RECURSIVE]. These comments are based on the
> patch here:
> 
> http://archives.postgresql.org/pgsql-patches/2008-08/msg00021.php
> 
> This is a semantically complex feature, and the standard is fairly
> complex as well. So I'm approaching this by making my own
> interpretations from the standard first (I included my interpretations
> and section references at the end of this email) and comparing to the
> behavior of the patch.
> 
> The following examples may be inconsistent with the standard. Some
> have already been mentioned, and I don't think they all need to be
> fixed for 8.4, but I mention them here for completeness.
> 
> * Mutual Recursion:
> 
>   with recursive
>     foo(i) as (values(1) union all select i+1 from bar where i < 10),
>     bar(i) as (values(1) union all select i+1 from foo where i < 10)
>   select * from foo;
>   ERROR:  mutual recursive call is not supported
> 
>   The standard allows mutual recursion.

The discussion seems to agree that let it leave for post 8.4.

> * Single Evaluation:
> 
>   with
>     foo(i) as (select random() as i)
>   select * from foo union all select * from foo;
>            i
>   -------------------
>    0.233165248762816
>     0.62126633618027
>   (2 rows)
> 
>   The standard specifies that non-recursive WITH should be evaluated
>   once.

What shall we do? I don't think there's a easy way to fix this. Maybe
we should not allow WITH clause without RECURISVE?

> * RHS only:
> 
>   with recursive
>     foo(i) as (select i+1 from foo where i < 10 union all values(1))
>   select * from foo;
>   ERROR:  Left hand side of UNION ALL must be a non-recursive term in a
> recursive query
> 
>   The standard does not require that the recursive term be on the RHS.

The discussion seems to agree that let it leave for post 8.4.

> * UNION ALL only:
> 
>   with recursive
>     foo(i) as (values(1) union select i+1 from foo where i < 10)
>   select * from foo;
>   ERROR:  non-recursive term and recursive term must be combined with
> UNION ALL
> 
>   The standard seems to allow UNION ALL, UNION, INTERSECT, and EXCEPT
>   (when the recursive term is not on the RHS of the EXCEPT).

The discussion seems to agree that let it leave for post 8.4.

> * Binary recursion and subselect strangeness:
> 
>   with recursive foo(i) as
>     (values (1)
>     union all
>     select * from
>       (select i+1 from foo where i < 10
>       union all
>       select i+1 from foo where i < X) t)
>   select * from foo;
> 
>   Produces 10 rows of output regardless of what "X" is. This should be
> fixed for 8.4.
>   Also, this is non-linear recursion, which the standard seems to
> disallow.

I will try to fix this. However detecting the query being not a
non-linear one is not so easy.

> * Multiple recursive references:
> 
>   with recursive foo(i) as
>     (values (1)
>     union all
>     select i+1 from foo where i < 10
>     union all
>     select i+1 from foo where i < 20)
>   select * from foo;
>   ERROR:  Left hand side of UNION ALL must be a non-recursive term in a
> recursive query
> 
>   If we're going to allow non-linear recursion (which the standard
>   does not), this seems like it should be a valid case.

I will try to disallow this.

> * Strange result with except:
> 
>   with recursive foo(i) as
>     (values (1)
>     union all
>     select * from
>     (select i+1 from foo where i < 10
>     except
>     select i+1 from foo where i < 5) t)
>   select * from foo;
>   ERROR:  table "foo" has 0 columns available but 1 columns specified
> 
>   This query works if you replace "except" with "union". This should be
> fixed for 8.4.

I will try to fix this.

> * Aggregates allowed:
> 
>   with recursive foo(i) as
>     (values(1)
>     union all
>     select max(i)+1 from foo where i < 10)
>   select * from foo;
> 
>   Aggregates should be blocked according to the standard.
>   Also, causes an infinite loop. This should be fixed for 8.4.

I will try to fix this.

> * DISTINCT should supress duplicates:
> 
>   with recursive foo(i) as
>     (select distinct * from (values(1),(2)) t
>     union all
>     select distinct i+1 from foo where i < 10)
>   select * from foo;
> 
>   This outputs a lot of duplicates, but they should be supressed
> according to the standard. This query is essentially the same as
> supporting UNION for recursive queries, so we should either fix both for
> 8.4 or block both for consistency.

I'm not sure if it's possible to fix this. Will look into.

> * outer joins on a recursive reference should be blocked:
> 
>   with recursive foo(i) as
>     (values(1)
>     union all
>     select i+1 from foo left join (values(1)) t on (i=column1))
>   select * from foo;
> 
>   Causes an infinite loop, but the standard says using an outer join
>   in this situation should be prohibited. This should be fixed for 8.4.

Not an issue, I think.

> * ORDER BY, LIMIT, and OFFSET are rejected for recursive queries. The
> standard does not seem to say that these should be rejected.
> 
> 
> The following are my interpretations of relevant parts of the SQL
> standard (200n), and the associated sections. These are only my
> interpretation, so let me know if you interpret the standard
> differently.
> 
> Non-linear recursion forbidden:
>   7.13: Syntax Rules: 2.g.ii
>     My interpretation of 2.g.ii.2 is that WQN[k] and WQN[l] may be the
>     same <query name>.
>   7.13: Syntax Rules: 2.g.iv
> 
> EXCEPT can't be used for recursive queries if a recursive reference
> appears on the RHS:
>   7.13: Syntax Rules: 2.g.iii.1
> 
> INTERSECT ALL/EXCEPT ALL can't be used for recursive queries:
>   7.13: Syntax Rules: 2.g.iii.5
> 
> recursive references must appear in FROM clause:
>   7.13: Syntax Rules: 2.g.iii.3
>     My interpretation of this rule is that it does not allow a
>     recursive reference in a subquery in the targetlist or a subquery
>     in the where clause.
> 
> stratum defined:
>   7.13: Syntax Rules: 2.f
>   7.13: Syntax Rules: 2.g.i.1
> 
> recursive query must have anchor for every stratum:
>   7.13: Syntax Rules: 2.g.i.3
> 
> outer joins not allowed to join recursive references:
>   7.13: Syntax Rules: 2.g.iii.6
>   7.13: Syntax Rules: 2.g.iii.7
> 
> Aggregates/HAVING disallowed:
>   7.13: Syntax Rules: 2.g.iii.4.B
> 
> Mutual recursion defined:
>   7.13: Syntax Rules: 2.g.i.1
> 
> Evaluate each WITH entry once, even if it's referenced multiple times:
>   7.13: General Rules: 1
>   7.13: General Rules: 2.b
>   See also:
> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php
> 
> Evaluation order with mutual recursion:
>   7.13: General Rules: 2.a
>   7.13: General Rules: 2.b
> 
> Evaluation semantics:
>   7.13: General Rules: 2.c
> 
> DISTINCT:
>   7.13: General Rules: 2.c.iv
>   7.13: General Rules: 2.c.ix.3.A
> 
> I will provide comments about the code and documentation soon. This is a
> very useful feature.

Thanks. Enclosed is the latest patch to adopt CVS HEAD.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tatsuo Ishii
Дата:
Thanks for the review.

[I dropped y-asaba@sraoss.co.jp from the Cc list since he has left our
company and the email address is being deleted.]

I'm going to look into issues which are seem to be bug (of course if
you know what to fix, patches are always welcome:-).

> These are my initial comments about the Common Table Expressions (CTE)
> patch, also known as WITH [RECURSIVE]. These comments are based on the
> patch here:
> 
> http://archives.postgresql.org/pgsql-patches/2008-08/msg00021.php
> 
> This is a semantically complex feature, and the standard is fairly
> complex as well. So I'm approaching this by making my own
> interpretations from the standard first (I included my interpretations
> and section references at the end of this email) and comparing to the
> behavior of the patch.
> 
> The following examples may be inconsistent with the standard. Some
> have already been mentioned, and I don't think they all need to be
> fixed for 8.4, but I mention them here for completeness.
> 
> * Mutual Recursion:
> 
>   with recursive
>     foo(i) as (values(1) union all select i+1 from bar where i < 10),
>     bar(i) as (values(1) union all select i+1 from foo where i < 10)
>   select * from foo;
>   ERROR:  mutual recursive call is not supported
> 
>   The standard allows mutual recursion.

The discussion seems to agree that let it leave for post 8.4.

> * Single Evaluation:
> 
>   with
>     foo(i) as (select random() as i)
>   select * from foo union all select * from foo;
>            i
>   -------------------
>    0.233165248762816
>     0.62126633618027
>   (2 rows)
> 
>   The standard specifies that non-recursive WITH should be evaluated
>   once.

What shall we do? I don't think there's an easy way to fix this as Tom
suggested. Maybe we should not allow WITH clause without RECURISVE for
8.4?

> * RHS only:
> 
>   with recursive
>     foo(i) as (select i+1 from foo where i < 10 union all values(1))
>   select * from foo;
>   ERROR:  Left hand side of UNION ALL must be a non-recursive term in a
> recursive query
> 
>   The standard does not require that the recursive term be on the RHS.

The discussion seems to agree that let it leave for post 8.4.

> * UNION ALL only:
> 
>   with recursive
>     foo(i) as (values(1) union select i+1 from foo where i < 10)
>   select * from foo;
>   ERROR:  non-recursive term and recursive term must be combined with
> UNION ALL
> 
>   The standard seems to allow UNION ALL, UNION, INTERSECT, and EXCEPT
>   (when the recursive term is not on the RHS of the EXCEPT).

The discussion seems to agree that let it leave for post 8.4.

> * Binary recursion and subselect strangeness:
> 
>   with recursive foo(i) as
>     (values (1)
>     union all
>     select * from
>       (select i+1 from foo where i < 10
>       union all
>       select i+1 from foo where i < X) t)
>   select * from foo;
> 
>   Produces 10 rows of output regardless of what "X" is. This should be
> fixed for 8.4.
>   Also, this is non-linear recursion, which the standard seems to
> disallow.

I will try to fix this. However detecting the query being not a
non-linear one is not so easy.

> * Multiple recursive references:
> 
>   with recursive foo(i) as
>     (values (1)
>     union all
>     select i+1 from foo where i < 10
>     union all
>     select i+1 from foo where i < 20)
>   select * from foo;
>   ERROR:  Left hand side of UNION ALL must be a non-recursive term in a
> recursive query
> 
>   If we're going to allow non-linear recursion (which the standard
>   does not), this seems like it should be a valid case.

I will try to disallow this.

> * Strange result with except:
> 
>   with recursive foo(i) as
>     (values (1)
>     union all
>     select * from
>     (select i+1 from foo where i < 10
>     except
>     select i+1 from foo where i < 5) t)
>   select * from foo;
>   ERROR:  table "foo" has 0 columns available but 1 columns specified
> 
>   This query works if you replace "except" with "union". This should be
> fixed for 8.4.

I will try to fix this.

> * Aggregates allowed:
> 
>   with recursive foo(i) as
>     (values(1)
>     union all
>     select max(i)+1 from foo where i < 10)
>   select * from foo;
> 
>   Aggregates should be blocked according to the standard.
>   Also, causes an infinite loop. This should be fixed for 8.4.

I will try to fix this.

> * DISTINCT should supress duplicates:
> 
>   with recursive foo(i) as
>     (select distinct * from (values(1),(2)) t
>     union all
>     select distinct i+1 from foo where i < 10)
>   select * from foo;
> 
>   This outputs a lot of duplicates, but they should be supressed
> according to the standard. This query is essentially the same as
> supporting UNION for recursive queries, so we should either fix both for
> 8.4 or block both for consistency.

I'm not sure if it's possible to fix this. Will look into.

> * outer joins on a recursive reference should be blocked:
> 
>   with recursive foo(i) as
>     (values(1)
>     union all
>     select i+1 from foo left join (values(1)) t on (i=column1))
>   select * from foo;
> 
>   Causes an infinite loop, but the standard says using an outer join
>   in this situation should be prohibited. This should be fixed for 8.4.

Not an issue, I think.

> * ORDER BY, LIMIT, and OFFSET are rejected for recursive queries. The
> standard does not seem to say that these should be rejected.
> 
> 
> The following are my interpretations of relevant parts of the SQL
> standard (200n), and the associated sections. These are only my
> interpretation, so let me know if you interpret the standard
> differently.
> 
> Non-linear recursion forbidden:
>   7.13: Syntax Rules: 2.g.ii
>     My interpretation of 2.g.ii.2 is that WQN[k] and WQN[l] may be the
>     same <query name>.
>   7.13: Syntax Rules: 2.g.iv
> 
> EXCEPT can't be used for recursive queries if a recursive reference
> appears on the RHS:
>   7.13: Syntax Rules: 2.g.iii.1
> 
> INTERSECT ALL/EXCEPT ALL can't be used for recursive queries:
>   7.13: Syntax Rules: 2.g.iii.5
> 
> recursive references must appear in FROM clause:
>   7.13: Syntax Rules: 2.g.iii.3
>     My interpretation of this rule is that it does not allow a
>     recursive reference in a subquery in the targetlist or a subquery
>     in the where clause.
> 
> stratum defined:
>   7.13: Syntax Rules: 2.f
>   7.13: Syntax Rules: 2.g.i.1
> 
> recursive query must have anchor for every stratum:
>   7.13: Syntax Rules: 2.g.i.3
> 
> outer joins not allowed to join recursive references:
>   7.13: Syntax Rules: 2.g.iii.6
>   7.13: Syntax Rules: 2.g.iii.7
> 
> Aggregates/HAVING disallowed:
>   7.13: Syntax Rules: 2.g.iii.4.B
> 
> Mutual recursion defined:
>   7.13: Syntax Rules: 2.g.i.1
> 
> Evaluate each WITH entry once, even if it's referenced multiple times:
>   7.13: General Rules: 1
>   7.13: General Rules: 2.b
>   See also:
> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php
> 
> Evaluation order with mutual recursion:
>   7.13: General Rules: 2.a
>   7.13: General Rules: 2.b
> 
> Evaluation semantics:
>   7.13: General Rules: 2.c
> 
> DISTINCT:
>   7.13: General Rules: 2.c.iv
>   7.13: General Rules: 2.c.ix.3.A
> 
> I will provide comments about the code and documentation soon. This is a
> very useful feature.

Thanks. Enclosed is the latest patch to adopt CVS HEAD.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Re: Common Table Expressions (WITH RECURSIVE) patch

От
"Robert Haas"
Дата:
>> * Single Evaluation:
>>
>>   with
>>     foo(i) as (select random() as i)
>>   select * from foo union all select * from foo;
>>            i
>>   -------------------
>>    0.233165248762816
>>     0.62126633618027
>>   (2 rows)
>>
>>   The standard specifies that non-recursive WITH should be evaluated
>>   once.
>
> What shall we do? I don't think there's a easy way to fix this. Maybe
> we should not allow WITH clause without RECURISVE?

ISTM that kind of misses the point.  Even if it's WITH RECURSIVE
rather than simply WITH, one wouldn't expect multiple evaluations of
any non-recursive portion of the query.

...Robert


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Jeff Davis
Дата:
On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote:
> Thanks for the review.
> 
> >   The standard specifies that non-recursive WITH should be evaluated
> >   once.
> 
> What shall we do? I don't think there's a easy way to fix this. Maybe
> we should not allow WITH clause without RECURISVE?

My interpretation of 7.13: General Rules: 2.b is that it should be
single evaluation, even if RECURSIVE is present.

The previous discussion was here:

http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php

The important arguments in the thread seemed to be:

1. People will generally expect single evaluation, so might be
disappointed if they can't use this feature for that purpose.

2. It's a spec violation in the case of volatile functions.

3. "I think this is a "must fix" because of the point about volatile
functions --- changing it later will result in user-visible semantics
changes, so we have to get it right the first time."

I don't entirely agree with #3. It is user-visible, but only in the
sense that someone is depending on undocumented multiple-evaluation
behavior.

Tom Lane said that multiple evaluation is grounds for rejection:
http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php

Is there hope of correcting this before November?

> I will try to fix this. However detecting the query being not a
> non-linear one is not so easy.

If we don't allow mutual recursion, the only kind of non-linear
recursion that might exist would be multiple references to the same
recursive query name in a recursive query, is that correct?

> > * DISTINCT should supress duplicates:
> > 
> >   with recursive foo(i) as
> >     (select distinct * from (values(1),(2)) t
> >     union all
> >     select distinct i+1 from foo where i < 10)
> >   select * from foo;
> > 
> >   This outputs a lot of duplicates, but they should be supressed
> > according to the standard. This query is essentially the same as
> > supporting UNION for recursive queries, so we should either fix both for
> > 8.4 or block both for consistency.
> 
> I'm not sure if it's possible to fix this. Will look into.
> 

Can't we just reject queries with top-level DISTINCT, similar to how
UNION is rejected?

> > * outer joins on a recursive reference should be blocked:
> > 
> >   with recursive foo(i) as
> >     (values(1)
> >     union all
> >     select i+1 from foo left join (values(1)) t on (i=column1))
> >   select * from foo;
> > 
> >   Causes an infinite loop, but the standard says using an outer join
> >   in this situation should be prohibited. This should be fixed for 8.4.
> 
> Not an issue, I think.

Agreed, Andrew Gierth corrected me here.

Regards,Jeff Davis



Re: Common Table Expressions (WITH RECURSIVE) patch

От
"Robert Haas"
Дата:
> 3. "I think this is a "must fix" because of the point about volatile
> functions --- changing it later will result in user-visible semantics
> changes, so we have to get it right the first time."
>
> I don't entirely agree with #3. It is user-visible, but only in the
> sense that someone is depending on undocumented multiple-evaluation
> behavior.

What makes you think it's going to be undocumented?  Single versus
multiple evaluation is a keep aspect of this feature and certainly
needs to be documented one way or the other.  I can't understand why
we would introduce a standard syntax with non-standard behavior, but
if we do, it certainly had better be mentioned in the documentation.

I think that the most likely result of a CTE implementation that
doesn't guarantee single evaluation is that people simply won't use
it.  But anyone who does will expect that their queries will return
the same results in release N and release N+1, for all values of N.
The only way that an incompatible change of this type won't break
people's applications is if they're not using the feature in the first
place, in which case there is no point in committing it anyway.

I wonder if the whole approach to this patch is backward.  Instead of
worrying about how to implement WITH RECURSIVE, maybe it would be
better to implement a really solid, spec-compliant version of WITH,
and add the RECURSIVE functionality in a later patch/release.

...Robert


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tatsuo Ishii
Дата:
> On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote:
> > Thanks for the review.
> > 
> > >   The standard specifies that non-recursive WITH should be evaluated
> > >   once.
> > 
> > What shall we do? I don't think there's a easy way to fix this. Maybe
> > we should not allow WITH clause without RECURISVE?
> 
> My interpretation of 7.13: General Rules: 2.b is that it should be
> single evaluation, even if RECURSIVE is present.
> 
> The previous discussion was here:
> 
> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php
> 
> The important arguments in the thread seemed to be:
> 
> 1. People will generally expect single evaluation, so might be
> disappointed if they can't use this feature for that purpose.
> 
> 2. It's a spec violation in the case of volatile functions.
> 
> 3. "I think this is a "must fix" because of the point about volatile
> functions --- changing it later will result in user-visible semantics
> changes, so we have to get it right the first time."
> 
> I don't entirely agree with #3. It is user-visible, but only in the
> sense that someone is depending on undocumented multiple-evaluation
> behavior.
> 
> Tom Lane said that multiple evaluation is grounds for rejection:
> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php
> 
> Is there hope of correcting this before November?

According to Tom, to implement "single evaluation" we need to make big
infrastructure enhancement which is likely slip the schedule for 8.4
release which Tom does not want.

So as long as Tom and other people think that is a "must fix", there
seems no hope probably.

Anyway I will continue to work on existing patches...
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> > I will try to fix this. However detecting the query being not a
> > non-linear one is not so easy.
> 
> If we don't allow mutual recursion, the only kind of non-linear
> recursion that might exist would be multiple references to the same
> recursive query name in a recursive query, is that correct?
> 
> > > * DISTINCT should supress duplicates:
> > > 
> > >   with recursive foo(i) as
> > >     (select distinct * from (values(1),(2)) t
> > >     union all
> > >     select distinct i+1 from foo where i < 10)
> > >   select * from foo;
> > > 
> > >   This outputs a lot of duplicates, but they should be supressed
> > > according to the standard. This query is essentially the same as
> > > supporting UNION for recursive queries, so we should either fix both for
> > > 8.4 or block both for consistency.
> > 
> > I'm not sure if it's possible to fix this. Will look into.
> > 
> 
> Can't we just reject queries with top-level DISTINCT, similar to how
> UNION is rejected?
> 
> > > * outer joins on a recursive reference should be blocked:
> > > 
> > >   with recursive foo(i) as
> > >     (values(1)
> > >     union all
> > >     select i+1 from foo left join (values(1)) t on (i=column1))
> > >   select * from foo;
> > > 
> > >   Causes an infinite loop, but the standard says using an outer join
> > >   in this situation should be prohibited. This should be fixed for 8.4.
> > 
> > Not an issue, I think.
> 
> Agreed, Andrew Gierth corrected me here.
> 
> Regards,
>     Jeff Davis
> 
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers


Re: Common Table Expressions (WITH RECURSIVE) patch

От
"Pavel Stehule"
Дата:
Hello

2008/9/9 Tatsuo Ishii <ishii@postgresql.org>:
>> On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote:
>> > Thanks for the review.
>> >
>> > >   The standard specifies that non-recursive WITH should be evaluated
>> > >   once.
>> >
>> > What shall we do? I don't think there's a easy way to fix this. Maybe
>> > we should not allow WITH clause without RECURISVE?
>>
>> My interpretation of 7.13: General Rules: 2.b is that it should be
>> single evaluation, even if RECURSIVE is present.
>>
>> The previous discussion was here:
>>
>> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php
>>
>> The important arguments in the thread seemed to be:
>>
>> 1. People will generally expect single evaluation, so might be
>> disappointed if they can't use this feature for that purpose.
>>
>> 2. It's a spec violation in the case of volatile functions.
>>
>> 3. "I think this is a "must fix" because of the point about volatile
>> functions --- changing it later will result in user-visible semantics
>> changes, so we have to get it right the first time."
>>
>> I don't entirely agree with #3. It is user-visible, but only in the
>> sense that someone is depending on undocumented multiple-evaluation
>> behavior.
>>
>> Tom Lane said that multiple evaluation is grounds for rejection:
>> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php
>>
>> Is there hope of correcting this before November?
>
> According to Tom, to implement "single evaluation" we need to make big
> infrastructure enhancement which is likely slip the schedule for 8.4
> release which Tom does not want.

why? why don't use a materialisation?

>
> So as long as Tom and other people think that is a "must fix", there
> seems no hope probably.
>
> Anyway I will continue to work on existing patches...
> --

I would to see your patch in core early. I am working on grouping sets
and I cannot finish my patch before your patch will be commited.

Regards
Pavel Stehule

> Tatsuo Ishii
> SRA OSS, Inc. Japan
>
>> > I will try to fix this. However detecting the query being not a
>> > non-linear one is not so easy.
>>
>> If we don't allow mutual recursion, the only kind of non-linear
>> recursion that might exist would be multiple references to the same
>> recursive query name in a recursive query, is that correct?
>>
>> > > * DISTINCT should supress duplicates:
>> > >
>> > >   with recursive foo(i) as
>> > >     (select distinct * from (values(1),(2)) t
>> > >     union all
>> > >     select distinct i+1 from foo where i < 10)
>> > >   select * from foo;
>> > >
>> > >   This outputs a lot of duplicates, but they should be supressed
>> > > according to the standard. This query is essentially the same as
>> > > supporting UNION for recursive queries, so we should either fix both for
>> > > 8.4 or block both for consistency.
>> >
>> > I'm not sure if it's possible to fix this. Will look into.
>> >
>>
>> Can't we just reject queries with top-level DISTINCT, similar to how
>> UNION is rejected?
>>
>> > > * outer joins on a recursive reference should be blocked:
>> > >
>> > >   with recursive foo(i) as
>> > >     (values(1)
>> > >     union all
>> > >     select i+1 from foo left join (values(1)) t on (i=column1))
>> > >   select * from foo;
>> > >
>> > >   Causes an infinite loop, but the standard says using an outer join
>> > >   in this situation should be prohibited. This should be fixed for 8.4.
>> >
>> > Not an issue, I think.
>>
>> Agreed, Andrew Gierth corrected me here.
>>
>> Regards,
>>       Jeff Davis
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tatsuo Ishii
Дата:
> Hello
> 
> 2008/9/9 Tatsuo Ishii <ishii@postgresql.org>:
> >> On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote:
> >> > Thanks for the review.
> >> >
> >> > >   The standard specifies that non-recursive WITH should be evaluated
> >> > >   once.
> >> >
> >> > What shall we do? I don't think there's a easy way to fix this. Maybe
> >> > we should not allow WITH clause without RECURISVE?
> >>
> >> My interpretation of 7.13: General Rules: 2.b is that it should be
> >> single evaluation, even if RECURSIVE is present.
> >>
> >> The previous discussion was here:
> >>
> >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php
> >>
> >> The important arguments in the thread seemed to be:
> >>
> >> 1. People will generally expect single evaluation, so might be
> >> disappointed if they can't use this feature for that purpose.
> >>
> >> 2. It's a spec violation in the case of volatile functions.
> >>
> >> 3. "I think this is a "must fix" because of the point about volatile
> >> functions --- changing it later will result in user-visible semantics
> >> changes, so we have to get it right the first time."
> >>
> >> I don't entirely agree with #3. It is user-visible, but only in the
> >> sense that someone is depending on undocumented multiple-evaluation
> >> behavior.
> >>
> >> Tom Lane said that multiple evaluation is grounds for rejection:
> >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php
> >>
> >> Is there hope of correcting this before November?
> >
> > According to Tom, to implement "single evaluation" we need to make big
> > infrastructure enhancement which is likely slip the schedule for 8.4
> > release which Tom does not want.
> 
> why? why don't use a materialisation?

See:
http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php

> >
> > So as long as Tom and other people think that is a "must fix", there
> > seems no hope probably.
> >
> > Anyway I will continue to work on existing patches...
> > --
> 
> I would to see your patch in core early. I am working on grouping sets
> and I cannot finish my patch before your patch will be commited.
> 
> Regards
> Pavel Stehule
> 
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> >
> >> > I will try to fix this. However detecting the query being not a
> >> > non-linear one is not so easy.
> >>
> >> If we don't allow mutual recursion, the only kind of non-linear
> >> recursion that might exist would be multiple references to the same
> >> recursive query name in a recursive query, is that correct?
> >>
> >> > > * DISTINCT should supress duplicates:
> >> > >
> >> > >   with recursive foo(i) as
> >> > >     (select distinct * from (values(1),(2)) t
> >> > >     union all
> >> > >     select distinct i+1 from foo where i < 10)
> >> > >   select * from foo;
> >> > >
> >> > >   This outputs a lot of duplicates, but they should be supressed
> >> > > according to the standard. This query is essentially the same as
> >> > > supporting UNION for recursive queries, so we should either fix both for
> >> > > 8.4 or block both for consistency.
> >> >
> >> > I'm not sure if it's possible to fix this. Will look into.
> >> >
> >>
> >> Can't we just reject queries with top-level DISTINCT, similar to how
> >> UNION is rejected?
> >>
> >> > > * outer joins on a recursive reference should be blocked:
> >> > >
> >> > >   with recursive foo(i) as
> >> > >     (values(1)
> >> > >     union all
> >> > >     select i+1 from foo left join (values(1)) t on (i=column1))
> >> > >   select * from foo;
> >> > >
> >> > >   Causes an infinite loop, but the standard says using an outer join
> >> > >   in this situation should be prohibited. This should be fixed for 8.4.
> >> >
> >> > Not an issue, I think.
> >>
> >> Agreed, Andrew Gierth corrected me here.
> >>
> >> Regards,
> >>       Jeff Davis
> >>
> >>
> >> --
> >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-hackers
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
> >


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tatsuo Ishii
Дата:
> Hello
> 
> 2008/9/9 Tatsuo Ishii <ishii@postgresql.org>:
> >> On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote:
> >> > Thanks for the review.
> >> >
> >> > >   The standard specifies that non-recursive WITH should be evaluated
> >> > >   once.
> >> >
> >> > What shall we do? I don't think there's a easy way to fix this. Maybe
> >> > we should not allow WITH clause without RECURISVE?
> >>
> >> My interpretation of 7.13: General Rules: 2.b is that it should be
> >> single evaluation, even if RECURSIVE is present.
> >>
> >> The previous discussion was here:
> >>
> >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php
> >>
> >> The important arguments in the thread seemed to be:
> >>
> >> 1. People will generally expect single evaluation, so might be
> >> disappointed if they can't use this feature for that purpose.
> >>
> >> 2. It's a spec violation in the case of volatile functions.
> >>
> >> 3. "I think this is a "must fix" because of the point about volatile
> >> functions --- changing it later will result in user-visible semantics
> >> changes, so we have to get it right the first time."
> >>
> >> I don't entirely agree with #3. It is user-visible, but only in the
> >> sense that someone is depending on undocumented multiple-evaluation
> >> behavior.
> >>
> >> Tom Lane said that multiple evaluation is grounds for rejection:
> >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php
> >>
> >> Is there hope of correcting this before November?
> >
> > According to Tom, to implement "single evaluation" we need to make big
> > infrastructure enhancement which is likely slip the schedule for 8.4
> > release which Tom does not want.
> 
> why? why don't use a materialisation?

See:
http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php

> > So as long as Tom and other people think that is a "must fix", there
> > seems no hope probably.
> >
> > Anyway I will continue to work on existing patches...
> > --
> 
> I would to see your patch in core early. I am working on grouping sets
> and I cannot finish my patch before your patch will be commited.
> 
> Regards
> Pavel Stehule
> 
> > Tatsuo Ishii
> > SRA OSS, Inc. Japan
> >
> >> > I will try to fix this. However detecting the query being not a
> >> > non-linear one is not so easy.
> >>
> >> If we don't allow mutual recursion, the only kind of non-linear
> >> recursion that might exist would be multiple references to the same
> >> recursive query name in a recursive query, is that correct?
> >>
> >> > > * DISTINCT should supress duplicates:
> >> > >
> >> > >   with recursive foo(i) as
> >> > >     (select distinct * from (values(1),(2)) t
> >> > >     union all
> >> > >     select distinct i+1 from foo where i < 10)
> >> > >   select * from foo;
> >> > >
> >> > >   This outputs a lot of duplicates, but they should be supressed
> >> > > according to the standard. This query is essentially the same as
> >> > > supporting UNION for recursive queries, so we should either fix both for
> >> > > 8.4 or block both for consistency.
> >> >
> >> > I'm not sure if it's possible to fix this. Will look into.
> >> >
> >>
> >> Can't we just reject queries with top-level DISTINCT, similar to how
> >> UNION is rejected?
> >>
> >> > > * outer joins on a recursive reference should be blocked:
> >> > >
> >> > >   with recursive foo(i) as
> >> > >     (values(1)
> >> > >     union all
> >> > >     select i+1 from foo left join (values(1)) t on (i=column1))
> >> > >   select * from foo;
> >> > >
> >> > >   Causes an infinite loop, but the standard says using an outer join
> >> > >   in this situation should be prohibited. This should be fixed for 8.4.
> >> >
> >> > Not an issue, I think.
> >>
> >> Agreed, Andrew Gierth corrected me here.
> >>
> >> Regards,
> >>       Jeff Davis
> >>
> >>
> >> --
> >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> >> To make changes to your subscription:
> >> http://www.postgresql.org/mailpref/pgsql-hackers
> >
> > --
> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-hackers
> >


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tatsuo Ishii
Дата:
> > * Aggregates allowed:
> > 
> >   with recursive foo(i) as
> >     (values(1)
> >     union all
> >     select max(i)+1 from foo where i < 10)
> >   select * from foo;
> > 
> >   Aggregates should be blocked according to the standard.
> >   Also, causes an infinite loop. This should be fixed for 8.4.
> 
> I will try to fix this.

We already reject:
   select max(i) from foo where i < 10)

But max(i)+1 seems to slip the check. I looked into this I found the
patch tried to detect the case before analyzing(see
parser/parse_cte.c) which is not a right thing I think.

I think we could detect the case by adding more checking in
parseCheckAggregates():
/* * Check if there's aggregate function in a recursive term. */foreach(l, qry->rtable){    RangeTblEntry *rte =
(RangeTblEntry*) lfirst(l);
 
    if (qry->hasAggs && rte->rtekind == RTE_RECURSIVE &&        rte->self_reference)    {        ereport(ERROR,
      (errcode(ERRCODE_SYNTAX_ERROR),                 errmsg("aggregate functions in a recursive term not allowed")));
 }}
 

What do you think?
--
Tatsuo Ishii
SRA OSS, Inc. Japan


Re: Common Table Expressions (WITH RECURSIVE) patch

От
"Pavel Stehule"
Дата:
2008/9/9 Tatsuo Ishii <ishii@sraoss.co.jp>:
>> Hello
>>
>> 2008/9/9 Tatsuo Ishii <ishii@postgresql.org>:
>> >> On Tue, 2008-09-09 at 13:45 +0900, Tatsuo Ishii wrote:
>> >> > Thanks for the review.
>> >> >
>> >> > >   The standard specifies that non-recursive WITH should be evaluated
>> >> > >   once.
>> >> >
>> >> > What shall we do? I don't think there's a easy way to fix this. Maybe
>> >> > we should not allow WITH clause without RECURISVE?
>> >>
>> >> My interpretation of 7.13: General Rules: 2.b is that it should be
>> >> single evaluation, even if RECURSIVE is present.
>> >>
>> >> The previous discussion was here:
>> >>
>> >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php
>> >>

I am  blind, I didn't find any reason, why materialisation isn't useable.

Regards
Pavel

>> >> The important arguments in the thread seemed to be:
>> >>
>> >> 1. People will generally expect single evaluation, so might be
>> >> disappointed if they can't use this feature for that purpose.
>> >>
>> >> 2. It's a spec violation in the case of volatile functions.
>> >>
>> >> 3. "I think this is a "must fix" because of the point about volatile
>> >> functions --- changing it later will result in user-visible semantics
>> >> changes, so we have to get it right the first time."
>> >>
>> >> I don't entirely agree with #3. It is user-visible, but only in the
>> >> sense that someone is depending on undocumented multiple-evaluation
>> >> behavior.
>> >>
>> >> Tom Lane said that multiple evaluation is grounds for rejection:
>> >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01318.php
>> >>
>> >> Is there hope of correcting this before November?
>> >
>> > According to Tom, to implement "single evaluation" we need to make big
>> > infrastructure enhancement which is likely slip the schedule for 8.4
>> > release which Tom does not want.
>>
>> why? why don't use a materialisation?
>
> See:
> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php
>
>> >
>> > So as long as Tom and other people think that is a "must fix", there
>> > seems no hope probably.
>> >
>> > Anyway I will continue to work on existing patches...
>> > --
>>
>> I would to see your patch in core early. I am working on grouping sets
>> and I cannot finish my patch before your patch will be commited.
>>
>> Regards
>> Pavel Stehule
>>
>> > Tatsuo Ishii
>> > SRA OSS, Inc. Japan
>> >
>> >> > I will try to fix this. However detecting the query being not a
>> >> > non-linear one is not so easy.
>> >>
>> >> If we don't allow mutual recursion, the only kind of non-linear
>> >> recursion that might exist would be multiple references to the same
>> >> recursive query name in a recursive query, is that correct?
>> >>
>> >> > > * DISTINCT should supress duplicates:
>> >> > >
>> >> > >   with recursive foo(i) as
>> >> > >     (select distinct * from (values(1),(2)) t
>> >> > >     union all
>> >> > >     select distinct i+1 from foo where i < 10)
>> >> > >   select * from foo;
>> >> > >
>> >> > >   This outputs a lot of duplicates, but they should be supressed
>> >> > > according to the standard. This query is essentially the same as
>> >> > > supporting UNION for recursive queries, so we should either fix both for
>> >> > > 8.4 or block both for consistency.
>> >> >
>> >> > I'm not sure if it's possible to fix this. Will look into.
>> >> >
>> >>
>> >> Can't we just reject queries with top-level DISTINCT, similar to how
>> >> UNION is rejected?
>> >>
>> >> > > * outer joins on a recursive reference should be blocked:
>> >> > >
>> >> > >   with recursive foo(i) as
>> >> > >     (values(1)
>> >> > >     union all
>> >> > >     select i+1 from foo left join (values(1)) t on (i=column1))
>> >> > >   select * from foo;
>> >> > >
>> >> > >   Causes an infinite loop, but the standard says using an outer join
>> >> > >   in this situation should be prohibited. This should be fixed for 8.4.
>> >> >
>> >> > Not an issue, I think.
>> >>
>> >> Agreed, Andrew Gierth corrected me here.
>> >>
>> >> Regards,
>> >>       Jeff Davis
>> >>
>> >>
>> >> --
>> >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> >> To make changes to your subscription:
>> >> http://www.postgresql.org/mailpref/pgsql-hackers
>> >
>> > --
>> > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-hackers
>> >
>


Re: Common Table Expressions (WITH RECURSIVE) patch

От
"Robert Haas"
Дата:
>>> >> My interpretation of 7.13: General Rules: 2.b is that it should be
>>> >> single evaluation, even if RECURSIVE is present.
>>> >>
>>> >> The previous discussion was here:
>>> >>
>>> >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php
>
> I am  blind, I didn't find any reason, why materialisation isn't useable.

I believe it's because of these two (closely related) problems:

# The basic
# implementation clearly ought to be to dump the result of the subquery
# into a tuplestore and then have the upper level read out from that.
# However, we don't have any infrastructure for having multiple
# upper-level RTEs reference the same tuplestore.  (Perhaps the InitPlan
# infrastructure could be enhanced in that direction, but it's not ready
# for non-scalar outputs today.)  Also, I think we'd have to teach
# tuplestore how to support multiple readout cursors.  For example,
# consider
#     WITH foo AS (SELECT ...) SELECT ... FROM foo a, foo b WHERE ...
# If the planner chooses to do the join as a nested loop then each
# Scan node needs to keep track of its own place in the tuplestore,
# concurrently with the other node having a different place.

...Robert


Re: Common Table Expressions (WITH RECURSIVE) patch

От
"Pavel Stehule"
Дата:
2008/9/9 Robert Haas <robertmhaas@gmail.com>:
>>>> >> My interpretation of 7.13: General Rules: 2.b is that it should be
>>>> >> single evaluation, even if RECURSIVE is present.
>>>> >>
>>>> >> The previous discussion was here:
>>>> >>
>>>> >> http://archives.postgresql.org/pgsql-hackers/2008-07/msg01292.php
>>
>> I am  blind, I didn't find any reason, why materialisation isn't useable.
>
> I believe it's because of these two (closely related) problems:
>
> # The basic
> # implementation clearly ought to be to dump the result of the subquery
> # into a tuplestore and then have the upper level read out from that.
> # However, we don't have any infrastructure for having multiple
> # upper-level RTEs reference the same tuplestore.  (Perhaps the InitPlan
> # infrastructure could be enhanced in that direction, but it's not ready
> # for non-scalar outputs today.)  Also, I think we'd have to teach
> # tuplestore how to support multiple readout cursors.  For example,
> # consider
> #       WITH foo AS (SELECT ...) SELECT ... FROM foo a, foo b WHERE ...
> # If the planner chooses to do the join as a nested loop then each
> # Scan node needs to keep track of its own place in the tuplestore,
> # concurrently with the other node having a different place.
>

hmm. I solve similar problem in grouping sets :( etc

SELECT ... FROM ... GROUP BY GROUPING SETS (a,b)

is almost same as

With foo AS (SELECT ... FROM) SELECT ... FROM foo GROUP BY a UNION ALL
SELECT ... FROM foo GROUP BY b;

Regards
Pavel





> ...Robert
>


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Jeff Davis
Дата:
On Tue, 2008-09-09 at 18:51 +0200, Pavel Stehule wrote:
> hmm. I solve similar problem in grouping sets :( etc
> 

How did you solve it?

Regards,Jeff Davis



Re: Common Table Expressions (WITH RECURSIVE) patch

От
"Pavel Stehule"
Дата:
2008/9/9 Jeff Davis <pgsql@j-davis.com>:
> On Tue, 2008-09-09 at 18:51 +0200, Pavel Stehule wrote:
>> hmm. I solve similar problem in grouping sets :( etc
>>

I have special executor node - feeder, it hold one tuple and others
nodes read from this node. It's usable for hash aggregates.

Pavel

plan is like:

grouping sets
--> seq scan ...
--> hash aggg   --> feeder
--> hash agg  --> feeder



>
> How did you solve it?
>
> Regards,
>        Jeff Davis
>
>


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
"Robert Haas" <robertmhaas@gmail.com> writes:
>> I am  blind, I didn't find any reason, why materialisation isn't useable.

> I believe it's because of these two (closely related) problems:

> # The basic
> # implementation clearly ought to be to dump the result of the subquery
> # into a tuplestore and then have the upper level read out from that.
> # However, we don't have any infrastructure for having multiple
> # upper-level RTEs reference the same tuplestore.  (Perhaps the InitPlan
> # infrastructure could be enhanced in that direction, but it's not ready
> # for non-scalar outputs today.)  Also, I think we'd have to teach
> # tuplestore how to support multiple readout cursors.  For example,
> # consider
> #     WITH foo AS (SELECT ...) SELECT ... FROM foo a, foo b WHERE ...
> # If the planner chooses to do the join as a nested loop then each
> # Scan node needs to keep track of its own place in the tuplestore,
> # concurrently with the other node having a different place.

The amount of new code needed for that seems a pittance compared to the
size of the patch already, so I'm not seeing why Tatsuo-san considers
it infeasible.
        regards, tom lane


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Jeff Davis
Дата:
On Tue, 2008-09-09 at 09:47 -0400, Robert Haas wrote:
> > 3. "I think this is a "must fix" because of the point about volatile
> > functions --- changing it later will result in user-visible semantics
> > changes, so we have to get it right the first time."
> >
> > I don't entirely agree with #3. It is user-visible, but only in the
> > sense that someone is depending on undocumented multiple-evaluation
> > behavior.
> 
> What makes you think it's going to be undocumented?  Single versus
> multiple evaluation is a keep aspect of this feature and certainly
> needs to be documented one way or the other.  I can't understand why
> we would introduce a standard syntax with non-standard behavior, but
> if we do, it certainly had better be mentioned in the documentation.
> 

I meant that -- hypothetically if we did accept the feature as-is -- the
number of evaluations would be documented to be undefined, not N. That
would avoid the backwards-compatibility problem.

This one point is probably not worth discussing now, because argument
#1 and #2 stand on their own.

Regards,Jeff Davis



Re: Common Table Expressions (WITH RECURSIVE) patch

От
"Robert Haas"
Дата:
> I meant that -- hypothetically if we did accept the feature as-is -- the
> number of evaluations would be documented to be undefined, not N. That
> would avoid the backwards-compatibility problem.
>
> This one point is probably not worth discussing now, because argument
> #1 and #2 stand on their own.

Agreed.  Plus, both Tom and Pavel seem to think this is a relatively
solvable problem.

...Robert


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tatsuo Ishii
Дата:
> > * Single Evaluation:
> > 
> >   with
> >     foo(i) as (select random() as i)
> >   select * from foo union all select * from foo;
> >            i
> >   -------------------
> >    0.233165248762816
> >     0.62126633618027
> >   (2 rows)
> > 
> >   The standard specifies that non-recursive WITH should be evaluated
> >   once.
> 
> What shall we do? I don't think there's an easy way to fix this as Tom
> suggested. Maybe we should not allow WITH clause without RECURISVE for
> 8.4?

This is a still remaing issue...

> > * Binary recursion and subselect strangeness:
> > 
> >   with recursive foo(i) as
> >     (values (1)
> >     union all
> >     select * from
> >       (select i+1 from foo where i < 10
> >       union all
> >       select i+1 from foo where i < X) t)
> >   select * from foo;
> > 
> >   Produces 10 rows of output regardless of what "X" is. This should be
> > fixed for 8.4.
> >   Also, this is non-linear recursion, which the standard seems to
> > disallow.
> 
> I will try to fix this. However detecting the query being not a
> non-linear one is not so easy.

I have implemented rejection of non-linear recursion and now this type
of query will not be executed anyway.

> > * Multiple recursive references:
> > 
> >   with recursive foo(i) as
> >     (values (1)
> >     union all
> >     select i+1 from foo where i < 10
> >     union all
> >     select i+1 from foo where i < 20)
> >   select * from foo;
> >   ERROR:  Left hand side of UNION ALL must be a non-recursive term in a
> > recursive query
> > 
> >   If we're going to allow non-linear recursion (which the standard
> >   does not), this seems like it should be a valid case.
> 
> I will try to disallow this.

Non-linear recursion is not allowed now.

> > * Strange result with except:
> > 
> >   with recursive foo(i) as
> >     (values (1)
> >     union all
> >     select * from
> >     (select i+1 from foo where i < 10
> >     except
> >     select i+1 from foo where i < 5) t)
> >   select * from foo;
> >   ERROR:  table "foo" has 0 columns available but 1 columns specified
> > 
> >   This query works if you replace "except" with "union". This should be
> > fixed for 8.4.
> 
> I will try to fix this.

This is a non-linear recursion too and will not be executed anyway.

> > * Aggregates allowed:
> > 
> >   with recursive foo(i) as
> >     (values(1)
> >     union all
> >     select max(i)+1 from foo where i < 10)
> >   select * from foo;
> > 
> >   Aggregates should be blocked according to the standard.
> >   Also, causes an infinite loop. This should be fixed for 8.4.
> 
> I will try to fix this.

Fixed.

> > * DISTINCT should supress duplicates:
> > 
> >   with recursive foo(i) as
> >     (select distinct * from (values(1),(2)) t
> >     union all
> >     select distinct i+1 from foo where i < 10)
> >   select * from foo;
> > 
> >   This outputs a lot of duplicates, but they should be supressed
> > according to the standard. This query is essentially the same as
> > supporting UNION for recursive queries, so we should either fix both for
> > 8.4 or block both for consistency.
> 
> I'm not sure if it's possible to fix this. Will look into.

Ok, now this type of DISTINCT is not allowed.

Included is the latest patches against CVS HEAD.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

Re: Common Table Expressions (WITH RECURSIVE) patch

От
David Fetter
Дата:
On Mon, Sep 15, 2008 at 06:46:16PM +0900, Tatsuo Ishii wrote:
> > > * Single Evaluation:
> > > 
> > >   with
> > >     foo(i) as (select random() as i)
> > >   select * from foo union all select * from foo;
> > >            i
> > >   -------------------
> > >    0.233165248762816
> > >     0.62126633618027
> > >   (2 rows)
> > > 
> > >   The standard specifies that non-recursive WITH should be
> > >   evaluated once.
> > 
> > What shall we do? I don't think there's an easy way to fix this as
> > Tom suggested. Maybe we should not allow WITH clause without
> > RECURISVE for 8.4?
> 
> This is a still remaing issue...

I don't think that the spec explicitly forbids this.

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Jeff Davis
Дата:
On Mon, 2008-09-15 at 22:41 -0700, David Fetter wrote:
> On Mon, Sep 15, 2008 at 06:46:16PM +0900, Tatsuo Ishii wrote:
> > > > * Single Evaluation:
> > > > 
> > > >   with
> > > >     foo(i) as (select random() as i)
> > > >   select * from foo union all select * from foo;
> > > >            i
> > > >   -------------------
> > > >    0.233165248762816
> > > >     0.62126633618027
> > > >   (2 rows)
> > > > 
> > > >   The standard specifies that non-recursive WITH should be
> > > >   evaluated once.
> > > 
> > > What shall we do? I don't think there's an easy way to fix this as
> > > Tom suggested. Maybe we should not allow WITH clause without
> > > RECURISVE for 8.4?
> > 
> > This is a still remaing issue...
> 
> I don't think that the spec explicitly forbids this.
> 

This has been discussed before.

Regardless of the nuances of the spec (and whether it says so explicitly
or implicitly), there are people in the community that see single
evaluation as important, and important enough to be a showstopper.

Tom Lane has suggested that this is a reasonable amount of work to
complete, and minor in comparison to the rest of the patch.

I think the right approach is to try to complete it so that everyone is
happy. I will work on this, but unfortunately I don't have a lot of time
right now, so I can't make any promises.

The rest of the patch looks good so far (there are still a few things I
want to look at), so I think this is the biggest open issue.

Regards,Jeff Davis





Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
Tatsuo Ishii <ishii@postgresql.org> writes:
> Included is the latest patches against CVS HEAD.

I spent some time reading this patch.  Here are a few comments in
no particular order:

RangeRecursive node lacks copyfuncs/equalfuncs support.

Query.recursive is missed in equalfuncs.c.  But rather than fix that,
get rid of it entirely.  AFAICS the only use is in qual_is_pushdown_safe,
and what is the value of that test?  The callers know perfectly well
whether they are operating on a recursive RTE or not.  You might as well
just delete all the useless qual-pushdown-attempt code from
set_recursion_pathlist, and not need to touch qual_is_pushdown_safe
at all.

Is physical_tlist optimization sensible for RecursiveScan?  We seem
to use it for every other Scan node type.

I dislike putting state into ExecutorState; that makes it impossible
to have multiple recursion nodes in one plan tree.  It would probably
be better for the Recursion and RecursiveScan nodes to talk to each
other directly (compare HashJoin/Hash); although since they are not
adjacent in the plan tree I admit I'm not sure how to do that.

es_disallow_tuplestore doesn't seem to need to be in ExecutorState
at all, it could as well be in RecursionState.

I don't really like the way that Append nodes are being abused here.
It would be better to allow nodeRecursion.c to duplicate a little code
from nodeAppend.c, and have the child plans be direct children of
the Recursion node.  BTW, is it actually possible to have more than
two children?  I didn't spend enough time analyzing the restrictions
in parse_cte to be sure.  If there are always just two then you could
simplify the representation by treating it like a join node instead
of an append.  (The RTE_RECURSIVE representation sure makes it look
like there can be only two...)

Mark/restore support seems useless ... note the comment on
ExecSupportsMarkRestore (which should be updated if this code
isn't removed).

RecursiveScan claims to support backwards fetch, but does not in fact
contain code to do so.  (Given that it will always be underneath
Recursion, which can't do backwards fetch, I see little point in adding
such code; fix execAmi.c instead.)

ExecInitRecursion doesn't seem to be on the same page about whether
it supports backward scan as execAmi.c, either.

This comment in nodeRecursivescan.c seems bogus:/* * Do not initialize scan tuple type, result tuple type and *
projectioninfo in ExecInitRecursivescan. These types are * initialized after initializing Recursion node. */
 
because the code seems to be doing exactly what the comment says it
doesn't.

Numerous comments appear to have been copied-and-pasted and not modified
from the original.  Somebody will have to go over all that text.

ruleutils.c fails completely for non-recursive WITH.  It *must* regenerate
such a query with a WITH clause, not as a flattened subquery which is what
you seem to be doing.  This isn't negotiable because the semantics are
different.  This will mean at least some change in the parsetree
representation.  Perhaps we could add a bool to subquery RTEs to mark them
as coming from a nonrecursive WITH?  The tests added for RTE_RECURSIVE
seem a bit ugly too.  If it thinks that can't happen it should Assert so,
not just fall through silently.

commentary for ParseState.p_ctenamespace is gratuitously unlike the
comment style for the other fields, and p_recursive_namespace isn't
documented at all.

ParseState.p_in_with_clause is unused, should be removed.

The WithClause struct definition is poorly commented.  It should be
stated that it is used only pre-parse-analysis (assuming that continues
to be true after you get done fixing ruleutils.c...), and it doesn't
say what the elements of the subquery list are (specifically, what
node type).  A lot of the other added structs and fields could use
better commenting too.

For that matter "subquery" is a poor name for WithClause's list of CTEs,
especially so since it's hard to search for.  It should be a plural name
and I'd be inclined to use something like "ctes" not "subqueries".
The term "subquery" is too overloaded already, so any place you can
refer to a WITH-list member as a CTE you should do so.

WithClause node may need a location field, and almost certainly has to
be handled somehow in exprLocation().

The error reports in parse_cte.c *desperately* need error locations.

Why does transformWithClause do parse_sub_analyze twice?
I'm not sure that's even safe, and it's surely unnecessary.
Also, what happens if a subquery isn't a SelectStmt?  Silently
doing nothing doesn't seem like a good plan there.

Why are we putting essentially the same information into both
p_recursive_namespace and p_ctenamespace?  Is there really a need
for both lists?  The code added to transformFromClauseItem
seems quite wrong since it searches both lists even if it found a
match in the first one.  This whole area looks like it needs
refactoring.

Costing is all bogus, but we knew that...

Why does set_recursion_pathlist think that the subquery might have
useful pathkeys?  We know it must always be a UNION ALL, no?

PlanState.has_recursivescan seems like a complete kluge.  Can't it just be
removed?  It looks to me like it is working around bugs that hopefully aren't
there anymore.  There is certainly no reason why a recursive CTE should be
more in need of rescanning than any other kind of plan.  If it is needed then
the current implementation is completely broken anyway, since it would only
detect a RecursiveScan node that is directly underneath an agg or hash node.

Please pay some attention to keeping things in logical, consistent orders.
For instance the withClause field was inserted into _copySelectStmt()
in a different place from where it was inserted in the actual struct
declaration, which is confusing.

parseTypeString() ought to check for null withClause.

expression_tree_walker/mutator support seems entirely broken for
RTE_RECURSIVE RTEs.  Shouldn't it be recursing into the subquery?

Missed adding non_recursive_query to the "zap unneeded substructure" part
of set_plan_references (assuming it really is unneeded).

There seem to be quite a number of places where RTE_SUBQUERY RTEs
are handled but the patch fails to add RTE_RECURSIVE handling ...

It's a really bad idea to use RTE subquery field over again for
RTE_RECURSIVE, especially without any comment saying you did that.
I would suggest two pointers in the RTE_RECURSIVE field list instead.

Do we really have to make RECURSIVE a fully reserved keyword?
(Actually, the patch makes it worse than reserved, by failing to
add it to the reserved_keywords list.)

checkCteTargetList is completely broken: it will only notice illegal
sublinks that are at the very top level of a targetlist expression.
checkWhereClause is very far short of adequate as well.  Need to recurse
here, or find some other way.  Given that the subexpressions haven't been
analyzed yet, this seems a bit messy --- expression_tree_walker doesn't
know about pre-analysis node trees, so you can't use it.  I'd suggest
replacing this whole set of routines with just one recursive routine that
doesn't make pre-assumptions about which node types can be found where.
Alternatively, is there any way of delaying the validity checks until
*after* parse analysis of the expressions, so that you could use
expression_tree_walker et al?

BTW, it seems like a lot of the logic there could be simplified by depending
on the enum ordering RECURSIVE_OTHER > RECURSIVE_SELF > NON_RECURSIVE.
There are a number of places that are taking the larger of two values
in baroque, hard-to-follow ways.

I wonder if checkCteSelectStmt is detecting nonlinearity correctly.
Since RECURSIVE_OTHER dominates RECURSIVE_SELF, couldn't it fail to
miss the problem in something like (self union (self union other)) ?
Maybe what you really need is a bitmask:
NON_RECURSIVE = 0,
RECURSIVE_SELF = 1,
RECURSIVE_OTHER = 2,
RECURSIVE_BOTH = 3    /* the OR of RECURSIVE_SELF and RECURSIVE_OTHER */
and then you can merge two values via OR instead of MAX.
        regards, tom lane


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
Jeff Davis <pgsql@j-davis.com> writes:
> I think the right approach is to try to complete it so that everyone is
> happy. I will work on this, but unfortunately I don't have a lot of time
> right now, so I can't make any promises.

I think there are two significant bits there:

* Fixing the parsetree representation so that the distinction between
a CTE and an RTE that references the CTE is preserved.

* Implementing some kind of "multiple readout tuplestore" to permit
multiple RTEs to scan a CTE plan without causing any row to be
evaluated more than once.

The first of these seems relatively straightforward: the WITH clause
has to be preserved explicitly in the Query representation, and RTEs
for CTEs should just carry indexes into the WITH list, not copies of
the subqueries.  (Hm, we might need both an index and a levelsup
counter, to deal with nested queries...)  This would solve ruleutils'
problem, too.

I haven't thought much about the multiple readout tuplestore though.
Anyone have a clear idea how to do that?  In particular, can the
existing tuplestore code be enhanced to do it (without sacrificing
performance in the simple case), or do we need something new?
        regards, tom lane


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Greg Stark
Дата:
I had either on that a while back. I think the existing tuplestore can  
be made to work without bad performance penaltiesbin the usual case.  
The trick was to do it without making the code unreadable.

I can send the code I had but I doubt you want to use it as-is.

-- 
greg



Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
Greg Stark <greg.stark@enterprisedb.com> writes:
> I had either on that a while back. I think the existing tuplestore can  
> be made to work without bad performance penaltiesbin the usual case.  
> The trick was to do it without making the code unreadable.

> I can send the code I had but I doubt you want to use it as-is.

Sure.  Maybe someone else will see a way to make it more readable.
        regards, tom lane


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tatsuo Ishii
Дата:
> Do we really have to make RECURSIVE a fully reserved keyword?

According to the standard, RECURSIVE is a reserved keyword, I believe.

> (Actually, the patch makes it worse than reserved, by failing to
> add it to the reserved_keywords list.)

Yes, it's a bug. Will fix...
--
Tatsuo Ishii
SRA OSS, Inc. Japan


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
Tatsuo Ishii <ishii@postgresql.org> writes:
>> Do we really have to make RECURSIVE a fully reserved keyword?

> According to the standard, RECURSIVE is a reserved keyword, I believe.

Sure, but our general rule is to make keywords no more reserved than
is absolutely necessary to make the bison grammar unambiguous.  I
haven't tested, but I'm thinking that if WITH is fully reserved then
RECURSIVE shouldn't have to be.
        regards, tom lane


Re: Common Table Expressions (WITH RECURSIVE) patch

От
"Pavel Stehule"
Дата:
2008/9/17 Tom Lane <tgl@sss.pgh.pa.us>:
> Tatsuo Ishii <ishii@postgresql.org> writes:
>>> Do we really have to make RECURSIVE a fully reserved keyword?
>
>> According to the standard, RECURSIVE is a reserved keyword, I believe.
>
> Sure, but our general rule is to make keywords no more reserved than
> is absolutely necessary to make the bison grammar unambiguous.  I
> haven't tested, but I'm thinking that if WITH is fully reserved then
> RECURSIVE shouldn't have to be.

I am not sure, if these rule is good. Somebody who develop on
postgresql should have a problems when they will be port to other
databases in future. Reserved words in standards should be respected.

regards
Pavel Stehule

>
>                        regards, tom lane
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: Common Table Expressions (WITH RECURSIVE) patch

От
"Robert Haas"
Дата:
> I am not sure, if these rule is good. Somebody who develop on
> postgresql should have a problems when they will be port to other
> databases in future. Reserved words in standards should be respected.

I disagree.  I have never ported an app written for PostgreSQL to
another database system, and have no plans to start.  The fact that
some other database system might barf on a particular bit of SQL is
insufficient reason for PostgreSQL to do the same thing.

If people want to write code that will work on multiple databases,
they should of course avoid using any SQL reserved words for anything
other than their reserved purposes.  But there is no reason for the
database system to unilaterally shove that down everyone's throat.  It
is very easy to overdo the idea of protecting users from themselves.

...Robert


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
"Robert Haas" <robertmhaas@gmail.com> writes:
>> I am not sure, if these rule is good. Somebody who develop on
>> postgresql should have a problems when they will be port to other
>> databases in future. Reserved words in standards should be respected.

> I disagree.  I have never ported an app written for PostgreSQL to
> another database system, and have no plans to start.  The fact that
> some other database system might barf on a particular bit of SQL is
> insufficient reason for PostgreSQL to do the same thing.

> If people want to write code that will work on multiple databases,
> they should of course avoid using any SQL reserved words for anything
> other than their reserved purposes.

More than that, they have to actually test their SQL on each target DB.
Every DB (including us) is going to have some reserved words that are
not in the standard; so imagining that Postgres can all by itself
protect you from this type of problem is doomed to failure anyway.
        regards, tom lane


Re: Common Table Expressions (WITH RECURSIVE) patch

От
"Kevin Grittner"
Дата:
>>> Tom Lane <tgl@sss.pgh.pa.us> wrote: 
> "Robert Haas" <robertmhaas@gmail.com> writes:
>>> I am not sure, if these rule is good. Somebody who develop on
>>> postgresql should have a problems when they will be port to other
>>> databases in future. Reserved words in standards should be
respected.
> 
>> If people want to write code that will work on multiple databases,
>> they should of course avoid using any SQL reserved words for
anything
>> other than their reserved purposes.
> 
> More than that, they have to actually test their SQL on each target
DB.
> Every DB (including us) is going to have some reserved words that
are
> not in the standard; so imagining that Postgres can all by itself
> protect you from this type of problem is doomed to failure anyway.
If someone wants portable code, they can use a development tool which
wraps ALL identifiers in quotes, every time.  That's what we do.  The
important thing is that, to the extent practicable, standard SQL code
is accepted and behaves in compliance with the standard.  I don't see
that it does anything to compromise that if you support additional,
non-standard syntax for extensions.
-Kevin


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tatsuo Ishii
Дата:
> Is physical_tlist optimization sensible for RecursiveScan?  We seem
> to use it for every other Scan node type.

To enable physical_tlist optimization, it seems build_physical_tlist,
use_physical_tlist and disuse_physical_tlist need to be
changed. build_physical_tlist and use_physical_tlist have been already
patched and only disuse_physical_tlist needs to be patched. Any other
place I miss to enable the optimization?
--
Tatsuo Ishii
SRA OSS, Inc. Japan


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
Tatsuo Ishii <ishii@postgresql.org> writes:
>> Is physical_tlist optimization sensible for RecursiveScan?  We seem
>> to use it for every other Scan node type.

> To enable physical_tlist optimization, it seems build_physical_tlist,
> use_physical_tlist and disuse_physical_tlist need to be
> changed. build_physical_tlist and use_physical_tlist have been already
> patched and only disuse_physical_tlist needs to be patched. Any other
> place I miss to enable the optimization?

IIRC, the comment for build_physical_tlist hadn't been patched, but
yeah that seems like about it.
        regards, tom lane


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tatsuo Ishii
Дата:
> > To enable physical_tlist optimization, it seems build_physical_tlist,
> > use_physical_tlist and disuse_physical_tlist need to be
> > changed. build_physical_tlist and use_physical_tlist have been already
> > patched and only disuse_physical_tlist needs to be patched. Any other
> > place I miss to enable the optimization?
> 
> IIRC, the comment for build_physical_tlist hadn't been patched, but
> yeah that seems like about it.

Yeah, I need to fix sloppy comments in the existing patches all over
the places:-)
--
Tatsuo Ishii
SRA OSS, Inc. Japan


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tatsuo Ishii
Дата:
Tom, thanks for the review.

Here is an in-progress report. Patches against CVS HEAD attached.
(uncommented items are work-in-progress).
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> Tatsuo Ishii <ishii@postgresql.org> writes:
> > Included is the latest patches against CVS HEAD.
> 
> I spent some time reading this patch.  Here are a few comments in
> no particular order:
> 
> RangeRecursive node lacks copyfuncs/equalfuncs support.

Functions added.
> Query.recursive is missed in equalfuncs.c.  But rather than fix that,
> get rid of it entirely.  AFAICS the only use is in qual_is_pushdown_safe,
> and what is the value of that test?  The callers know perfectly well
> whether they are operating on a recursive RTE or not.  You might as well
> just delete all the useless qual-pushdown-attempt code from
> set_recursion_pathlist, and not need to touch qual_is_pushdown_safe
> at all.

Query.recursive removed and qual_is_pushdown_safe is untouched.

> Is physical_tlist optimization sensible for RecursiveScan?  We seem
> to use it for every other Scan node type.

Fixed and physical_tlist optimization is enabled for RecursiveScan I
believe.

> I dislike putting state into ExecutorState; that makes it impossible
> to have multiple recursion nodes in one plan tree.  It would probably
> be better for the Recursion and RecursiveScan nodes to talk to each
> other directly (compare HashJoin/Hash); although since they are not
> adjacent in the plan tree I admit I'm not sure how to do that.
> 
> es_disallow_tuplestore doesn't seem to need to be in ExecutorState
> at all, it could as well be in RecursionState.

It was for a workaround to avoid an infinit recursion in some
cases. Discussion came to the conclusion that it's user's
responsibilty to avoid such that case (otherwise the semantics of our
recursive query becomes to be different from the one defined in the
standard) I believe.

es_disallow_tuplestore removed.

> I don't really like the way that Append nodes are being abused here.
> It would be better to allow nodeRecursion.c to duplicate a little code
> from nodeAppend.c, and have the child plans be direct children of
> the Recursion node.  BTW, is it actually possible to have more than
> two children?  I didn't spend enough time analyzing the restrictions
> in parse_cte to be sure.  If there are always just two then you could
> simplify the representation by treating it like a join node instead
> of an append.  (The RTE_RECURSIVE representation sure makes it look
> like there can be only two...)
> 
> Mark/restore support seems useless ... note the comment on
> ExecSupportsMarkRestore (which should be updated if this code
> isn't removed).
> 
> RecursiveScan claims to support backwards fetch, but does not in fact
> contain code to do so.  (Given that it will always be underneath
> Recursion, which can't do backwards fetch, I see little point in adding
> such code; fix execAmi.c instead.)
> 
> ExecInitRecursion doesn't seem to be on the same page about whether
> it supports backward scan as execAmi.c, either.
> 
> This comment in nodeRecursivescan.c seems bogus:
>     /*
>      * Do not initialize scan tuple type, result tuple type and
>      * projection info in ExecInitRecursivescan. These types are
>      * initialized after initializing Recursion node.
>      */
> because the code seems to be doing exactly what the comment says it
> doesn't.
> 
> Numerous comments appear to have been copied-and-pasted and not modified
> from the original.  Somebody will have to go over all that text.
> 
> ruleutils.c fails completely for non-recursive WITH.  It *must* regenerate
> such a query with a WITH clause, not as a flattened subquery which is what
> you seem to be doing.  This isn't negotiable because the semantics are
> different.  This will mean at least some change in the parsetree
> representation.  Perhaps we could add a bool to subquery RTEs to mark them
> as coming from a nonrecursive WITH?  The tests added for RTE_RECURSIVE
> seem a bit ugly too.  If it thinks that can't happen it should Assert so,
> not just fall through silently.
> 
> commentary for ParseState.p_ctenamespace is gratuitously unlike the
> comment style for the other fields, and p_recursive_namespace isn't
> documented at all.
> 
> ParseState.p_in_with_clause is unused, should be removed.

Done.

> The WithClause struct definition is poorly commented.  It should be
> stated that it is used only pre-parse-analysis (assuming that continues
> to be true after you get done fixing ruleutils.c...), and it doesn't
> say what the elements of the subquery list are (specifically, what
> node type).  A lot of the other added structs and fields could use
> better commenting too.
> 
> For that matter "subquery" is a poor name for WithClause's list of CTEs,
> especially so since it's hard to search for.  It should be a plural name
> and I'd be inclined to use something like "ctes" not "subqueries".
> The term "subquery" is too overloaded already, so any place you can
> refer to a WITH-list member as a CTE you should do so.
> 
> WithClause node may need a location field, and almost certainly has to
> be handled somehow in exprLocation().
> 
> The error reports in parse_cte.c *desperately* need error locations.
> 
> Why does transformWithClause do parse_sub_analyze twice?
> I'm not sure that's even safe, and it's surely unnecessary.
> Also, what happens if a subquery isn't a SelectStmt?  Silently
> doing nothing doesn't seem like a good plan there.
> 
> Why are we putting essentially the same information into both
> p_recursive_namespace and p_ctenamespace?  Is there really a need
> for both lists?  The code added to transformFromClauseItem
> seems quite wrong since it searches both lists even if it found a
> match in the first one.  This whole area looks like it needs
> refactoring.
> 
> Costing is all bogus, but we knew that...
> 
> Why does set_recursion_pathlist think that the subquery might have
> useful pathkeys?  We know it must always be a UNION ALL, no?
> 
> PlanState.has_recursivescan seems like a complete kluge.  Can't it just be
> removed?  It looks to me like it is working around bugs that hopefully aren't
> there anymore.  There is certainly no reason why a recursive CTE should be
> more in need of rescanning than any other kind of plan.  If it is needed then
> the current implementation is completely broken anyway, since it would only
> detect a RecursiveScan node that is directly underneath an agg or hash node.
> 
> Please pay some attention to keeping things in logical, consistent orders.
> For instance the withClause field was inserted into _copySelectStmt()
> in a different place from where it was inserted in the actual struct
> declaration, which is confusing.
> 
> parseTypeString() ought to check for null withClause.

Done.

> expression_tree_walker/mutator support seems entirely broken for
> RTE_RECURSIVE RTEs.  Shouldn't it be recursing into the subquery?
> 
> Missed adding non_recursive_query to the "zap unneeded substructure" part
> of set_plan_references (assuming it really is unneeded).
> 
> There seem to be quite a number of places where RTE_SUBQUERY RTEs
> are handled but the patch fails to add RTE_RECURSIVE handling ...
> 
> It's a really bad idea to use RTE subquery field over again for
> RTE_RECURSIVE, especially without any comment saying you did that.
> I would suggest two pointers in the RTE_RECURSIVE field list instead.
> 
> Do we really have to make RECURSIVE a fully reserved keyword?
> (Actually, the patch makes it worse than reserved, by failing to
> add it to the reserved_keywords list.)

Changed to unreserved keyword.

> checkCteTargetList is completely broken: it will only notice illegal
> sublinks that are at the very top level of a targetlist expression.
> checkWhereClause is very far short of adequate as well.  Need to recurse
> here, or find some other way.  Given that the subexpressions haven't been
> analyzed yet, this seems a bit messy --- expression_tree_walker doesn't
> know about pre-analysis node trees, so you can't use it.  I'd suggest
> replacing this whole set of routines with just one recursive routine that
> doesn't make pre-assumptions about which node types can be found where.
> Alternatively, is there any way of delaying the validity checks until
> *after* parse analysis of the expressions, so that you could use
> expression_tree_walker et al?
> 
> BTW, it seems like a lot of the logic there could be simplified by depending
> on the enum ordering RECURSIVE_OTHER > RECURSIVE_SELF > NON_RECURSIVE.
> There are a number of places that are taking the larger of two values
> in baroque, hard-to-follow ways.
> 
> I wonder if checkCteSelectStmt is detecting nonlinearity correctly.
> Since RECURSIVE_OTHER dominates RECURSIVE_SELF, couldn't it fail to
> miss the problem in something like (self union (self union other)) ?
> Maybe what you really need is a bitmask:
> NON_RECURSIVE = 0,
> RECURSIVE_SELF = 1,
> RECURSIVE_OTHER = 2,
> RECURSIVE_BOTH = 3    /* the OR of RECURSIVE_SELF and RECURSIVE_OTHER */
> and then you can merge two values via OR instead of MAX.
> 
>             regards, tom lane

Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tatsuo Ishii
Дата:
> Why does set_recursion_pathlist think that the subquery might have
> useful pathkeys?  We know it must always be a UNION ALL, no?

Right. But someday we might implement "UNION" (without ALL) then we
have useful pathkeys...

Or shall I completely remove the step to generate patheys and do not
pass pathkeys to create_recursion_path?
pathkeys = convert_subquery_pathkeys(root, rel, subroot->query_pathkeys);
--
Tatsuo Ishii
SRA OSS, Inc. Japan


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
Tatsuo Ishii <ishii@postgresql.org> writes:
>> Why does set_recursion_pathlist think that the subquery might have
>> useful pathkeys?  We know it must always be a UNION ALL, no?

> Right. But someday we might implement "UNION" (without ALL) then we
> have useful pathkeys...

Good point.  Might as well leave it in.
        regards, tom lane


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tatsuo Ishii
Дата:
> PlanState.has_recursivescan seems like a complete kluge.  Can't it just be
> removed?  It looks to me like it is working around bugs that hopefully aren't
> there anymore.  There is certainly no reason why a recursive CTE should be
> more in need of rescanning than any other kind of plan.

I don't think so. Recursion plan needs the hash table used by sublan
be re-created at each recursion loop stage. Remember that in each
evaluation of recursive plan, the recursive name is replaced by a
working table which is holding previous evalution result of recursion
stage. Thus the hash table corresponding to the work table needs to
be re-created.

> If it is needed then
> the current implementation is completely broken anyway, since it would only
> detect a RecursiveScan node that is directly underneath an agg or hash node.

Yeah, that's right. What I have in my mind is to implement something
similar to UpdateChangedParamSet family like mechanism which will
inherit working table change event to child node.
--
Tatsuo Ishii
SRA OSS, Inc. Japan


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
Tatsuo Ishii <ishii@postgresql.org> writes:
>> PlanState.has_recursivescan seems like a complete kluge.  Can't it just be
>> removed?  It looks to me like it is working around bugs that hopefully aren't
>> there anymore.  There is certainly no reason why a recursive CTE should be
>> more in need of rescanning than any other kind of plan.

> I don't think so. Recursion plan needs the hash table used by sublan
> be re-created at each recursion loop stage. Remember that in each
> evaluation of recursive plan, the recursive name is replaced by a
> working table which is holding previous evalution result of recursion
> stage. Thus the hash table corresponding to the work table needs to
> be re-created.

Oh, I see.  I keep getting confused about whether RecursiveScan is at the
top or the bottom of the recursion plan tree :-(.  Maybe it would help
to use a different name for it?  RecursionInjector or something like
that?

>> If it is needed then
>> the current implementation is completely broken anyway, since it would only
>> detect a RecursiveScan node that is directly underneath an agg or hash node.

> Yeah, that's right. What I have in my mind is to implement something
> similar to UpdateChangedParamSet family like mechanism which will
> inherit working table change event to child node.

I think it could actually *be* UpdateChangedParamSet, if you just
associate some otherwise-unused Param with each RecursiveScan node,
and have the Recursion node signal a change of that Param when it
revises the work table.

In fact, why not combine that with getting rid of the klugy addition to
ExecutorState?  Make the param be actually useful: it could contain
some internal datastructure that passes the work table down to the
RecursiveScan node.
        regards, tom lane


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
Jeff Davis <pgsql@j-davis.com> writes:
> Here is a patch that is an initial attempt to reorganize the parse tree
> representation.

Oh dear, we seem to have spent yesterday doing the same work :-(

I'll go over this and try to merge it with my own WIP.

> * There are a couple of other rough points in places where it's hard to
> traverse up the parse tree or query tree.

Yeah, I'd been running into that issue too.  Adding an explicit pointer
to the CTE into the RTE doesn't work because it renders the parse tree
un-copiable (at least without something a lot more sophisticated than
copyObject; and saving/loading rule parsetrees would be tough too).

What I've got at the moment is that creation of an RTE_CTE RTE copies
the CTE's lists of output column types/typmods into the RTE.  This
eliminates the need for expandRTE and a couple of other places to be
able to find the CTE; everything they need is in the RTE.  So far as I
can see, everyplace else that might need to find the CTE from the RTE
is in places that either have a ParseState available, or have some
comparable structure that could provide a way to search upwards for
CTEs (eg, in ruleutils the "deparse context" will need to track
uplevel CTE lists as well as rtables).

It is a bit tedious though.  Can anyone think of another way that would
still preserve the notion of multiple RTEs being links to the same CTE
rather than independent subqueries?
        regards, tom lane


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Gregory Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Yeah, I'd been running into that issue too.  Adding an explicit pointer
> to the CTE into the RTE doesn't work because it renders the parse tree
> un-copiable (at least without something a lot more sophisticated than
> copyObject; and saving/loading rule parsetrees would be tough too).

Well the alternative to direct pointers is as you did with subqueries, turning
the set into a flat array and storing indexes into the array. I'm not sure if
that applies here or not.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's PostGIS support!


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
Gregory Stark <stark@enterprisedb.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Yeah, I'd been running into that issue too.  Adding an explicit pointer
>> to the CTE into the RTE doesn't work because it renders the parse tree
>> un-copiable (at least without something a lot more sophisticated than
>> copyObject; and saving/loading rule parsetrees would be tough too).

> Well the alternative to direct pointers is as you did with subqueries, turning
> the set into a flat array and storing indexes into the array. I'm not sure if
> that applies here or not.

I think that just changes the problem into "where can I find the array?" ...

The real issue here is that simplicity of copying etc requires that
child nodes in a parse tree never have back-links leading up to their
parent.  If we were willing to drop that requirement for Query then
we'd not need any auxiliary data structures to chase up to upper-level
rtables or CTEs.  I'm not sure this cure is better than the disease
though.
        regards, tom lane


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
Attached is the result of a couple of days hacking on the WITH RECURSIVE
patch.  This moves us a lot closer to having sanity in the parsing
phase, though I'm still quite unhappy with the second half of the
processing in parse_cte.c.  I added some infrastructure to make the
first half's search for CTE references reasonably bulletproof, but the
second half needs to be converted to use the same infrastructure, and
I didn't do that yet because I didn't understand what it was doing.
In particular, what the heck is the exception in findCteName that allows
some other CTE's non_recursive_term to be stored into the
non_recursive_term for the current one?  That seems mighty broken.

There are a number of unfinished corner cases (look for XXX in the
patch) but they aren't in the way of further progress.  The next big
thing seems to be to figure out exactly how to do multiple references
to CTE outputs, so that we can de-bogotify the planner.

            regards, tom lane


Вложения

Re: Common Table Expressions (WITH RECURSIVE) patch

От
Jeff Davis
Дата:
I am re-sending this message to -hackers from yesterday, because the
first time it didn't appear to make it through. This time I gzipped the
patch. This is just for the archives (and to give context to the
replies), and this message is superseded by Tom's patch here:

http://archives.postgresql.org/pgsql-hackers/2008-09/msg01521.php


On Thu, 2008-09-18 at 12:55 +0900, Tatsuo Ishii wrote:
> Tom, thanks for the review.
>
> Here is an in-progress report. Patches against CVS HEAD attached.
> (uncommented items are work-in-progress).

Here is a patch that is an initial attempt to reorganize the parse tree
representation.

The goal of this patch is to separate the RTEs from the CTEs, so that we
can, for example, have multiple RTEs refer to the same CTE. This will
hopefully allow us to materialize a volatile query once, and have
several RTEs refer to that same value, which will meet the SQL standard.

Notes:

* It makes a p_cte_table in the ParseState, which is a list of
CteTblEntries. This replaces p_ctenamespace, which was a list of
RangeSubselects.

* It copies the p_cte_table into Query.cte_table

* I introduced a new type of RTE, RTE_CTE, which holds a cte_index and
cte_levelsup. This is used to find the CTE that the RTE references.

Weak points:

* It does not change the behavior of recursive queries. That is a little
more complicated, so I wanted to wait for feedback on my patch so far.

* I don't understand set_subquery_pathlist, or that general area of the
code. I made a new set_cte_pathlist, that is basically the same thing,
except I used a hack "dummy_subquery" variable in the RTE to pass along
a pointer to the subquery of the CTE. I think this dummy variable can be
removed, but I just don't understand that part of the code well enough
to know what should happen. And if it does require a subquery at that
point, I'll need to find a way of locating the right cte_table from
inside that function. Any advice here would be appreciated.

* There are a couple of other rough points in places where it's hard to
traverse up the parse tree or query tree.

I can probably work around these weak points, but I wanted to send the
patch to avoid a lot of conflicts or problems later. Tell me whether you
think this is moving in the right direction.

Regards,
    Jeff Davis

Вложения

Re: Common Table Expressions (WITH RECURSIVE) patch

От
Greg Stark
Дата:

greg

On 24 Sep 2008, at 02:45, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> The next big
> thing seems to be to figure out exactly how to do multiple references
> to CTE outputs, so that we can de-bogotify the planner.

I've looked and don't seem to still have the source tree where I  
worked on this. I remember how I made the changes to tuplestore which  
was mostly mechanical. The trick I think will be in adding a special  
purpose executor method which passes the call site to the node below.  
This depends on the observation that if we always memoize results then  
each call site can only have one active call. That is, we don't need  
to maintain a full stack tree.   


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tatsuo Ishii
Дата:
Tom,

> > WithClause node may need a location field, and almost certainly has to
> > be handled somehow in exprLocation().
> > 
> > The error reports in parse_cte.c *desperately* need error locations.

Included is a patch for this against your cte-0923.patch.gz. Most
errors now have error locations, but some do not. I'm going to think
more to enhance this.
--
Tatsuo Ishii
SRA OSS, Inc. Japan
*** pgsql/src/backend/parser/parse_cte.c    2008-09-25 11:06:12.000000000 +0900
--- pgsql.patched/src/backend/parser/parse_cte.c    2008-09-25 10:46:41.000000000 +0900
***************
*** 239,245 ****     if (query->intoClause)         ereport(ERROR,                 (errcode(ERRCODE_SYNTAX_ERROR),
!                  errmsg("subquery in WITH cannot have SELECT INTO")));      /* Compute the derived fields if not done
yet*/     if (!cte->cterecursive)
 
--- 239,247 ----     if (query->intoClause)         ereport(ERROR,                 (errcode(ERRCODE_SYNTAX_ERROR),
!                  errmsg("subquery in WITH cannot have SELECT INTO"),
!                  parser_errposition(pstate,
!                                     exprLocation((Node *) query->intoClause))));      /* Compute the derived fields
ifnot done yet */     if (!cte->cterecursive)
 
***************
*** 561,625 ****                      lresult != NON_RECURSIVE)                 ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
!                          errmsg("Left hand side of UNION ALL must be a non-recursive term in a recursive query")));
          else if (stmt->op == SETOP_INTERSECT)                 ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
!                          errmsg("non-recursive term and recursive term must not be combined with INTERSECT")));
      else if (stmt->op == SETOP_EXCEPT)                 ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
!                          errmsg("non-recursive term and recursive term must not be combined with EXCEPT")));
   else if (stmt->op == SETOP_UNION && stmt->all != true)                 ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
!                          errmsg("non-recursive term and recursive term must be combined with UNION ALL")));
  else if (stmt->op == SETOP_UNION && stmt->all == true &&                      rarg->op == SETOP_UNION)
ereport(ERROR,                         (errcode(ERRCODE_SYNTAX_ERROR),
 
!                          errmsg("Right hand side of UNION ALL must not contain UNION operation")));              else
if(stmt->sortClause)                 ereport(ERROR,                         (errcode(ERRCODE_SYNTAX_ERROR),
 
!                          errmsg("ORDER BY in a recursive query not allowed")));              else if
(stmt->limitOffset|| stmt->limitCount)                 ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
!                          errmsg("LIMIT OFFSET in a recursive query not allowed")));              else if
(stmt->lockingClause)                ereport(ERROR,                         (errcode(ERRCODE_SYNTAX_ERROR),
 
!                          errmsg("FOR UPDATE in a recursive query not allowed")));              else if (lresult ==
NON_RECURSIVE&& rresult == RECURSIVE_SELF)             {                 if (larg->distinctClause)
ereport(ERROR,                            (errcode(ERRCODE_SYNTAX_ERROR),
 
!                              errmsg("DISTINCT in a non recursive term not allowed")));                  if
(rarg->distinctClause)                    ereport(ERROR,                             (errcode(ERRCODE_SYNTAX_ERROR),
 
!                              errmsg("DISTINCT in a recursive term not allowed")));                  if
(rarg->groupClause)                    ereport(ERROR,                             (errcode(ERRCODE_SYNTAX_ERROR),
 
!                              errmsg("GROUP BY in a recursive term not allowed")));                  if
(rarg->havingClause)                    ereport(ERROR,                             (errcode(ERRCODE_SYNTAX_ERROR),
 
!                              errmsg("HAVING in a recursive term not allowed")));                  /*
*Save non_recursive_term.
 
--- 563,646 ----                      lresult != NON_RECURSIVE)                 ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
!                          errmsg("Left hand side of UNION ALL must be a non-recursive term in a recursive query"),
!                          parser_errposition(cstate->pstate, cte->location)));              else if (stmt->op ==
SETOP_INTERSECT)                ereport(ERROR,                         (errcode(ERRCODE_SYNTAX_ERROR),
 
!                          errmsg("non-recursive term and recursive term must not be combined with INTERSECT"),
!                          parser_errposition(cstate->pstate, cte->location)));              else if (stmt->op ==
SETOP_EXCEPT)                ereport(ERROR,                         (errcode(ERRCODE_SYNTAX_ERROR),
 
!                          errmsg("non-recursive term and recursive term must not be combined with EXCEPT"),
!                          parser_errposition(cstate->pstate, cte->location)));              else if (stmt->op ==
SETOP_UNION&& stmt->all != true)                 ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
!                          errmsg("non-recursive term and recursive term must be combined with UNION ALL"),
!                          parser_errposition(cstate->pstate, cte->location)));              else if (stmt->op ==
SETOP_UNION&& stmt->all == true &&                      rarg->op == SETOP_UNION)                 ereport(ERROR,
               (errcode(ERRCODE_SYNTAX_ERROR),
 
!                          errmsg("Right hand side of UNION ALL must not contain UNION operation"),
!                          parser_errposition(cstate->pstate, cte->location)));              else if (stmt->sortClause)
               ereport(ERROR,                         (errcode(ERRCODE_SYNTAX_ERROR),
 
!                          errmsg("ORDER BY in a recursive query not allowed"),
!                          parser_errposition(cstate->pstate,
!                                             exprLocation((Node *) stmt->sortClause))));              else if
(stmt->limitOffset|| stmt->limitCount)                 ereport(ERROR,
(errcode(ERRCODE_SYNTAX_ERROR),
!                          errmsg("LIMIT OFFSET in a recursive query not allowed"),
!                          parser_errposition(cstate->pstate,
!                                             exprLocation((Node *) stmt->limitCount))));              else if
(stmt->lockingClause)                ereport(ERROR,                         (errcode(ERRCODE_SYNTAX_ERROR),
 
!                          errmsg("FOR UPDATE in a recursive query not allowed"),
!                          parser_errposition(cstate->pstate,
!                                             exprLocation((Node *) stmt->lockingClause))));              else if
(lresult== NON_RECURSIVE && rresult == RECURSIVE_SELF)             {                 if (larg->distinctClause)
          ereport(ERROR,                             (errcode(ERRCODE_SYNTAX_ERROR),
 
!                              errmsg("DISTINCT in a non recursive term not allowed"),
!                              parser_errposition(cstate->pstate,
!                                                 exprLocation((Node *) larg->distinctClause))));                  if
(rarg->distinctClause)                    ereport(ERROR,                             (errcode(ERRCODE_SYNTAX_ERROR),
 
!                              errmsg("DISTINCT in a recursive term not allowed"),
!                              parser_errposition(cstate->pstate,
!                                                 exprLocation((Node *) rarg->distinctClause))));                  if
(rarg->groupClause)                    ereport(ERROR,                             (errcode(ERRCODE_SYNTAX_ERROR),
 
!                              errmsg("GROUP BY in a recursive term not allowed"),
!                              parser_errposition(cstate->pstate,
!                                                 exprLocation((Node *) rarg->groupClause))));                  if
(rarg->havingClause)                    ereport(ERROR,                             (errcode(ERRCODE_SYNTAX_ERROR),
 
!                              errmsg("HAVING in a recursive term not allowed"),
!                              parser_errposition(cstate->pstate,
!                                                 exprLocation((Node *) rarg->havingClause))));                  /*
            * Save non_recursive_term.
 
***************
*** 668,674 ****         if (checkCteTargetList(cstate, n->targetList, myindex) != NON_RECURSIVE)
ereport(ERROR,                    (errcode(ERRCODE_SYNTAX_ERROR),
 
!                      (errmsg("target list having subquery which uses recursive name in a recursive term not
allowed"))));    }      if (n->fromClause)
 
--- 689,697 ----         if (checkCteTargetList(cstate, n->targetList, myindex) != NON_RECURSIVE)
ereport(ERROR,                    (errcode(ERRCODE_SYNTAX_ERROR),
 
!                      (errmsg("target list having subquery which uses recursive name in a recursive term not
allowed"),
!                       parser_errposition(cstate->pstate,
!                                          exprLocation((Node *) n->targetList)))));     }      if (n->fromClause)
***************
*** 680,687 ****         if (checkCteWhereClause(cstate, n->whereClause, myindex) != NON_RECURSIVE)
ereport(ERROR,                    (errcode(ERRCODE_SYNTAX_ERROR),
 
!                      (errmsg("WHERE clause having subqury which uses recursive name in a recursive term not
allowed"))));
!      }      return r;
--- 703,711 ----         if (checkCteWhereClause(cstate, n->whereClause, myindex) != NON_RECURSIVE)
ereport(ERROR,                    (errcode(ERRCODE_SYNTAX_ERROR),
 
!                      (errmsg("WHERE clause having subqury which uses recursive name in a recursive term not
allowed"),
!                       parser_errposition(cstate->pstate,
!                                          exprLocation((Node *) n->whereClause)))));     }      return r;
*** pgsql/src/test/regress/expected/recursive.out    2008-09-25 11:06:12.000000000 +0900
--- pgsql.patched/src/test/regress/expected/recursive.out    2008-09-25 11:11:47.000000000 +0900
***************
*** 404,423 ****
--- 404,433 ---- WITH RECURSIVE x(n) AS (SELECT 1 UNION SELECT n+1 FROM x)     SELECT * FROM x; ERROR:  non-recursive
termand recursive term must be combined with UNION ALL
 
+ LINE 1: WITH RECURSIVE x(n) AS (SELECT 1 UNION SELECT n+1 FROM x)
+                        ^ -- INTERSECT WITH RECURSIVE x(n) AS (SELECT 1 INTERSECT SELECT n+1 FROM x)     SELECT * FROM
x;ERROR:  non-recursive term and recursive term must not be combined with INTERSECT
 
+ LINE 1: WITH RECURSIVE x(n) AS (SELECT 1 INTERSECT SELECT n+1 FROM x...
+                        ^ WITH RECURSIVE x(n) AS (SELECT 1 INTERSECT ALL SELECT n+1 FROM x)     SELECT * FROM x;
ERROR: non-recursive term and recursive term must not be combined with INTERSECT
 
+ LINE 1: WITH RECURSIVE x(n) AS (SELECT 1 INTERSECT ALL SELECT n+1 FR...
+                        ^ -- EXCEPT WITH RECURSIVE x(n) AS (SELECT 1 EXCEPT SELECT n+1 FROM x)     SELECT * FROM x;
ERROR: non-recursive term and recursive term must not be combined with EXCEPT
 
+ LINE 1: WITH RECURSIVE x(n) AS (SELECT 1 EXCEPT SELECT n+1 FROM x)
+                        ^ WITH RECURSIVE x(n) AS (SELECT 1 EXCEPT ALL SELECT n+1 FROM x)     SELECT * FROM x; ERROR:
non-recursiveterm and recursive term must not be combined with EXCEPT
 
+ LINE 1: WITH RECURSIVE x(n) AS (SELECT 1 EXCEPT ALL SELECT n+1 FROM ...
+                        ^ -- no non-recursive term WITH RECURSIVE x(n) AS (SELECT n FROM x)     SELECT * FROM x;
***************
*** 428,433 ****
--- 438,445 ---- WITH RECURSIVE x(n) AS (SELECT n FROM x UNION ALL SELECT 1)     SELECT * FROM x; ERROR:  Left hand
sideof UNION ALL must be a non-recursive term in a recursive query
 
+ LINE 1: WITH RECURSIVE x(n) AS (SELECT n FROM x UNION ALL SELECT 1)
+                        ^ CREATE TEMPORARY TABLE y (a INTEGER); INSERT INTO y SELECT generate_series(1, 10); -- LEFT
JOIN
***************
*** 453,470 ****
--- 465,490 ----                           WHERE n IN (SELECT * FROM x))   SELECT * FROM x; ERROR:  WHERE clause having
subqurywhich uses recursive name in a recursive term not allowed
 
+ LINE 2:                           WHERE n IN (SELECT * FROM x))
+                                         ^ WITH RECURSIVE x(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x
          WHERE n = 1 AND n IN (SELECT * FROM x))   SELECT * FROM x; ERROR:  WHERE clause having subqury which uses
recursivename in a recursive term not allowed
 
+ LINE 2:                           WHERE n = 1 AND n IN (SELECT * FRO...
+                                         ^ -- GROUP BY WITH RECURSIVE x(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x
GROUPBY n)   SELECT * FROM x; ERROR:  GROUP BY in a recursive term not allowed
 
+ LINE 1: ...VE x(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x GROUP BY n)
+                                                                      ^ -- HAVING WITH RECURSIVE x(n) AS (SELECT 1
UNIONALL SELECT n+1 FROM x HAVING n < 10)   SELECT * FROM x; ERROR:  HAVING in a recursive term not allowed
 
+ LINE 1: ...x(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x HAVING n < 10)
+                                                                 ^ -- aggregate functions WITH RECURSIVE x(n) AS
(SELECT1 UNION ALL SELECT count(*) FROM x)   SELECT * FROM x;
 
***************
*** 485,494 ****
--- 505,518 ---- WITH RECURSIVE x(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x ORDER BY 1)   SELECT * FROM x; ERROR:
ORDERBY in a recursive query not allowed
 
+ LINE 1: ...VE x(n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x ORDER BY 1)
+                                                                      ^ -- LIMIT/OFFSET WITH RECURSIVE x(n) AS (SELECT
1UNION ALL SELECT n+1 FROM x LIMIT 10 OFFSET 1)   SELECT * FROM x; ERROR:  LIMIT OFFSET in a recursive query not
allowed
+ LINE 1: ...n) AS (SELECT 1 UNION ALL SELECT n+1 FROM x LIMIT 10 OFFSET ...
+                                                              ^ -- FOR UPDATE WITH RECURSIVE x(n) AS (SELECT 1 UNION
ALLSELECT n+1 FROM x FOR UPDATE)   SELECT * FROM x;
 
***************
*** 499,504 ****
--- 523,530 ----     SELECT (SELECT * FROM x) FROM x WHERE id < 5 ) SELECT * FROM x; ERROR:  target list having
subquerywhich uses recursive name in a recursive term not allowed
 
+ LINE 3:     SELECT (SELECT * FROM x) FROM x WHERE id < 5
+                    ^ -- mutual recursive query WITH RECURSIVE   x (id) AS (SELECT 1 UNION ALL SELECT id+1 FROM y
WHEREid < 5), 

Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
Greg Stark <greg.stark@enterprisedb.com> writes:
> On 24 Sep 2008, at 02:45, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The next big
>> thing seems to be to figure out exactly how to do multiple references
>> to CTE outputs, so that we can de-bogotify the planner.

> I've looked and don't seem to still have the source tree where I  
> worked on this. I remember how I made the changes to tuplestore which  
> was mostly mechanical. The trick I think will be in adding a special  
> purpose executor method which passes the call site to the node below.  
> This depends on the observation that if we always memoize results then  
> each call site can only have one active call. That is, we don't need  
> to maintain a full stack tree. 

I looked at the tuplestore code a bit and decided that this actually
doesn't need to be hard at all.  Tuplestore already has a notion of a
write position and an independent read position, and we don't need more
than one write position.  So what I think we should do is add "get read
position" and "set read position" functions to tuplestore.c, and have
each of the reader nodes remember its own read position.  That is,
each reader has to do
set_read_position(tupstore, &local_read_position);tuple = tuplestore_gettuple(tupstore,
...);get_read_position(tupstore,&local_read_position);
 

rather than just tuplestore_gettuple.  The set/get functions will be
cheap enough that this is no big deal.  (Or maybe we should just
provide a wrapper function that does this sequence?)
        regards, tom lane


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Dimitri Fontaine
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi,

Le 30 sept. 08 à 20:03, Tom Lane a écrit :
>     set_read_position(tupstore, &local_read_position);
>     tuple = tuplestore_gettuple(tupstore, ...);
>     get_read_position(tupstore, &local_read_position);
>
> rather than just tuplestore_gettuple.  The set/get functions will be
> cheap enough that this is no big deal.  (Or maybe we should just
> provide a wrapper function that does this sequence?)

It seems to me to share some ideas with the MemoryContext concept:
what about a TupstoreContext associated with tuplestore, you get a
common default one if you don't register your own, and usetuplestore_gettuple(MyTupstoreContext, ...);

Maybe some other API would benefit from the idea?

Regards,
- --
dim


-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (Darwin)

iEYEARECAAYFAkjigF4ACgkQlBXRlnbh1bkycQCgqs/+JBOd0SiN4xvKwLgEgi9F
BOYAoLm0Se6zs8cEAnoTlH6de7pLLh/l
=kzm1
-----END PGP SIGNATURE-----


Re: Common Table Expressions (WITH RECURSIVE) patch

От
"Hitoshi Harada"
Дата:
Hi,

2008/10/1 Dimitri Fontaine <dfontaine@hi-media.com>:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> Hi,
>
> Le 30 sept. 08 à 20:03, Tom Lane a écrit :
>>
>>        set_read_position(tupstore, &local_read_position);
>>        tuple = tuplestore_gettuple(tupstore, ...);
>>        get_read_position(tupstore, &local_read_position);
>>
>> rather than just tuplestore_gettuple.  The set/get functions will be
>> cheap enough that this is no big deal.  (Or maybe we should just
>> provide a wrapper function that does this sequence?)
>
> It seems to me to share some ideas with the MemoryContext concept: what
> about a TupstoreContext associated with tuplestore, you get a common default
> one if you don't register your own, and use
>        tuplestore_gettuple(MyTupstoreContext, ...);
>
> Maybe some other API would benefit from the idea?
>

I'm just working on tuplestore recording multiple positions for my
window function project. Attached patch is still in progress but seems
it works in a situation.

>From my work, the setting/getting read position and delegate savig
positions to the caller will probably have problems, because of memory
control for saving positions and tuplestore status changing (memory ->
BufFile). Instead, I decided it'd better that we can indicate the row
number by integer.

Regards,

--
Hitoshi Harada

Вложения

Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
Here are the results of a couple more days' hacking on the CTE patch.

* I cleaned up the processing in the second part of parse_cte.c, where
we are trying to check for validity of a recursive query.  The
conditions that it's checking for are not exactly the same as what was
being looked for previously, so this could do with a bit of review.

* I got rid of the kluges in the executor in favor of treating the
working table as a PARAM_EXEC Param.  Also renamed the plan node
types to RecursiveUnion and WorkTableScan --- I'm not wedded to these
choices, but they seemed more transparent than the former names.

* I have not yet tackled the problem of ensuring single evaluation of
CTEs, but there's a few bits of infrastructure for it.

There are various small loose ends denoted by XXX in the patch, but
the main remaining issue is definitely the single-evaluation business.

            regards, tom lane


Вложения

Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
"Hitoshi Harada" <umi.tanuki@gmail.com> writes:
>> It seems to me to share some ideas with the MemoryContext concept: what
>> about a TupstoreContext associated with tuplestore, you get a common default
>> one if you don't register your own, and use
>> tuplestore_gettuple(MyTupstoreContext, ...);

> I'm just working on tuplestore recording multiple positions for my
> window function project. Attached patch is still in progress but seems
> it works in a situation.

> From my work, the setting/getting read position and delegate savig
> positions to the caller will probably have problems, because of memory
> control for saving positions and tuplestore status changing (memory ->
> BufFile). Instead, I decided it'd better that we can indicate the row
> number by integer.

That seems unreasonably inefficient --- it takes lots more file I/O,
and there's really no need for random access, at least not in what I'm
doing.

I looked at the tuplestore code some more, and my idea of allowing
callers to store their current position externally to tuplestore indeed
doesn't work.  The problem is that if dumptuples() happens there's no
way to convert an index-based position to a file-based one.  The current
code can handle it because it has access to all the positions (the read,
write, and mark pointers) but any positions stored externally wouldn't
get converted.

So it seems like the appropriate generalization is to have an array of
read positions inside the tuplestore and allow callers to say "read
using position N", plus some API to allow positions to be allocated to
different requestors.  We could get rid of the separate mark pointer by
implementing an API that allows position X to be copied to position Y.
But the actual value of a position (a tuple number or file position
info) would never be exposed to callers.

Comments?
        regards, tom lane


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
"Greg Stark" <stark@enterprisedb.com> writes:
> On Wed, Oct 1, 2008 at 2:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> So it seems like the appropriate generalization is to have an array of
>> read positions inside the tuplestore and allow callers to say "read
>> using position N", plus some API to allow positions to be allocated to
>> different requestors.

> One other reason the tuplestore should know the position of all the
> readers is that ideally it would want to be able to discard any tuples
> older than the oldest read position. That also means it needs to know
> when all the call sites have allocated their position and don't need
> to reset it.

Good point.  So we'd need per-position capability flags, not
per-tuplestore.

I hadn't realized that this would be relevant to window functions.
Now that I know that, I propose fixing tuplestore for multiple
positions and committing it separately, before I go back to the CTE
patch.  Then Hitoshi-san will have something he can work with too.
        regards, tom lane


Re: Common Table Expressions (WITH RECURSIVE) patch

От
"Greg Stark"
Дата:
On Wed, Oct 1, 2008 at 2:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> So it seems like the appropriate generalization is to have an array of
> read positions inside the tuplestore and allow callers to say "read
> using position N", plus some API to allow positions to be allocated to
> different requestors.  We could get rid of the separate mark pointer by
> implementing an API that allows position X to be copied to position Y.
> But the actual value of a position (a tuple number or file position
> info) would never be exposed to callers.


That's basicaly what had done (though i had n "readers" which
encapsulated the current pointer and mark).

One other reason the tuplestore should know the position of all the
readers is that ideally it would want to be able to discard any tuples
older than the oldest read position. That also means it needs to know
when all the call sites have allocated their position and don't need
to reset it.



-- 
greg


Re: Common Table Expressions (WITH RECURSIVE) patch

От
"Hitoshi Harada"
Дата:
>> One other reason the tuplestore should know the position of all the
>> readers is that ideally it would want to be able to discard any tuples
>> older than the oldest read position. That also means it needs to know
>> when all the call sites have allocated their position and don't need
>> to reset it.
>
> Good point.  So we'd need per-position capability flags, not
> per-tuplestore.
>
> I hadn't realized that this would be relevant to window functions.
> Now that I know that, I propose fixing tuplestore for multiple
> positions and committing it separately, before I go back to the CTE
> patch.  Then Hitoshi-san will have something he can work with too.
>

Yes, tuplestore multiple positioning will give the greate help to the
window function. Ideally, it is better that tuplestore'd have all the
positions and have some kind of capability to discard old rows so that
it can stay in TSS_MEM, which helps window function's sliding frame.
But it isn't really critical, just performance matter.

Regards,



-- 
Hitoshi Harada


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
"Hitoshi Harada" <umi.tanuki@gmail.com> writes:
>> I hadn't realized that this would be relevant to window functions.
>> Now that I know that, I propose fixing tuplestore for multiple
>> positions and committing it separately, before I go back to the CTE
>> patch.  Then Hitoshi-san will have something he can work with too.

> Yes, tuplestore multiple positioning will give the greate help to the
> window function. Ideally, it is better that tuplestore'd have all the
> positions and have some kind of capability to discard old rows so that
> it can stay in TSS_MEM, which helps window function's sliding frame.

Okay, there's a patch in CVS HEAD that works this way.  Let me know if
it needs further tweaking for your purposes.
        regards, tom lane


Re: Common Table Expressions (WITH RECURSIVE) patch

От
"Hitoshi Harada"
Дата:
2008/10/2 Tom Lane <tgl@sss.pgh.pa.us>:
> "Hitoshi Harada" <umi.tanuki@gmail.com> writes:
>>> I hadn't realized that this would be relevant to window functions.
>>> Now that I know that, I propose fixing tuplestore for multiple
>>> positions and committing it separately, before I go back to the CTE
>>> patch.  Then Hitoshi-san will have something he can work with too.
>
>> Yes, tuplestore multiple positioning will give the greate help to the
>> window function. Ideally, it is better that tuplestore'd have all the
>> positions and have some kind of capability to discard old rows so that
>> it can stay in TSS_MEM, which helps window function's sliding frame.
>
> Okay, there's a patch in CVS HEAD that works this way.  Let me know if
> it needs further tweaking for your purposes.
>
>                        regards, tom lane
>

Hmm, I've looked over the patch. Logically window functions can access
arbitrary rows that have been stored in a frame. Thus I had thought
tuplestore should hold all the positions and allow arbitrary random
access indicated by integer. Maybe those functionalities can be
abstracted by the window function API itself. For this matter it seems
that you'd better to look at my future patch.
Everything else is great deal. By improving tuplestore_trim(), sliding
frame will be performed better than I'd thought.

Regards,


-- 
Hitoshi Harada


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
"Hitoshi Harada" <umi.tanuki@gmail.com> writes:
> 2008/10/2 Tom Lane <tgl@sss.pgh.pa.us>:
>> Okay, there's a patch in CVS HEAD that works this way.  Let me know if
>> it needs further tweaking for your purposes.

> Hmm, I've looked over the patch. Logically window functions can access
> arbitrary rows that have been stored in a frame. Thus I had thought
> tuplestore should hold all the positions and allow arbitrary random
> access indicated by integer. Maybe those functionalities can be
> abstracted by the window function API itself. For this matter it seems
> that you'd better to look at my future patch.

Well, the problem with defining it as "arbitrary" random access is that
there's no way for the tuplestore to throw away old data.

The scheme that I have in mind here is that you keep (at least) two read
pointers, one that is where you're actually reading the data and one
that is nailing down the oldest point you might need to return to.
This is a generalization of the previous mark/restore logic to allow any
number of pairs of mark and restore points.
        regards, tom lane


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Greg Stark
Дата:

On 2 Oct 2008, at 05:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:

> "Hitoshi Harada" <umi.tanuki@gmail.com> writes:
>
>> Hmm, I've looked over the patch. Logically window functions can  
>> access
>> arbitrary rows that have been stored in a frame. Thus I had thought
>> tuplestore should hold all the positions and allow arbitrary random
>> access indicated by integer. Maybe those functionalities can be
>> abstracted by the window function API itself. For this matter it  
>> seems
>> that you'd better to look at my future patch.
>
> Well, the problem with defining it as "arbitrary" random access is  
> that
> there's no way for the tuplestore to throw away old data.

And that there's no way to make it work if the tuplestore has spilled  
to disk.




Re: Common Table Expressions (WITH RECURSIVE) patch

От
"Hitoshi Harada"
Дата:
2008/10/2 Greg Stark <greg.stark@enterprisedb.com>:
>
>
> On 2 Oct 2008, at 05:44 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
>> "Hitoshi Harada" <umi.tanuki@gmail.com> writes:
>>
>>> Hmm, I've looked over the patch. Logically window functions can access
>>> arbitrary rows that have been stored in a frame. Thus I had thought
>>> tuplestore should hold all the positions and allow arbitrary random
>>> access indicated by integer. Maybe those functionalities can be
>>> abstracted by the window function API itself. For this matter it seems
>>> that you'd better to look at my future patch.
>>
>> Well, the problem with defining it as "arbitrary" random access is that
>> there's no way for the tuplestore to throw away old data.
>
> And that there's no way to make it work if the tuplestore has spilled to
> disk.
>

In my purpose the "old data" can always be indicated by an integer row
position. So the real problem here is how you store all the row
positions for arbitrary random access closer to O(1). Yes, you can go
to certain row by fetching tuples manytimes from a marked row but it's
inefficient. I know my patch sent before is also inefficent. But
essentially the window function needs high cost.

I'll try to process my work with the patch but maybe more work on
tuplestore will be needed after we see the real problem that I don't
see now either.

Regards,



-- 
Hitoshi Harada


Re: Common Table Expressions (WITH RECURSIVE) patch

От
Tom Lane
Дата:
I've successfully taught the WITH patch to do single evaluation of WITH
queries.  I've also been through all of the planner and executor code
for it and am now feeling pretty happy with the whole thing.  There are
still a small number of loose ends (see XXX in the patch), but I don't
believe any of them represent significant work --- I just left them
undone because they weren't in the way of testing anything else.

One point of interest (at least to Hitoshi-san) is that my first cut
at the multiple-readout tuplestore turned out to be not quite the right
thing.  In the original code, if the read pointer was "at eof" (meaning
the previous gettuple call had returned null), then a puttuple call
would move the read pointer forward over the added tuple, keeping it
"at eof".  This was done because nodeMaterial.c wanted it and no other
callers cared particularly.  I had generalized that to the idea that
*all* read pointers that are "at eof" should get moved; but this turns
out to be a really bad idea, at least for nodeCtescan's usage.  What
seems to be the right thing is for only the "active" read pointer to
be moved forward, with inactive ones dropping out of "at eof" state.
This seems reasonable because the point is to not have to reprocess
a tuple you know you just stuck into the tuplestore --- but the other
readers of the tuplestore won't know that, and need to see the tuple
you added.  But it might be that we need to make the behavior
configurable somehow, if the window-functions patch turns out to
need something different.

Barring surprises in the loose ends, I expect to be able to commit
this in a couple of days.

            regards, tom lane


Вложения

CYCLE and SEARCH [was Re: Common Table Expressions (WITH RECURSIVE) patch]

От
David Fetter
Дата:
On Fri, Oct 03, 2008 at 03:55:56PM -0400, Tom Lane wrote:
> I've successfully taught the WITH patch to do single evaluation of
> WITH queries.  I've also been through all of the planner and
> executor code for it and am now feeling pretty happy with the whole
> thing.  There are still a small number of loose ends (see XXX in the
> patch), but I don't believe any of them represent significant work
> --- I just left them undone because they weren't in the way of
> testing anything else.

Great!

How hard would it be to add the infrastructure for CYCLE?  Here's the
kind of awful hack I've been doing lately in order to detect and
prevent cycles:

CREATE TABLE adjacency(   id INTEGER NOT NULL,   parent_id INTEGER
);

INSERT INTO adjacency VALUES(1,NULL),
(2,1),(3,1),(4,1),
(5,2),(6,2),(7,2),(8,3),(9,3),(10,4),
(11,5),(12,5),(13,6),(14,7),(15,8),
(9,1); /* Cycle! */

WITH RECURSIVE t(node, path) AS (   SELECT id, ARRAY[id]   FROM adjacency   WHERE parent_id IS NULL
UNION ALL   SELECT a1.id, t.path || a1.id   FROM adjacency a1 JOIN t ON (a1.parent_id = t.node)   WHERE a1.id <>
ANY(t.path)/* Remove cycle using awful hack :P */
 
)
SELECT
CASE WHEN array_upper(path,1)>1 THEN '+-' ELSE '' END ||
REPEAT('--', array_upper(path,1)-2) ||
node AS "Branch"
FROM t
ORDER BY path;

I suspect that some kind of hash structure, instantiated only when a
CYCLE clause is specified, could help out with a much, much more
efficient implementation of cycle prevention.

Adding SEARCH will be a lot more complicated, as DEPTH FIRST is
completely different from how the implementation works now.  Any ideas
on how this might be approached?

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: CYCLE and SEARCH [was Re: Common Table Expressions (WITH RECURSIVE) patch]

От
Tom Lane
Дата:
David Fetter <david@fetter.org> writes:
> How hard would it be to add the infrastructure for CYCLE?

Personally I'm more interested in getting the recursive UNION (without
ALL) case to work.

I think that this might just be a matter of drawing on support that's
already there: have RecursiveUnion maintain a hash table of rows it's
already seen, and discard anything coming from either the non-recursive
term or the recursive term that is found to be already present in the
hash table.

Two objections to this are (a) it doesn't work for datatypes without
hash equality support, and (b) it would fail for recursion output
exceeding available memory.  However, the alternative of using
sort-and-uniq to detect duplicates seems pretty horrid in this situation
--- you'd need a fresh sort and mergejoin-like scan for every iteration
of the recursive term.  And it would become impractical to return rows
immediately after they're emitted by the subqueries.  So I'd be willing
to live with only supporting the hash implementation.

If no objections, I'll look at that next week ...
        regards, tom lane