Обсуждение: WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

Поиск
Список
Период
Сортировка

WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

От
Tom Lane
Дата:
I looked a bit at the SQL:2008 spec for a CYCLE clause for WITH
RECURSIVE.  It is interesting to see that it is just syntactic sugar,
because *they spell out how to expand it into regular SQL*.  More,
they defined it in such a way that it's hard to optimize at all,
because the "path" column is exposed to the user; you don't really
have any choice about how to do it.  There are some ugly and unnecessary
choices in there too, like insisting that the cycle mark column be
char(1).

So I am not feeling very excited about implementing the syntax per se
(and I note that DB2 doesn't seem to have done so either).  Instead
we should document some examples of how to do cycle detection at the
SQL level.  However, it would be nice if the spec's approach to cycle
detection actually worked well in Postgres.  There are a couple of
things we seem to be missing, according to some experiments I just
did with trying to translate the spec's code into Postgres:

* The spec assumes that ARRAY[ROW(some columns)] works, ie, that you can
have an array of an anonymous record type.  We don't allow that right
now, but it seems like a useful thing to have --- at least as a
transient value within a query.  I'm not sure there's a case for
allowing such things to go to disk.

* The spec writes this to detect whether a row of an anonymous record
type is present in an array of that same anonymous record type:ROW(some columns) IN (SELECT P.* FROM TABLE(array
variable)P) 
 
We haven't got the TABLE() syntax; you can sort of emulate it with a SRF
but only for arrays of named rowtypes.  For an anonymous rowtype,
it's very unclear to me how the rowtype would be communicated at
parse time so that the P.* notation could be expanded properly.

* Instead of the above, we could try to makeROW(some columns) = ANY (array variable)
work.  This is shorter than the above syntax and would presumably have
a lot less overhead too.  But it doesn't work right now, not even for
named rowtypes much less anonymous ones.

I'm thinking that addressing these pieces would be a generally good
thing to do, above and beyond potential uses in recursive queries.
        regards, tom lane


Re: WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

От
"Merlin Moncure"
Дата:
On Tue, Oct 7, 2008 at 9:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> * Instead of the above, we could try to make
>        ROW(some columns) = ANY (array variable)
> work.  This is shorter than the above syntax and would presumably have
> a lot less overhead too.  But it doesn't work right now, not even for
> named rowtypes much less anonymous ones.

By extension, would this also mean things like
select row(1,2,3)::foo = foo from foo;
or
select (1,2,3)::foo = (1,2,3)::foo;
or
select (1,2,3) = (1,2,3)::foo;

Would work (presumably as row-wise comparison does)? Also,
create index foo_idx on foo(foo);
select * from foo where (1,2,3)::foo = foo;

would be very nice.

> I'm thinking that addressing these pieces would be a generally good
> thing to do, above and beyond potential uses in recursive queries.

absolutely.

merlin


Re: WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

От
Tom Lane
Дата:
"Merlin Moncure" <mmoncure@gmail.com> writes:
> On Tue, Oct 7, 2008 at 9:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> * Instead of the above, we could try to make
>> ROW(some columns) = ANY (array variable)
>> work.  This is shorter than the above syntax and would presumably have
>> a lot less overhead too.  But it doesn't work right now, not even for
>> named rowtypes much less anonymous ones.

> By extension, would this also mean things like
> select row(1,2,3)::foo = foo from foo;
> Would work (presumably as row-wise comparison does)?

Well, it turned out to be easier than I thought to get the base case
working --- all that's necessary is to define an array type for RECORD
and add generic comparison functions, and the cases that are needed for
recursive cycle detection Just Work!  See attached WIP patch, and
particularly note the new test case in with.sql.

The part that actually seems to need some thought is the relationship
between this and operations on named rowtypes.  In the patch I tweaked
parse_coerce.c to treat coercion from a named rowtype's array type to
record[] as an allowed binary-compatible case, but I didn't do the
other direction yet (I'm not fully convinced that it's necessary).

What I'm really not clear about is the extent to which record[] ought
to be treated like a polymorphic type --- should we consider that
it acts like an "anyrecordarray" type, or is that a distinct notion?
Do we even want that?  record itself is not considered a polymorphic
type, though it has some similar qualities.

Another point worth making is that I made the comparisons work like
IS NOT DISTINCT, ie, they don't return NULL just because some field of
the row is NULL.  This is *not* like SQL-spec row comparison, but we
can't build a btree opclass from these functions if they insist on
returning null for null fields.  (Our array comparisons work like this,
too.)  I'm not sure how big a deal that is either way, but I am pretty
sure that IS NOT DISTINCT is the semantics you need to have if you want
cycle detection to work reliably.  (Hm, is that a bug in the spec?
They say to use = rather than DISTINCT in cycle checking ...)

Comments?  Do we want to go forward with this?

            regards, tom lane


Вложения

Re: WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

От
"Merlin Moncure"
Дата:
On Wed, Oct 8, 2008 at 4:27 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Merlin Moncure" <mmoncure@gmail.com> writes:
>> On Tue, Oct 7, 2008 at 9:58 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> * Instead of the above, we could try to make
>>> ROW(some columns) = ANY (array variable)
>>> work.  This is shorter than the above syntax and would presumably have
>>> a lot less overhead too.  But it doesn't work right now, not even for
>>> named rowtypes much less anonymous ones.
>
>> By extension, would this also mean things like
>> select row(1,2,3)::foo = foo from foo;
>> Would work (presumably as row-wise comparison does)?
>
> Well, it turned out to be easier than I thought to get the base case
> working --- all that's necessary is to define an array type for RECORD
> and add generic comparison functions, and the cases that are needed for
> recursive cycle detection Just Work!  See attached WIP patch, and
> particularly note the new test case in with.sql.
>
> The part that actually seems to need some thought is the relationship
> between this and operations on named rowtypes.  In the patch I tweaked
> parse_coerce.c to treat coercion from a named rowtype's array type to
> record[] as an allowed binary-compatible case, but I didn't do the
> other direction yet (I'm not fully convinced that it's necessary).
>
> What I'm really not clear about is the extent to which record[] ought
> to be treated like a polymorphic type --- should we consider that
> it acts like an "anyrecordarray" type, or is that a distinct notion?
> Do we even want that?  record itself is not considered a polymorphic
> type, though it has some similar qualities.
>
> Another point worth making is that I made the comparisons work like
> IS NOT DISTINCT, ie, they don't return NULL just because some field of
> the row is NULL.  This is *not* like SQL-spec row comparison, but we
> can't build a btree opclass from these functions if they insist on
> returning null for null fields.  (Our array comparisons work like this,
> too.)  I'm not sure how big a deal that is either way, but I am pretty
> sure that IS NOT DISTINCT is the semantics you need to have if you want
> cycle detection to work reliably.  (Hm, is that a bug in the spec?
> They say to use = rather than DISTINCT in cycle checking ...)
>
> Comments?  Do we want to go forward with this?

The record ops work as promised.  IMO this patch stands in its own
merits with or without the CTE changes.  I played around with it and
noticed a couple of oddities:

select foo = foo from foo;  --works
but

select distinct foo from foo; --yields
ERROR:  could not identify an equality operator for type foo
also,

select foo from foo order by foo;
ERROR:  could not identify an ordering operator for type foo

postgres=# create index foo_idx on foo((foo));
ERROR:  cache lookup failed for type 0

The above are probably not required to fulfill the CTE
requirements...but would be nice.

merlin


Re: WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

От
Tom Lane
Дата:
"Merlin Moncure" <mmoncure@gmail.com> writes:
> The record ops work as promised.  IMO this patch stands in its own
> merits with or without the CTE changes.  I played around with it and
> noticed a couple of oddities:

> select foo = foo from foo;  --works
> but

> select distinct foo from foo; --yields
> ERROR:  could not identify an equality operator for type foo
> also,

> select foo from foo order by foo;
> ERROR:  could not identify an ordering operator for type foo

Yeah, these are because of the incomplete handling of named record
types.  I'm not sure how far we want to go in that direction.

> postgres=# create index foo_idx on foo((foo));
> ERROR:  cache lookup failed for type 0

Hm, that's not good ...
        regards, tom lane


Re: WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

От
Tom Lane
Дата:
I wrote:
> "Merlin Moncure" <mmoncure@gmail.com> writes:
>> select foo from foo order by foo;
>> ERROR:  could not identify an ordering operator for type foo

> Yeah, these are because of the incomplete handling of named record
> types.  I'm not sure how far we want to go in that direction.

On looking closer, all these cases fail because I forgot to teach
IsBinaryCoercible() that any composite type should be considered
binary-coercible to RECORD.  Which is clearly sensible.

I'm inclined to apply the patch with binary-coercibility adjustments
and not try to turn RECORD or RECORD[] into full-fledged polymorphic
types.  It's not immediately clear what the use of that would be
anyway.
        regards, tom lane


Re: WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

От
"Merlin Moncure"
Дата:
On Mon, Oct 13, 2008 at 9:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> "Merlin Moncure" <mmoncure@gmail.com> writes:
>>> select foo from foo order by foo;
>>> ERROR:  could not identify an ordering operator for type foo
>
>> Yeah, these are because of the incomplete handling of named record
>> types.  I'm not sure how far we want to go in that direction.
>
> On looking closer, all these cases fail because I forgot to teach
> IsBinaryCoercible() that any composite type should be considered
> binary-coercible to RECORD.  Which is clearly sensible.
>
> I'm inclined to apply the patch with binary-coercibility adjustments
> and not try to turn RECORD or RECORD[] into full-fledged polymorphic
> types.  It's not immediately clear what the use of that would be
> anyway.


...meaning, that you would not be able to create a function taking
generic 'record' as a parameter?  In that case I agree...any chance of
getting an updated patch?

merlin


Re: WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

От
Tom Lane
Дата:
"Merlin Moncure" <mmoncure@gmail.com> writes:
> On Mon, Oct 13, 2008 at 9:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I'm inclined to apply the patch with binary-coercibility adjustments
>> and not try to turn RECORD or RECORD[] into full-fledged polymorphic
>> types.  It's not immediately clear what the use of that would be
>> anyway.

> ...meaning, that you would not be able to create a function taking
> generic 'record' as a parameter?

Well, you've never been able to do that, although for many of the PLs
there doesn't seem to be any very fundamental reason why not.  But
I was actually wondering about something beyond that: should we have the
equivalent of the polymorphic-type behaviors for composites?  That would
mean rules along the line of "all records mentioned in the call and
result are the same composite type" and "record[] means the array type
corresponding to whichever type record is".

We don't seem to need these things in order to solve the recursion cycle
detection problem, so I'm not very excited about pursuing the line of
thought any further right now.

> In that case I agree...any chance of
> getting an updated patch?

See CVS HEAD ...
        regards, tom lane


Re: WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

От
Alvaro Herrera
Дата:
Tom Lane escribió:
> "Merlin Moncure" <mmoncure@gmail.com> writes:
> > On Mon, Oct 13, 2008 at 9:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I'm inclined to apply the patch with binary-coercibility adjustments
> >> and not try to turn RECORD or RECORD[] into full-fledged polymorphic
> >> types.  It's not immediately clear what the use of that would be
> >> anyway.
> 
> > ...meaning, that you would not be able to create a function taking
> > generic 'record' as a parameter?
> 
> Well, you've never been able to do that, although for many of the PLs
> there doesn't seem to be any very fundamental reason why not.

Yeah, it seems an arbitrary restriction for no very good reason.  When I
was working on PL/php (years ago) I tried to make it work because I
found it useful for some use case I was trying, but couldn't.  I don't
remember the details (and PL/php has been pretty much abandoned since
then anyway.)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.


Re: WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

От
David Fetter
Дата:
On Mon, Oct 13, 2008 at 07:01:29PM -0400, Tom Lane wrote:
> "Merlin Moncure" <mmoncure@gmail.com> writes:
> > On Mon, Oct 13, 2008 at 9:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> I'm inclined to apply the patch with binary-coercibility
> >> adjustments and not try to turn RECORD or RECORD[] into
> >> full-fledged polymorphic types.  It's not immediately clear what
> >> the use of that would be anyway.
> 
> > ...meaning, that you would not be able to create a function taking
> > generic 'record' as a parameter?
> 
> Well, you've never been able to do that, although for many of the
> PLs there doesn't seem to be any very fundamental reason why not.
> But I was actually wondering about something beyond that: should we
> have the equivalent of the polymorphic-type behaviors for
> composites?  That would mean rules along the line of "all records
> mentioned in the call and result are the same composite type" and
> "record[] means the array type corresponding to whichever type
> record is".

+1 :)

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


Re: WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

От
"Merlin Moncure"
Дата:
On Mon, Oct 13, 2008 at 7:01 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Merlin Moncure" <mmoncure@gmail.com> writes:
>> On Mon, Oct 13, 2008 at 9:56 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I'm inclined to apply the patch with binary-coercibility adjustments
>>> and not try to turn RECORD or RECORD[] into full-fledged polymorphic
>>> types.  It's not immediately clear what the use of that would be
>>> anyway.
>
>> ...meaning, that you would not be able to create a function taking
>> generic 'record' as a parameter?
>
> Well, you've never been able to do that, although for many of the PLs
> there doesn't seem to be any very fundamental reason why not.  But
> I was actually wondering about something beyond that: should we have the
> equivalent of the polymorphic-type behaviors for composites?  That would
> mean rules along the line of "all records mentioned in the call and
> result are the same composite type" and "record[] means the array type
> corresponding to whichever type record is".
>
> We don't seem to need these things in order to solve the recursion cycle
> detection problem, so I'm not very excited about pursuing the line of
> thought any further right now.

Here is another use-case solved by the patch.  Previously, there was
no easy way to index on a composite function result.  The following
works in HEAD:

create function func(f foo, a out int, b out int) returns record ...

create index foo_idx on foo(func(foo));

The point being, you can now directly index the return composite vs.
trying to deal with the headache of expanding the function result.
You might see this in userland if trying to index on a parsed
expression.  Really nice...composites are first class citizens now...

merlin


Re: WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

От
Tom Lane
Дата:
"Merlin Moncure" <mmoncure@gmail.com> writes:
> Here is another use-case solved by the patch.  Previously, there was
> no easy way to index on a composite function result.  The following
> works in HEAD:

> create function func(f foo, a out int, b out int) returns record ...

> create index foo_idx on foo(func(foo));

Urk ... "works" for small values of "work", perhaps.  Did you try using
the index from a fresh session?

We could support this for named composite types but not for anonymous
record types.  I'm not quite sure how to enforce that distinction
considering that the opclass is defined to take "record".  Maybe we
should apply CheckAttributeType() to index column types?
        regards, tom lane


Re: WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

От
"Merlin Moncure"
Дата:
On Tue, Oct 14, 2008 at 10:59 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Merlin Moncure" <mmoncure@gmail.com> writes:
>> Here is another use-case solved by the patch.  Previously, there was
>> no easy way to index on a composite function result.  The following
>> works in HEAD:
>
>> create function func(f foo, a out int, b out int) returns record ...
>
>> create index foo_idx on foo(func(foo));
>
> Urk ... "works" for small values of "work", perhaps.  Did you try using
> the index from a fresh session?
>
> We could support this for named composite types but not for anonymous
> record types.  I'm not quite sure how to enforce that distinction
> considering that the opclass is defined to take "record".  Maybe we
> should apply CheckAttributeType() to index column types?

yup.  being able to do this with anonymous types would be nice
though...put a thumbtack on that :-)

merlin


Re: WITH RECURSIVE ... CYCLE in vanilla SQL: issues with arrays of rows

От
Bruce Momjian
Дата:
Added to TODO:

Add support for WITH RECURSIVE ... CYCLE
   * http://archives.postgresql.org/pgsql-hackers/2008-10/msg00291.php 

---------------------------------------------------------------------------

Tom Lane wrote:
> I looked a bit at the SQL:2008 spec for a CYCLE clause for WITH
> RECURSIVE.  It is interesting to see that it is just syntactic sugar,
> because *they spell out how to expand it into regular SQL*.  More,
> they defined it in such a way that it's hard to optimize at all,
> because the "path" column is exposed to the user; you don't really
> have any choice about how to do it.  There are some ugly and unnecessary
> choices in there too, like insisting that the cycle mark column be
> char(1).
> 
> So I am not feeling very excited about implementing the syntax per se
> (and I note that DB2 doesn't seem to have done so either).  Instead
> we should document some examples of how to do cycle detection at the
> SQL level.  However, it would be nice if the spec's approach to cycle
> detection actually worked well in Postgres.  There are a couple of
> things we seem to be missing, according to some experiments I just
> did with trying to translate the spec's code into Postgres:
> 
> * The spec assumes that ARRAY[ROW(some columns)] works, ie, that you can
> have an array of an anonymous record type.  We don't allow that right
> now, but it seems like a useful thing to have --- at least as a
> transient value within a query.  I'm not sure there's a case for
> allowing such things to go to disk.
> 
> * The spec writes this to detect whether a row of an anonymous record
> type is present in an array of that same anonymous record type:
>     ROW(some columns) IN (SELECT P.* FROM TABLE(array variable) P) 
> We haven't got the TABLE() syntax; you can sort of emulate it with a SRF
> but only for arrays of named rowtypes.  For an anonymous rowtype,
> it's very unclear to me how the rowtype would be communicated at
> parse time so that the P.* notation could be expanded properly.
> 
> * Instead of the above, we could try to make
>     ROW(some columns) = ANY (array variable)
> work.  This is shorter than the above syntax and would presumably have
> a lot less overhead too.  But it doesn't work right now, not even for
> named rowtypes much less anonymous ones.
> 
> I'm thinking that addressing these pieces would be a generally good
> thing to do, above and beyond potential uses in recursive queries.
> 
>             regards, tom lane
> 
> -- 
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +