Обсуждение: Boolean operators without commutators vs. ALL/ANY

Поиск

Список

Период

Сортировка

Boolean operators without commutators vs. ALL/ANY

От

Florian Pflug

Дата:

12 июня 2011 г., 08:46:18

Hi

I've recently wanted to define a check constraint on an array
column that verifies that all array entries match some regular
expression. Unfortunately, t

The most natural way of expressing such a check would be CHECK ('<regexp>' ~ ANY(field)),
but that doesn't work, because "~" expects the *value*
to be the left argument and the *pattern* to be the right.

The next try was CHECK (ANY(field) ~ '<regexp>'),
but that doesn't even parse.

Ok, so then use UNNEST() and BOOL_AND() I figured, and wrote CHECK ((SELECT BOOL_AND(v ~ '<regexp>') FROM UNNEST(field)
v)).
But that of course lead to nothing but ERROR: cannot use subquery in check constraint

So I the end, I had to wrap the sub-query in a SQL-language
function and use that in the check constraint. While this
solved my immediate problem, the necessity of doing that
highlights a few problems

(A) "~" is an extremely bad name for the regexp-matching
operators, since it's visual form is symmetric but it's
behaviour isn't. This doesn't only make its usage very
error-prone, it also makes it very hard to come up with
sensible name for an commutator of "~". I suggest that we
add "=~" as an alias for "~", "~=" as an commutator
for "=~", and deprecate "~". The same holds for "~~".
We might want to do this starting with 9.1. 

(B) There should be a way to use ANY()/ALL() with the
array elements becoming the left arguments of the operator.
Ideally, we'd support "ANY(<array>) <operator> <value>",
but if that's not possible grammar-wise, I suggest we extend
the OPERATOR() syntax to allow <value> OPERATOR(COMMUTATOR <operator>) ANY(<array>).
OPERATOR(COMMUTATOR <operator>) would use the COMMUTATOR
of the specified operator if one exists, and otherwise
use the original operator with the arguments swapped.

(C) Why do we forbid sub-queries in CHECK constraints?
I do realize that any non-IMMUTABLE CHECK constraint is
a foot-gun, but since we already allow STABLE and even
VOLATILE functions to be used inside CHECK constraint,
forbidding sub-queries seems a bit pointless...

best regards,
Florian Pflug

Re: Boolean operators without commutators vs. ALL/ANY

От

Robert Haas

Дата:

13 июня 2011 г., 00:13:07

On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote:
> So I the end, I had to wrap the sub-query in a SQL-language
> function and use that in the check constraint. While this
> solved my immediate problem, the necessity of doing that
> highlights a few problems
>
> (A) "~" is an extremely bad name for the regexp-matching
> operators, since it's visual form is symmetric but it's
> behaviour isn't. This doesn't only make its usage very
> error-prone, it also makes it very hard to come up with
> sensible name for an commutator of "~". I suggest that we
> add "=~" as an alias for "~", "~=" as an commutator
> for "=~", and deprecate "~". The same holds for "~~".

Does any other database or programming language implement it this way?

> (B) There should be a way to use ANY()/ALL() with the
> array elements becoming the left arguments of the operator.
> Ideally, we'd support "ANY(<array>) <operator> <value>",
> but if that's not possible grammar-wise, I suggest we extend
> the OPERATOR() syntax to allow
>  <value> OPERATOR(COMMUTATOR <operator>) ANY(<array>).
> OPERATOR(COMMUTATOR <operator>) would use the COMMUTATOR
> of the specified operator if one exists, and otherwise
> use the original operator with the arguments swapped.

It seems to me that if we provided some way of handling this, your
first proposal would be moot; and I have to say I like the idea of
allowing this a lot more than tinkering with the operator names.  I'm
not crazy about the proposed syntax, though; it seems cumbersome, and
it's really only needed for SOME/ALL/ANY, not in general operator
expressions.  Since ANY is a reserved keyword, I believe we could
allow something like "expr op ANY BACKWARD ( ... )" -- or some other
keyword in lieu of BACKWARD if you prefer.

Hath the spec anything to say about this?

> (C) Why do we forbid sub-queries in CHECK constraints?
> I do realize that any non-IMMUTABLE CHECK constraint is
> a foot-gun, but since we already allow STABLE and even
> VOLATILE functions to be used inside CHECK constraint,
> forbidding sub-queries seems a bit pointless...

Dunno.  Maybe it's just an implementation restriction?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Boolean operators without commutators vs. ALL/ANY

От

Tom Lane

Дата:

13 июня 2011 г., 00:45:05

Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote:
>> (B) There should be a way to use ANY()/ALL() with the
>> array elements becoming the left arguments of the operator.

> It seems to me that if we provided some way of handling this, your
> first proposal would be moot; and I have to say I like the idea of
> allowing this a lot more than tinkering with the operator names.

There are syntactic reasons not to do that.  It'd be a lot easier just
to provide a commutator operator for ~.

>> (C) Why do we forbid sub-queries in CHECK constraints?

> Dunno.  Maybe it's just an implementation restriction?

(1) We don't want to invoke the planner in the places where we'd
have to do so to make that work.

(2) It's just about inevitable that a sub-query would have results
dependent on other rows beside the one being checked.  As such, it
would be trying to enforce semantics that you simply can't enforce
via CHECK.  (And yes, you can bypass that with a function, but guess
what: it still won't actually work.)
        regards, tom lane

Re: Boolean operators without commutators vs. ALL/ANY

От

Robert Haas

Дата:

13 июня 2011 г., 01:01:00

On Sun, Jun 12, 2011 at 11:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote:
>>> (B) There should be a way to use ANY()/ALL() with the
>>> array elements becoming the left arguments of the operator.
>
>> It seems to me that if we provided some way of handling this, your
>> first proposal would be moot; and I have to say I like the idea of
>> allowing this a lot more than tinkering with the operator names.
>
> There are syntactic reasons not to do that.  It'd be a lot easier just
> to provide a commutator operator for ~.

Details?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Boolean operators without commutators vs. ALL/ANY

От

Florian Pflug

Дата:

13 июня 2011 г., 04:01:56

On Jun13, 2011, at 05:12 , Robert Haas wrote:
> On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote:
>> So I the end, I had to wrap the sub-query in a SQL-language
>> function and use that in the check constraint. While this
>> solved my immediate problem, the necessity of doing that
>> highlights a few problems
>>
>> (A) "~" is an extremely bad name for the regexp-matching
>> operators, since it's visual form is symmetric but it's
>> behaviour isn't. This doesn't only make its usage very
>> error-prone, it also makes it very hard to come up with
>> sensible name for an commutator of "~". I suggest that we
>> add "=~" as an alias for "~", "~=" as an commutator
>> for "=~", and deprecate "~". The same holds for "~~".
>
> Does any other database or programming language implement it this way?

Ruby has "=~", which returns the position of the regexp's first
match, or nil if there is none.

$ ruby -e "puts 'hello' =~ /l+/"
2
$ ruby -e "puts 'hello' =~ /x+/"
nil

>> (B) There should be a way to use ANY()/ALL() with the
>> array elements becoming the left arguments of the operator.
>> Ideally, we'd support "ANY(<array>) <operator> <value>",
>> but if that's not possible grammar-wise, I suggest we extend
>> the OPERATOR() syntax to allow
>>  <value> OPERATOR(COMMUTATOR <operator>) ANY(<array>).
>> OPERATOR(COMMUTATOR <operator>) would use the COMMUTATOR
>> of the specified operator if one exists, and otherwise
>> use the original operator with the arguments swapped.
>
> It seems to me that if we provided some way of handling this, your
> first proposal would be moot; and I have to say I like the idea of
> allowing this a lot more than tinkering with the operator names.

Well, the issue of "~" being anti-self-explanatory remains
independent from whether we do (B) or not.

> I'm
> not crazy about the proposed syntax, though; it seems cumbersome, and
> it's really only needed for SOME/ALL/ANY, not in general operator
> expressions.  Since ANY is a reserved keyword, I believe we could
> allow something like "expr op ANY BACKWARD ( ... )" -- or some other
> keyword in lieu of BACKWARD if you prefer.

Hm, that's less bulky but more kludgy, I'd say. But wait a minute...

If ANY and ALL are reserved anyway, should it be possible to
make "(ANY(..) <op> <expr>)" and "(ALL(...) <op> <expr>)"
work grammar-wise? (Note the enclosing parens)

I just tried that, and it seems to work. bison doesn't report
and conflicts, the regression tests still succeed, and
I get the following

postgres=# select (all(array[1,2]) = 1);
ERROR:  ANY()/ALL() <op> <expr> is not yet implemented at character 9
STATEMENT:  select (all(array[1,2]) = 1);
ERROR:  ANY()/ALL() <op> <expr> is not yet implemented
LINE 1: select (all(array[1,2]) = 1);
                ^
I've attached a patch with the changes to gram.y.

best regards,
Florian Pflug

Вложения

pg_anyall_reversed.v0.patch

Re: Boolean operators without commutators vs. ALL/ANY

От

"Stephen J. Butler"

Дата:

13 июня 2011 г., 04:19:29

On Sun, Jun 12, 2011 at 6:46 AM, Florian Pflug <fgp@phlo.org> wrote:
> (B) There should be a way to use ANY()/ALL() with the
> array elements becoming the left arguments of the operator.

FWIW, in case people were unaware, this is getting close to Perl 6
junctions/superpositions. See:
<http://dev.perl.org/perl6/doc/design/exe/E06.html> "The Wonderful
World of Junctions". Beyond "any" and "all" they propose "one" and
"none".

Re: Boolean operators without commutators vs. ALL/ANY

От

Florian Pflug

Дата:

13 июня 2011 г., 05:32:08

On Jun13, 2011, at 05:44 , Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote:
>>> (C) Why do we forbid sub-queries in CHECK constraints?
> 
>> Dunno.  Maybe it's just an implementation restriction?
> 
> (1) We don't want to invoke the planner in the places where we'd
> have to do so to make that work.

Hm, OK, I get that.

> (2) It's just about inevitable that a sub-query would have results
> dependent on other rows beside the one being checked.  As such, it
> would be trying to enforce semantics that you simply can't enforce
> via CHECK.  (And yes, you can bypass that with a function, but guess
> what: it still won't actually work.)

Yeah, I never expected non-immutable CHECK constraints to work. I was
simply pointing out that UNNEST() allows one to write all kinds of
interesting CHECK constraints, all of which are immutable.

Now, if supporting these would increase code complexity, or cause
a performance drop for non-sub-query CHECK constraints, I'm perfectly
fine with leaving them unsupported. I just wanted to make sure we
aren't simply nannying the user - especially since it's a nanny who's
out-smarted by function calls.

best regards,
Florian Pflug

Re: Boolean operators without commutators vs. ALL/ANY

От

Tom Lane

Дата:

13 июня 2011 г., 10:50:04

Robert Haas <robertmhaas@gmail.com> writes:
> On Sun, Jun 12, 2011 at 11:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> There are syntactic reasons not to do that.  It'd be a lot easier just
>> to provide a commutator operator for ~.

> Details?

Well, for one, it becomes unobvious what
A op ANY (B) op C

means.  This has come up before, and I seem to recall that we identified
some more-compelling problems, but that's the best I can do before
consuming any caffeine.

In any case, if you dig around enough to notice all the infrastructure
that's involved with this, you'll definitely come to the conclusion that
it'd be a lot less work to just add the missing commutator operators.
        regards, tom lane

Re: Boolean operators without commutators vs. ALL/ANY

От

Robert Haas

Дата:

13 июня 2011 г., 11:07:25

On Mon, Jun 13, 2011 at 3:01 AM, Florian Pflug <fgp@phlo.org> wrote:
> On Jun13, 2011, at 05:12 , Robert Haas wrote:
>> On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote:
>>> So I the end, I had to wrap the sub-query in a SQL-language
>>> function and use that in the check constraint. While this
>>> solved my immediate problem, the necessity of doing that
>>> highlights a few problems
>>>
>>> (A) "~" is an extremely bad name for the regexp-matching
>>> operators, since it's visual form is symmetric but it's
>>> behaviour isn't. This doesn't only make its usage very
>>> error-prone, it also makes it very hard to come up with
>>> sensible name for an commutator of "~". I suggest that we
>>> add "=~" as an alias for "~", "~=" as an commutator
>>> for "=~", and deprecate "~". The same holds for "~~".
>>
>> Does any other database or programming language implement it this way?
>
> Ruby has "=~", which returns the position of the regexp's first
> match, or nil if there is none.
>
> $ ruby -e "puts 'hello' =~ /l+/"
> 2
> $ ruby -e "puts 'hello' =~ /x+/"
> nil

Sure.  Some languages use =~ and some use just ~... I was just
wondering if anyone thought the commutator of =~ was ~=...

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Boolean operators without commutators vs. ALL/ANY

От

Andrew Dunstan

Дата:

13 июня 2011 г., 11:19:40

On 06/13/2011 10:07 AM, Robert Haas wrote:
> Some languages use =~ and some use just ~... I was just
> wondering if anyone thought the commutator of =~ was ~=...

My feeling is it's a bit dangerous. It's too easy to fat-finger the 
reverse op, and get something quite unintended.

cheers

andrew (whose lack of coordination sadly increases with age)

Re: Boolean operators without commutators vs. ALL/ANY

От

David Fetter

Дата:

13 июня 2011 г., 12:54:45

On Mon, Jun 13, 2011 at 09:01:45AM +0200, Florian Pflug wrote:
> Hm, that's less bulky but more kludgy, I'd say. But wait a minute...
> 
> If ANY and ALL are reserved anyway, should it be possible to
> make "(ANY(..) <op> <expr>)" and "(ALL(...) <op> <expr>)"
> work grammar-wise? (Note the enclosing parens)

This would be a very, very useful feature. :)

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Boolean operators without commutators vs. ALL/ANY

От

Florian Pflug

Дата:

14 июня 2011 г., 07:06:10

On Jun13, 2011, at 16:19 , Andrew Dunstan wrote:
> On 06/13/2011 10:07 AM, Robert Haas wrote:
>> Some languages use =~ and some use just ~... I was just
>> wondering if anyone thought the commutator of =~ was ~=...
>
> My feeling is it's a bit dangerous. It's too easy to fat-finger the reverse op, and get something quite unintended.

Well, but with "~" you need to *remember* that the regexp
goes on the right side and the text on the left. That seems
worse than the risk of fat-fingering "=~" and getting "~=".

At, at least, have looked up the argument order of "~"
countless in the past...

best regards,
Florian Pflug

Re: Boolean operators without commutators vs. ALL/ANY

От

Florian Pflug

Дата:

14 июня 2011 г., 07:10:26

On Jun13, 2011, at 05:44 , Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote:
>>> (B) There should be a way to use ANY()/ALL() with the
>>> array elements becoming the left arguments of the operator.
> 
>> It seems to me that if we provided some way of handling this, your
>> first proposal would be moot; and I have to say I like the idea of
>> allowing this a lot more than tinkering with the operator names.
> 
> There are syntactic reasons not to do that.  It'd be a lot easier just
> to provide a commutator operator for ~.

My suggestion would be the add a commutator for "~" as a short-term
solution (preferably in 9.1).

Since "~" doesn't inspire any obvious names for a possible commutator,
I suggest adding "=~" and "~=".

Is there any support for that proposal?

In the long term, I'd like to add support for "(ANY() <op> <expr>)"
(Note the enclosing parens). I've checked that this works grammar-wise,
but haven't no idea how much tweaking the executor needs to support
that...

best regards,
Florian Pflug

Re: Boolean operators without commutators vs. ALL/ANY

От

Robert Haas

Дата:

14 июня 2011 г., 09:29:15

On Tue, Jun 14, 2011 at 6:10 AM, Florian Pflug <fgp@phlo.org> wrote:
> On Jun13, 2011, at 05:44 , Tom Lane wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote:
>>>> (B) There should be a way to use ANY()/ALL() with the
>>>> array elements becoming the left arguments of the operator.
>>
>>> It seems to me that if we provided some way of handling this, your
>>> first proposal would be moot; and I have to say I like the idea of
>>> allowing this a lot more than tinkering with the operator names.
>>
>> There are syntactic reasons not to do that.  It'd be a lot easier just
>> to provide a commutator operator for ~.
>
> My suggestion would be the add a commutator for "~" as a short-term
> solution (preferably in 9.1).

I don't think we want to bump catversion again before release if we
can avoid it.  And I don't see this as being a terribly urgent problem
- it's not like this is a new regression, and I can't remember hearing
any complaints about it prior to two days ago.

> Since "~" doesn't inspire any obvious names for a possible commutator,
> I suggest adding "=~" and "~=".
>
> Is there any support for that proposal?

I'm OK with adding a commutator but I guess I don't see the point of
adding a synonym for ~ along the way.  The existing use of ~ is
consistent with, for example, awk, so it's not like we've dreamed up
something utterly crazy that we now need to fix.  I'd suggest we just
come up with some arbitrary variant, like ~~ or <~ or #~ or
!#!%@~bikeshed++!.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Boolean operators without commutators vs. ALL/ANY

От

Florian Pflug

Дата:

14 июня 2011 г., 10:38:12

On Jun14, 2011, at 14:29 , Robert Haas wrote:
> On Tue, Jun 14, 2011 at 6:10 AM, Florian Pflug <fgp@phlo.org> wrote:
>> On Jun13, 2011, at 05:44 , Tom Lane wrote:
>>> Robert Haas <robertmhaas@gmail.com> writes:
>>>> On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote:
>>>>> (B) There should be a way to use ANY()/ALL() with the
>>>>> array elements becoming the left arguments of the operator.
>>>
>>>> It seems to me that if we provided some way of handling this, your
>>>> first proposal would be moot; and I have to say I like the idea of
>>>> allowing this a lot more than tinkering with the operator names.
>>>
>>> There are syntactic reasons not to do that.  It'd be a lot easier just
>>> to provide a commutator operator for ~.
>>
>> My suggestion would be the add a commutator for "~" as a short-term
>> solution (preferably in 9.1).
>
> I don't think we want to bump catversion again before release if we
> can avoid it.  And I don't see this as being a terribly urgent problem
> - it's not like this is a new regression, and I can't remember hearing
> any complaints about it prior to two days ago.

Hm, OK, that makes sense...

>> Since "~" doesn't inspire any obvious names for a possible commutator,
>> I suggest adding "=~" and "~=".
>>
>> Is there any support for that proposal?
>
> I'm OK with adding a commutator but I guess I don't see the point of
> adding a synonym for ~ along the way.  The existing use of ~ is
> consistent with, for example, awk, so it's not like we've dreamed up
> something utterly crazy that we now need to fix.  I'd suggest we just
> come up with some arbitrary variant, like ~~ or <~ or #~ or
> !#!%@~bikeshed++!.


That, however, I'm not at all happy with. Quite frankly, operator
naming is already a bit of a mess, and readability of queries
suffers as a result. The geometric types are especially vile
offenders in this regard, but the various array-related operators
aren't poster children either.

I think we should try to work towards more mnemonic operator
naming, not add to the mess by defining commutator pairs whose
names bear no visual resemblance whatsoever to one each other.

I'm not wedded to "=~", it's just the only name I could come
up which (a) has a natural commutator (b) gives visual indication of which argument constitutes the     text and which
thepattern (c) there is precedent for. 

BTW, there's actually precedent for a commutator of "~", namely
"@". Some of the geometric types (polygon, box, circle, point,
path) use "~" as a commutator for "@" (which stands for "contains").
But IMHO that mainly proves that the geometric types are vile
offenders when it comes to readability...

The pair ("@", "~" ) is also the only pair of commutators whose
names are totally unrelated to each other. Given a suitable
definition of a reverse() function for text [1], the following query

select       o1.oprleft::regtype || ' ' || o1.oprname || ' ' || o1.oprright::regtype as opr,       o2.oprleft::regtype
||' ' || o2.oprname || ' ' || o2.oprright::regtype as com,       o1.oprcode as opr_code,       o2.oprcode as com_code 
from pg_operator o1
join pg_operator o2 on o1.oprcom = o2.oid or o2.oprcom = o1.oid
where       o1.oid < o2.oid and       o1.oprname <> reverse(translate(o2.oprname, '<>', '><')) and       o1.oprname <>
translate(o2.oprname,'<>', '><'); 

produces
       opr        |        com        |      opr_code       |     com_code
-------------------+-------------------+---------------------+-------------------polygon @ polygon | polygon ~ polygon
|poly_contained      | poly_containbox @ box         | box ~ box         | box_contained       | box_containcircle @
circle  | circle ~ circle   | circle_contained    | circle_containpoint @ path      | path ~ point      | on_ppath
     | path_contain_ptpoint @ polygon   | polygon ~ point   | pt_contained_poly   | poly_contain_ptpoint @ circle    |
circle~ point    | pt_contained_circle | circle_contain_pt 
(6 rows)

best regards,
Florian Pflug

[1] I used
create or replace function reverse(text) returns text as $$
select string_agg(substring($1, i, 1), '') from generate_series(length($1), 1, -1) i
$$ language sql immutable;

[WIP] Support for "ANY/ALL(array) op scalar" (Was: Re: Boolean operators without commutators vs. ALL/ANY)

От

Florian Pflug

Дата:

15 июня 2011 г., 22:56:00

On Jun13, 2011, at 17:41 , David Fetter wrote:
> On Mon, Jun 13, 2011 at 09:01:45AM +0200, Florian Pflug wrote:
>> Hm, that's less bulky but more kludgy, I'd say. But wait a minute...
>>
>> If ANY and ALL are reserved anyway, should it be possible to
>> make "(ANY(..) <op> <expr>)" and "(ALL(...) <op> <expr>)"
>> work grammar-wise? (Note the enclosing parens)
>
> This would be a very, very useful feature. :)

There, you dared me. So here's a patch :-P

So far it supports (ANY(array) op scalar) and (ALL(array) op scalar),
but not (ANY/ALL(subselect) op scalar). "ANY/ALL op scalar" is
supported independent from whether <op> has a commutator or not.
In the latter case, the clause isn't indexable, but thats no
different from the case "const op field". Both ANY(...) = ctid
and ctid = ANY(...) are correctly executed as TID scans (if the
array expression is a pseudo-constant, that is).

The patch adds a fields "aryArgIdx" (0 or 1) to ScalarArrayOpExpr
which stores the index of the array-valued argument. Thus, for
the traditional "scalar op ANY/ALL(array)" variant, aryArgIdx == 1,
for "ANY/ALL(array) op scalar" it's zero.

I've updates all places that I could find which inspect
ScalarArrayOpExprs to take the aryArgIdx into account. And boy,
those were quite a few, which explains the size of the patch.
Most of the changes are pretty trivial, though.

The indexing support works similar to the case "scalar op scalar",
i.e. match_clause_to_indexcol() consideres the clause to be index-able
if the scalar argument matches an index definition, and relies on
fix_indexqual_references() to commute the ScalarArrayOpExprs if
the scalar argument is on the right side instead of on the left.

I noticed that ANY/ALL is hardly exercised by the regression tests
at all, so I added a (pretty exhaustive, I think) test any_all.
The test exercises the traditional and the reversed form of
ANY/ALL and verify that an index is used if possible.

Comments are extremely welcome, especially ones regarding
the overall approach taken in this patch. If people consider
that to be acceptable, I'd try to add the missing features
and add documentation.

best regards,
Florian Pflug

Вложения

pg_anyall_reversed.v0.patch

Re: [WIP] Support for "ANY/ALL(array) op scalar" (Was: Re: Boolean operators without commutators vs. ALL/ANY)

От

Tom Lane

Дата:

15 июня 2011 г., 23:20:05

Florian Pflug <fgp@phlo.org> writes:
> Comments are extremely welcome, especially ones regarding
> the overall approach taken in this patch. If people consider
> that to be acceptable, I'd try to add the missing features
> and add documentation.

Quite honestly, I don't like this one bit and would rather you not
pursue the idea.  There is no such syntax in the standard, and
presumably that's not because the SQL committee never thought of it.
They may have some incompatible idea in mind for the future, who knows?
But in any case, this won't provide any functionality whatever that we
couldn't provide at much less effort and risk, just by providing
commutator operators for the few missing cases.

(FWIW, I've come around to liking the idea of using =~ and the obvious
variants of that for regex operators, mainly because of the Perl
precedent.)
        regards, tom lane

Re: Boolean operators without commutators vs. ALL/ANY

От

Peter Eisentraut

Дата:

16 июня 2011 г., 01:38:26

On mån, 2011-06-13 at 10:19 -0400, Andrew Dunstan wrote:
> On 06/13/2011 10:07 AM, Robert Haas wrote:
> > Some languages use =~ and some use just ~... I was just
> > wondering if anyone thought the commutator of =~ was ~=...
> 
> My feeling is it's a bit dangerous. It's too easy to fat-finger the 
> reverse op, and get something quite unintended.

Yes, it looked highly dangerous to me as well.

Re: Boolean operators without commutators vs. ALL/ANY

От

Peter Eisentraut

Дата:

16 июня 2011 г., 01:39:27

On tis, 2011-06-14 at 15:38 +0200, Florian Pflug wrote:
> BTW, there's actually precedent for a commutator of "~", namely
> "@". Some of the geometric types (polygon, box, circle, point,
> path) use "~" as a commutator for "@" (which stands for "contains"). 

I wouldn't have a problem with naming the reverse operator "@".

Re: [WIP] Support for "ANY/ALL(array) op scalar" (Was: Re: Boolean operators without commutators vs. ALL/ANY)

От

Peter Eisentraut

Дата:

16 июня 2011 г., 01:40:45

On ons, 2011-06-15 at 22:19 -0400, Tom Lane wrote:
> (FWIW, I've come around to liking the idea of using =~ and the obvious
> variants of that for regex operators, mainly because of the Perl
> precedent.)

Maybe I'm not completely up to date on this, but I observe that Perl
itself doesn't appear to have a commutator for =~ .

Re: Boolean operators without commutators vs. ALL/ANY

От

Tom Lane

Дата:

16 июня 2011 г., 01:50:29

Peter Eisentraut <peter_e@gmx.net> writes:
> On tis, 2011-06-14 at 15:38 +0200, Florian Pflug wrote:
>> BTW, there's actually precedent for a commutator of "~", namely
>> "@". Some of the geometric types (polygon, box, circle, point,
>> path) use "~" as a commutator for "@" (which stands for "contains"). 

> I wouldn't have a problem with naming the reverse operator "@".

We deprecated those names for the geometric operators largely because
there wasn't any visual correlation between the commutator pairs.
I can't see introducing the same pairing for regex operators if we
already decided the geometric case was a bad idea.
        regards, tom lane

Re: [WIP] Support for "ANY/ALL(array) op scalar" (Was: Re: Boolean operators without commutators vs. ALL/ANY)

От

Tom Lane

Дата:

16 июня 2011 г., 02:12:01

Peter Eisentraut <peter_e@gmx.net> writes:
> On ons, 2011-06-15 at 22:19 -0400, Tom Lane wrote:
>> (FWIW, I've come around to liking the idea of using =~ and the obvious
>> variants of that for regex operators, mainly because of the Perl
>> precedent.)

> Maybe I'm not completely up to date on this, but I observe that Perl
> itself doesn't appear to have a commutator for =~ .

Nope, it doesn't.  But that doesn't mean we don't need one.  Aside from
the ANY/ALL business, the index infrastructure is asymmetrical: it will
only deal with indexable WHERE clauses that have the index column on the
left.  So those are two very good reasons to make sure that
operators returning boolean all have commutators.  (I will refrain for
the moment from speculating whether we'll ever have an index type that
supports regexp match directly as an indexable operator...)

At the moment, this query:

select oid::regoperator, oprcode from pg_operator
where oprkind = 'b' and oprresult = 'bool'::regtype and oprcom = 0;

says we have 83 such operators without commutators.  Of these, if I'm
counting correctly, 26 are the LIKE and regex operators in question.
It looks like at least another twenty do in fact constitute commutator
pairs, they're just not documented as such via oprcom.  The remaining
forty or less are a morass of functions for obsolete types, duplicate
names for the same function, etc.  I don't think it's unreasonable at
all to establish an expectation that all non-legacy binary boolean
operators should come in commutator pairs.
        regards, tom lane

Re: [WIP] Support for "ANY/ALL(array) op scalar" (Was: Re: Boolean operators without commutators vs. ALL/ANY)

От

Florian Pflug

Дата:

16 июня 2011 г., 06:16:34

On Jun16, 2011, at 04:19 , Tom Lane wrote:
> Florian Pflug <fgp@phlo.org> writes:
>> Comments are extremely welcome, especially ones regarding
>> the overall approach taken in this patch. If people consider
>> that to be acceptable, I'd try to add the missing features
>> and add documentation.
> 
> Quite honestly, I don't like this one bit and would rather you not
> pursue the idea.  There is no such syntax in the standard, and
> presumably that's not because the SQL committee never thought of it.
> They may have some incompatible idea in mind for the future, who knows?

The SQL standard doesn't have CREATE OPERATOR though, so for them
the asymmetry of the ANY/ALL constructs don't translate to a missing
feature, though. For us, however it does. We might try to doge that
by decreeing that boolean operators better have commutators, but that
doesn't help for non-core-defined operators.

So I'd very much like us to provide some way to get the effect
of "ANY/ALL op scalar" without having to resort to UNNEST and BOOL_AND/OR.
But I'm absolutely not wedded to the syntax "ANY/ALL op scalar".

One other idea I've been kicking around is to generate commutators
automatically for all binary operators that return boolean. If no name
for the commutator is specified (i.e. if CREATE OPERATOR is called without
a value for COMMUTATOR), it's be named "COMMUTATOR <op>", and would thus
be useably only via OPERATOR(...).

Implementation-wise we'd need to add a flag to FmgrInfo which tells
the fmgr to swap the function's arguments, and would need to centralize
the translation of operator OIDs to FmgrInfos.

best regards,
Florian Pflug

Re: Boolean operators without commutators vs. ALL/ANY

От

Robert Haas

Дата:

16 июня 2011 г., 14:55:03

On Thu, Jun 16, 2011 at 12:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> We deprecated those names for the geometric operators largely because
> there wasn't any visual correlation between the commutator pairs.
> I can't see introducing the same pairing for regex operators if we
> already decided the geometric case was a bad idea.

I'm having trouble avoiding the conclusion that we're trying to shove
a round peg into a square hole.  The idea that we have to have a
commutator for every operator just because we don't handle left and
right symmetrically sits poorly with me.  I can't really argue with
your statement that it's the easiest way to address Florian's gripe,
but because it almost surely is.  But it still feels like a kludge.
The syntax foo = ANY(bar) is really quite a poorly-designed syntax,
because the top-level operation is really "ANY", and it has three
arguments: foo, =, bar.  If the SQL committee had standardized on
ANY(foo = $0, bar) or some such thing we wouldn't be having this
conversation.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Boolean operators without commutators vs. ALL/ANY

От

Tom Lane

Дата:

16 июня 2011 г., 15:14:31

Robert Haas <robertmhaas@gmail.com> writes:
> I'm having trouble avoiding the conclusion that we're trying to shove
> a round peg into a square hole.  The idea that we have to have a
> commutator for every operator just because we don't handle left and
> right symmetrically sits poorly with me.  I can't really argue with
> your statement that it's the easiest way to address Florian's gripe,
> but because it almost surely is.  But it still feels like a kludge.
> The syntax foo = ANY(bar) is really quite a poorly-designed syntax,
> because the top-level operation is really "ANY", and it has three
> arguments: foo, =, bar.  If the SQL committee had standardized on
> ANY(foo = $0, bar) or some such thing we wouldn't be having this
> conversation.

[ shrug... ]  Take it up with the committee.  The syntax is what it is,
and we should select our operators to fit it, not vice versa.
        regards, tom lane

Re: Boolean operators without commutators vs. ALL/ANY

От

Florian Pflug

Дата:

16 июня 2011 г., 15:22:55

On Jun16, 2011, at 19:54 , Robert Haas wrote:
> On Thu, Jun 16, 2011 at 12:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We deprecated those names for the geometric operators largely because
>> there wasn't any visual correlation between the commutator pairs.
>> I can't see introducing the same pairing for regex operators if we
>> already decided the geometric case was a bad idea.
> 
> I'm having trouble avoiding the conclusion that we're trying to shove
> a round peg into a square hole.  The idea that we have to have a
> commutator for every operator just because we don't handle left and
> right symmetrically sits poorly with me.  I can't really argue with
> your statement that it's the easiest way to address Florian's gripe,
> but because it almost surely is.  But it still feels like a kludge.

Well, I think there are basically three choices here, kludge or no
kludge.

(1) We either decree once and for all that binary operations ought to
have commutators, modify CREATE TYPE to issue a warning if you
create one without, add the missing ones, and add a check for
that to opr_sanity (possibly excluding some deprecated operators).

or

(2) We arrange for commutators of binary operators to be created
automatically. 

or

(3) Or we bit the bullet and provide something similar to
"ANY/ALL op scalar". We do have the liberty to pick whatever syntax we
feel comfortable with, though, since we're out of SQL standard territory
anyway.

What I *wouldn't* like us to is just a few missing commutators and be
done with it. That pretty much guarantees that this issue will pop up
again some time in the future.

I personally prefer (3), but would also be content with (1), and be
ready to provide a patch for that. To be fair, (1) really doesn't seem
that kludgy if one takes into account that all indexable operators must
have commutators anyway.

I haven't checked how viable (2) actually is, but I dare say that it's
probably quite a bit of work. Essentially, we'd need a way to automatically
swap a function's argument before invoking the function, which I'm not
sure that fmgr can cleanly be persuaded to do.

Now all that's required is to agree on a way forward ;-)

best regards,
Florian Pflug

Re: Boolean operators without commutators vs. ALL/ANY

От

Robert Haas

Дата:

16 июня 2011 г., 16:02:36

On Thu, Jun 16, 2011 at 2:22 PM, Florian Pflug <fgp@phlo.org> wrote:
> On Jun16, 2011, at 19:54 , Robert Haas wrote:
>> On Thu, Jun 16, 2011 at 12:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> We deprecated those names for the geometric operators largely because
>>> there wasn't any visual correlation between the commutator pairs.
>>> I can't see introducing the same pairing for regex operators if we
>>> already decided the geometric case was a bad idea.
>>
>> I'm having trouble avoiding the conclusion that we're trying to shove
>> a round peg into a square hole.  The idea that we have to have a
>> commutator for every operator just because we don't handle left and
>> right symmetrically sits poorly with me.  I can't really argue with
>> your statement that it's the easiest way to address Florian's gripe,
>> but because it almost surely is.  But it still feels like a kludge.
>
> Well, I think there are basically three choices here, kludge or no
> kludge.
>
> (1) We either decree once and for all that binary operations ought to
> have commutators, modify CREATE TYPE to issue a warning if you
> create one without, add the missing ones, and add a check for
> that to opr_sanity (possibly excluding some deprecated operators).
>
> or
>
> (2) We arrange for commutators of binary operators to be created
> automatically.
>
> or
>
> (3) Or we bit the bullet and provide something similar to
> "ANY/ALL op scalar". We do have the liberty to pick whatever syntax we
> feel comfortable with, though, since we're out of SQL standard territory
> anyway.
>
> What I *wouldn't* like us to is just a few missing commutators and be
> done with it. That pretty much guarantees that this issue will pop up
> again some time in the future.
>
> I personally prefer (3), but would also be content with (1), and be
> ready to provide a patch for that. To be fair, (1) really doesn't seem
> that kludgy if one takes into account that all indexable operators must
> have commutators anyway.
>
> I haven't checked how viable (2) actually is, but I dare say that it's
> probably quite a bit of work. Essentially, we'd need a way to automatically
> swap a function's argument before invoking the function, which I'm not
> sure that fmgr can cleanly be persuaded to do.
>
> Now all that's required is to agree on a way forward ;-)

Well, Tom seems pretty strongly in favor of #1, or some variant of it,
and while I don't find that to be enormously elegant it does have the
virtue of being quite a bit less work than any of the other options.
I think the chances of that being a complete and permanent solution
are less than 50%, but perhaps it's close enough for government work.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Boolean operators without commutators vs. ALL/ANY

От

Tom Lane

Дата:

16 июня 2011 г., 16:49:30

Florian Pflug <fgp@phlo.org> writes:
> Well, I think there are basically three choices here, kludge or no
> kludge.

> (1) We either decree once and for all that binary operations ought to
> have commutators, modify CREATE TYPE to issue a warning if you
> create one without, add the missing ones, and add a check for
> that to opr_sanity (possibly excluding some deprecated operators).

> or

> (2) We arrange for commutators of binary operators to be created
> automatically. 

> or

> (3) Or we bit the bullet and provide something similar to
> "ANY/ALL op scalar". We do have the liberty to pick whatever syntax we
> feel comfortable with, though, since we're out of SQL standard territory
> anyway.

All three of these are massive overkill.  What we need is a general
policy that providing commutators is a good idea.  We do not need to try
to make it 100.00% with an enforcement mechanism.  As for #2, what's
your plan for automatically selecting a commutator operator name?

(Having said that, I *was* thinking of adding an opr_sanity test ... but
not expecting that we'd get it to find zero rows.)
        regards, tom lane

Re: Boolean operators without commutators vs. ALL/ANY

От

Peter Eisentraut

Дата:

16 июня 2011 г., 17:07:45

On tor, 2011-06-16 at 00:50 -0400, Tom Lane wrote:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > On tis, 2011-06-14 at 15:38 +0200, Florian Pflug wrote:
> >> BTW, there's actually precedent for a commutator of "~", namely
> >> "@". Some of the geometric types (polygon, box, circle, point,
> >> path) use "~" as a commutator for "@" (which stands for
> "contains"). 
> 
> > I wouldn't have a problem with naming the reverse operator "@".
> 
> We deprecated those names for the geometric operators largely because
> there wasn't any visual correlation between the commutator pairs.
> I can't see introducing the same pairing for regex operators if we
> already decided the geometric case was a bad idea.

I actually reported the exact issue that Florian reported a while ago
and we had this same sort of discussion.  I think I'm running with a
custom operator named ~~~ somewhere in production.  So yay for adding a
commutator in any case.

I don't really agree that visual correlation needs to trump everything.
If say
   foo =~ bar

and
   foo ~= bar

were to produce completely different results, this would introduce bugs
all over the place.  Most programming languages would get away with this
kind of issue because the pattern has a different data type than the
string to be matched against, so mistakes will be caught.

Looking at the list of geometric operators, I can't help but feel that
the silliness of operator naming is reaching its limits.  We can
probably come up with a few more for this particular problem, but long
term we might want to think of other solutions, such as attaching the
optimization information to functions instead, and/or inventing an infix
function call syntax like in Haskell.

Re: Boolean operators without commutators vs. ALL/ANY

От

Tom Lane

Дата:

16 июня 2011 г., 18:33:31

Peter Eisentraut <peter_e@gmx.net> writes:
> I don't really agree that visual correlation needs to trump everything.
> If say
>     foo =~ bar
> and
>     foo ~= bar
> were to produce completely different results, this would introduce bugs
> all over the place.

Huh?  That's about like arguing that standard mathematical notation is
broken because a < b and a > b don't produce the same result.
        regards, tom lane

Re: Boolean operators without commutators vs. ALL/ANY

От

Florian Pflug

Дата:

16 июня 2011 г., 19:59:48

On Jun16, 2011, at 21:49 , Tom Lane wrote:
> All three of these are massive overkill.  What we need is a general
> policy that providing commutators is a good idea.  We do not need to try
> to make it 100.00% with an enforcement mechanism.

What parts of (1) do you think are overkill exactly, then?

> As for #2, what's
> your plan for automatically selecting a commutator operator name?

I figured we'd name it "COMMUTATOR <op>" or something along this line.
That'd mean it'd only be useable with the OPERATOR() syntax, but that's
way better than nothing. Or we could even make the COMMUTATOR argument
mandatory for binary operators returning boolean. After all, if a 
commutator doesn't require a second function, than I fail to see why
you'd ever want to define a predicate without a commutator.

In any case, yeah, (2) is pretty hand-weavy. I included so that we'd
have all the options on the table, not because I think it's particularly
elegant, easy, or interesting to implement (actually, it's probably
none of these).

> (Having said that, I *was* thinking of adding an opr_sanity test ... but
> not expecting that we'd get it to find zero rows.)

Well, as long as there is some regression test failure for
missing commutators of newly added binary boolean operators, I'm
content.

best regards,
Florian Pflug

Re: Boolean operators without commutators vs. ALL/ANY

От

Alvaro Herrera

Дата:

16 июня 2011 г., 22:42:57

Excerpts from Tom Lane's message of jue jun 16 17:33:17 -0400 2011:
> Peter Eisentraut <peter_e@gmx.net> writes:
> > I don't really agree that visual correlation needs to trump everything.
> > If say
> >     foo =~ bar
> > and
> >     foo ~= bar
> > were to produce completely different results, this would introduce bugs
> > all over the place.
> 
> Huh?  That's about like arguing that standard mathematical notation is
> broken because a < b and a > b don't produce the same result.

The difference is that the mnemonic for > and < is very simple and in
widespread knowledge; not something I would say for =~'s rule of "the ~
is on the side of the regexp".  I know I used to get it wrong in Perl
(i.e. I wrote ~= occasionally).
To make matters worse, our delimiters for regexes are the same as for
strings, the single quote.  So you get

foo =~ 'bar'    /* foo is the text column, bar is the regex */
'bar' =~ foo    /* no complaint but it's wrong */

'bar' ~= foo    /* okay */
'foo' ~= bar    /* no complaint but it's wrong */

How do I tell which is the regex here?  If we used, say, /, that would
be a different matter:

foo =~ /bar/
/bar/ ~= foo    /* both okay */

If we had that and you get it wrong, the parser would immediately barf
at you if you got it wrong:

/bar/ =~ foo    /* wrong: LHS wanted text, got regex */
foo ~= /bar/    /* wrong: LHS wanted regex, got text */

(Note: I'm not suggesting we use / as delimiter.  This is just an
example.)

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Boolean operators without commutators vs. ALL/ANY

От

Florian Pflug

Дата:

17 июня 2011 г., 05:46:43

On Jun17, 2011, at 03:42 , Alvaro Herrera wrote:
> To make matters worse, our delimiters for regexes are the same as for
> strings, the single quote.  So you get
> 
> foo =~ 'bar'    /* foo is the text column, bar is the regex */
> 'bar' =~ foo    /* no complaint but it's wrong */
> 
> 'bar' ~= foo    /* okay */
> 'foo' ~= bar    /* no complaint but it's wrong */
> 
> How do I tell which is the regex here?  If we used, say, /, that would
> be a different matter:

How is this different from the situation today where the operator
is just "~"?

best regards,
Florian Pflug

Re: Boolean operators without commutators vs. ALL/ANY

От

Alvaro Herrera

Дата:

17 июня 2011 г., 10:37:01

Excerpts from Florian Pflug's message of vie jun 17 04:46:32 -0400 2011:
> On Jun17, 2011, at 03:42 , Alvaro Herrera wrote:
> > To make matters worse, our delimiters for regexes are the same as for
> > strings, the single quote.  So you get
> > 
> > foo =~ 'bar'    /* foo is the text column, bar is the regex */
> > 'bar' =~ foo    /* no complaint but it's wrong */
> > 
> > 'bar' ~= foo    /* okay */
> > 'foo' ~= bar    /* no complaint but it's wrong */
> > 
> > How do I tell which is the regex here?  If we used, say, /, that would
> > be a different matter:
> 
> How is this different from the situation today where the operator
> is just "~"?

Err, we don't have commutators today?

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Boolean operators without commutators vs. ALL/ANY

От

Florian Pflug

Дата:

17 июня 2011 г., 11:04:06

On Jun17, 2011, at 15:36 , Alvaro Herrera wrote:
> Excerpts from Florian Pflug's message of vie jun 17 04:46:32 -0400 2011:
>> On Jun17, 2011, at 03:42 , Alvaro Herrera wrote:
>>> To make matters worse, our delimiters for regexes are the same as for
>>> strings, the single quote.  So you get
>>> 
>>> foo =~ 'bar'    /* foo is the text column, bar is the regex */
>>> 'bar' =~ foo    /* no complaint but it's wrong */
>>> 
>>> 'bar' ~= foo    /* okay */
>>> 'foo' ~= bar    /* no complaint but it's wrong */
>>> 
>>> How do I tell which is the regex here?  If we used, say, /, that would
>>> be a different matter:
>> 
>> How is this different from the situation today where the operator
>> is just "~"?
> 
> Err, we don't have commutators today?

So? How does that reduce that risk of somebody writing "pattern ~ text"
instead of "text ~ pattern"? Modifying your quote from above
--------
foo ~ 'bar'    /* foo is the text column, bar is the regex */
'bar' ~ foo    /* no complaint but it's wrong */

How do I tell which is the regex here?
--------

How is that worse than the situation with "=~" and "~="?

"=~" and "~=" at least don't *look* symmetric when they really are
not, which is the heart of the complaint, and also what makes defining
a sensible commutator impossible.

Also, do you have a better suggestion for how we can fix my original
gripe? Adding support for 'ANY/ALL op scalar" was shot down by Tom,
so it looks like we need a commutator for "~". "@" is severely disliked
by Tom, on the grounds that it's already been deprecated in other places.
"=~" is argued against by you and Robert Haas (I think). We're running
out of options here...

best regards,
Florian Pflug

Re: Boolean operators without commutators vs. ALL/ANY

От

Alvaro Herrera

Дата:

17 июня 2011 г., 11:20:17

Excerpts from Florian Pflug's message of vie jun 17 10:03:56 -0400 2011:
> On Jun17, 2011, at 15:36 , Alvaro Herrera wrote:
> > Excerpts from Florian Pflug's message of vie jun 17 04:46:32 -0400 2011:
> >> On Jun17, 2011, at 03:42 , Alvaro Herrera wrote:
> >>> To make matters worse, our delimiters for regexes are the same as for
> >>> strings, the single quote.  So you get
> >>> 
> >>> foo =~ 'bar'    /* foo is the text column, bar is the regex */
> >>> 'bar' =~ foo    /* no complaint but it's wrong */
> >>> 
> >>> 'bar' ~= foo    /* okay */
> >>> 'foo' ~= bar    /* no complaint but it's wrong */
> >>> 
> >>> How do I tell which is the regex here?  If we used, say, /, that would
> >>> be a different matter:
> >> 
> >> How is this different from the situation today where the operator
> >> is just "~"?
> > 
> > Err, we don't have commutators today?
> 
> 
> So? How does that reduce that risk of somebody writing "pattern ~ text"
> instead of "text ~ pattern"? Modifying your quote from above
> --------
> foo ~ 'bar'    /* foo is the text column, bar is the regex */
> 'bar' ~ foo    /* no complaint but it's wrong */
> 
> How do I tell which is the regex here?
> --------

The regex is always to the right of the operator.

> How is that worse than the situation with "=~" and "~="?

With =~ it is to the right, with ~= it is to the left.

I have sometimes needed to look up which is which on ~ and ~~.
I assume that whichever way we go here, we're still going to have to
look up operator definitions in docs or online help.  This kind of help
doesn't, err, help all that much:

alvherre=# \doS ~
                                             Listado de operadores Esquema   | Nombre | Tipo arg izq | Tipo arg der |
Tiporesultado |                Descripción                 
 
------------+--------+--------------+--------------+----------------+--------------------------------------------
...pg_catalog | ~      | text         | text         | boolean        | matches regular expression, case-sensitive

Note that there's no way to tell which is the regex here.  It'd be a lot
better if the description was explicit about it.  (Or, alternatively,
use a different data type for regexes than plain text ... but that has
been in the Todo list for years ...)

> "=~" and "~=" at least don't *look* symmetric when they really are
> not, which is the heart of the complaint, and also what makes defining
> a sensible commutator impossible.

> Also, do you have a better suggestion for how we can fix my original
> gripe? Adding support for 'ANY/ALL op scalar" was shot down by Tom,
> so it looks like we need a commutator for "~". "@" is severely disliked
> by Tom, on the grounds that it's already been deprecated in other places.
> "=~" is argued against by you and Robert Haas (I think). We're running
> out of options here...

Have ~ keep its existing semantics, use ~= for the commutator?  There
are a lot more chars allowed in operator names anyway, it doesn't seem
to me like we need to limit ourselves to ~, = and @.

I *do* like the idea of having commutate-ability for ANY/ALL, having
needed it a couple of times in the past.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Boolean operators without commutators vs. ALL/ANY

От

Andrew Dunstan

Дата:

17 июня 2011 г., 11:47:26


On 06/17/2011 10:20 AM, Alvaro Herrera wrote:
> alvherre=# \doS ~
>
>                                                Listado de operadores
>    Esquema   | Nombre | Tipo arg izq | Tipo arg der | Tipo resultado |                Descripción
> ------------+--------+--------------+--------------+----------------+--------------------------------------------
> ...
>   pg_catalog | ~      | text         | text         | boolean        | matches regular expression, case-sensitive
>
> Note that there's no way to tell which is the regex here.  It'd be a lot
> better if the description was explicit about it.  (Or, alternatively,
> use a different data type for regexes than plain text ... but that has
> been in the Todo list for years ...)


+1 for improving the description.

>
> Have ~ keep its existing semantics, use ~= for the commutator?  There
> are a lot more chars allowed in operator names anyway, it doesn't seem
> to me like we need to limit ourselves to ~, = and @.


Yeah, maybe something like ~< for the commutator. (I know, we're 
bikeshedding somewhat.)


> I *do* like the idea of having commutate-ability for ANY/ALL, having
> needed it a couple of times in the past.
>

Indeed. me too.

cheers

andrew

Re: Boolean operators without commutators vs. ALL/ANY

От

Florian Pflug

Дата:

17 июня 2011 г., 11:49:54

On Jun17, 2011, at 16:20 , Alvaro Herrera wrote:
> Excerpts from Florian Pflug's message of vie jun 17 10:03:56 -0400 2011:
>> So? How does that reduce that risk of somebody writing "pattern ~ text"
>> instead of "text ~ pattern"? Modifying your quote from above
>> --------
>> foo ~ 'bar'    /* foo is the text column, bar is the regex */
>> 'bar' ~ foo    /* no complaint but it's wrong */
>>
>> How do I tell which is the regex here?
>> --------
>
> The regex is always to the right of the operator.

Which is something you have to remember... It's not in any
way deducible from "foo ~ bar" alone.

>> How is that worse than the situation with "=~" and "~="?
>
> With =~ it is to the right, with ~= it is to the left.

It's always where the tilde is. Yeah, you have to remember that.
Just as today you have to remember that the pattern goes on the
right side.

> I have sometimes needed to look up which is which on ~ and ~~.
> I assume that whichever way we go here, we're still going to have to
> look up operator definitions in docs or online help.  This kind of help
> doesn't, err, help all that much:
>
> alvherre=# \doS ~
>
>                                              Listado de operadores
>  Esquema   | Nombre | Tipo arg izq | Tipo arg der | Tipo resultado |                Descripción
> ------------+--------+--------------+--------------+----------------+--------------------------------------------
> ...
> pg_catalog | ~      | text         | text         | boolean        | matches regular expression, case-sensitive
>
> Note that there's no way to tell which is the regex here.  It'd be a lot
> better if the description was explicit about it.

I'm all for it, let's change the description then! Shall I submit a patch?

> (Or, alternatively,
> use a different data type for regexes than plain text ... but that has
> been in the Todo list for years ...)

I actually like that idea. Since we probably don't want a type for every
kind of pattern we support (like, similar to, regexp), such a type wouldn't
be much more than a synonym for text though. I personally don't have a
problem with that, but I somehow feel there's gonna be quite some pushback...

>> Also, do you have a better suggestion for how we can fix my original
>> gripe? Adding support for 'ANY/ALL op scalar" was shot down by Tom,
>> so it looks like we need a commutator for "~". "@" is severely disliked
>> by Tom, on the grounds that it's already been deprecated in other places.
>> "=~" is argued against by you and Robert Haas (I think). We're running
>> out of options here...
>
> Have ~ keep its existing semantics, use ~= for the commutator?

So how does that make it any easier to tell what
  foo ~ bar
and
 foo ~= bar
mean? With that, neither the "pattern is always on the right" nor
the "pattern goes where the tilde is" mnemonic works.

Also, do we really want to end up with a large number of commutator
pairs with totally unrelated names? I fear that this *will* seriously
harm readability of SQL statements, and we'll regret it badly.

best regards,
Florian Pflug

Re: Boolean operators without commutators vs. ALL/ANY

От

"Ross J. Reedstrom"

Дата:

17 июня 2011 г., 12:15:27

On Fri, Jun 17, 2011 at 10:20:04AM -0400, Alvaro Herrera wrote:
> Excerpts from Florian Pflug's message of vie jun 17 10:03:56 -0400 2011:
>
> > How is that worse than the situation with "=~" and "~="?
>
> With =~ it is to the right, with ~= it is to the left.

To throw my user opinion into this ring (as a long time user of regexes
in many different systems) I've always taken the ~ to be short hand for
the 'approximately' notation (a squiggly equals) which has good semantic
match in my mind: a regex match is sort of a fuzzy equality. With that
model, the suggested pair is fairly mnemonic - the 'fuzzy' part i(the
pattern) is next to the squiggles, the 'concrete' part goes by the
equals.

> I have sometimes needed to look up which is which on ~ and ~~.

which has no such directionality, so yeah, no hinting there.

Ross
--
Ross Reedstrom, Ph.D.                                 reedstrm@rice.edu
Systems Engineer & Admin, Research Scientist        phone: 713-348-6166
Connexions                  http://cnx.org            fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE

Re: Boolean operators without commutators vs. ALL/ANY

От

Florian Pflug

Дата:

17 июня 2011 г., 12:21:19

On Jun17, 2011, at 17:15 , Ross J. Reedstrom wrote:
> On Fri, Jun 17, 2011 at 10:20:04AM -0400, Alvaro Herrera wrote:
>> Excerpts from Florian Pflug's message of vie jun 17 10:03:56 -0400 2011:
>> 
>>> How is that worse than the situation with "=~" and "~="?
>> 
>> With =~ it is to the right, with ~= it is to the left.
> 
> To throw my user opinion into this ring (as a long time user of regexes
> in many different systems) I've always taken the ~ to be short hand for
> the 'approximately' notation (a squiggly equals) which has good semantic
> match in my mind: a regex match is sort of a fuzzy equality. With that
> model, the suggested pair is fairly mnemonic - the 'fuzzy' part i(the
> pattern) is next to the squiggles, the 'concrete' part goes by the
> equals.

Hey, that's my mnemonic device! ;-)

best regards,
Florian Pflug

Re: Boolean operators without commutators vs. ALL/ANY

От

Alvaro Herrera

Дата:

17 июня 2011 г., 12:46:39

Excerpts from Florian Pflug's message of vie jun 17 10:49:46 -0400 2011:
> On Jun17, 2011, at 16:20 , Alvaro Herrera wrote:
> > Excerpts from Florian Pflug's message of vie jun 17 10:03:56 -0400 2011:
> >> So? How does that reduce that risk of somebody writing "pattern ~ text"
> >> instead of "text ~ pattern"? Modifying your quote from above
> >> --------
> >> foo ~ 'bar'    /* foo is the text column, bar is the regex */
> >> 'bar' ~ foo    /* no complaint but it's wrong */
> >> 
> >> How do I tell which is the regex here?
> >> --------
> > 
> > The regex is always to the right of the operator.
> 
> Which is something you have to remember... It's not in any
> way deducible from "foo ~ bar" alone.

Maybe, but the mnemonic rule seems quite a bit easier (to me anyway).
In my head I think of ~ as "matches", so "text matches regex", whereas
"regex matches text" doesn't make as much sense.  (Hmm now that I see
it, maybe in english this is not so clear, but in spanish the difference
is pretty obvious).

> >> How is that worse than the situation with "=~" and "~="?
> > 
> > With =~ it is to the right, with ~= it is to the left.
> 
> It's always where the tilde is. Yeah, you have to remember that.
> Just as today you have to remember that the pattern goes on the
> right side.

Well, the mnemonic would be that ~ is still "text matches regex", while
~= is "the weird operator that goes the other way around", so it's still
pretty clear.

> > I have sometimes needed to look up which is which on ~ and ~~.
> > I assume that whichever way we go here, we're still going to have to
> > look up operator definitions in docs or online help.  This kind of help
> > doesn't, err, help all that much:
> > 
> > alvherre=# \doS ~
> > 
> >                                              Listado de operadores
> >  Esquema   | Nombre | Tipo arg izq | Tipo arg der | Tipo resultado |                Descripción                 
> > ------------+--------+--------------+--------------+----------------+--------------------------------------------
> > ...
> > pg_catalog | ~      | text         | text         | boolean        | matches regular expression, case-sensitive
> > 
> > Note that there's no way to tell which is the regex here.  It'd be a lot
> > better if the description was explicit about it.
> 
> I'm all for it, let's change the description then! Shall I submit a patch?

Yes, please.

> > (Or, alternatively,
> > use a different data type for regexes than plain text ... but that has
> > been in the Todo list for years ...)
> 
> I actually like that idea. Since we probably don't want a type for every
> kind of pattern we support (like, similar to, regexp), such a type wouldn't
> be much more than a synonym for text though. I personally don't have a
> problem with that, but I somehow feel there's gonna be quite some pushback...

Hmm, why?  Maybe that's something we can discuss.


> Also, do we really want to end up with a large number of commutator
> pairs with totally unrelated names? I fear that this *will* seriously
> harm readability of SQL statements, and we'll regret it badly.

Hmm.

I guess this wouldn't be much of a problem if you could use ANY/ALL with
a function instead of an operator, c.f. map().

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Boolean operators without commutators vs. ALL/ANY

От

Robert Haas

Дата:

17 июня 2011 г., 13:00:22

On Fri, Jun 17, 2011 at 11:46 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> I guess this wouldn't be much of a problem if you could use ANY/ALL with
> a function instead of an operator, c.f. map().

Yeah.  Or really what you want is a lambda-expression, rather than a
predefined function.

fold(bool_and, map { val ~ $0 } array)

I suspect that's darn hard to make work though.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Boolean operators without commutators vs. ALL/ANY

От

"Ross J. Reedstrom"

Дата:

17 июня 2011 г., 13:03:49

On Fri, Jun 17, 2011 at 05:21:10PM +0200, Florian Pflug wrote:
> On Jun17, 2011, at 17:15 , Ross J. Reedstrom wrote:
> > On Fri, Jun 17, 2011 at 10:20:04AM -0400, Alvaro Herrera wrote:
> >> Excerpts from Florian Pflug's message of vie jun 17 10:03:56 -0400 2011:
> >>
> >>> How is that worse than the situation with "=~" and "~="?
> >>
> >> With =~ it is to the right, with ~= it is to the left.
> >
> > To throw my user opinion into this ring (as a long time user of regexes
> > in many different systems) I've always taken the ~ to be short hand for
> > the 'approximately' notation (a squiggly equals) which has good semantic
> > match in my mind: a regex match is sort of a fuzzy equality. With that
> > model, the suggested pair is fairly mnemonic - the 'fuzzy' part i(the
> > pattern) is next to the squiggles, the 'concrete' part goes by the
> > equals.
>
> Hey, that's my mnemonic device! ;-)
>

Ah, good, so since this is almost mathematics, and we have two
instances, that's a proof then. :-)

Ross
--
Ross Reedstrom, Ph.D.                                 reedstrm@rice.edu
Systems Engineer & Admin, Research Scientist        phone: 713-348-6166
Connexions                  http://cnx.org            fax: 713-348-3665
Rice University MS-375, Houston, TX 77005
GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E  F888 D3AE 810E 88F0 BEDE

Re: Boolean operators without commutators vs. ALL/ANY

От

Florian Pflug

Дата:

17 июня 2011 г., 13:13:16

On Jun17, 2011, at 18:00 , Robert Haas wrote:
> On Fri, Jun 17, 2011 at 11:46 AM, Alvaro Herrera
> <alvherre@commandprompt.com> wrote:
>> I guess this wouldn't be much of a problem if you could use ANY/ALL with
>> a function instead of an operator, c.f. map().
> 
> Yeah.  Or really what you want is a lambda-expression, rather than a
> predefined function.
> 
> fold(bool_and, map { val ~ $0 } array)

Yeah, to bad we can't just write SELECT BOOL_AND(val ~ e) FROM UNNEST(array)
Hey...wait a minute... ;-)

(I guess you actually meant fold(bool_and, map { val ~ $0 } array)
which the equivalent sub-select SELECT BOOL_AND(e ~ val) FROM UNNEST(array))

Still, you can't put that into a CHECK constraint (because it
counts as sub-select) and it's considerable longer and harder
to read then val = ANY(array)

best regards,
Florian Pflug

Re: Boolean operators without commutators vs. ALL/ANY

От

Florian Pflug

Дата:

17 июня 2011 г., 13:29:38

On Jun17, 2011, at 17:46 , Alvaro Herrera wrote:
> Excerpts from Florian Pflug's message of vie jun 17 10:49:46 -0400 2011:
> Maybe, but the mnemonic rule seems quite a bit easier (to me anyway).
> In my head I think of ~ as "matches", so "text matches regex", whereas
> "regex matches text" doesn't make as much sense.  (Hmm now that I see
> it, maybe in english this is not so clear, but in spanish the difference
> is pretty obvious).

I can't really argue with that, only state for that record that it's
different for me. I think of "~" as "similar" or "approximately equal",
and hence intuitively expect it to be symmetric. Whether or not
"matches" technically implies some direction or not I cannot say as
I'm not an english native speaker myself. But if I had to guess, I'd say
it doesn't.

>>>> How is that worse than the situation with "=~" and "~="?
>>>
>>> With =~ it is to the right, with ~= it is to the left.
>>
>> It's always where the tilde is. Yeah, you have to remember that.
>> Just as today you have to remember that the pattern goes on the
>> right side.
>
> Well, the mnemonic would be that ~ is still "text matches regex", while
> ~= is "the weird operator that goes the other way around", so it's still
> pretty clear.

Again, that depends on a person's background. For me it'd be
"~= is the regexp matching operator" and "~ is for some strange
reasons its commutator".

>>> 'm all for it, let's change the description then! Shall I submit a patch?
>
> Yes, please.

Will do, but after we've reached an overall agreement about the fate
or "~" and friends.

>>> (Or, alternatively,
>>> use a different data type for regexes than plain text ... but that has
>>> been in the Todo list for years ...)
>>
>> I actually like that idea. Since we probably don't want a type for every
>> kind of pattern we support (like, similar to, regexp), such a type wouldn't
>> be much more than a synonym for text though. I personally don't have a
>> problem with that, but I somehow feel there's gonna be quite some pushback...
>
> Hmm, why?  Maybe that's something we can discuss.

Ok, I'll start a new thread for this.

best regards,
Florian Pflug

Re: [WIP] Support for "ANY/ALL(array) op scalar" (Was: Re: Boolean operators without commutators vs. ALL/ANY)

От

Greg Stark

Дата:

19 июня 2011 г., 10:49:45

On Thu, Jun 16, 2011 at 6:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>  (I will refrain for
> the moment from speculating whether we'll ever have an index type that
> supports regexp match directly as an indexable operator...)

Fwiw I looked into this at one point and have some ideas if anyone is
keen to try it.

--
greg

Re: [WIP] Support for "ANY/ALL(array) op scalar" (Was: Re: Boolean operators without commutators vs. ALL/ANY)

От

David Fetter

Дата:

19 июня 2011 г., 13:15:00

On Sun, Jun 19, 2011 at 02:48:58PM +0100, Greg Stark wrote:
> On Thu, Jun 16, 2011 at 6:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > (I will refrain for the moment from speculating whether we'll ever
> > have an index type that supports regexp match directly as an
> > indexable operator...)
> 
> Fwiw I looked into this at one point and have some ideas if anyone
> is keen to try it.

Please post them :)

Cheers,
David.
-- 
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com
iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: Boolean operators without commutators vs. ALL/ANY

От

Greg Stark

Дата:

19 июня 2011 г., 22:16:54

On Fri, Jun 17, 2011 at 3:49 PM, Florian Pflug <fgp@phlo.org> wrote:
>> The regex is always to the right of the operator.
>
> Which is something you have to remember... It's not in any
> way deducible from "foo ~ bar" alone.

Except that it's always been this way, going back to perl4 or tcl or
their predecessors. The regexp binding operator always has the regexp
on the right.

>>> How is that worse than the situation with "=~" and "~="?
>>
>> With =~ it is to the right, with ~= it is to the left.
>
> It's always where the tilde is. Yeah, you have to remember that.

And when you get it wrong it will fail silently. No errors, just wrong results.

While I've never accidentally written /foo/ =~ $_ in perl I have
*frequently* forgotten whether the operator is ~= or =~. Actually I
forget that pretty much every time I start writing some perl. I just
put whichever comes first and if I get an error I reverse it.

I can see the temptation to make it symmetric but it's going to cause
an awful lot of confusion.

Perhaps we could name the operators ~~= and =~~ and then have a =~
short-cut for compatibility? (and ~ too I guess?)

-- 
greg

Re: Boolean operators without commutators vs. ALL/ANY

От

Florian Pflug

Дата:

20 июня 2011 г., 07:56:01

On Jun20, 2011, at 03:16 , Greg Stark wrote:
> On Fri, Jun 17, 2011 at 3:49 PM, Florian Pflug <fgp@phlo.org> wrote:
>>> The regex is always to the right of the operator.
>>
>> Which is something you have to remember... It's not in any
>> way deducible from "foo ~ bar" alone.
>
> Except that it's always been this way, going back to perl4 or tcl or
> their predecessors. The regexp binding operator always has the regexp
> on the right.

Yeah. The strength of that argument really depends on one's
prior exposure to these languages, though...

>>>> How is that worse than the situation with "=~" and "~="?
>>>
>>> With =~ it is to the right, with ~= it is to the left.
>>
>> It's always where the tilde is. Yeah, you have to remember that.
>
> And when you get it wrong it will fail silently. No errors, just wrong results.

Yeah, but this is hardly the only case where you'll get
unintended results if you mix up operator names.

Now, one might argue, I guess, that mixing up "=~" and "~="
or more likely than mixing up, say, "~" and "~~". But ultimately,
whether or not that is highly dependent on one's personal background,
so we're unlikely to ever reach agreement on that...

> While I've never accidentally written /foo/ =~ $_ in perl I have
> *frequently* forgotten whether the operator is ~= or =~. Actually I
> forget that pretty much every time I start writing some perl. I just
> put whichever comes first and if I get an error I reverse it.

Yeah, the nice thing in perl (and ruby also, which is *my* background)
is that regexp's and strings are distinguished by the type system,
and also by the parser.

The latter (i.e. regexp literals enclosed by /../) probably isn't
desirably for postgres, but the former definitely is (i.e. distinguishing
regexp's and text in the type system). Please see the thread
"Adding a distinct pattern type to resolve the ~ commutator stalemate"
for the details of the proposal.

> I can see the temptation to make it symmetric but it's going to cause
> an awful lot of confusion.

I do believe that by adding a distinct type we can actually *reduce*
confusion. It makes "text ~ pattern" readable even for people who
don't intuitively know that the pattern always goes on the right.

best regards,
Florian Pflug

Re: Boolean operators without commutators vs. ALL/ANY

От

Alvaro Herrera

Дата:

20 июня 2011 г., 14:22:58

Excerpts from Florian Pflug's message of lun jun 20 06:55:42 -0400 2011:

> The latter (i.e. regexp literals enclosed by /../) probably isn't
> desirably for postgres, but the former definitely is (i.e. distinguishing
> regexp's and text in the type system). Please see the thread
> "Adding a distinct pattern type to resolve the ~ commutator stalemate"
> for the details of the proposal. 

'your text' ~ regexp 'your.*foo'    column ~ regexp 'your.*foo'

So you could do

regexp 'foo.*bar' ~ 'your text'

and it's immediately clear what's up.

The question is what to do wrt implicit casting of text to regexp.
If we don't, there's a backwards compatibility hit.

-- 
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: Boolean operators without commutators vs. ALL/ANY

От

Florian Pflug

Дата:

20 июня 2011 г., 14:32:20

On Jun20, 2011, at 19:22 , Alvaro Herrera wrote:
> Excerpts from Florian Pflug's message of lun jun 20 06:55:42 -0400 2011:
>> The latter (i.e. regexp literals enclosed by /../) probably isn't
>> desirably for postgres, but the former definitely is (i.e. distinguishing
>> regexp's and text in the type system). Please see the thread
>> "Adding a distinct pattern type to resolve the ~ commutator stalemate"
>> for the details of the proposal. 
> 
> 'your text' ~ regexp 'your.*foo'
>     column ~ regexp 'your.*foo'
> 
> So you could do
> 
> regexp 'foo.*bar' ~ 'your text'
> 
> and it's immediately clear what's up.
> 
> The question is what to do wrt implicit casting of text to regexp.
> If we don't, there's a backwards compatibility hit.


No, we certainly musn't allow text to be implicitly converted to
regexp, for otherwise e.g. "varchar ~ varchar" becomes ambiguous.

I posted a primitive prototype for a pattern type on said thread,
which seems to do everything we require without causing compatibility
problems.

best regards,
Florian Pflug

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Boolean operators without commutators vs. ALL/ANY

Вложения

Вложения