Обсуждение: Boolean operators without commutators vs. ALL/ANY
Hi I've recently wanted to define a check constraint on an array column that verifies that all array entries match some regular expression. Unfortunately, t The most natural way of expressing such a check would be CHECK ('<regexp>' ~ ANY(field)), but that doesn't work, because "~" expects the *value* to be the left argument and the *pattern* to be the right. The next try was CHECK (ANY(field) ~ '<regexp>'), but that doesn't even parse. Ok, so then use UNNEST() and BOOL_AND() I figured, and wrote CHECK ((SELECT BOOL_AND(v ~ '<regexp>') FROM UNNEST(field) v)). But that of course lead to nothing but ERROR: cannot use subquery in check constraint So I the end, I had to wrap the sub-query in a SQL-language function and use that in the check constraint. While this solved my immediate problem, the necessity of doing that highlights a few problems (A) "~" is an extremely bad name for the regexp-matching operators, since it's visual form is symmetric but it's behaviour isn't. This doesn't only make its usage very error-prone, it also makes it very hard to come up with sensible name for an commutator of "~". I suggest that we add "=~" as an alias for "~", "~=" as an commutator for "=~", and deprecate "~". The same holds for "~~". We might want to do this starting with 9.1. (B) There should be a way to use ANY()/ALL() with the array elements becoming the left arguments of the operator. Ideally, we'd support "ANY(<array>) <operator> <value>", but if that's not possible grammar-wise, I suggest we extend the OPERATOR() syntax to allow <value> OPERATOR(COMMUTATOR <operator>) ANY(<array>). OPERATOR(COMMUTATOR <operator>) would use the COMMUTATOR of the specified operator if one exists, and otherwise use the original operator with the arguments swapped. (C) Why do we forbid sub-queries in CHECK constraints? I do realize that any non-IMMUTABLE CHECK constraint is a foot-gun, but since we already allow STABLE and even VOLATILE functions to be used inside CHECK constraint, forbidding sub-queries seems a bit pointless... best regards, Florian Pflug
On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote: > So I the end, I had to wrap the sub-query in a SQL-language > function and use that in the check constraint. While this > solved my immediate problem, the necessity of doing that > highlights a few problems > > (A) "~" is an extremely bad name for the regexp-matching > operators, since it's visual form is symmetric but it's > behaviour isn't. This doesn't only make its usage very > error-prone, it also makes it very hard to come up with > sensible name for an commutator of "~". I suggest that we > add "=~" as an alias for "~", "~=" as an commutator > for "=~", and deprecate "~". The same holds for "~~". Does any other database or programming language implement it this way? > (B) There should be a way to use ANY()/ALL() with the > array elements becoming the left arguments of the operator. > Ideally, we'd support "ANY(<array>) <operator> <value>", > but if that's not possible grammar-wise, I suggest we extend > the OPERATOR() syntax to allow > <value> OPERATOR(COMMUTATOR <operator>) ANY(<array>). > OPERATOR(COMMUTATOR <operator>) would use the COMMUTATOR > of the specified operator if one exists, and otherwise > use the original operator with the arguments swapped. It seems to me that if we provided some way of handling this, your first proposal would be moot; and I have to say I like the idea of allowing this a lot more than tinkering with the operator names. I'm not crazy about the proposed syntax, though; it seems cumbersome, and it's really only needed for SOME/ALL/ANY, not in general operator expressions. Since ANY is a reserved keyword, I believe we could allow something like "expr op ANY BACKWARD ( ... )" -- or some other keyword in lieu of BACKWARD if you prefer. Hath the spec anything to say about this? > (C) Why do we forbid sub-queries in CHECK constraints? > I do realize that any non-IMMUTABLE CHECK constraint is > a foot-gun, but since we already allow STABLE and even > VOLATILE functions to be used inside CHECK constraint, > forbidding sub-queries seems a bit pointless... Dunno. Maybe it's just an implementation restriction? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote: >> (B) There should be a way to use ANY()/ALL() with the >> array elements becoming the left arguments of the operator. > It seems to me that if we provided some way of handling this, your > first proposal would be moot; and I have to say I like the idea of > allowing this a lot more than tinkering with the operator names. There are syntactic reasons not to do that. It'd be a lot easier just to provide a commutator operator for ~. >> (C) Why do we forbid sub-queries in CHECK constraints? > Dunno. Maybe it's just an implementation restriction? (1) We don't want to invoke the planner in the places where we'd have to do so to make that work. (2) It's just about inevitable that a sub-query would have results dependent on other rows beside the one being checked. As such, it would be trying to enforce semantics that you simply can't enforce via CHECK. (And yes, you can bypass that with a function, but guess what: it still won't actually work.) regards, tom lane
On Sun, Jun 12, 2011 at 11:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote: >>> (B) There should be a way to use ANY()/ALL() with the >>> array elements becoming the left arguments of the operator. > >> It seems to me that if we provided some way of handling this, your >> first proposal would be moot; and I have to say I like the idea of >> allowing this a lot more than tinkering with the operator names. > > There are syntactic reasons not to do that. It'd be a lot easier just > to provide a commutator operator for ~. Details? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Jun13, 2011, at 05:12 , Robert Haas wrote: > On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote: >> So I the end, I had to wrap the sub-query in a SQL-language >> function and use that in the check constraint. While this >> solved my immediate problem, the necessity of doing that >> highlights a few problems >> >> (A) "~" is an extremely bad name for the regexp-matching >> operators, since it's visual form is symmetric but it's >> behaviour isn't. This doesn't only make its usage very >> error-prone, it also makes it very hard to come up with >> sensible name for an commutator of "~". I suggest that we >> add "=~" as an alias for "~", "~=" as an commutator >> for "=~", and deprecate "~". The same holds for "~~". > > Does any other database or programming language implement it this way? Ruby has "=~", which returns the position of the regexp's first match, or nil if there is none. $ ruby -e "puts 'hello' =~ /l+/" 2 $ ruby -e "puts 'hello' =~ /x+/" nil >> (B) There should be a way to use ANY()/ALL() with the >> array elements becoming the left arguments of the operator. >> Ideally, we'd support "ANY(<array>) <operator> <value>", >> but if that's not possible grammar-wise, I suggest we extend >> the OPERATOR() syntax to allow >> <value> OPERATOR(COMMUTATOR <operator>) ANY(<array>). >> OPERATOR(COMMUTATOR <operator>) would use the COMMUTATOR >> of the specified operator if one exists, and otherwise >> use the original operator with the arguments swapped. > > It seems to me that if we provided some way of handling this, your > first proposal would be moot; and I have to say I like the idea of > allowing this a lot more than tinkering with the operator names. Well, the issue of "~" being anti-self-explanatory remains independent from whether we do (B) or not. > I'm > not crazy about the proposed syntax, though; it seems cumbersome, and > it's really only needed for SOME/ALL/ANY, not in general operator > expressions. Since ANY is a reserved keyword, I believe we could > allow something like "expr op ANY BACKWARD ( ... )" -- or some other > keyword in lieu of BACKWARD if you prefer. Hm, that's less bulky but more kludgy, I'd say. But wait a minute... If ANY and ALL are reserved anyway, should it be possible to make "(ANY(..) <op> <expr>)" and "(ALL(...) <op> <expr>)" work grammar-wise? (Note the enclosing parens) I just tried that, and it seems to work. bison doesn't report and conflicts, the regression tests still succeed, and I get the following postgres=# select (all(array[1,2]) = 1); ERROR: ANY()/ALL() <op> <expr> is not yet implemented at character 9 STATEMENT: select (all(array[1,2]) = 1); ERROR: ANY()/ALL() <op> <expr> is not yet implemented LINE 1: select (all(array[1,2]) = 1); ^ I've attached a patch with the changes to gram.y. best regards, Florian Pflug
Вложения
On Sun, Jun 12, 2011 at 6:46 AM, Florian Pflug <fgp@phlo.org> wrote: > (B) There should be a way to use ANY()/ALL() with the > array elements becoming the left arguments of the operator. FWIW, in case people were unaware, this is getting close to Perl 6 junctions/superpositions. See: <http://dev.perl.org/perl6/doc/design/exe/E06.html> "The Wonderful World of Junctions". Beyond "any" and "all" they propose "one" and "none".
On Jun13, 2011, at 05:44 , Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote: >>> (C) Why do we forbid sub-queries in CHECK constraints? > >> Dunno. Maybe it's just an implementation restriction? > > (1) We don't want to invoke the planner in the places where we'd > have to do so to make that work. Hm, OK, I get that. > (2) It's just about inevitable that a sub-query would have results > dependent on other rows beside the one being checked. As such, it > would be trying to enforce semantics that you simply can't enforce > via CHECK. (And yes, you can bypass that with a function, but guess > what: it still won't actually work.) Yeah, I never expected non-immutable CHECK constraints to work. I was simply pointing out that UNNEST() allows one to write all kinds of interesting CHECK constraints, all of which are immutable. Now, if supporting these would increase code complexity, or cause a performance drop for non-sub-query CHECK constraints, I'm perfectly fine with leaving them unsupported. I just wanted to make sure we aren't simply nannying the user - especially since it's a nanny who's out-smarted by function calls. best regards, Florian Pflug
Robert Haas <robertmhaas@gmail.com> writes: > On Sun, Jun 12, 2011 at 11:44 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> There are syntactic reasons not to do that. It'd be a lot easier just >> to provide a commutator operator for ~. > Details? Well, for one, it becomes unobvious what A op ANY (B) op C means. This has come up before, and I seem to recall that we identified some more-compelling problems, but that's the best I can do before consuming any caffeine. In any case, if you dig around enough to notice all the infrastructure that's involved with this, you'll definitely come to the conclusion that it'd be a lot less work to just add the missing commutator operators. regards, tom lane
On Mon, Jun 13, 2011 at 3:01 AM, Florian Pflug <fgp@phlo.org> wrote: > On Jun13, 2011, at 05:12 , Robert Haas wrote: >> On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote: >>> So I the end, I had to wrap the sub-query in a SQL-language >>> function and use that in the check constraint. While this >>> solved my immediate problem, the necessity of doing that >>> highlights a few problems >>> >>> (A) "~" is an extremely bad name for the regexp-matching >>> operators, since it's visual form is symmetric but it's >>> behaviour isn't. This doesn't only make its usage very >>> error-prone, it also makes it very hard to come up with >>> sensible name for an commutator of "~". I suggest that we >>> add "=~" as an alias for "~", "~=" as an commutator >>> for "=~", and deprecate "~". The same holds for "~~". >> >> Does any other database or programming language implement it this way? > > Ruby has "=~", which returns the position of the regexp's first > match, or nil if there is none. > > $ ruby -e "puts 'hello' =~ /l+/" > 2 > $ ruby -e "puts 'hello' =~ /x+/" > nil Sure. Some languages use =~ and some use just ~... I was just wondering if anyone thought the commutator of =~ was ~=... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 06/13/2011 10:07 AM, Robert Haas wrote: > Some languages use =~ and some use just ~... I was just > wondering if anyone thought the commutator of =~ was ~=... My feeling is it's a bit dangerous. It's too easy to fat-finger the reverse op, and get something quite unintended. cheers andrew (whose lack of coordination sadly increases with age)
On Mon, Jun 13, 2011 at 09:01:45AM +0200, Florian Pflug wrote: > Hm, that's less bulky but more kludgy, I'd say. But wait a minute... > > If ANY and ALL are reserved anyway, should it be possible to > make "(ANY(..) <op> <expr>)" and "(ALL(...) <op> <expr>)" > work grammar-wise? (Note the enclosing parens) This would be a very, very useful feature. :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Jun13, 2011, at 16:19 , Andrew Dunstan wrote: > On 06/13/2011 10:07 AM, Robert Haas wrote: >> Some languages use =~ and some use just ~... I was just >> wondering if anyone thought the commutator of =~ was ~=... > > My feeling is it's a bit dangerous. It's too easy to fat-finger the reverse op, and get something quite unintended. Well, but with "~" you need to *remember* that the regexp goes on the right side and the text on the left. That seems worse than the risk of fat-fingering "=~" and getting "~=". At, at least, have looked up the argument order of "~" countless in the past... best regards, Florian Pflug
On Jun13, 2011, at 05:44 , Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote: >>> (B) There should be a way to use ANY()/ALL() with the >>> array elements becoming the left arguments of the operator. > >> It seems to me that if we provided some way of handling this, your >> first proposal would be moot; and I have to say I like the idea of >> allowing this a lot more than tinkering with the operator names. > > There are syntactic reasons not to do that. It'd be a lot easier just > to provide a commutator operator for ~. My suggestion would be the add a commutator for "~" as a short-term solution (preferably in 9.1). Since "~" doesn't inspire any obvious names for a possible commutator, I suggest adding "=~" and "~=". Is there any support for that proposal? In the long term, I'd like to add support for "(ANY() <op> <expr>)" (Note the enclosing parens). I've checked that this works grammar-wise, but haven't no idea how much tweaking the executor needs to support that... best regards, Florian Pflug
On Tue, Jun 14, 2011 at 6:10 AM, Florian Pflug <fgp@phlo.org> wrote: > On Jun13, 2011, at 05:44 , Tom Lane wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote: >>>> (B) There should be a way to use ANY()/ALL() with the >>>> array elements becoming the left arguments of the operator. >> >>> It seems to me that if we provided some way of handling this, your >>> first proposal would be moot; and I have to say I like the idea of >>> allowing this a lot more than tinkering with the operator names. >> >> There are syntactic reasons not to do that. It'd be a lot easier just >> to provide a commutator operator for ~. > > My suggestion would be the add a commutator for "~" as a short-term > solution (preferably in 9.1). I don't think we want to bump catversion again before release if we can avoid it. And I don't see this as being a terribly urgent problem - it's not like this is a new regression, and I can't remember hearing any complaints about it prior to two days ago. > Since "~" doesn't inspire any obvious names for a possible commutator, > I suggest adding "=~" and "~=". > > Is there any support for that proposal? I'm OK with adding a commutator but I guess I don't see the point of adding a synonym for ~ along the way. The existing use of ~ is consistent with, for example, awk, so it's not like we've dreamed up something utterly crazy that we now need to fix. I'd suggest we just come up with some arbitrary variant, like ~~ or <~ or #~ or !#!%@~bikeshed++!. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Jun14, 2011, at 14:29 , Robert Haas wrote: > On Tue, Jun 14, 2011 at 6:10 AM, Florian Pflug <fgp@phlo.org> wrote: >> On Jun13, 2011, at 05:44 , Tom Lane wrote: >>> Robert Haas <robertmhaas@gmail.com> writes: >>>> On Sun, Jun 12, 2011 at 7:46 AM, Florian Pflug <fgp@phlo.org> wrote: >>>>> (B) There should be a way to use ANY()/ALL() with the >>>>> array elements becoming the left arguments of the operator. >>> >>>> It seems to me that if we provided some way of handling this, your >>>> first proposal would be moot; and I have to say I like the idea of >>>> allowing this a lot more than tinkering with the operator names. >>> >>> There are syntactic reasons not to do that. It'd be a lot easier just >>> to provide a commutator operator for ~. >> >> My suggestion would be the add a commutator for "~" as a short-term >> solution (preferably in 9.1). > > I don't think we want to bump catversion again before release if we > can avoid it. And I don't see this as being a terribly urgent problem > - it's not like this is a new regression, and I can't remember hearing > any complaints about it prior to two days ago. Hm, OK, that makes sense... >> Since "~" doesn't inspire any obvious names for a possible commutator, >> I suggest adding "=~" and "~=". >> >> Is there any support for that proposal? > > I'm OK with adding a commutator but I guess I don't see the point of > adding a synonym for ~ along the way. The existing use of ~ is > consistent with, for example, awk, so it's not like we've dreamed up > something utterly crazy that we now need to fix. I'd suggest we just > come up with some arbitrary variant, like ~~ or <~ or #~ or > !#!%@~bikeshed++!. That, however, I'm not at all happy with. Quite frankly, operator naming is already a bit of a mess, and readability of queries suffers as a result. The geometric types are especially vile offenders in this regard, but the various array-related operators aren't poster children either. I think we should try to work towards more mnemonic operator naming, not add to the mess by defining commutator pairs whose names bear no visual resemblance whatsoever to one each other. I'm not wedded to "=~", it's just the only name I could come up which (a) has a natural commutator (b) gives visual indication of which argument constitutes the text and which thepattern (c) there is precedent for. BTW, there's actually precedent for a commutator of "~", namely "@". Some of the geometric types (polygon, box, circle, point, path) use "~" as a commutator for "@" (which stands for "contains"). But IMHO that mainly proves that the geometric types are vile offenders when it comes to readability... The pair ("@", "~" ) is also the only pair of commutators whose names are totally unrelated to each other. Given a suitable definition of a reverse() function for text [1], the following query select o1.oprleft::regtype || ' ' || o1.oprname || ' ' || o1.oprright::regtype as opr, o2.oprleft::regtype ||' ' || o2.oprname || ' ' || o2.oprright::regtype as com, o1.oprcode as opr_code, o2.oprcode as com_code from pg_operator o1 join pg_operator o2 on o1.oprcom = o2.oid or o2.oprcom = o1.oid where o1.oid < o2.oid and o1.oprname <> reverse(translate(o2.oprname, '<>', '><')) and o1.oprname <> translate(o2.oprname,'<>', '><'); produces opr | com | opr_code | com_code -------------------+-------------------+---------------------+-------------------polygon @ polygon | polygon ~ polygon |poly_contained | poly_containbox @ box | box ~ box | box_contained | box_containcircle @ circle | circle ~ circle | circle_contained | circle_containpoint @ path | path ~ point | on_ppath | path_contain_ptpoint @ polygon | polygon ~ point | pt_contained_poly | poly_contain_ptpoint @ circle | circle~ point | pt_contained_circle | circle_contain_pt (6 rows) best regards, Florian Pflug [1] I used create or replace function reverse(text) returns text as $$ select string_agg(substring($1, i, 1), '') from generate_series(length($1), 1, -1) i $$ language sql immutable;
On Jun13, 2011, at 17:41 , David Fetter wrote: > On Mon, Jun 13, 2011 at 09:01:45AM +0200, Florian Pflug wrote: >> Hm, that's less bulky but more kludgy, I'd say. But wait a minute... >> >> If ANY and ALL are reserved anyway, should it be possible to >> make "(ANY(..) <op> <expr>)" and "(ALL(...) <op> <expr>)" >> work grammar-wise? (Note the enclosing parens) > > This would be a very, very useful feature. :) There, you dared me. So here's a patch :-P So far it supports (ANY(array) op scalar) and (ALL(array) op scalar), but not (ANY/ALL(subselect) op scalar). "ANY/ALL op scalar" is supported independent from whether <op> has a commutator or not. In the latter case, the clause isn't indexable, but thats no different from the case "const op field". Both ANY(...) = ctid and ctid = ANY(...) are correctly executed as TID scans (if the array expression is a pseudo-constant, that is). The patch adds a fields "aryArgIdx" (0 or 1) to ScalarArrayOpExpr which stores the index of the array-valued argument. Thus, for the traditional "scalar op ANY/ALL(array)" variant, aryArgIdx == 1, for "ANY/ALL(array) op scalar" it's zero. I've updates all places that I could find which inspect ScalarArrayOpExprs to take the aryArgIdx into account. And boy, those were quite a few, which explains the size of the patch. Most of the changes are pretty trivial, though. The indexing support works similar to the case "scalar op scalar", i.e. match_clause_to_indexcol() consideres the clause to be index-able if the scalar argument matches an index definition, and relies on fix_indexqual_references() to commute the ScalarArrayOpExprs if the scalar argument is on the right side instead of on the left. I noticed that ANY/ALL is hardly exercised by the regression tests at all, so I added a (pretty exhaustive, I think) test any_all. The test exercises the traditional and the reversed form of ANY/ALL and verify that an index is used if possible. Comments are extremely welcome, especially ones regarding the overall approach taken in this patch. If people consider that to be acceptable, I'd try to add the missing features and add documentation. best regards, Florian Pflug
Вложения
Florian Pflug <fgp@phlo.org> writes: > Comments are extremely welcome, especially ones regarding > the overall approach taken in this patch. If people consider > that to be acceptable, I'd try to add the missing features > and add documentation. Quite honestly, I don't like this one bit and would rather you not pursue the idea. There is no such syntax in the standard, and presumably that's not because the SQL committee never thought of it. They may have some incompatible idea in mind for the future, who knows? But in any case, this won't provide any functionality whatever that we couldn't provide at much less effort and risk, just by providing commutator operators for the few missing cases. (FWIW, I've come around to liking the idea of using =~ and the obvious variants of that for regex operators, mainly because of the Perl precedent.) regards, tom lane
On mån, 2011-06-13 at 10:19 -0400, Andrew Dunstan wrote: > On 06/13/2011 10:07 AM, Robert Haas wrote: > > Some languages use =~ and some use just ~... I was just > > wondering if anyone thought the commutator of =~ was ~=... > > My feeling is it's a bit dangerous. It's too easy to fat-finger the > reverse op, and get something quite unintended. Yes, it looked highly dangerous to me as well.
On tis, 2011-06-14 at 15:38 +0200, Florian Pflug wrote: > BTW, there's actually precedent for a commutator of "~", namely > "@". Some of the geometric types (polygon, box, circle, point, > path) use "~" as a commutator for "@" (which stands for "contains"). I wouldn't have a problem with naming the reverse operator "@".
On ons, 2011-06-15 at 22:19 -0400, Tom Lane wrote: > (FWIW, I've come around to liking the idea of using =~ and the obvious > variants of that for regex operators, mainly because of the Perl > precedent.) Maybe I'm not completely up to date on this, but I observe that Perl itself doesn't appear to have a commutator for =~ .
Peter Eisentraut <peter_e@gmx.net> writes: > On tis, 2011-06-14 at 15:38 +0200, Florian Pflug wrote: >> BTW, there's actually precedent for a commutator of "~", namely >> "@". Some of the geometric types (polygon, box, circle, point, >> path) use "~" as a commutator for "@" (which stands for "contains"). > I wouldn't have a problem with naming the reverse operator "@". We deprecated those names for the geometric operators largely because there wasn't any visual correlation between the commutator pairs. I can't see introducing the same pairing for regex operators if we already decided the geometric case was a bad idea. regards, tom lane
Peter Eisentraut <peter_e@gmx.net> writes: > On ons, 2011-06-15 at 22:19 -0400, Tom Lane wrote: >> (FWIW, I've come around to liking the idea of using =~ and the obvious >> variants of that for regex operators, mainly because of the Perl >> precedent.) > Maybe I'm not completely up to date on this, but I observe that Perl > itself doesn't appear to have a commutator for =~ . Nope, it doesn't. But that doesn't mean we don't need one. Aside from the ANY/ALL business, the index infrastructure is asymmetrical: it will only deal with indexable WHERE clauses that have the index column on the left. So those are two very good reasons to make sure that operators returning boolean all have commutators. (I will refrain for the moment from speculating whether we'll ever have an index type that supports regexp match directly as an indexable operator...) At the moment, this query: select oid::regoperator, oprcode from pg_operator where oprkind = 'b' and oprresult = 'bool'::regtype and oprcom = 0; says we have 83 such operators without commutators. Of these, if I'm counting correctly, 26 are the LIKE and regex operators in question. It looks like at least another twenty do in fact constitute commutator pairs, they're just not documented as such via oprcom. The remaining forty or less are a morass of functions for obsolete types, duplicate names for the same function, etc. I don't think it's unreasonable at all to establish an expectation that all non-legacy binary boolean operators should come in commutator pairs. regards, tom lane
On Jun16, 2011, at 04:19 , Tom Lane wrote: > Florian Pflug <fgp@phlo.org> writes: >> Comments are extremely welcome, especially ones regarding >> the overall approach taken in this patch. If people consider >> that to be acceptable, I'd try to add the missing features >> and add documentation. > > Quite honestly, I don't like this one bit and would rather you not > pursue the idea. There is no such syntax in the standard, and > presumably that's not because the SQL committee never thought of it. > They may have some incompatible idea in mind for the future, who knows? The SQL standard doesn't have CREATE OPERATOR though, so for them the asymmetry of the ANY/ALL constructs don't translate to a missing feature, though. For us, however it does. We might try to doge that by decreeing that boolean operators better have commutators, but that doesn't help for non-core-defined operators. So I'd very much like us to provide some way to get the effect of "ANY/ALL op scalar" without having to resort to UNNEST and BOOL_AND/OR. But I'm absolutely not wedded to the syntax "ANY/ALL op scalar". One other idea I've been kicking around is to generate commutators automatically for all binary operators that return boolean. If no name for the commutator is specified (i.e. if CREATE OPERATOR is called without a value for COMMUTATOR), it's be named "COMMUTATOR <op>", and would thus be useably only via OPERATOR(...). Implementation-wise we'd need to add a flag to FmgrInfo which tells the fmgr to swap the function's arguments, and would need to centralize the translation of operator OIDs to FmgrInfos. best regards, Florian Pflug
On Thu, Jun 16, 2011 at 12:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > We deprecated those names for the geometric operators largely because > there wasn't any visual correlation between the commutator pairs. > I can't see introducing the same pairing for regex operators if we > already decided the geometric case was a bad idea. I'm having trouble avoiding the conclusion that we're trying to shove a round peg into a square hole. The idea that we have to have a commutator for every operator just because we don't handle left and right symmetrically sits poorly with me. I can't really argue with your statement that it's the easiest way to address Florian's gripe, but because it almost surely is. But it still feels like a kludge. The syntax foo = ANY(bar) is really quite a poorly-designed syntax, because the top-level operation is really "ANY", and it has three arguments: foo, =, bar. If the SQL committee had standardized on ANY(foo = $0, bar) or some such thing we wouldn't be having this conversation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I'm having trouble avoiding the conclusion that we're trying to shove > a round peg into a square hole. The idea that we have to have a > commutator for every operator just because we don't handle left and > right symmetrically sits poorly with me. I can't really argue with > your statement that it's the easiest way to address Florian's gripe, > but because it almost surely is. But it still feels like a kludge. > The syntax foo = ANY(bar) is really quite a poorly-designed syntax, > because the top-level operation is really "ANY", and it has three > arguments: foo, =, bar. If the SQL committee had standardized on > ANY(foo = $0, bar) or some such thing we wouldn't be having this > conversation. [ shrug... ] Take it up with the committee. The syntax is what it is, and we should select our operators to fit it, not vice versa. regards, tom lane
On Jun16, 2011, at 19:54 , Robert Haas wrote: > On Thu, Jun 16, 2011 at 12:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> We deprecated those names for the geometric operators largely because >> there wasn't any visual correlation between the commutator pairs. >> I can't see introducing the same pairing for regex operators if we >> already decided the geometric case was a bad idea. > > I'm having trouble avoiding the conclusion that we're trying to shove > a round peg into a square hole. The idea that we have to have a > commutator for every operator just because we don't handle left and > right symmetrically sits poorly with me. I can't really argue with > your statement that it's the easiest way to address Florian's gripe, > but because it almost surely is. But it still feels like a kludge. Well, I think there are basically three choices here, kludge or no kludge. (1) We either decree once and for all that binary operations ought to have commutators, modify CREATE TYPE to issue a warning if you create one without, add the missing ones, and add a check for that to opr_sanity (possibly excluding some deprecated operators). or (2) We arrange for commutators of binary operators to be created automatically. or (3) Or we bit the bullet and provide something similar to "ANY/ALL op scalar". We do have the liberty to pick whatever syntax we feel comfortable with, though, since we're out of SQL standard territory anyway. What I *wouldn't* like us to is just a few missing commutators and be done with it. That pretty much guarantees that this issue will pop up again some time in the future. I personally prefer (3), but would also be content with (1), and be ready to provide a patch for that. To be fair, (1) really doesn't seem that kludgy if one takes into account that all indexable operators must have commutators anyway. I haven't checked how viable (2) actually is, but I dare say that it's probably quite a bit of work. Essentially, we'd need a way to automatically swap a function's argument before invoking the function, which I'm not sure that fmgr can cleanly be persuaded to do. Now all that's required is to agree on a way forward ;-) best regards, Florian Pflug
On Thu, Jun 16, 2011 at 2:22 PM, Florian Pflug <fgp@phlo.org> wrote: > On Jun16, 2011, at 19:54 , Robert Haas wrote: >> On Thu, Jun 16, 2011 at 12:50 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> We deprecated those names for the geometric operators largely because >>> there wasn't any visual correlation between the commutator pairs. >>> I can't see introducing the same pairing for regex operators if we >>> already decided the geometric case was a bad idea. >> >> I'm having trouble avoiding the conclusion that we're trying to shove >> a round peg into a square hole. The idea that we have to have a >> commutator for every operator just because we don't handle left and >> right symmetrically sits poorly with me. I can't really argue with >> your statement that it's the easiest way to address Florian's gripe, >> but because it almost surely is. But it still feels like a kludge. > > Well, I think there are basically three choices here, kludge or no > kludge. > > (1) We either decree once and for all that binary operations ought to > have commutators, modify CREATE TYPE to issue a warning if you > create one without, add the missing ones, and add a check for > that to opr_sanity (possibly excluding some deprecated operators). > > or > > (2) We arrange for commutators of binary operators to be created > automatically. > > or > > (3) Or we bit the bullet and provide something similar to > "ANY/ALL op scalar". We do have the liberty to pick whatever syntax we > feel comfortable with, though, since we're out of SQL standard territory > anyway. > > What I *wouldn't* like us to is just a few missing commutators and be > done with it. That pretty much guarantees that this issue will pop up > again some time in the future. > > I personally prefer (3), but would also be content with (1), and be > ready to provide a patch for that. To be fair, (1) really doesn't seem > that kludgy if one takes into account that all indexable operators must > have commutators anyway. > > I haven't checked how viable (2) actually is, but I dare say that it's > probably quite a bit of work. Essentially, we'd need a way to automatically > swap a function's argument before invoking the function, which I'm not > sure that fmgr can cleanly be persuaded to do. > > Now all that's required is to agree on a way forward ;-) Well, Tom seems pretty strongly in favor of #1, or some variant of it, and while I don't find that to be enormously elegant it does have the virtue of being quite a bit less work than any of the other options. I think the chances of that being a complete and permanent solution are less than 50%, but perhaps it's close enough for government work. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Florian Pflug <fgp@phlo.org> writes: > Well, I think there are basically three choices here, kludge or no > kludge. > (1) We either decree once and for all that binary operations ought to > have commutators, modify CREATE TYPE to issue a warning if you > create one without, add the missing ones, and add a check for > that to opr_sanity (possibly excluding some deprecated operators). > or > (2) We arrange for commutators of binary operators to be created > automatically. > or > (3) Or we bit the bullet and provide something similar to > "ANY/ALL op scalar". We do have the liberty to pick whatever syntax we > feel comfortable with, though, since we're out of SQL standard territory > anyway. All three of these are massive overkill. What we need is a general policy that providing commutators is a good idea. We do not need to try to make it 100.00% with an enforcement mechanism. As for #2, what's your plan for automatically selecting a commutator operator name? (Having said that, I *was* thinking of adding an opr_sanity test ... but not expecting that we'd get it to find zero rows.) regards, tom lane
On tor, 2011-06-16 at 00:50 -0400, Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > On tis, 2011-06-14 at 15:38 +0200, Florian Pflug wrote: > >> BTW, there's actually precedent for a commutator of "~", namely > >> "@". Some of the geometric types (polygon, box, circle, point, > >> path) use "~" as a commutator for "@" (which stands for > "contains"). > > > I wouldn't have a problem with naming the reverse operator "@". > > We deprecated those names for the geometric operators largely because > there wasn't any visual correlation between the commutator pairs. > I can't see introducing the same pairing for regex operators if we > already decided the geometric case was a bad idea. I actually reported the exact issue that Florian reported a while ago and we had this same sort of discussion. I think I'm running with a custom operator named ~~~ somewhere in production. So yay for adding a commutator in any case. I don't really agree that visual correlation needs to trump everything. If say foo =~ bar and foo ~= bar were to produce completely different results, this would introduce bugs all over the place. Most programming languages would get away with this kind of issue because the pattern has a different data type than the string to be matched against, so mistakes will be caught. Looking at the list of geometric operators, I can't help but feel that the silliness of operator naming is reaching its limits. We can probably come up with a few more for this particular problem, but long term we might want to think of other solutions, such as attaching the optimization information to functions instead, and/or inventing an infix function call syntax like in Haskell.
Peter Eisentraut <peter_e@gmx.net> writes: > I don't really agree that visual correlation needs to trump everything. > If say > foo =~ bar > and > foo ~= bar > were to produce completely different results, this would introduce bugs > all over the place. Huh? That's about like arguing that standard mathematical notation is broken because a < b and a > b don't produce the same result. regards, tom lane
On Jun16, 2011, at 21:49 , Tom Lane wrote: > All three of these are massive overkill. What we need is a general > policy that providing commutators is a good idea. We do not need to try > to make it 100.00% with an enforcement mechanism. What parts of (1) do you think are overkill exactly, then? > As for #2, what's > your plan for automatically selecting a commutator operator name? I figured we'd name it "COMMUTATOR <op>" or something along this line. That'd mean it'd only be useable with the OPERATOR() syntax, but that's way better than nothing. Or we could even make the COMMUTATOR argument mandatory for binary operators returning boolean. After all, if a commutator doesn't require a second function, than I fail to see why you'd ever want to define a predicate without a commutator. In any case, yeah, (2) is pretty hand-weavy. I included so that we'd have all the options on the table, not because I think it's particularly elegant, easy, or interesting to implement (actually, it's probably none of these). > (Having said that, I *was* thinking of adding an opr_sanity test ... but > not expecting that we'd get it to find zero rows.) Well, as long as there is some regression test failure for missing commutators of newly added binary boolean operators, I'm content. best regards, Florian Pflug
Excerpts from Tom Lane's message of jue jun 16 17:33:17 -0400 2011: > Peter Eisentraut <peter_e@gmx.net> writes: > > I don't really agree that visual correlation needs to trump everything. > > If say > > foo =~ bar > > and > > foo ~= bar > > were to produce completely different results, this would introduce bugs > > all over the place. > > Huh? That's about like arguing that standard mathematical notation is > broken because a < b and a > b don't produce the same result. The difference is that the mnemonic for > and < is very simple and in widespread knowledge; not something I would say for =~'s rule of "the ~ is on the side of the regexp". I know I used to get it wrong in Perl (i.e. I wrote ~= occasionally). To make matters worse, our delimiters for regexes are the same as for strings, the single quote. So you get foo =~ 'bar' /* foo is the text column, bar is the regex */ 'bar' =~ foo /* no complaint but it's wrong */ 'bar' ~= foo /* okay */ 'foo' ~= bar /* no complaint but it's wrong */ How do I tell which is the regex here? If we used, say, /, that would be a different matter: foo =~ /bar/ /bar/ ~= foo /* both okay */ If we had that and you get it wrong, the parser would immediately barf at you if you got it wrong: /bar/ =~ foo /* wrong: LHS wanted text, got regex */ foo ~= /bar/ /* wrong: LHS wanted regex, got text */ (Note: I'm not suggesting we use / as delimiter. This is just an example.) -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Jun17, 2011, at 03:42 , Alvaro Herrera wrote: > To make matters worse, our delimiters for regexes are the same as for > strings, the single quote. So you get > > foo =~ 'bar' /* foo is the text column, bar is the regex */ > 'bar' =~ foo /* no complaint but it's wrong */ > > 'bar' ~= foo /* okay */ > 'foo' ~= bar /* no complaint but it's wrong */ > > How do I tell which is the regex here? If we used, say, /, that would > be a different matter: How is this different from the situation today where the operator is just "~"? best regards, Florian Pflug
Excerpts from Florian Pflug's message of vie jun 17 04:46:32 -0400 2011: > On Jun17, 2011, at 03:42 , Alvaro Herrera wrote: > > To make matters worse, our delimiters for regexes are the same as for > > strings, the single quote. So you get > > > > foo =~ 'bar' /* foo is the text column, bar is the regex */ > > 'bar' =~ foo /* no complaint but it's wrong */ > > > > 'bar' ~= foo /* okay */ > > 'foo' ~= bar /* no complaint but it's wrong */ > > > > How do I tell which is the regex here? If we used, say, /, that would > > be a different matter: > > How is this different from the situation today where the operator > is just "~"? Err, we don't have commutators today? -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Jun17, 2011, at 15:36 , Alvaro Herrera wrote: > Excerpts from Florian Pflug's message of vie jun 17 04:46:32 -0400 2011: >> On Jun17, 2011, at 03:42 , Alvaro Herrera wrote: >>> To make matters worse, our delimiters for regexes are the same as for >>> strings, the single quote. So you get >>> >>> foo =~ 'bar' /* foo is the text column, bar is the regex */ >>> 'bar' =~ foo /* no complaint but it's wrong */ >>> >>> 'bar' ~= foo /* okay */ >>> 'foo' ~= bar /* no complaint but it's wrong */ >>> >>> How do I tell which is the regex here? If we used, say, /, that would >>> be a different matter: >> >> How is this different from the situation today where the operator >> is just "~"? > > Err, we don't have commutators today? So? How does that reduce that risk of somebody writing "pattern ~ text" instead of "text ~ pattern"? Modifying your quote from above -------- foo ~ 'bar' /* foo is the text column, bar is the regex */ 'bar' ~ foo /* no complaint but it's wrong */ How do I tell which is the regex here? -------- How is that worse than the situation with "=~" and "~="? "=~" and "~=" at least don't *look* symmetric when they really are not, which is the heart of the complaint, and also what makes defining a sensible commutator impossible. Also, do you have a better suggestion for how we can fix my original gripe? Adding support for 'ANY/ALL op scalar" was shot down by Tom, so it looks like we need a commutator for "~". "@" is severely disliked by Tom, on the grounds that it's already been deprecated in other places. "=~" is argued against by you and Robert Haas (I think). We're running out of options here... best regards, Florian Pflug
Excerpts from Florian Pflug's message of vie jun 17 10:03:56 -0400 2011: > On Jun17, 2011, at 15:36 , Alvaro Herrera wrote: > > Excerpts from Florian Pflug's message of vie jun 17 04:46:32 -0400 2011: > >> On Jun17, 2011, at 03:42 , Alvaro Herrera wrote: > >>> To make matters worse, our delimiters for regexes are the same as for > >>> strings, the single quote. So you get > >>> > >>> foo =~ 'bar' /* foo is the text column, bar is the regex */ > >>> 'bar' =~ foo /* no complaint but it's wrong */ > >>> > >>> 'bar' ~= foo /* okay */ > >>> 'foo' ~= bar /* no complaint but it's wrong */ > >>> > >>> How do I tell which is the regex here? If we used, say, /, that would > >>> be a different matter: > >> > >> How is this different from the situation today where the operator > >> is just "~"? > > > > Err, we don't have commutators today? > > > So? How does that reduce that risk of somebody writing "pattern ~ text" > instead of "text ~ pattern"? Modifying your quote from above > -------- > foo ~ 'bar' /* foo is the text column, bar is the regex */ > 'bar' ~ foo /* no complaint but it's wrong */ > > How do I tell which is the regex here? > -------- The regex is always to the right of the operator. > How is that worse than the situation with "=~" and "~="? With =~ it is to the right, with ~= it is to the left. I have sometimes needed to look up which is which on ~ and ~~. I assume that whichever way we go here, we're still going to have to look up operator definitions in docs or online help. This kind of help doesn't, err, help all that much: alvherre=# \doS ~ Listado de operadores Esquema | Nombre | Tipo arg izq | Tipo arg der | Tiporesultado | Descripción ------------+--------+--------------+--------------+----------------+-------------------------------------------- ...pg_catalog | ~ | text | text | boolean | matches regular expression, case-sensitive Note that there's no way to tell which is the regex here. It'd be a lot better if the description was explicit about it. (Or, alternatively, use a different data type for regexes than plain text ... but that has been in the Todo list for years ...) > "=~" and "~=" at least don't *look* symmetric when they really are > not, which is the heart of the complaint, and also what makes defining > a sensible commutator impossible. > Also, do you have a better suggestion for how we can fix my original > gripe? Adding support for 'ANY/ALL op scalar" was shot down by Tom, > so it looks like we need a commutator for "~". "@" is severely disliked > by Tom, on the grounds that it's already been deprecated in other places. > "=~" is argued against by you and Robert Haas (I think). We're running > out of options here... Have ~ keep its existing semantics, use ~= for the commutator? There are a lot more chars allowed in operator names anyway, it doesn't seem to me like we need to limit ourselves to ~, = and @. I *do* like the idea of having commutate-ability for ANY/ALL, having needed it a couple of times in the past. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On 06/17/2011 10:20 AM, Alvaro Herrera wrote: > alvherre=# \doS ~ > > Listado de operadores > Esquema | Nombre | Tipo arg izq | Tipo arg der | Tipo resultado | Descripción > ------------+--------+--------------+--------------+----------------+-------------------------------------------- > ... > pg_catalog | ~ | text | text | boolean | matches regular expression, case-sensitive > > Note that there's no way to tell which is the regex here. It'd be a lot > better if the description was explicit about it. (Or, alternatively, > use a different data type for regexes than plain text ... but that has > been in the Todo list for years ...) +1 for improving the description. > > Have ~ keep its existing semantics, use ~= for the commutator? There > are a lot more chars allowed in operator names anyway, it doesn't seem > to me like we need to limit ourselves to ~, = and @. Yeah, maybe something like ~< for the commutator. (I know, we're bikeshedding somewhat.) > I *do* like the idea of having commutate-ability for ANY/ALL, having > needed it a couple of times in the past. > Indeed. me too. cheers andrew
On Jun17, 2011, at 16:20 , Alvaro Herrera wrote: > Excerpts from Florian Pflug's message of vie jun 17 10:03:56 -0400 2011: >> So? How does that reduce that risk of somebody writing "pattern ~ text" >> instead of "text ~ pattern"? Modifying your quote from above >> -------- >> foo ~ 'bar' /* foo is the text column, bar is the regex */ >> 'bar' ~ foo /* no complaint but it's wrong */ >> >> How do I tell which is the regex here? >> -------- > > The regex is always to the right of the operator. Which is something you have to remember... It's not in any way deducible from "foo ~ bar" alone. >> How is that worse than the situation with "=~" and "~="? > > With =~ it is to the right, with ~= it is to the left. It's always where the tilde is. Yeah, you have to remember that. Just as today you have to remember that the pattern goes on the right side. > I have sometimes needed to look up which is which on ~ and ~~. > I assume that whichever way we go here, we're still going to have to > look up operator definitions in docs or online help. This kind of help > doesn't, err, help all that much: > > alvherre=# \doS ~ > > Listado de operadores > Esquema | Nombre | Tipo arg izq | Tipo arg der | Tipo resultado | Descripción > ------------+--------+--------------+--------------+----------------+-------------------------------------------- > ... > pg_catalog | ~ | text | text | boolean | matches regular expression, case-sensitive > > Note that there's no way to tell which is the regex here. It'd be a lot > better if the description was explicit about it. I'm all for it, let's change the description then! Shall I submit a patch? > (Or, alternatively, > use a different data type for regexes than plain text ... but that has > been in the Todo list for years ...) I actually like that idea. Since we probably don't want a type for every kind of pattern we support (like, similar to, regexp), such a type wouldn't be much more than a synonym for text though. I personally don't have a problem with that, but I somehow feel there's gonna be quite some pushback... >> Also, do you have a better suggestion for how we can fix my original >> gripe? Adding support for 'ANY/ALL op scalar" was shot down by Tom, >> so it looks like we need a commutator for "~". "@" is severely disliked >> by Tom, on the grounds that it's already been deprecated in other places. >> "=~" is argued against by you and Robert Haas (I think). We're running >> out of options here... > > Have ~ keep its existing semantics, use ~= for the commutator? So how does that make it any easier to tell what foo ~ bar and foo ~= bar mean? With that, neither the "pattern is always on the right" nor the "pattern goes where the tilde is" mnemonic works. Also, do we really want to end up with a large number of commutator pairs with totally unrelated names? I fear that this *will* seriously harm readability of SQL statements, and we'll regret it badly. best regards, Florian Pflug
On Fri, Jun 17, 2011 at 10:20:04AM -0400, Alvaro Herrera wrote: > Excerpts from Florian Pflug's message of vie jun 17 10:03:56 -0400 2011: > > > How is that worse than the situation with "=~" and "~="? > > With =~ it is to the right, with ~= it is to the left. To throw my user opinion into this ring (as a long time user of regexes in many different systems) I've always taken the ~ to be short hand for the 'approximately' notation (a squiggly equals) which has good semantic match in my mind: a regex match is sort of a fuzzy equality. With that model, the suggested pair is fairly mnemonic - the 'fuzzy' part i(the pattern) is next to the squiggles, the 'concrete' part goes by the equals. > I have sometimes needed to look up which is which on ~ and ~~. which has no such directionality, so yeah, no hinting there. Ross -- Ross Reedstrom, Ph.D. reedstrm@rice.edu Systems Engineer & Admin, Research Scientist phone: 713-348-6166 Connexions http://cnx.org fax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE
On Jun17, 2011, at 17:15 , Ross J. Reedstrom wrote: > On Fri, Jun 17, 2011 at 10:20:04AM -0400, Alvaro Herrera wrote: >> Excerpts from Florian Pflug's message of vie jun 17 10:03:56 -0400 2011: >> >>> How is that worse than the situation with "=~" and "~="? >> >> With =~ it is to the right, with ~= it is to the left. > > To throw my user opinion into this ring (as a long time user of regexes > in many different systems) I've always taken the ~ to be short hand for > the 'approximately' notation (a squiggly equals) which has good semantic > match in my mind: a regex match is sort of a fuzzy equality. With that > model, the suggested pair is fairly mnemonic - the 'fuzzy' part i(the > pattern) is next to the squiggles, the 'concrete' part goes by the > equals. Hey, that's my mnemonic device! ;-) best regards, Florian Pflug
Excerpts from Florian Pflug's message of vie jun 17 10:49:46 -0400 2011: > On Jun17, 2011, at 16:20 , Alvaro Herrera wrote: > > Excerpts from Florian Pflug's message of vie jun 17 10:03:56 -0400 2011: > >> So? How does that reduce that risk of somebody writing "pattern ~ text" > >> instead of "text ~ pattern"? Modifying your quote from above > >> -------- > >> foo ~ 'bar' /* foo is the text column, bar is the regex */ > >> 'bar' ~ foo /* no complaint but it's wrong */ > >> > >> How do I tell which is the regex here? > >> -------- > > > > The regex is always to the right of the operator. > > Which is something you have to remember... It's not in any > way deducible from "foo ~ bar" alone. Maybe, but the mnemonic rule seems quite a bit easier (to me anyway). In my head I think of ~ as "matches", so "text matches regex", whereas "regex matches text" doesn't make as much sense. (Hmm now that I see it, maybe in english this is not so clear, but in spanish the difference is pretty obvious). > >> How is that worse than the situation with "=~" and "~="? > > > > With =~ it is to the right, with ~= it is to the left. > > It's always where the tilde is. Yeah, you have to remember that. > Just as today you have to remember that the pattern goes on the > right side. Well, the mnemonic would be that ~ is still "text matches regex", while ~= is "the weird operator that goes the other way around", so it's still pretty clear. > > I have sometimes needed to look up which is which on ~ and ~~. > > I assume that whichever way we go here, we're still going to have to > > look up operator definitions in docs or online help. This kind of help > > doesn't, err, help all that much: > > > > alvherre=# \doS ~ > > > > Listado de operadores > > Esquema | Nombre | Tipo arg izq | Tipo arg der | Tipo resultado | Descripción > > ------------+--------+--------------+--------------+----------------+-------------------------------------------- > > ... > > pg_catalog | ~ | text | text | boolean | matches regular expression, case-sensitive > > > > Note that there's no way to tell which is the regex here. It'd be a lot > > better if the description was explicit about it. > > I'm all for it, let's change the description then! Shall I submit a patch? Yes, please. > > (Or, alternatively, > > use a different data type for regexes than plain text ... but that has > > been in the Todo list for years ...) > > I actually like that idea. Since we probably don't want a type for every > kind of pattern we support (like, similar to, regexp), such a type wouldn't > be much more than a synonym for text though. I personally don't have a > problem with that, but I somehow feel there's gonna be quite some pushback... Hmm, why? Maybe that's something we can discuss. > Also, do we really want to end up with a large number of commutator > pairs with totally unrelated names? I fear that this *will* seriously > harm readability of SQL statements, and we'll regret it badly. Hmm. I guess this wouldn't be much of a problem if you could use ANY/ALL with a function instead of an operator, c.f. map(). -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Fri, Jun 17, 2011 at 11:46 AM, Alvaro Herrera <alvherre@commandprompt.com> wrote: > I guess this wouldn't be much of a problem if you could use ANY/ALL with > a function instead of an operator, c.f. map(). Yeah. Or really what you want is a lambda-expression, rather than a predefined function. fold(bool_and, map { val ~ $0 } array) I suspect that's darn hard to make work though. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Fri, Jun 17, 2011 at 05:21:10PM +0200, Florian Pflug wrote: > On Jun17, 2011, at 17:15 , Ross J. Reedstrom wrote: > > On Fri, Jun 17, 2011 at 10:20:04AM -0400, Alvaro Herrera wrote: > >> Excerpts from Florian Pflug's message of vie jun 17 10:03:56 -0400 2011: > >> > >>> How is that worse than the situation with "=~" and "~="? > >> > >> With =~ it is to the right, with ~= it is to the left. > > > > To throw my user opinion into this ring (as a long time user of regexes > > in many different systems) I've always taken the ~ to be short hand for > > the 'approximately' notation (a squiggly equals) which has good semantic > > match in my mind: a regex match is sort of a fuzzy equality. With that > > model, the suggested pair is fairly mnemonic - the 'fuzzy' part i(the > > pattern) is next to the squiggles, the 'concrete' part goes by the > > equals. > > Hey, that's my mnemonic device! ;-) > Ah, good, so since this is almost mathematics, and we have two instances, that's a proof then. :-) Ross -- Ross Reedstrom, Ph.D. reedstrm@rice.edu Systems Engineer & Admin, Research Scientist phone: 713-348-6166 Connexions http://cnx.org fax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE
On Jun17, 2011, at 18:00 , Robert Haas wrote: > On Fri, Jun 17, 2011 at 11:46 AM, Alvaro Herrera > <alvherre@commandprompt.com> wrote: >> I guess this wouldn't be much of a problem if you could use ANY/ALL with >> a function instead of an operator, c.f. map(). > > Yeah. Or really what you want is a lambda-expression, rather than a > predefined function. > > fold(bool_and, map { val ~ $0 } array) Yeah, to bad we can't just write SELECT BOOL_AND(val ~ e) FROM UNNEST(array) Hey...wait a minute... ;-) (I guess you actually meant fold(bool_and, map { val ~ $0 } array) which the equivalent sub-select SELECT BOOL_AND(e ~ val) FROM UNNEST(array)) Still, you can't put that into a CHECK constraint (because it counts as sub-select) and it's considerable longer and harder to read then val = ANY(array) best regards, Florian Pflug
On Jun17, 2011, at 17:46 , Alvaro Herrera wrote: > Excerpts from Florian Pflug's message of vie jun 17 10:49:46 -0400 2011: > Maybe, but the mnemonic rule seems quite a bit easier (to me anyway). > In my head I think of ~ as "matches", so "text matches regex", whereas > "regex matches text" doesn't make as much sense. (Hmm now that I see > it, maybe in english this is not so clear, but in spanish the difference > is pretty obvious). I can't really argue with that, only state for that record that it's different for me. I think of "~" as "similar" or "approximately equal", and hence intuitively expect it to be symmetric. Whether or not "matches" technically implies some direction or not I cannot say as I'm not an english native speaker myself. But if I had to guess, I'd say it doesn't. >>>> How is that worse than the situation with "=~" and "~="? >>> >>> With =~ it is to the right, with ~= it is to the left. >> >> It's always where the tilde is. Yeah, you have to remember that. >> Just as today you have to remember that the pattern goes on the >> right side. > > Well, the mnemonic would be that ~ is still "text matches regex", while > ~= is "the weird operator that goes the other way around", so it's still > pretty clear. Again, that depends on a person's background. For me it'd be "~= is the regexp matching operator" and "~ is for some strange reasons its commutator". >>> 'm all for it, let's change the description then! Shall I submit a patch? > > Yes, please. Will do, but after we've reached an overall agreement about the fate or "~" and friends. >>> (Or, alternatively, >>> use a different data type for regexes than plain text ... but that has >>> been in the Todo list for years ...) >> >> I actually like that idea. Since we probably don't want a type for every >> kind of pattern we support (like, similar to, regexp), such a type wouldn't >> be much more than a synonym for text though. I personally don't have a >> problem with that, but I somehow feel there's gonna be quite some pushback... > > Hmm, why? Maybe that's something we can discuss. Ok, I'll start a new thread for this. best regards, Florian Pflug
On Thu, Jun 16, 2011 at 6:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > (I will refrain for > the moment from speculating whether we'll ever have an index type that > supports regexp match directly as an indexable operator...) Fwiw I looked into this at one point and have some ideas if anyone is keen to try it. -- greg
On Sun, Jun 19, 2011 at 02:48:58PM +0100, Greg Stark wrote: > On Thu, Jun 16, 2011 at 6:11 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > (I will refrain for the moment from speculating whether we'll ever > > have an index type that supports regexp match directly as an > > indexable operator...) > > Fwiw I looked into this at one point and have some ideas if anyone > is keen to try it. Please post them :) Cheers, David. -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
On Fri, Jun 17, 2011 at 3:49 PM, Florian Pflug <fgp@phlo.org> wrote: >> The regex is always to the right of the operator. > > Which is something you have to remember... It's not in any > way deducible from "foo ~ bar" alone. Except that it's always been this way, going back to perl4 or tcl or their predecessors. The regexp binding operator always has the regexp on the right. >>> How is that worse than the situation with "=~" and "~="? >> >> With =~ it is to the right, with ~= it is to the left. > > It's always where the tilde is. Yeah, you have to remember that. And when you get it wrong it will fail silently. No errors, just wrong results. While I've never accidentally written /foo/ =~ $_ in perl I have *frequently* forgotten whether the operator is ~= or =~. Actually I forget that pretty much every time I start writing some perl. I just put whichever comes first and if I get an error I reverse it. I can see the temptation to make it symmetric but it's going to cause an awful lot of confusion. Perhaps we could name the operators ~~= and =~~ and then have a =~ short-cut for compatibility? (and ~ too I guess?) -- greg
On Jun20, 2011, at 03:16 , Greg Stark wrote: > On Fri, Jun 17, 2011 at 3:49 PM, Florian Pflug <fgp@phlo.org> wrote: >>> The regex is always to the right of the operator. >> >> Which is something you have to remember... It's not in any >> way deducible from "foo ~ bar" alone. > > Except that it's always been this way, going back to perl4 or tcl or > their predecessors. The regexp binding operator always has the regexp > on the right. Yeah. The strength of that argument really depends on one's prior exposure to these languages, though... >>>> How is that worse than the situation with "=~" and "~="? >>> >>> With =~ it is to the right, with ~= it is to the left. >> >> It's always where the tilde is. Yeah, you have to remember that. > > And when you get it wrong it will fail silently. No errors, just wrong results. Yeah, but this is hardly the only case where you'll get unintended results if you mix up operator names. Now, one might argue, I guess, that mixing up "=~" and "~=" or more likely than mixing up, say, "~" and "~~". But ultimately, whether or not that is highly dependent on one's personal background, so we're unlikely to ever reach agreement on that... > While I've never accidentally written /foo/ =~ $_ in perl I have > *frequently* forgotten whether the operator is ~= or =~. Actually I > forget that pretty much every time I start writing some perl. I just > put whichever comes first and if I get an error I reverse it. Yeah, the nice thing in perl (and ruby also, which is *my* background) is that regexp's and strings are distinguished by the type system, and also by the parser. The latter (i.e. regexp literals enclosed by /../) probably isn't desirably for postgres, but the former definitely is (i.e. distinguishing regexp's and text in the type system). Please see the thread "Adding a distinct pattern type to resolve the ~ commutator stalemate" for the details of the proposal. > I can see the temptation to make it symmetric but it's going to cause > an awful lot of confusion. I do believe that by adding a distinct type we can actually *reduce* confusion. It makes "text ~ pattern" readable even for people who don't intuitively know that the pattern always goes on the right. best regards, Florian Pflug
Excerpts from Florian Pflug's message of lun jun 20 06:55:42 -0400 2011: > The latter (i.e. regexp literals enclosed by /../) probably isn't > desirably for postgres, but the former definitely is (i.e. distinguishing > regexp's and text in the type system). Please see the thread > "Adding a distinct pattern type to resolve the ~ commutator stalemate" > for the details of the proposal. 'your text' ~ regexp 'your.*foo' column ~ regexp 'your.*foo' So you could do regexp 'foo.*bar' ~ 'your text' and it's immediately clear what's up. The question is what to do wrt implicit casting of text to regexp. If we don't, there's a backwards compatibility hit. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Jun20, 2011, at 19:22 , Alvaro Herrera wrote: > Excerpts from Florian Pflug's message of lun jun 20 06:55:42 -0400 2011: >> The latter (i.e. regexp literals enclosed by /../) probably isn't >> desirably for postgres, but the former definitely is (i.e. distinguishing >> regexp's and text in the type system). Please see the thread >> "Adding a distinct pattern type to resolve the ~ commutator stalemate" >> for the details of the proposal. > > 'your text' ~ regexp 'your.*foo' > column ~ regexp 'your.*foo' > > So you could do > > regexp 'foo.*bar' ~ 'your text' > > and it's immediately clear what's up. > > The question is what to do wrt implicit casting of text to regexp. > If we don't, there's a backwards compatibility hit. No, we certainly musn't allow text to be implicitly converted to regexp, for otherwise e.g. "varchar ~ varchar" becomes ambiguous. I posted a primitive prototype for a pattern type on said thread, which seems to do everything we require without causing compatibility problems. best regards, Florian Pflug