Обсуждение: Modest proposal for making bpchar less inconsistent

Поиск
Список
Период
Сортировка

Modest proposal for making bpchar less inconsistent

От
Tom Lane
Дата:
It struck me that the real reason that we keep getting gripes about
the weird behavior of CHAR(n) is that these functions (and, hence,
their corresponding operators) fail to obey the "trailing blanks
aren't significant" rule:

               regprocedure                |        prosrc        
-------------------------------------------+----------------------
 bpcharlike(character,text)                | textlike
 bpcharnlike(character,text)               | textnlike
 bpcharicregexeq(character,text)           | texticregexeq
 bpcharicregexne(character,text)           | texticregexne
 bpcharregexeq(character,text)             | textregexeq
 bpcharregexne(character,text)             | textregexne
 bpchariclike(character,text)              | texticlike
 bpcharicnlike(character,text)             | texticnlike

They're just relying on binary compatibility of bpchar to text ...
but of course textlike etc. think trailing blanks are significant.

Every other primitive operation we have for bpchar correctly ignores
the trailing spaces.

We could fix this, and save some catalog space too, if we simply
deleted these functions/operators and let such calls devolve
into implicit casts to text.

This might annoy people who are actually writing trailing spaces
in their patterns to make such cases work.  But I think there
are probably not too many such people, and having real consistency
here is worth something.

            regards, tom lane



Re: Modest proposal for making bpchar less inconsistent

От
Pavel Stehule
Дата:


Dne pá 13. 9. 2019 16:43 uživatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
It struck me that the real reason that we keep getting gripes about
the weird behavior of CHAR(n) is that these functions (and, hence,
their corresponding operators) fail to obey the "trailing blanks
aren't significant" rule:

               regprocedure                |        prosrc       
-------------------------------------------+----------------------
 bpcharlike(character,text)                | textlike
 bpcharnlike(character,text)               | textnlike
 bpcharicregexeq(character,text)           | texticregexeq
 bpcharicregexne(character,text)           | texticregexne
 bpcharregexeq(character,text)             | textregexeq
 bpcharregexne(character,text)             | textregexne
 bpchariclike(character,text)              | texticlike
 bpcharicnlike(character,text)             | texticnlike

They're just relying on binary compatibility of bpchar to text ...
but of course textlike etc. think trailing blanks are significant.

Every other primitive operation we have for bpchar correctly ignores
the trailing spaces.

We could fix this, and save some catalog space too, if we simply
deleted these functions/operators and let such calls devolve
into implicit casts to text.

This might annoy people who are actually writing trailing spaces
in their patterns to make such cases work.  But I think there
are probably not too many such people, and having real consistency
here is worth something.

has sense

Pavel

                        regards, tom lane


Re: Modest proposal for making bpchar less inconsistent

От
Bruce Momjian
Дата:
On Fri, Sep 13, 2019 at 09:50:10PM +0200, Pavel Stehule wrote:
> 
> 
> Dne pá 13. 9. 2019 16:43 uživatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
> 
>     It struck me that the real reason that we keep getting gripes about
>     the weird behavior of CHAR(n) is that these functions (and, hence,
>     their corresponding operators) fail to obey the "trailing blanks
>     aren't significant" rule:
> 
>                    regprocedure                |        prosrc       
>     -------------------------------------------+----------------------
>      bpcharlike(character,text)                | textlike
>      bpcharnlike(character,text)               | textnlike
>      bpcharicregexeq(character,text)           | texticregexeq
>      bpcharicregexne(character,text)           | texticregexne
>      bpcharregexeq(character,text)             | textregexeq
>      bpcharregexne(character,text)             | textregexne
>      bpchariclike(character,text)              | texticlike
>      bpcharicnlike(character,text)             | texticnlike
> 
>     They're just relying on binary compatibility of bpchar to text ...
>     but of course textlike etc. think trailing blanks are significant.
> 
>     Every other primitive operation we have for bpchar correctly ignores
>     the trailing spaces.
> 
>     We could fix this, and save some catalog space too, if we simply
>     deleted these functions/operators and let such calls devolve
>     into implicit casts to text.
> 
>     This might annoy people who are actually writing trailing spaces
>     in their patterns to make such cases work.  But I think there
>     are probably not too many such people, and having real consistency
>     here is worth something.
> 
> 
> has sense

Yes, I think this is a great idea!

-- 
  Bruce Momjian  <bruce@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



Re: Modest proposal for making bpchar less inconsistent

От
Kyotaro Horiguchi
Дата:
At Sat, 28 Sep 2019 08:22:22 -0400, Bruce Momjian <bruce@momjian.us> wrote in <20190928122222.GA26853@momjian.us>
> On Fri, Sep 13, 2019 at 09:50:10PM +0200, Pavel Stehule wrote:
> > 
> > 
> > Dne pá 13. 9. 2019 16:43 uživatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
> > 
> >     It struck me that the real reason that we keep getting gripes about
> >     the weird behavior of CHAR(n) is that these functions (and, hence,
> >     their corresponding operators) fail to obey the "trailing blanks
> >     aren't significant" rule:
> > 
> >                    regprocedure                |        prosrc       
> >     -------------------------------------------+----------------------
> >      bpcharlike(character,text)                | textlike
> >      bpcharnlike(character,text)               | textnlike
> >      bpcharicregexeq(character,text)           | texticregexeq
> >      bpcharicregexne(character,text)           | texticregexne
> >      bpcharregexeq(character,text)             | textregexeq
> >      bpcharregexne(character,text)             | textregexne
> >      bpchariclike(character,text)              | texticlike
> >      bpcharicnlike(character,text)             | texticnlike
> > 
> >     They're just relying on binary compatibility of bpchar to text ...
> >     but of course textlike etc. think trailing blanks are significant.
> > 
> >     Every other primitive operation we have for bpchar correctly ignores
> >     the trailing spaces.
> > 
> >     We could fix this, and save some catalog space too, if we simply
> >     deleted these functions/operators and let such calls devolve
> >     into implicit casts to text.
> > 
> >     This might annoy people who are actually writing trailing spaces
> >     in their patterns to make such cases work.  But I think there
> >     are probably not too many such people, and having real consistency
> >     here is worth something.
> > 
> > 
> > has sense
> 
> Yes, I think this is a great idea!

I totally agree.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Modest proposal for making bpchar less inconsistent

От
Tom Lane
Дата:
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes:
> At Sat, 28 Sep 2019 08:22:22 -0400, Bruce Momjian <bruce@momjian.us> wrote in <20190928122222.GA26853@momjian.us>
>> On Fri, Sep 13, 2019 at 09:50:10PM +0200, Pavel Stehule wrote:
>>> Dne pá 13. 9. 2019 16:43 uživatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
>>>> It struck me that the real reason that we keep getting gripes about
>>>> the weird behavior of CHAR(n) is that these functions (and, hence,
>>>> their corresponding operators) fail to obey the "trailing blanks
>>>> aren't significant" rule:
>>>> ...
>>>> We could fix this, and save some catalog space too, if we simply
>>>> deleted these functions/operators and let such calls devolve
>>>> into implicit casts to text.

>> Yes, I think this is a great idea!

> I totally agree.

I experimented with this, as per the attached simple patch.  If you just
apply the catalog deletions, no regression test results change, which
says more about our lack of test coverage in this area than anything else.
So I added a few simple test cases too.

However, playing with this more, I'm not sure it's the direction we want
to go.  I realized that the BPCHAR-related code paths in like_support.c
are dead code with this patch, because it's no longer possible to match
a LIKE/regex operator to a bpchar column.  For example, in existing
releases you can do

regression=# create table t(f1 char(20) unique);
CREATE TABLE
regression=# explain select * from t where f1 like 'abcdef';
                               QUERY PLAN
------------------------------------------------------------------------
 Index Only Scan using t_f1_key on t  (cost=0.15..8.17 rows=1 width=24)
   Index Cond: (f1 = 'abcdef'::bpchar)
   Filter: (f1 ~~ 'abcdef'::text)
(3 rows)

regression=# explain select * from t where f1 like 'abcdef%';
                                 QUERY PLAN
----------------------------------------------------------------------------
 Bitmap Heap Scan on t  (cost=4.23..14.39 rows=8 width=24)
   Filter: (f1 ~~ 'abcdef%'::text)
   ->  Bitmap Index Scan on t_f1_key  (cost=0.00..4.23 rows=8 width=0)
         Index Cond: ((f1 >= 'abcdef'::bpchar) AND (f1 < 'abcdeg'::bpchar))
(4 rows)

But with this patch, you just get dumb seqscan plans because the
expression trees now look like "f1::text ~~ constant" which doesn't
match to an index on the bare column f1.

If we wanted to preserve these index optimizations while still
redefining the pattern match operators as ignoring trailing whitespace,
we could keep the operators/functions but change them to point at new
C functions that strip trailing blanks before invoking the pattern
match machinery.  Some thought would need to be given as well to whether
like_fixed_prefix et al need to behave differently to agree with this
behavior.  (Offhand it seems like they might need to strip trailing
blanks from what would otherwise be the fixed prefix, but I'm not
quite sure.)

That would be much more work than this patch of course (though still
not an enormous amount), and I'm not quite sure if it's worth the
trouble.  Is this a case that anyone is using in practice?

            regards, tom lane

diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat
index fa7dc96..1b430ef 100644
--- a/src/include/catalog/pg_operator.dat
+++ b/src/include/catalog/pg_operator.dat
@@ -1166,15 +1166,6 @@
   oprnegate => '<>(bpchar,bpchar)', oprcode => 'bpchareq', oprrest => 'eqsel',
   oprjoin => 'eqjoinsel' },

-{ oid => '1055', oid_symbol => 'OID_BPCHAR_REGEXEQ_OP',
-  descr => 'matches regular expression, case-sensitive',
-  oprname => '~', oprleft => 'bpchar', oprright => 'text', oprresult => 'bool',
-  oprnegate => '!~(bpchar,text)', oprcode => 'bpcharregexeq',
-  oprrest => 'regexeqsel', oprjoin => 'regexeqjoinsel' },
-{ oid => '1056', descr => 'does not match regular expression, case-sensitive',
-  oprname => '!~', oprleft => 'bpchar', oprright => 'text', oprresult => 'bool',
-  oprnegate => '~(bpchar,text)', oprcode => 'bpcharregexne',
-  oprrest => 'regexnesel', oprjoin => 'regexnejoinsel' },
 { oid => '1057', descr => 'not equal',
   oprname => '<>', oprleft => 'bpchar', oprright => 'bpchar',
   oprresult => 'bool', oprcom => '<>(bpchar,bpchar)',
@@ -1444,15 +1435,6 @@
   oprname => '!~~', oprleft => 'text', oprright => 'text', oprresult => 'bool',
   oprnegate => '~~(text,text)', oprcode => 'textnlike', oprrest => 'nlikesel',
   oprjoin => 'nlikejoinsel' },
-{ oid => '1211', oid_symbol => 'OID_BPCHAR_LIKE_OP',
-  descr => 'matches LIKE expression',
-  oprname => '~~', oprleft => 'bpchar', oprright => 'text', oprresult => 'bool',
-  oprnegate => '!~~(bpchar,text)', oprcode => 'bpcharlike',
-  oprrest => 'likesel', oprjoin => 'likejoinsel' },
-{ oid => '1212', descr => 'does not match LIKE expression',
-  oprname => '!~~', oprleft => 'bpchar', oprright => 'text',
-  oprresult => 'bool', oprnegate => '~~(bpchar,text)', oprcode => 'bpcharnlike',
-  oprrest => 'nlikesel', oprjoin => 'nlikejoinsel' },

 # case-insensitive regex hacks
 { oid => '1226', oid_symbol => 'OID_NAME_ICREGEXEQ_OP',
@@ -1475,17 +1457,6 @@
   oprname => '!~*', oprleft => 'text', oprright => 'text', oprresult => 'bool',
   oprnegate => '~*(text,text)', oprcode => 'texticregexne',
   oprrest => 'icregexnesel', oprjoin => 'icregexnejoinsel' },
-{ oid => '1234', oid_symbol => 'OID_BPCHAR_ICREGEXEQ_OP',
-  descr => 'matches regular expression, case-insensitive',
-  oprname => '~*', oprleft => 'bpchar', oprright => 'text', oprresult => 'bool',
-  oprnegate => '!~*(bpchar,text)', oprcode => 'bpcharicregexeq',
-  oprrest => 'icregexeqsel', oprjoin => 'icregexeqjoinsel' },
-{ oid => '1235',
-  descr => 'does not match regular expression, case-insensitive',
-  oprname => '!~*', oprleft => 'bpchar', oprright => 'text',
-  oprresult => 'bool', oprnegate => '~*(bpchar,text)',
-  oprcode => 'bpcharicregexne', oprrest => 'icregexnesel',
-  oprjoin => 'icregexnejoinsel' },

 # timestamptz operators
 { oid => '1320', descr => 'equal',
@@ -2034,17 +2005,6 @@
   oprname => '!~~*', oprleft => 'text', oprright => 'text', oprresult => 'bool',
   oprnegate => '~~*(text,text)', oprcode => 'texticnlike',
   oprrest => 'icnlikesel', oprjoin => 'icnlikejoinsel' },
-{ oid => '1629', oid_symbol => 'OID_BPCHAR_ICLIKE_OP',
-  descr => 'matches LIKE expression, case-insensitive',
-  oprname => '~~*', oprleft => 'bpchar', oprright => 'text',
-  oprresult => 'bool', oprnegate => '!~~*(bpchar,text)',
-  oprcode => 'bpchariclike', oprrest => 'iclikesel',
-  oprjoin => 'iclikejoinsel' },
-{ oid => '1630', descr => 'does not match LIKE expression, case-insensitive',
-  oprname => '!~~*', oprleft => 'bpchar', oprright => 'text',
-  oprresult => 'bool', oprnegate => '~~*(bpchar,text)',
-  oprcode => 'bpcharicnlike', oprrest => 'icnlikesel',
-  oprjoin => 'icnlikejoinsel' },

 # NUMERIC type - OID's 1700-1799
 { oid => '1751', descr => 'negate',
diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat
index 58ea5b9..54a2de2 100644
--- a/src/include/catalog/pg_proc.dat
+++ b/src/include/catalog/pg_proc.dat
@@ -3360,13 +3360,6 @@
   proname => 'mul_d_interval', prorettype => 'interval',
   proargtypes => 'float8 interval', prosrc => 'mul_d_interval' },

-{ oid => '1631',
-  proname => 'bpcharlike', prosupport => 'textlike_support',
-  prorettype => 'bool', proargtypes => 'bpchar text', prosrc => 'textlike' },
-{ oid => '1632',
-  proname => 'bpcharnlike', prorettype => 'bool', proargtypes => 'bpchar text',
-  prosrc => 'textnlike' },
-
 { oid => '1633',
   proname => 'texticlike', prosupport => 'texticlike_support',
   prorettype => 'bool', proargtypes => 'text text', prosrc => 'texticlike' },
@@ -3386,26 +3379,6 @@
   proname => 'like_escape', prorettype => 'text', proargtypes => 'text text',
   prosrc => 'like_escape' },

-{ oid => '1656',
-  proname => 'bpcharicregexeq', prosupport => 'texticregexeq_support',
-  prorettype => 'bool', proargtypes => 'bpchar text',
-  prosrc => 'texticregexeq' },
-{ oid => '1657',
-  proname => 'bpcharicregexne', prorettype => 'bool',
-  proargtypes => 'bpchar text', prosrc => 'texticregexne' },
-{ oid => '1658',
-  proname => 'bpcharregexeq', prosupport => 'textregexeq_support',
-  prorettype => 'bool', proargtypes => 'bpchar text', prosrc => 'textregexeq' },
-{ oid => '1659',
-  proname => 'bpcharregexne', prorettype => 'bool',
-  proargtypes => 'bpchar text', prosrc => 'textregexne' },
-{ oid => '1660',
-  proname => 'bpchariclike', prosupport => 'texticlike_support',
-  prorettype => 'bool', proargtypes => 'bpchar text', prosrc => 'texticlike' },
-{ oid => '1661',
-  proname => 'bpcharicnlike', prorettype => 'bool',
-  proargtypes => 'bpchar text', prosrc => 'texticnlike' },
-
 # Oracle Compatibility Related Functions - By Edmund Mergl <E.Mergl@bawue.de>
 { oid => '868', descr => 'position of substring',
   proname => 'strpos', prorettype => 'int4', proargtypes => 'text text',
diff --git a/src/test/regress/expected/char.out b/src/test/regress/expected/char.out
index 991c771..db2624d 100644
--- a/src/test/regress/expected/char.out
+++ b/src/test/regress/expected/char.out
@@ -1,8 +1,10 @@
 --
 -- CHAR
 --
--- fixed-length by value
--- internally passed by value if <= 4 bytes in storage
+-- This type, known internally as bpchar, is the same as varchar or text
+-- except that trailing blanks are considered insignificant in comparisons
+-- (so we strip them when converting to text), and we blank-pad to the
+-- declared length if there is one.
 SELECT char 'c' = char 'c' AS true;
  true
 ------
@@ -120,3 +122,50 @@ SELECT '' AS four, * FROM CHAR_TBL;
       | abcd
 (4 rows)

+--
+-- Check regex and LIKE comparisons
+--
+SELECT * FROM CHAR_TBL WHERE f1 LIKE 'a';
+  f1
+------
+ a
+(1 row)
+
+SELECT * FROM CHAR_TBL WHERE f1 LIKE 'ab%';
+  f1
+------
+ ab
+ abcd
+ abcd
+(3 rows)
+
+SELECT * FROM CHAR_TBL WHERE f1 ILIKE 'AB%';
+  f1
+------
+ ab
+ abcd
+ abcd
+(3 rows)
+
+SELECT * FROM CHAR_TBL WHERE f1 ~ 'b';
+  f1
+------
+ ab
+ abcd
+ abcd
+(3 rows)
+
+SELECT * FROM CHAR_TBL WHERE f1 ~ '^ab';
+  f1
+------
+ ab
+ abcd
+ abcd
+(3 rows)
+
+SELECT * FROM CHAR_TBL WHERE f1 ~ '^ab$';
+  f1
+------
+ ab
+(1 row)
+
diff --git a/src/test/regress/expected/char_1.out b/src/test/regress/expected/char_1.out
index 8eff75a..758f9ef 100644
--- a/src/test/regress/expected/char_1.out
+++ b/src/test/regress/expected/char_1.out
@@ -1,8 +1,10 @@
 --
 -- CHAR
 --
--- fixed-length by value
--- internally passed by value if <= 4 bytes in storage
+-- This type, known internally as bpchar, is the same as varchar or text
+-- except that trailing blanks are considered insignificant in comparisons
+-- (so we strip them when converting to text), and we blank-pad to the
+-- declared length if there is one.
 SELECT char 'c' = char 'c' AS true;
  true
 ------
@@ -120,3 +122,50 @@ SELECT '' AS four, * FROM CHAR_TBL;
       | abcd
 (4 rows)

+--
+-- Check regex and LIKE comparisons
+--
+SELECT * FROM CHAR_TBL WHERE f1 LIKE 'a';
+  f1
+------
+ a
+(1 row)
+
+SELECT * FROM CHAR_TBL WHERE f1 LIKE 'ab%';
+  f1
+------
+ ab
+ abcd
+ abcd
+(3 rows)
+
+SELECT * FROM CHAR_TBL WHERE f1 ILIKE 'AB%';
+  f1
+------
+ ab
+ abcd
+ abcd
+(3 rows)
+
+SELECT * FROM CHAR_TBL WHERE f1 ~ 'b';
+  f1
+------
+ ab
+ abcd
+ abcd
+(3 rows)
+
+SELECT * FROM CHAR_TBL WHERE f1 ~ '^ab';
+  f1
+------
+ ab
+ abcd
+ abcd
+(3 rows)
+
+SELECT * FROM CHAR_TBL WHERE f1 ~ '^ab$';
+  f1
+------
+ ab
+(1 row)
+
diff --git a/src/test/regress/expected/char_2.out b/src/test/regress/expected/char_2.out
index f54736c..566ba1e 100644
--- a/src/test/regress/expected/char_2.out
+++ b/src/test/regress/expected/char_2.out
@@ -1,8 +1,10 @@
 --
 -- CHAR
 --
--- fixed-length by value
--- internally passed by value if <= 4 bytes in storage
+-- This type, known internally as bpchar, is the same as varchar or text
+-- except that trailing blanks are considered insignificant in comparisons
+-- (so we strip them when converting to text), and we blank-pad to the
+-- declared length if there is one.
 SELECT char 'c' = char 'c' AS true;
  true
 ------
@@ -120,3 +122,50 @@ SELECT '' AS four, * FROM CHAR_TBL;
       | abcd
 (4 rows)

+--
+-- Check regex and LIKE comparisons
+--
+SELECT * FROM CHAR_TBL WHERE f1 LIKE 'a';
+  f1
+------
+ a
+(1 row)
+
+SELECT * FROM CHAR_TBL WHERE f1 LIKE 'ab%';
+  f1
+------
+ ab
+ abcd
+ abcd
+(3 rows)
+
+SELECT * FROM CHAR_TBL WHERE f1 ILIKE 'AB%';
+  f1
+------
+ ab
+ abcd
+ abcd
+(3 rows)
+
+SELECT * FROM CHAR_TBL WHERE f1 ~ 'b';
+  f1
+------
+ ab
+ abcd
+ abcd
+(3 rows)
+
+SELECT * FROM CHAR_TBL WHERE f1 ~ '^ab';
+  f1
+------
+ ab
+ abcd
+ abcd
+(3 rows)
+
+SELECT * FROM CHAR_TBL WHERE f1 ~ '^ab$';
+  f1
+------
+ ab
+(1 row)
+
diff --git a/src/test/regress/sql/char.sql b/src/test/regress/sql/char.sql
index 235ec62..eed81e7 100644
--- a/src/test/regress/sql/char.sql
+++ b/src/test/regress/sql/char.sql
@@ -2,8 +2,10 @@
 -- CHAR
 --

--- fixed-length by value
--- internally passed by value if <= 4 bytes in storage
+-- This type, known internally as bpchar, is the same as varchar or text
+-- except that trailing blanks are considered insignificant in comparisons
+-- (so we strip them when converting to text), and we blank-pad to the
+-- declared length if there is one.

 SELECT char 'c' = char 'c' AS true;

@@ -73,3 +75,14 @@ INSERT INTO CHAR_TBL (f1) VALUES ('abcde');
 INSERT INTO CHAR_TBL (f1) VALUES ('abcd    ');

 SELECT '' AS four, * FROM CHAR_TBL;
+
+--
+-- Check regex and LIKE comparisons
+--
+
+SELECT * FROM CHAR_TBL WHERE f1 LIKE 'a';
+SELECT * FROM CHAR_TBL WHERE f1 LIKE 'ab%';
+SELECT * FROM CHAR_TBL WHERE f1 ILIKE 'AB%';
+SELECT * FROM CHAR_TBL WHERE f1 ~ 'b';
+SELECT * FROM CHAR_TBL WHERE f1 ~ '^ab';
+SELECT * FROM CHAR_TBL WHERE f1 ~ '^ab$';