Обсуждение: Modest proposal for making bpchar less inconsistent
It struck me that the real reason that we keep getting gripes about the weird behavior of CHAR(n) is that these functions (and, hence, their corresponding operators) fail to obey the "trailing blanks aren't significant" rule: regprocedure | prosrc -------------------------------------------+---------------------- bpcharlike(character,text) | textlike bpcharnlike(character,text) | textnlike bpcharicregexeq(character,text) | texticregexeq bpcharicregexne(character,text) | texticregexne bpcharregexeq(character,text) | textregexeq bpcharregexne(character,text) | textregexne bpchariclike(character,text) | texticlike bpcharicnlike(character,text) | texticnlike They're just relying on binary compatibility of bpchar to text ... but of course textlike etc. think trailing blanks are significant. Every other primitive operation we have for bpchar correctly ignores the trailing spaces. We could fix this, and save some catalog space too, if we simply deleted these functions/operators and let such calls devolve into implicit casts to text. This might annoy people who are actually writing trailing spaces in their patterns to make such cases work. But I think there are probably not too many such people, and having real consistency here is worth something. regards, tom lane
Dne pá 13. 9. 2019 16:43 uživatel Tom Lane <tgl@sss.pgh.pa.us> napsal:
It struck me that the real reason that we keep getting gripes about
the weird behavior of CHAR(n) is that these functions (and, hence,
their corresponding operators) fail to obey the "trailing blanks
aren't significant" rule:
regprocedure | prosrc
-------------------------------------------+----------------------
bpcharlike(character,text) | textlike
bpcharnlike(character,text) | textnlike
bpcharicregexeq(character,text) | texticregexeq
bpcharicregexne(character,text) | texticregexne
bpcharregexeq(character,text) | textregexeq
bpcharregexne(character,text) | textregexne
bpchariclike(character,text) | texticlike
bpcharicnlike(character,text) | texticnlike
They're just relying on binary compatibility of bpchar to text ...
but of course textlike etc. think trailing blanks are significant.
Every other primitive operation we have for bpchar correctly ignores
the trailing spaces.
We could fix this, and save some catalog space too, if we simply
deleted these functions/operators and let such calls devolve
into implicit casts to text.
This might annoy people who are actually writing trailing spaces
in their patterns to make such cases work. But I think there
are probably not too many such people, and having real consistency
here is worth something.
has sense
Pavel
regards, tom lane
On Fri, Sep 13, 2019 at 09:50:10PM +0200, Pavel Stehule wrote: > > > Dne pá 13. 9. 2019 16:43 uživatel Tom Lane <tgl@sss.pgh.pa.us> napsal: > > It struck me that the real reason that we keep getting gripes about > the weird behavior of CHAR(n) is that these functions (and, hence, > their corresponding operators) fail to obey the "trailing blanks > aren't significant" rule: > > regprocedure | prosrc > -------------------------------------------+---------------------- > bpcharlike(character,text) | textlike > bpcharnlike(character,text) | textnlike > bpcharicregexeq(character,text) | texticregexeq > bpcharicregexne(character,text) | texticregexne > bpcharregexeq(character,text) | textregexeq > bpcharregexne(character,text) | textregexne > bpchariclike(character,text) | texticlike > bpcharicnlike(character,text) | texticnlike > > They're just relying on binary compatibility of bpchar to text ... > but of course textlike etc. think trailing blanks are significant. > > Every other primitive operation we have for bpchar correctly ignores > the trailing spaces. > > We could fix this, and save some catalog space too, if we simply > deleted these functions/operators and let such calls devolve > into implicit casts to text. > > This might annoy people who are actually writing trailing spaces > in their patterns to make such cases work. But I think there > are probably not too many such people, and having real consistency > here is worth something. > > > has sense Yes, I think this is a great idea! -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
At Sat, 28 Sep 2019 08:22:22 -0400, Bruce Momjian <bruce@momjian.us> wrote in <20190928122222.GA26853@momjian.us> > On Fri, Sep 13, 2019 at 09:50:10PM +0200, Pavel Stehule wrote: > > > > > > Dne pá 13. 9. 2019 16:43 uživatel Tom Lane <tgl@sss.pgh.pa.us> napsal: > > > > It struck me that the real reason that we keep getting gripes about > > the weird behavior of CHAR(n) is that these functions (and, hence, > > their corresponding operators) fail to obey the "trailing blanks > > aren't significant" rule: > > > > regprocedure | prosrc > > -------------------------------------------+---------------------- > > bpcharlike(character,text) | textlike > > bpcharnlike(character,text) | textnlike > > bpcharicregexeq(character,text) | texticregexeq > > bpcharicregexne(character,text) | texticregexne > > bpcharregexeq(character,text) | textregexeq > > bpcharregexne(character,text) | textregexne > > bpchariclike(character,text) | texticlike > > bpcharicnlike(character,text) | texticnlike > > > > They're just relying on binary compatibility of bpchar to text ... > > but of course textlike etc. think trailing blanks are significant. > > > > Every other primitive operation we have for bpchar correctly ignores > > the trailing spaces. > > > > We could fix this, and save some catalog space too, if we simply > > deleted these functions/operators and let such calls devolve > > into implicit casts to text. > > > > This might annoy people who are actually writing trailing spaces > > in their patterns to make such cases work. But I think there > > are probably not too many such people, and having real consistency > > here is worth something. > > > > > > has sense > > Yes, I think this is a great idea! I totally agree. -- Kyotaro Horiguchi NTT Open Source Software Center
Kyotaro Horiguchi <horikyota.ntt@gmail.com> writes: > At Sat, 28 Sep 2019 08:22:22 -0400, Bruce Momjian <bruce@momjian.us> wrote in <20190928122222.GA26853@momjian.us> >> On Fri, Sep 13, 2019 at 09:50:10PM +0200, Pavel Stehule wrote: >>> Dne pá 13. 9. 2019 16:43 uživatel Tom Lane <tgl@sss.pgh.pa.us> napsal: >>>> It struck me that the real reason that we keep getting gripes about >>>> the weird behavior of CHAR(n) is that these functions (and, hence, >>>> their corresponding operators) fail to obey the "trailing blanks >>>> aren't significant" rule: >>>> ... >>>> We could fix this, and save some catalog space too, if we simply >>>> deleted these functions/operators and let such calls devolve >>>> into implicit casts to text. >> Yes, I think this is a great idea! > I totally agree. I experimented with this, as per the attached simple patch. If you just apply the catalog deletions, no regression test results change, which says more about our lack of test coverage in this area than anything else. So I added a few simple test cases too. However, playing with this more, I'm not sure it's the direction we want to go. I realized that the BPCHAR-related code paths in like_support.c are dead code with this patch, because it's no longer possible to match a LIKE/regex operator to a bpchar column. For example, in existing releases you can do regression=# create table t(f1 char(20) unique); CREATE TABLE regression=# explain select * from t where f1 like 'abcdef'; QUERY PLAN ------------------------------------------------------------------------ Index Only Scan using t_f1_key on t (cost=0.15..8.17 rows=1 width=24) Index Cond: (f1 = 'abcdef'::bpchar) Filter: (f1 ~~ 'abcdef'::text) (3 rows) regression=# explain select * from t where f1 like 'abcdef%'; QUERY PLAN ---------------------------------------------------------------------------- Bitmap Heap Scan on t (cost=4.23..14.39 rows=8 width=24) Filter: (f1 ~~ 'abcdef%'::text) -> Bitmap Index Scan on t_f1_key (cost=0.00..4.23 rows=8 width=0) Index Cond: ((f1 >= 'abcdef'::bpchar) AND (f1 < 'abcdeg'::bpchar)) (4 rows) But with this patch, you just get dumb seqscan plans because the expression trees now look like "f1::text ~~ constant" which doesn't match to an index on the bare column f1. If we wanted to preserve these index optimizations while still redefining the pattern match operators as ignoring trailing whitespace, we could keep the operators/functions but change them to point at new C functions that strip trailing blanks before invoking the pattern match machinery. Some thought would need to be given as well to whether like_fixed_prefix et al need to behave differently to agree with this behavior. (Offhand it seems like they might need to strip trailing blanks from what would otherwise be the fixed prefix, but I'm not quite sure.) That would be much more work than this patch of course (though still not an enormous amount), and I'm not quite sure if it's worth the trouble. Is this a case that anyone is using in practice? regards, tom lane diff --git a/src/include/catalog/pg_operator.dat b/src/include/catalog/pg_operator.dat index fa7dc96..1b430ef 100644 --- a/src/include/catalog/pg_operator.dat +++ b/src/include/catalog/pg_operator.dat @@ -1166,15 +1166,6 @@ oprnegate => '<>(bpchar,bpchar)', oprcode => 'bpchareq', oprrest => 'eqsel', oprjoin => 'eqjoinsel' }, -{ oid => '1055', oid_symbol => 'OID_BPCHAR_REGEXEQ_OP', - descr => 'matches regular expression, case-sensitive', - oprname => '~', oprleft => 'bpchar', oprright => 'text', oprresult => 'bool', - oprnegate => '!~(bpchar,text)', oprcode => 'bpcharregexeq', - oprrest => 'regexeqsel', oprjoin => 'regexeqjoinsel' }, -{ oid => '1056', descr => 'does not match regular expression, case-sensitive', - oprname => '!~', oprleft => 'bpchar', oprright => 'text', oprresult => 'bool', - oprnegate => '~(bpchar,text)', oprcode => 'bpcharregexne', - oprrest => 'regexnesel', oprjoin => 'regexnejoinsel' }, { oid => '1057', descr => 'not equal', oprname => '<>', oprleft => 'bpchar', oprright => 'bpchar', oprresult => 'bool', oprcom => '<>(bpchar,bpchar)', @@ -1444,15 +1435,6 @@ oprname => '!~~', oprleft => 'text', oprright => 'text', oprresult => 'bool', oprnegate => '~~(text,text)', oprcode => 'textnlike', oprrest => 'nlikesel', oprjoin => 'nlikejoinsel' }, -{ oid => '1211', oid_symbol => 'OID_BPCHAR_LIKE_OP', - descr => 'matches LIKE expression', - oprname => '~~', oprleft => 'bpchar', oprright => 'text', oprresult => 'bool', - oprnegate => '!~~(bpchar,text)', oprcode => 'bpcharlike', - oprrest => 'likesel', oprjoin => 'likejoinsel' }, -{ oid => '1212', descr => 'does not match LIKE expression', - oprname => '!~~', oprleft => 'bpchar', oprright => 'text', - oprresult => 'bool', oprnegate => '~~(bpchar,text)', oprcode => 'bpcharnlike', - oprrest => 'nlikesel', oprjoin => 'nlikejoinsel' }, # case-insensitive regex hacks { oid => '1226', oid_symbol => 'OID_NAME_ICREGEXEQ_OP', @@ -1475,17 +1457,6 @@ oprname => '!~*', oprleft => 'text', oprright => 'text', oprresult => 'bool', oprnegate => '~*(text,text)', oprcode => 'texticregexne', oprrest => 'icregexnesel', oprjoin => 'icregexnejoinsel' }, -{ oid => '1234', oid_symbol => 'OID_BPCHAR_ICREGEXEQ_OP', - descr => 'matches regular expression, case-insensitive', - oprname => '~*', oprleft => 'bpchar', oprright => 'text', oprresult => 'bool', - oprnegate => '!~*(bpchar,text)', oprcode => 'bpcharicregexeq', - oprrest => 'icregexeqsel', oprjoin => 'icregexeqjoinsel' }, -{ oid => '1235', - descr => 'does not match regular expression, case-insensitive', - oprname => '!~*', oprleft => 'bpchar', oprright => 'text', - oprresult => 'bool', oprnegate => '~*(bpchar,text)', - oprcode => 'bpcharicregexne', oprrest => 'icregexnesel', - oprjoin => 'icregexnejoinsel' }, # timestamptz operators { oid => '1320', descr => 'equal', @@ -2034,17 +2005,6 @@ oprname => '!~~*', oprleft => 'text', oprright => 'text', oprresult => 'bool', oprnegate => '~~*(text,text)', oprcode => 'texticnlike', oprrest => 'icnlikesel', oprjoin => 'icnlikejoinsel' }, -{ oid => '1629', oid_symbol => 'OID_BPCHAR_ICLIKE_OP', - descr => 'matches LIKE expression, case-insensitive', - oprname => '~~*', oprleft => 'bpchar', oprright => 'text', - oprresult => 'bool', oprnegate => '!~~*(bpchar,text)', - oprcode => 'bpchariclike', oprrest => 'iclikesel', - oprjoin => 'iclikejoinsel' }, -{ oid => '1630', descr => 'does not match LIKE expression, case-insensitive', - oprname => '!~~*', oprleft => 'bpchar', oprright => 'text', - oprresult => 'bool', oprnegate => '~~*(bpchar,text)', - oprcode => 'bpcharicnlike', oprrest => 'icnlikesel', - oprjoin => 'icnlikejoinsel' }, # NUMERIC type - OID's 1700-1799 { oid => '1751', descr => 'negate', diff --git a/src/include/catalog/pg_proc.dat b/src/include/catalog/pg_proc.dat index 58ea5b9..54a2de2 100644 --- a/src/include/catalog/pg_proc.dat +++ b/src/include/catalog/pg_proc.dat @@ -3360,13 +3360,6 @@ proname => 'mul_d_interval', prorettype => 'interval', proargtypes => 'float8 interval', prosrc => 'mul_d_interval' }, -{ oid => '1631', - proname => 'bpcharlike', prosupport => 'textlike_support', - prorettype => 'bool', proargtypes => 'bpchar text', prosrc => 'textlike' }, -{ oid => '1632', - proname => 'bpcharnlike', prorettype => 'bool', proargtypes => 'bpchar text', - prosrc => 'textnlike' }, - { oid => '1633', proname => 'texticlike', prosupport => 'texticlike_support', prorettype => 'bool', proargtypes => 'text text', prosrc => 'texticlike' }, @@ -3386,26 +3379,6 @@ proname => 'like_escape', prorettype => 'text', proargtypes => 'text text', prosrc => 'like_escape' }, -{ oid => '1656', - proname => 'bpcharicregexeq', prosupport => 'texticregexeq_support', - prorettype => 'bool', proargtypes => 'bpchar text', - prosrc => 'texticregexeq' }, -{ oid => '1657', - proname => 'bpcharicregexne', prorettype => 'bool', - proargtypes => 'bpchar text', prosrc => 'texticregexne' }, -{ oid => '1658', - proname => 'bpcharregexeq', prosupport => 'textregexeq_support', - prorettype => 'bool', proargtypes => 'bpchar text', prosrc => 'textregexeq' }, -{ oid => '1659', - proname => 'bpcharregexne', prorettype => 'bool', - proargtypes => 'bpchar text', prosrc => 'textregexne' }, -{ oid => '1660', - proname => 'bpchariclike', prosupport => 'texticlike_support', - prorettype => 'bool', proargtypes => 'bpchar text', prosrc => 'texticlike' }, -{ oid => '1661', - proname => 'bpcharicnlike', prorettype => 'bool', - proargtypes => 'bpchar text', prosrc => 'texticnlike' }, - # Oracle Compatibility Related Functions - By Edmund Mergl <E.Mergl@bawue.de> { oid => '868', descr => 'position of substring', proname => 'strpos', prorettype => 'int4', proargtypes => 'text text', diff --git a/src/test/regress/expected/char.out b/src/test/regress/expected/char.out index 991c771..db2624d 100644 --- a/src/test/regress/expected/char.out +++ b/src/test/regress/expected/char.out @@ -1,8 +1,10 @@ -- -- CHAR -- --- fixed-length by value --- internally passed by value if <= 4 bytes in storage +-- This type, known internally as bpchar, is the same as varchar or text +-- except that trailing blanks are considered insignificant in comparisons +-- (so we strip them when converting to text), and we blank-pad to the +-- declared length if there is one. SELECT char 'c' = char 'c' AS true; true ------ @@ -120,3 +122,50 @@ SELECT '' AS four, * FROM CHAR_TBL; | abcd (4 rows) +-- +-- Check regex and LIKE comparisons +-- +SELECT * FROM CHAR_TBL WHERE f1 LIKE 'a'; + f1 +------ + a +(1 row) + +SELECT * FROM CHAR_TBL WHERE f1 LIKE 'ab%'; + f1 +------ + ab + abcd + abcd +(3 rows) + +SELECT * FROM CHAR_TBL WHERE f1 ILIKE 'AB%'; + f1 +------ + ab + abcd + abcd +(3 rows) + +SELECT * FROM CHAR_TBL WHERE f1 ~ 'b'; + f1 +------ + ab + abcd + abcd +(3 rows) + +SELECT * FROM CHAR_TBL WHERE f1 ~ '^ab'; + f1 +------ + ab + abcd + abcd +(3 rows) + +SELECT * FROM CHAR_TBL WHERE f1 ~ '^ab$'; + f1 +------ + ab +(1 row) + diff --git a/src/test/regress/expected/char_1.out b/src/test/regress/expected/char_1.out index 8eff75a..758f9ef 100644 --- a/src/test/regress/expected/char_1.out +++ b/src/test/regress/expected/char_1.out @@ -1,8 +1,10 @@ -- -- CHAR -- --- fixed-length by value --- internally passed by value if <= 4 bytes in storage +-- This type, known internally as bpchar, is the same as varchar or text +-- except that trailing blanks are considered insignificant in comparisons +-- (so we strip them when converting to text), and we blank-pad to the +-- declared length if there is one. SELECT char 'c' = char 'c' AS true; true ------ @@ -120,3 +122,50 @@ SELECT '' AS four, * FROM CHAR_TBL; | abcd (4 rows) +-- +-- Check regex and LIKE comparisons +-- +SELECT * FROM CHAR_TBL WHERE f1 LIKE 'a'; + f1 +------ + a +(1 row) + +SELECT * FROM CHAR_TBL WHERE f1 LIKE 'ab%'; + f1 +------ + ab + abcd + abcd +(3 rows) + +SELECT * FROM CHAR_TBL WHERE f1 ILIKE 'AB%'; + f1 +------ + ab + abcd + abcd +(3 rows) + +SELECT * FROM CHAR_TBL WHERE f1 ~ 'b'; + f1 +------ + ab + abcd + abcd +(3 rows) + +SELECT * FROM CHAR_TBL WHERE f1 ~ '^ab'; + f1 +------ + ab + abcd + abcd +(3 rows) + +SELECT * FROM CHAR_TBL WHERE f1 ~ '^ab$'; + f1 +------ + ab +(1 row) + diff --git a/src/test/regress/expected/char_2.out b/src/test/regress/expected/char_2.out index f54736c..566ba1e 100644 --- a/src/test/regress/expected/char_2.out +++ b/src/test/regress/expected/char_2.out @@ -1,8 +1,10 @@ -- -- CHAR -- --- fixed-length by value --- internally passed by value if <= 4 bytes in storage +-- This type, known internally as bpchar, is the same as varchar or text +-- except that trailing blanks are considered insignificant in comparisons +-- (so we strip them when converting to text), and we blank-pad to the +-- declared length if there is one. SELECT char 'c' = char 'c' AS true; true ------ @@ -120,3 +122,50 @@ SELECT '' AS four, * FROM CHAR_TBL; | abcd (4 rows) +-- +-- Check regex and LIKE comparisons +-- +SELECT * FROM CHAR_TBL WHERE f1 LIKE 'a'; + f1 +------ + a +(1 row) + +SELECT * FROM CHAR_TBL WHERE f1 LIKE 'ab%'; + f1 +------ + ab + abcd + abcd +(3 rows) + +SELECT * FROM CHAR_TBL WHERE f1 ILIKE 'AB%'; + f1 +------ + ab + abcd + abcd +(3 rows) + +SELECT * FROM CHAR_TBL WHERE f1 ~ 'b'; + f1 +------ + ab + abcd + abcd +(3 rows) + +SELECT * FROM CHAR_TBL WHERE f1 ~ '^ab'; + f1 +------ + ab + abcd + abcd +(3 rows) + +SELECT * FROM CHAR_TBL WHERE f1 ~ '^ab$'; + f1 +------ + ab +(1 row) + diff --git a/src/test/regress/sql/char.sql b/src/test/regress/sql/char.sql index 235ec62..eed81e7 100644 --- a/src/test/regress/sql/char.sql +++ b/src/test/regress/sql/char.sql @@ -2,8 +2,10 @@ -- CHAR -- --- fixed-length by value --- internally passed by value if <= 4 bytes in storage +-- This type, known internally as bpchar, is the same as varchar or text +-- except that trailing blanks are considered insignificant in comparisons +-- (so we strip them when converting to text), and we blank-pad to the +-- declared length if there is one. SELECT char 'c' = char 'c' AS true; @@ -73,3 +75,14 @@ INSERT INTO CHAR_TBL (f1) VALUES ('abcde'); INSERT INTO CHAR_TBL (f1) VALUES ('abcd '); SELECT '' AS four, * FROM CHAR_TBL; + +-- +-- Check regex and LIKE comparisons +-- + +SELECT * FROM CHAR_TBL WHERE f1 LIKE 'a'; +SELECT * FROM CHAR_TBL WHERE f1 LIKE 'ab%'; +SELECT * FROM CHAR_TBL WHERE f1 ILIKE 'AB%'; +SELECT * FROM CHAR_TBL WHERE f1 ~ 'b'; +SELECT * FROM CHAR_TBL WHERE f1 ~ '^ab'; +SELECT * FROM CHAR_TBL WHERE f1 ~ '^ab$';