Обсуждение: Issue with CHAR column and "column LIKE column" condition

Поиск
Список
Период
Сортировка

Issue with CHAR column and "column LIKE column" condition

От
Manuel Rigger
Дата:
Hi everyone,

I was surprised by the behavior of LIKE and the CHAR type. Consider
the following statements:

CREATE TABLE t0(c0 CHAR(2)) ;
INSERT INTO t0(c0) VALUES('a');
SELECT * FROM t0 WHERE c0 LIKE c0; -- expected: fetches the row,
actual: does not fetch the row

According to the docs, CHAR values are padded, and the padding is
supposed to be semantically significant in "LIKE and regular
expressions" [1], which is why I would expect the query to be
equivalent to the following:

SELECT * FROM t0 WHERE 'a ' LIKE 'a '; -- fetches the row

It seems that the trailing spaces on the right hand side are
disregarded, but not on the left hand side:

SELECT 'a' LIKE c0 FROM t0; -- unexpected: TRUE
SELECT 'a ' LIKE c0 FROM t0; -- unexpected: FALSE
SELECT c0 LIKE 'a' FROM t0; -- FALSE
SELECT c0 LIKE 'a ' FROM t0; -- TRUE

Is this behavior expected or is this a bug? By the way, this is unlike
what happens in MySQL and SQLite3, where the row would be fetched.

[1] https://www.postgresql.org/docs/11/datatype-character.html

Best,
Manuel



Re: Issue with CHAR column and "column LIKE column" condition

От
Tom Lane
Дата:
Manuel Rigger <rigger.manuel@gmail.com> writes:
> I was surprised by the behavior of LIKE and the CHAR type. Consider
> the following statements:

> CREATE TABLE t0(c0 CHAR(2)) ;
> INSERT INTO t0(c0) VALUES('a');
> SELECT * FROM t0 WHERE c0 LIKE c0; -- expected: fetches the row,
> actual: does not fetch the row

Yeah.  That's because LIKE is not, in fact, symmetric.
The available LIKE operators are

regression=# \do ~~
                                     List of operators
   Schema   | Name | Left arg type | Right arg type | Result type |       Description
------------+------+---------------+----------------+-------------+-------------------------
 pg_catalog | ~~   | bytea         | bytea          | boolean     | matches LIKE expression
 pg_catalog | ~~   | character     | text           | boolean     | matches LIKE expression
 pg_catalog | ~~   | name          | text           | boolean     | matches LIKE expression
 pg_catalog | ~~   | text          | text           | boolean     | matches LIKE expression
(4 rows)

of which the first and third aren't relevant here, and we end up
choosing "character ~~ text" ... so the pattern side receives a
coercion to text, which strips its trailing blanks, and then you
have

    'a '::char(2) ~~ 'a'::text

which doesn't match.

The semantics of char(N) are a mess generally.  It's possible that
this particular case would act less surprisingly if we got rid of
the char ~~ text operator (forcing blank-stripping on both sides),
or adding char ~~ char (preventing any blank-stripping), but
probably somebody out there would complain if we did either,
because it'd break some other case.

Generally speaking, PG developers aren't excited about messing
around with the semantics of char(N) --- we think it's a legacy
datatype that you're best off not using.  Trying to make it act
less surprisingly would be a lot of work with, most likely,
negative return.  Anyone who *is* using it has probably tweaked
their app until they got acceptable results, and would complain
that we broke it.

            regards, tom lane