Обсуждение: Comparison semantics of CHAR data type

Поиск
Список
Период
Сортировка

Comparison semantics of CHAR data type

От
Thomas Fanghaenel
Дата:
I was wondering about the proper semantics of CHAR comparisons in some corner cases that involve control characters with values that are less than 0x20 (space).

Consider the following testcase:

===
create table t (a int, b char(10));

insert into t values (1, 'foo');
insert into t values (2, 'foo ');
insert into t values (3, E'foo\t');
insert into t values (4, E'foo\n');
insert into t values (5, E'foo \n');
insert into t values (6, 'foobar');

select * from t order by b;
===

What's the proper order of these string values in the CHAR domain?  The way I interpret the SQL Standard (and assuming that \t and \n collate lower than a space), it's supposed to be this:

  (3) < (4) < (5) < (1) <= (2) < (6)

Postgres comes up with this:

  (1) <= (2) < (3) < (4) < (5) < (6)

The reason is that the bpchar functions that implement the relative comparison operators for CHAR(n) effectively strip trailing whitespaces before doing the comparison.  One might argue that doing this is not correct.  The standard seems to mandate that all CHAR(n) values are actually considered to be of width n, and that trailing spaces are indeed relevant for comparison.  In other words, stripping them would only be possible if it can be guaranteed that there are no characters in the character set that collate lower than a space.

Any thoughts on this?  I searched the mailing list archives, but couldn't find any relevant discussion.  There were plenty of threads that argue whether or not it's semantically correct to strip trailing spaces from CHAR(n) values, but the issue of characters collating below a space does not seem to have brought up in any of those discussions before.

Cheers,

-- Thomas