Обсуждение: [BUGS] behavior difference in character literal vs national character literal

Поиск
Список
Период
Сортировка

[BUGS] behavior difference in character literal vs national character literal

От
Brad DeJong
Дата:
postgresql 9.6.1 windows 64-bit, pgadmin4 1.6

I am working with a colleague (who mainly works with a different dbms) who brought me the following question - why do these produce different results ...

> select 'a' || ' ' || 'b'
a b
> select 'a' || N' ' || 'b'
ab

(The answer in his specific case is that the database in question uses utf8 so just use character literals.)

Given that PostgreSQL is claiming compliance for SQL F421 National Character, I then decided to dig in a bit more in order to answer the question in general.

I ran "explain (analyze, verbose) select concat(concat('a', ' '), 'b')" which shows ' ' as ' '.

Then I ran "explain (analyze, verbose) select concat(concat('a', N' '), 'b')" which shows N' ' as ''::bpchar.

I do not claim to be a lex/yacc/... export but I think that this is happening because {xnstart} in src/backend/parser/scan.l emits an "NCHAR" and then the literal. Then src/backend/parser/gram.y sees the explicit tag followed by literal as "ConstTypename Sconst" and calls makeStringConstCast(). Because ConstTypename does not have a "VARYING" suffix, the cast is to bpchar which causes the trailing space to be truncated when the || (concat) operator casts bpchar to text (via the anytextcat(any, text) or textanycat(text, any) functions).

To provide evidence for/against that theory, I then ran

> select 'a' || char ' ' || 'b'
ab

and verified that this also trims the trailing space.

Given that PostgreSQL is silently converting NCHAR, NCHAR VARYING, ... to CHAR, CHAR VARYING, ... it seems to me like national character literals should also silently convert to character literals - with exactly the same behavior in regards to trimming of trailing spaces during casts to ::text.

I looked through several old mailing list discussions on NCHAR support in PostgreSQL and did not see anyone advocating that character literals should be handled as varchar while national character literals should be handled as bpchar.

It seems like the lexer could emit "NCHAR VARYING" followed by the literal or it could just emit the literal. Either of those should result in the national character literal being handled the same as a regular character literal.

Re: [BUGS] behavior difference in character literal vs national character literal

От
Tom Lane
Дата:
Brad DeJong <bpd0018@gmail.com> writes:
> Given that PostgreSQL is silently converting NCHAR, NCHAR VARYING, ... to
> CHAR, CHAR VARYING, ... it seems to me like national character literals
> should also silently convert to character literals - with exactly the same
> behavior in regards to trimming of trailing spaces during casts to ::text.

But ' ' without any decoration IS NOT a character literal.  It's a literal
of unknown type, which in cases like this will ultimately default to text,
but in other cases will be deemed to have other types.  N/NCHAR, on the
other hand, defines what the data type is, just as much as if you'd
written an explicit cast.

regression=# select 'a' || char ' ' || 'b';?column? 
----------ab
(1 row)

regression=# select 'a' || nchar ' ' || 'b';?column? 
----------ab
(1 row)

regression=# select 'a' || char varying ' ' || 'b';?column? 
----------a b
(1 row)

regression=# select 'a' || nchar varying ' ' || 'b';?column? 
----------a b
(1 row)

I don't see anything wrong there.  You can certainly quibble with our
basic choice to strip trailing blanks when converting from char to
varchar/text, but NCHAR isn't doing anything different from CHAR.
        regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs