[BUGS] behavior difference in character literal vs national character literal

Поиск
Список
Период
Сортировка
От Brad DeJong
Тема [BUGS] behavior difference in character literal vs national character literal
Дата
Msg-id CAJnrtnwyYxBcBhe0W04Nq3Godx0Ch3PyC-OVPREkGf_0jGztJw@mail.gmail.com
обсуждение исходный текст
Ответы Re: [BUGS] behavior difference in character literal vs national character literal  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
postgresql 9.6.1 windows 64-bit, pgadmin4 1.6

I am working with a colleague (who mainly works with a different dbms) who brought me the following question - why do these produce different results ...

> select 'a' || ' ' || 'b'
a b
> select 'a' || N' ' || 'b'
ab

(The answer in his specific case is that the database in question uses utf8 so just use character literals.)

Given that PostgreSQL is claiming compliance for SQL F421 National Character, I then decided to dig in a bit more in order to answer the question in general.

I ran "explain (analyze, verbose) select concat(concat('a', ' '), 'b')" which shows ' ' as ' '.

Then I ran "explain (analyze, verbose) select concat(concat('a', N' '), 'b')" which shows N' ' as ''::bpchar.

I do not claim to be a lex/yacc/... export but I think that this is happening because {xnstart} in src/backend/parser/scan.l emits an "NCHAR" and then the literal. Then src/backend/parser/gram.y sees the explicit tag followed by literal as "ConstTypename Sconst" and calls makeStringConstCast(). Because ConstTypename does not have a "VARYING" suffix, the cast is to bpchar which causes the trailing space to be truncated when the || (concat) operator casts bpchar to text (via the anytextcat(any, text) or textanycat(text, any) functions).

To provide evidence for/against that theory, I then ran

> select 'a' || char ' ' || 'b'
ab

and verified that this also trims the trailing space.

Given that PostgreSQL is silently converting NCHAR, NCHAR VARYING, ... to CHAR, CHAR VARYING, ... it seems to me like national character literals should also silently convert to character literals - with exactly the same behavior in regards to trimming of trailing spaces during casts to ::text.

I looked through several old mailing list discussions on NCHAR support in PostgreSQL and did not see anyone advocating that character literals should be handled as varchar while national character literals should be handled as bpchar.

It seems like the lexer could emit "NCHAR VARYING" followed by the literal or it could just emit the literal. Either of those should result in the national character literal being handled the same as a regular character literal.

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Etsuro Fujita
Дата:
Сообщение: Re: [HACKERS] [BUGS] BUG #14759: insert into foreign data partitionsfail
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: [BUGS] signal 11 segfaults with parallel workers