Unicode escapes in literals

Поиск
Список
Период
Сортировка
От Peter Eisentraut
Тема Unicode escapes in literals
Дата
Msg-id 490038DB.5070602@gmx.net
обсуждение исходный текст
Ответы Re: Unicode escapes in literals  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Unicode escapes in literals  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-hackers
I would like to add an escape mechanism to PostgreSQL for entering 
arbitrary Unicode characters into string literals.  We currently only 
have the option of entering the character directly via the keyboard or 
cut-and-paste, which is difficult for a number of reasons, such as when 
the font doesn't have the character, and entering the UTF8-encoded bytes 
using the E'...' strings, which is hardly usable.

SQL has the following escape syntax for it:
   U&'special character: \xxxx' [ UESCAPE '\' ]

where xxxx is the hexadecimal Unicode codepoint.  So this is pretty much 
just another variant on what the E'...' syntax does.

The trick is that since we have user-definable encoding conversion 
routines, we can't convert the Unicode codepoint to the server encoding 
in the scanner stage.  I imagine there are two ways to address this:

1. Only support this syntax when the server encoding is UTF8.  This 
would probably cover most use cases anyway.  We could have limited 
support for characters in the ASCII range for all server encodings.

2. Convert this syntax to a function call.  But that would then create a 
lot of inconsistencies, such as needing functional indexes for matches 
against what should really be a literal.

I'd be happy to start with UTF8 support only.  Other ideas?


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: Deriving Recovery Snapshots
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Block level concurrency during recovery