Unicode escapes in literals

Поиск

Список

Период

Сортировка

От	Peter Eisentraut
Тема	Unicode escapes in literals
Дата	23 октября 2008 г. 05:42:10
Msg-id	490038DB.5070602@gmx.net обсуждение исходный текст
Ответы	Re: Unicode escapes in literals Re: Unicode escapes in literals
Список	pgsql-hackers

Дерево обсуждения

I would like to add an escape mechanism to PostgreSQL for entering 
arbitrary Unicode characters into string literals.  We currently only 
have the option of entering the character directly via the keyboard or 
cut-and-paste, which is difficult for a number of reasons, such as when 
the font doesn't have the character, and entering the UTF8-encoded bytes 
using the E'...' strings, which is hardly usable.

SQL has the following escape syntax for it:
   U&'special character: \xxxx' [ UESCAPE '\' ]

where xxxx is the hexadecimal Unicode codepoint.  So this is pretty much 
just another variant on what the E'...' syntax does.

The trick is that since we have user-definable encoding conversion 
routines, we can't convert the Unicode codepoint to the server encoding 
in the scanner stage.  I imagine there are two ways to address this:

1. Only support this syntax when the server encoding is UTF8.  This 
would probably cover most use cases anyway.  We could have limited 
support for characters in the ASCII range for all server encodings.

2. Convert this syntax to a function call.  But that would then create a 
lot of inconsistencies, such as needing functional indexes for matches 
against what should really be a literal.

I'd be happy to start with UTF8 support only.  Other ideas?

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Unicode escapes in literals