Re: UTF8 national character data type support WIP patch and list of open issues.

Поиск
Список
Период
Сортировка
От Chapman Flack
Тема Re: UTF8 national character data type support WIP patch and list of open issues.
Дата
Msg-id df1325b316827117d1086cd0762a402b@anastigmatix.net
обсуждение исходный текст
Ответ на Re: UTF8 national character data type support WIP patch and list of open issues.  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

Although this is a ten-year-old message, it was the one I found quickly
when looking to see what the current state of play on this might be.

On 2013-09-20 14:22, Robert Haas wrote:
> Hmm.  So under that design, a database could support up to a total of
> two character sets, the one that you get when you say 'foo' and the
> other one that you get when you say n'foo'.
> 
> I guess we could do that, but it seems a bit limited.  If we're going
> to go to the trouble of supporting multiple character sets, why not
> support an arbitrary number instead of just two?

Because that old thread came to an end without mentioning how the
standard approaches that, it seemed worth adding, just to complete the
record.

In the draft of the standard I'm looking at (which is also around a
decade old), n'foo' is nothing but a handy shorthand for _csname'foo'
(which is a syntax we do not accept) for some particular csname that
was chosen when setting up the db.

So really, the standard contemplates letting you have columns of
arbitrary different charsets (CHAR(x) CHARACTER SET csname), and
literals of arbitrary charsets _csname'foo'. Then, as a bit of
sugar, you get to pick which two of those charsets you'd like
to have easy shorter ways of writing, 'foo' or n'foo',
CHAR or NCHAR.

The grammar for csname is kind of funky. It can be nothing but
<SQL language identifier>, which has the nice restricted form
/[A-Za-z][A-Za-z0-9_]*/. But it can also be schema-qualified,
with the schema of course being a full-fledged <identifier>.

So yeah, to fully meet this part of the standard, the parser'd
have to know that
  _U&"I am a schema nameZ0021" UESCAPE 'Z'/*hi!*/.LATIN1'foo'
is a string literal, expressing foo, in a character set named
LATIN1, in some cutely-named schema.

Never a dull moment.

Regards,
-Chap



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Yugo NAGATA
Дата:
Сообщение: Re: Incremental View Maintenance, take 2
Следующее
От: Yugo NAGATA
Дата:
Сообщение: Re: Incremental View Maintenance, take 2