Re: Unicode string literals versus the world

Поиск
Список
Период
Сортировка
От Marko Kreen
Тема Re: Unicode string literals versus the world
Дата
Msg-id e51f66da0904140713r4144a5d9i1382af935de77c4@mail.gmail.com
обсуждение исходный текст
Ответ на Unicode string literals versus the world  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Unicode string literals versus the world  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-hackers
On 4/14/09, Peter Eisentraut <peter_e@gmx.net> wrote:
> On Tuesday 14 April 2009 14:38:38 Marko Kreen wrote:
>  > I think the problem is that they should not act like E'' strings, but they
>  > should act like plain '' strings - they should follow stdstr setting.
>  >
>  > That way existing tools that may (or may not..) understand E'' and stdstr
>  > settings, but definitely have not heard about U&'' strings can still
>  > parse the SQL without new surprises.
>
>
> Can you be more specific in what "surprises" you expect?  What algorithms do
>  you suppose those "existing tools" use and what expectations do they have?

If the parsing does not happen in 2 passes and it does not take account
of stdstr setting then the  default breakage would be:
  stdstr=off, U&' \' UESCAPE '!'.

And anything, whose security or functionality depends on parsing SQL
can be broken that way.

Broken functionality would be eg. Slony (or other replication solution)
distributing developer-written SQL code to bunch of nodes.  It needs to
parse text file to SQL statements and execute them separately.

There are probably other solutions who expect to understand SQL
at least token level to function correctly.  (pgpool, java has
probably something depending on it, etc.)

>  > I still stand on my proposal, how about extending E'' strings with
>  > unicode escapes (eg. \uXXXX)?  The E'' strings are already more
>  > clearly defined than '' and they are our "own", we don't need to
>  > consider random standards, but can consider our sanity.
>
>
> This doesn't excite me.  I think the tendency should be to get rid of E''
>  usage, because its definition of escape sequences is single-byte and ASCII
>  centric and thus overall a legacy construct.

Why are you concentrating only on \0xx escapes?  The \\, \n, etc
seem standard and forward looking enough.  Yes, unicode escapes are
missing but we can add them without breaking anything.

>  Certainly, we will want to keep
>  around E'' for a long time or forever, but it is a legitimate goal for
>  application writers to not use it, which is after all the reason behind this
>  whole standards-conforming strings project.  I wouldn't want to have a
>  forward-looking feature such as the Unicode escapes be burdened with that kind
>  of legacy behavior.
>
>  Also note that Unicode escapes are also available for identifiers, for which
>  there is no existing E"" that you can add it to.

Well, I was not rejecting the standard quoting, but suggesting
postponing until the stdstr mess is sorted out.  We can use \uXX
in meantime and I think most Postgres users would prefer to keep
using it...

-- 
marko


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Windowing functions vs aggregates
Следующее
От: "Kevin Grittner"
Дата:
Сообщение: Re: proposal: add columns created and altered to pg_proc and pg_class