Re: Unicode string literals versus the world

Поиск
Список
Период
Сортировка
От Marko Kreen
Тема Re: Unicode string literals versus the world
Дата
Msg-id e51f66da0904140438p599d8debj17114a0976295a13@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Unicode string literals versus the world  (Peter Eisentraut <peter_e@gmx.net>)
Ответы Re: Unicode string literals versus the world  (Andrew Dunstan <andrew@dunslane.net>)
Re: Unicode string literals versus the world  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Unicode string literals versus the world  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-hackers
On 4/14/09, Peter Eisentraut <peter_e@gmx.net> wrote:
> On Saturday 11 April 2009 00:54:25 Tom Lane wrote:
>  > It gets worse though: I have seldom seen such a badly designed piece of
>  > syntax as the Unicode string syntax --- see
>  > http://developer.postgresql.org/pgdocs/postgres/sql-syntax-lexical.html#SQL
>  >-SYNTAX-STRINGS-UESCAPE
>  >
>  > You scan the string, and then after that they tell you what the escape
>  > character is!?  Not to mention the obvious ambiguity with & as an
>  > operator.
>  >
>  > If we let this go into 8.4, our previous rounds with security holes
>  > caused by careless string parsing will look like a day at the beach.
>  > No frontend that isn't fully cognizant of the Unicode string syntax is
>  > going to parse such things correctly --- it's going to be trivial for
>  > a bad guy to confuse a quoting mechanism as to what's an escape and what
>  > isn't.
>
>
> Note that the escape character marks the Unicode escapes; it doesn't affect the
>  quote characters that delimit the string.  So offhand I can't see any potential
>  for quote confusion/SQL injection type problems.  Please elaborate if you see
>  a problem.
>
>  If there are problems, we could consider getting rid of the UESCAPE clause.
>  Without it, the U&'' strings would behave much like the E'' strings.  But I'd
>  like to understand the problem first.

I think the problem is that they should not act like E'' strings, but they
should act like plain '' strings - they should follow stdstr setting.

That way existing tools that may (or may not..) understand E'' and stdstr
settings, but definitely have not heard about U&'' strings can still
parse the SQL without new surprises.

If they already act that way then keeping U& should be fine.

And if UESCAPE does not affect main string parsing, but is handled in
second pass going over parsed string - like bytea \ - then that should
also be fine and should not cause any new surprises.

But if not, it must go.

I would prefer that such quoting extensions would wait until
stdstr=on setting is the only mode Postgres will operate.
Fitting new quoting ways to environment with flippable stdstr setting
will be rather painful for everyone.

I still stand on my proposal, how about extending E'' strings with
unicode escapes (eg. \uXXXX)?  The E'' strings are already more
clearly defined than '' and they are our "own", we don't need to
consider random standards, but can consider our sanity.

-- 
marko


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: Unicode string literals versus the world
Следующее
От: Zdenek Kotala
Дата:
Сообщение: libpq is not thread safe