Re: Unicode string literals versus the world

Поиск

Список

Период

Сортировка

От	Marko Kreen
Тема	Re: Unicode string literals versus the world
Дата	14 апреля 2009 г. 11:38:45
Msg-id	e51f66da0904140438p599d8debj17114a0976295a13@mail.gmail.com обсуждение исходный текст
Ответ на	Re: Unicode string literals versus the world (Peter Eisentraut <peter_e@gmx.net>)
Ответы	Re: Unicode string literals versus the world (Andrew Dunstan <andrew@dunslane.net>) Re: Unicode string literals versus the world (Tom Lane <tgl@sss.pgh.pa.us>) Re: Unicode string literals versus the world (Peter Eisentraut <peter_e@gmx.net>)
Список	pgsql-hackers

Дерево обсуждения

On 4/14/09, Peter Eisentraut <peter_e@gmx.net> wrote:
> On Saturday 11 April 2009 00:54:25 Tom Lane wrote:
>  > It gets worse though: I have seldom seen such a badly designed piece of
>  > syntax as the Unicode string syntax --- see
>  > http://developer.postgresql.org/pgdocs/postgres/sql-syntax-lexical.html#SQL
>  >-SYNTAX-STRINGS-UESCAPE
>  >
>  > You scan the string, and then after that they tell you what the escape
>  > character is!?  Not to mention the obvious ambiguity with & as an
>  > operator.
>  >
>  > If we let this go into 8.4, our previous rounds with security holes
>  > caused by careless string parsing will look like a day at the beach.
>  > No frontend that isn't fully cognizant of the Unicode string syntax is
>  > going to parse such things correctly --- it's going to be trivial for
>  > a bad guy to confuse a quoting mechanism as to what's an escape and what
>  > isn't.
>
>
> Note that the escape character marks the Unicode escapes; it doesn't affect the
>  quote characters that delimit the string.  So offhand I can't see any potential
>  for quote confusion/SQL injection type problems.  Please elaborate if you see
>  a problem.
>
>  If there are problems, we could consider getting rid of the UESCAPE clause.
>  Without it, the U&'' strings would behave much like the E'' strings.  But I'd
>  like to understand the problem first.

I think the problem is that they should not act like E'' strings, but they
should act like plain '' strings - they should follow stdstr setting.

That way existing tools that may (or may not..) understand E'' and stdstr
settings, but definitely have not heard about U&'' strings can still
parse the SQL without new surprises.

If they already act that way then keeping U& should be fine.

And if UESCAPE does not affect main string parsing, but is handled in
second pass going over parsed string - like bytea \ - then that should
also be fine and should not cause any new surprises.

But if not, it must go.

I would prefer that such quoting extensions would wait until
stdstr=on setting is the only mode Postgres will operate.
Fitting new quoting ways to environment with flippable stdstr setting
will be rather painful for everyone.

I still stand on my proposal, how about extending E'' strings with
unicode escapes (eg. \uXXXX)?  The E'' strings are already more
clearly defined than '' and they are our "own", we don't need to
consider random standards, but can consider our sanity.

-- 
marko

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Peter Eisentraut
Дата: 14 апреля 2009 г., 10:57:21
Сообщение: Re: Unicode string literals versus the world

Следующее

От: Zdenek Kotala
Дата: 14 апреля 2009 г., 11:53:37
Сообщение: libpq is not thread safe

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Unicode string literals versus the world

Предыдущее

Следующее