Обсуждение: Regular expression character escape

Поиск
Список
Период
Сортировка

Regular expression character escape

От
Ronan Dunklau
Дата:
Hello.

I'd like to perform a query using user-submitted input in a regular
expression.

Something along the lines of:

select some_col
from some_table
where some_col ~ ('^' || user_submitted_input || '\d*$')

This query is looking for every value matching the user submitted input
with optional trailing decimal characters.

My problem is: the user can break this using special regular expression
characters (ex: 'test(').

I'm looking for a way to escape regular expression characters from the
input (maybe a function already exists for that purpose?).

I could do it on the client side, but I'm not really sure how python and
postgresql flavors of regular expressions differ.

I've seen that the (?q) modifier is supported (treat the rest of the
regexp as normal characters), but unless I missed something it I can't
use it anywhere else than at the beginning of the regexp.

From the docs at
http://www.postgresql.org/docs/current/static/functions-matching.html:

"Embedded options take effect at the ) terminating the sequence. They
can appear only at the start of an ARE."

Any idea on how to achieve this ?

Thank you.

--
Ronan Dunklau

Re: Regular expression character escape

От
"David Johnston"
Дата:
-----Original Message-----
From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Ronan Dunklau
Sent: Friday, February 24, 2012 6:34 AM
To: pgsql-general@postgresql.org
Subject: [GENERAL] Regular expression character escape

Hello.

I'd like to perform a query using user-submitted input in a regular expression.

Something along the lines of:

select some_col
from some_table
where some_col ~ ('^' || user_submitted_input || '\d*$')

This query is looking for every value matching the user submitted input with optional trailing decimal characters.

My problem is: the user can break this using special regular expression characters (ex: 'test(').

I'm looking for a way to escape regular expression characters from the input (maybe a function already exists for that
purpose?).

I could do it on the client side, but I'm not really sure how python and postgresql flavors of regular expressions
differ.

I've seen that the (?q) modifier is supported (treat the rest of the regexp as normal characters), but unless I missed
somethingit I can't use it anywhere else than at the beginning of the regexp. 

From the docs at
http://www.postgresql.org/docs/current/static/functions-matching.html:

"Embedded options take effect at the ) terminating the sequence. They can appear only at the start of an ARE."

Any idea on how to achieve this ?

Thank you.

--
Ronan Dunklau

------------------------------------------------------------------------------------------------

How about:

WHERE some_col LIKE (user_submitted_input || '%') AND some_col ~ ('^.{' || length_of_user_submitted_input || '}\d*$')

I'd have some reservations regarding multi-byte characters however - but this avoids any escaping of the input string.

You could (should?) write the escaping routine on the server side in a user-defined function:

WHERE some_col ~ ('^' || make_regexp_literal(user_submitted_stringliteral) || '\d*$')

David J.



Re: Regular expression character escape

От
Ronan Dunklau
Дата:
On 24/02/2012 16:38, David Johnston wrote:

> How about:
>
> WHERE some_col LIKE (user_submitted_input || '%') AND some_col ~ ('^.{' || length_of_user_submitted_input || '}\d*$')
>
> I'd have some reservations regarding multi-byte characters however - but this avoids any escaping of the input
string.

That's a clever trick, I might end using it.

> You could (should?) write the escaping routine on the server side in a user-defined function:
>
> WHERE some_col ~ ('^' || make_regexp_literal(user_submitted_stringliteral) || '\d*$')

I totally agree, but I hoped I could use an already existing function
without having to read the whole spec to figure what should and should
not be escaped.


> David J.
>
>



--
Ronan Dunklau

Re: Regular expression character escape

От
Heiko Wundram
Дата:
Am 24.02.2012 17:04, schrieb Ronan Dunklau:
> On 24/02/2012 16:38, David Johnston wrote:
>> You could (should?) write the escaping routine on the server side in a user-defined function:
>>
>> WHERE some_col ~ ('^' || make_regexp_literal(user_submitted_stringliteral) || '\d*$')
>
> I totally agree, but I hoped I could use an already existing function
> without having to read the whole spec to figure what should and should
> not be escaped.

Use the corresponding function of your programming language/framework of
choice. E.g. Python delivers this as re.escape().

--
--- Heiko.

Re: Regular expression character escape

От
Ronan Dunklau
Дата:
On 24/02/2012 17:09, Heiko Wundram wrote:
> Am 24.02.2012 17:04, schrieb Ronan Dunklau:
>> On 24/02/2012 16:38, David Johnston wrote:
>>> You could (should?) write the escaping routine on the server side in
>>> a user-defined function:
>>>
>>> WHERE some_col ~ ('^' ||
>>> make_regexp_literal(user_submitted_stringliteral) || '\d*$')
>>
>> I totally agree, but I hoped I could use an already existing function
>> without having to read the whole spec to figure what should and should
>> not be escaped.
>
> Use the corresponding function of your programming language/framework of
> choice. E.g. Python delivers this as re.escape().

Thank you, but as I wrote in the original post, I don't know how
postgresql and python differ in their regexp syntax. Specifically, I
know that re.escape escapes any non-alphanumeric character, including
accented letters.

If you have any evidence proving that everything will work fine with
re.escape, I'll be more than happy to use it.

--
Ronan Dunklau

Re: Regular expression character escape

От
Heiko Wundram
Дата:
Am 24.02.2012 17:40, schrieb Ronan Dunklau:
> On 24/02/2012 17:09, Heiko Wundram wrote:
>> Use the corresponding function of your programming language/framework of
>> choice. E.g. Python delivers this as re.escape().
>
> Thank you, but as I wrote in the original post, I don't know how
> postgresql and python differ in their regexp syntax. Specifically, I
> know that re.escape escapes any non-alphanumeric character, including
> accented letters.
>
> If you have any evidence proving that everything will work fine with
> re.escape, I'll be more than happy to use it.

This will work (AFAICT, _and_ as far as I've used it): the regex-syntax
of Python is a superset of the regex-syntax of PostgreSQL (both are
based on PCRE, not on the library, but the syntax), and as such you'll
possibly do "too much" quoting, but never too little when simply using
the Python-builtin.

--
--- Heiko.

Re: Regular expression character escape

От
Ronan Dunklau
Дата:
On 24/02/2012 17:43, Heiko Wundram wrote:
> Am 24.02.2012 17:40, schrieb Ronan Dunklau:
>> On 24/02/2012 17:09, Heiko Wundram wrote:
>>> Use the corresponding function of your programming language/framework of
>>> choice. E.g. Python delivers this as re.escape().
>>
>> Thank you, but as I wrote in the original post, I don't know how
>> postgresql and python differ in their regexp syntax. Specifically, I
>> know that re.escape escapes any non-alphanumeric character, including
>> accented letters.
>>
>> If you have any evidence proving that everything will work fine with
>> re.escape, I'll be more than happy to use it.
>
> This will work (AFAICT, _and_ as far as I've used it): the regex-syntax
> of Python is a superset of the regex-syntax of PostgreSQL (both are
> based on PCRE, not on the library, but the syntax), and as such you'll
> possibly do "too much" quoting, but never too little when simply using
> the Python-builtin.

Unfortunately for my use case, "too much" quoting can lead to errors in
postgresql.

I just tested it using the following code:

postgres=# create or replace function escape_re(atext varchar) returns
varchar as $$
  import re
  return re.escape(atext.decode('utf8'))
$$ language plpythonu;

postgres=# select escape_re('testé');
 escape_re
-----------
 test\é

postgres=# select 'testé' ~ escape_re('testé');
ERROR:  invalid regular expression: invalid escape \ sequence

--
Ronan Dunklau

Re: Regular expression character escape

От
Tom Lane
Дата:
Ronan Dunklau <rdunklau@gmail.com> writes:
> Unfortunately for my use case, "too much" quoting can lead to errors in
> postgresql.

AFAIR, the only stuff that's unsafe to insert a backslash before
is ASCII letters.  This is documented in the fine print discussing
regular expressions, btw.

            regards, tom lane