Обсуждение: Regex problem

Поиск
Список
Период
Сортировка

Regex problem

От
"Scott Marlowe"
Дата:
I'm usually ok at Regex stuff, but this one is driving me a bit crazy.

Here's a string in a single field.  I'm trying to grab the long db query bit.

---------------------------------------------------------------

 initial time: 0.0001058578491210
 After _request set time: 0.0001859664916992
 Before include modules time: 0.001070976257324
 Before session_start time: 0.003780841827392
 SessionHandler read() start time: 0.004056930541992
 SessionHandler read() query finished: SELECT * FROM sessions WHERE
session_id = 'f5ca5ec95965e8ac99ec9bc31eca84c6' time:
0.005122900009155
 After session start time: 0.005219936370849
 After create db time: 0.005784034729003
 before create new session time: 0.005914926528930
 session call constructor 1 time: 0.005953073501586
 session call constructor (org loaded) time: 0.008623838424682
 session call constructor (finished) time: 0.01247286796569LONG DB
QUERY (db1, 4.9376289844513): UPDATE force_session SET
last_used_timestamp = 'now'::timestamp WHERE orgid = 15723 AND
session_id = 'f5ca5ec95965e8ac99ec9bc31eca84c6New session created
time: 5.03999090194
 Session set up time: 5.040019989013
 Behavior loaded time: 5.040072917938
 Start of page body time: 5.129977941513
 End of page body time: 6.25822091102

---------------------------------------------------------

I'm using this substring to grab part of it:

select substring (notes from E'LONG DB QUERY.+time: [0-9]+.[0-9]+')
from table where id=1;

And that returns this:

LONG DB QUERY (db1, 4.9376289844513): UPDATE force_session SET
last_used_timestamp = 'now'::timestamp WHERE orgid = 15723 AND
session_id = 'f5ca5ec95965e8ac99ec9bc31eca84c6New session created
time: 5.03999090194
           : Session set up time: 5.040019989013
           : Behavior loaded time: 5.040072917938
           : Start of page body time: 5.129977941513
           : End of page body time: 6.25822091102

Which is not surprising.  It's greedy.  So, I turn off the greediness
of the first + with a ? and then I get this

select substring (notes from E'LONG DB QUERY.+?time: [0-9]+.[0-9]+')
from table where id=1;

LONG DB QUERY (db1, 4.9376289844513): UPDATE force_session SET
last_used_timestamp = 'now'::timestamp WHERE orgid = 15723 AND
session_id = 'f5ca5ec95965e8ac99ec9bc31eca84c6New session created
time: 5.0

Now, I'm pretty sure that with the [0-9]+.[0-9]+ I should be getting
5.03999090194 at the end.  I know the . is a regex match for one char
there.  There's only ever one number before it, but changing the . to
\. doesn't help either.

Any ideas?  I'm guessing some old hand at regex will look at it and
see what I'm doing wrong, but I'm not seeing it.

Re: Regex problem

От
Tom Lane
Дата:
"Scott Marlowe" <scott.marlowe@gmail.com> writes:
> ...Which is not surprising.  It's greedy.  So, I turn off the greediness
> of the first + with a ? and then I get this

> select substring (notes from E'LONG DB QUERY.+?time: [0-9]+.[0-9]+')
> from table where id=1;

> LONG DB QUERY (db1, 4.9376289844513): UPDATE force_session SET
> last_used_timestamp = 'now'::timestamp WHERE orgid = 15723 AND
> session_id = 'f5ca5ec95965e8ac99ec9bc31eca84c6New session created
> time: 5.0

> Now, I'm pretty sure that with the [0-9]+.[0-9]+ I should be getting
> 5.03999090194 at the end.

You're getting bit by the fact that the initial non-greedy quantifier
makes the entire regex non-greedy --- see rules in section 9.7.3.5:
http://developer.postgresql.org/pgdocs/postgres/functions-matching.html#POSIX-MATCHING-RULES

If you know that there will always be something after the first time
value, you could do something like

E'(LONG DB QUERY.+?time: [0-9]+\\.[0-9]+)[^0-9]'

to force the issue about how much the second and third quantifiers
match.

            regards, tom lane

Re: Regex problem

От
"Scott Marlowe"
Дата:
On Thu, Jul 10, 2008 at 1:22 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Scott Marlowe" <scott.marlowe@gmail.com> writes:
>> ...Which is not surprising.  It's greedy.  So, I turn off the greediness
>> of the first + with a ? and then I get this
>
>> select substring (notes from E'LONG DB QUERY.+?time: [0-9]+.[0-9]+')
>> from table where id=1;
>
>> LONG DB QUERY (db1, 4.9376289844513): UPDATE force_session SET
>> last_used_timestamp = 'now'::timestamp WHERE orgid = 15723 AND
>> session_id = 'f5ca5ec95965e8ac99ec9bc31eca84c6New session created
>> time: 5.0
>
>> Now, I'm pretty sure that with the [0-9]+.[0-9]+ I should be getting
>> 5.03999090194 at the end.
>
> You're getting bit by the fact that the initial non-greedy quantifier
> makes the entire regex non-greedy --- see rules in section 9.7.3.5:
> http://developer.postgresql.org/pgdocs/postgres/functions-matching.html#POSIX-MATCHING-RULES
>
> If you know that there will always be something after the first time
> value, you could do something like
>
> E'(LONG DB QUERY.+?time: [0-9]+\\.[0-9]+)[^0-9]'
>
> to force the issue about how much the second and third quantifiers
> match.

Thanks Tom, that's the exact answer I needed.  Now, it's back to the
bit mines...