Обсуждение: regexp idea

Поиск
Список
Период
Сортировка

regexp idea

От
AI Rumman
Дата:
Hi,

I have a string like:
Gloucester Catholic vs. St. Augustine baseball, South Jersey Non-Public A final, June 5, 2013

I need to extract date part from the string.

I used the follows:
regexp_matches(title,'[.* ]+\ (Jul|August|Sep)[, a-zA-Z0-9]+' )

But it gives me result August as it stops at "Augustine".

In my case, date can be in different formats, some record may use "," or some may not.

Any idea to achieve this?

Thanks.

Re: regexp idea

От
David Johnston
Дата:
rummandba wrote
> Hi,
>
> I have a string like:
> Gloucester Catholic vs. St. Augustine baseball, South Jersey Non-Public A
> final, June 5, 2013
>
> I need to extract date part from the string.
>
> I used the follows:
> regexp_matches(title,'[.* ]+\ (Jul|August|Sep)[, a-zA-Z0-9]+' )
>
> But it gives me result August as it stops at "Augustine".
>
> In my case, date can be in different formats, some record may use "," or
> some may not.
>
> Any idea to achieve this?
>
> Thanks.

Not sure how you expect to match "June" with that particular expression but
to solve the mis-matching of "Augustine" you can use the word-boundary
escapes "\m" (word-start) and "\M" (word-end).

Unless you need fuzzy matching on the month name you should simply list all
twelve months and possible recognized abbreviations as well.

^.*\m(June|July|August|September)\M[, a-zA-Z0-9]+

I'd consider helping more with forming an actual expression but a single
input sample with zero context on how such a string is created gives little
to work with.

Though after the month there likely cannot be a letter so a better
definition would be:

\m(August)[, ]+(\d)+[, ]+(\d+)

HTH

David J.




--
View this message in context: http://postgresql.1045698.n5.nabble.com/regexp-idea-tp5768725p5768731.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: regexp idea

От
Rob Sargent
Дата:
On 08/27/2013 12:44 PM, AI Rumman wrote:
> Hi,
>
> I have a string like:
> Gloucester Catholic vs. St. Augustine baseball, South Jersey
> Non-Public A final, June 5, 2013
>
> I need to extract date part from the string.
>
> I used the follows:
> regexp_matches(title,'[.* ]+\ (Jul|August|Sep)[, a-zA-Z0-9]+' )
>
> But it gives me result August as it stops at "Augustine".
>
> In my case, date can be in different formats, some record may use ","
> or some may not.
>
> Any idea to achieve this?
>
> Thanks.
select regexp_replace('Gloucester Catholic vs. St. Augustine baseball,
South Jersey Non-Public A final, June 5, 2013',
E'(^.*)(\\m(June|July|August|Sep))([, a-zA-Z0-9]+)', E'\\2 \\4' );
+----------------+
| regexp_replace |
+----------------+
| June  5, 2013  |
+----------------+
(1 row)



Re: regexp idea

От
AI Rumman
Дата:
Thanks. That's awesome.
Do you have any good guide where I may get more knowledge on REGEXP?


On Tue, Aug 27, 2013 at 3:57 PM, Rob Sargent <robjsargent@gmail.com> wrote:
On 08/27/2013 12:44 PM, AI Rumman wrote:
Hi,

I have a string like:
Gloucester Catholic vs. St. Augustine baseball, South Jersey Non-Public A final, June 5, 2013

I need to extract date part from the string.

I used the follows:
regexp_matches(title,'[.* ]+\ (Jul|August|Sep)[, a-zA-Z0-9]+' )

But it gives me result August as it stops at "Augustine".

In my case, date can be in different formats, some record may use "," or some may not.

Any idea to achieve this?

Thanks.
select regexp_replace('Gloucester Catholic vs. St. Augustine baseball, South Jersey Non-Public A final, June 5, 2013', E'(^.*)(\\m(June|July|August|Sep))([, a-zA-Z0-9]+)', E'\\2 \\4' );
+----------------+
| regexp_replace |
+----------------+
| June  5, 2013  |
+----------------+
(1 row)




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: regexp idea

От
Rob Sargent
Дата:
On 08/27/2013 02:04 PM, AI Rumman wrote:
Thanks. That's awesome.
Do you have any good guide where I may get more knowledge on REGEXP?


On Tue, Aug 27, 2013 at 3:57 PM, Rob Sargent <robjsargent@gmail.com> wrote:
On 08/27/2013 12:44 PM, AI Rumman wrote:
Hi,

I have a string like:
Gloucester Catholic vs. St. Augustine baseball, South Jersey Non-Public A final, June 5, 2013

I need to extract date part from the string.

I used the follows:
regexp_matches(title,'[.* ]+\ (Jul|August|Sep)[, a-zA-Z0-9]+' )

But it gives me result August as it stops at "Augustine".

In my case, date can be in different formats, some record may use "," or some may not.

Any idea to achieve this?

Thanks.
select regexp_replace('Gloucester Catholic vs. St. Augustine baseball, South Jersey Non-Public A final, June 5, 2013', E'(^.*)(\\m(June|July|August|Sep))([, a-zA-Z0-9]+)', E'\\2 \\4' );
+----------------+
| regexp_replace |
+----------------+
| June  5, 2013  |
+----------------+
(1 row)




--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

http://www.postgresql.org/docs/9.1/static/functions-matching.html is the best starting point

Re: regexp idea

От
Alvaro Herrera
Дата:
AI Rumman escribió:
> Thanks. That's awesome.
> Do you have any good guide where I may get more knowledge on REGEXP?

This book is awesome:
http://regex.info/book.html

--
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services