Re: Regular Expression INTRO [long] [OT]

Поиск
Список
Период
Сортировка
От will trillich
Тема Re: Regular Expression INTRO [long] [OT]
Дата
Msg-id 20020410163854.A5944@serensoft.com
обсуждение исходный текст
Ответ на Regular Expression INTRO [long] [OT]  (will trillich <will@serensoft.com>)
Список pgsql-general
On Tue, Apr 09, 2002 at 06:05:43PM -0600, will trillich wrote:
[snip]
> With LIKE, your search values can use the percent (%)
> to tell PostgreSQL that anything can occupy that spot
> -- one character, twelve symbols or zero digits -- and
> still satisfy the search.
>
> On Unix or Linux, this is basically the same as the
> ASTERISK (*) at the command line, when dealing with
> file names:
>
>     # list all files whose names begin with '.bash'
>     ls .bash*
>
>     # list all files containing 'out' anywhere in the name
>     ls *out*
>
>     # list all file names ending with '.pl'
>     ls *.pl
>
>     # list file starting with 'proj', ending with '.c'
>     ls proj*.c

i'm torn between leaving in a full set of shell-glob samples to
pair up with postgres-like samples, and using only one (as
recommended by Holger Klawitter -- thanks!)...

> With PostgreSQL using the LIKE operator, use the
> percent, instead:
>
>     -- list all customers within the 47610 postal code
>     SELECT * FROM cust WHERE zip LIKE '47610%';
>
>     -- display customers who have 'Corp' in their names
>     SELECT * FROM cust WHERE name LIKE '%Corp%';
>
>     -- show customers whose names end in 'LLC'
>     SELECT * FROM cust WHERE name LIKE '%LLC';
>
>     -- documents beginning with 'We', ending with 'rica'
>     SELECT * FROM doc WHERE contents LIKE 'We%rica';
>
> Wherever the '%' appears (using the LIKE operator)
> Postgres allows anything at all to match -- from a
> lengthy string of text, to one single character, to
> a zero-length string -- i.e. nothing at all.
>
>     ...ILIKE 'A%Z'
>     -- matches 'AZ' and 'A to Z' and 'ABC2468XYZ' and 'A@$*Z'

whoops -- this shoulda been LIKE, not ILIKE (at least not yet)

[snip]

> So what's with the BACKSLASH in front of the dot? Well,
> just as LIKE has the UNDERSCORE (_) to denote "any single
> character", REGULAR EXPRESSIONS use the DOT (.) for
> that very same purpose. So we have to "escape" the
> dot to alter its normal interpretation, using the.

using the what? um, "...using the backslash." oops!

[snip]

> And for very common cases, there are handy abbreviations:
>
>     'x{1,1}'  'x'    one (the default)
>     'r{0,1}'  'r?'   zero or one, i.e. 'x' is optional

should be "'r' is optional". (thanks, Holger)

>     'B{0,}'   'B*'   zero or more
>     'z{1,}'   'z+'   one or more

may as well flesh these out, to be parallel with the others:

    'B{0,}'   'B*'   zero or more B's
    'z{1,}'   'z+'   one or more z's

[snip]

> Note that we've prefixed the previous pattern with
>
>     '[^0-9]'
>
> because, within the square brackets of a character class,
> the CARAT (^) means "anything EXCEPT..."
>
> So now, 00000-0000 will NOT match; this is what we're
> after.

maybe i could be more clear, by saying:

    So now, our pattern will NOT match something like
    00000-0000, and since we don't want it to, we're making
    progress.

[snip]

> And, just as in the LIKE operator, unless regular expressions
> ARE anchored at the beginning of a field, you'll defeat any
> index you have for that field. Indexes help alphabetize by
> comparing the beginning of your fields to each other; unless
> you're looking at the beginning of the field, your index can't
> be used.

some of you asked if regexes are actually able to use the index.
well, here's the answer: if they're anchored at the front, YES!

    EXPLAIN
    repo=# explain
    repo-# select * from _table_1015197075 where str ~ '^f';
    NOTICE:  QUERY PLAN:

    Index Scan using _table_ix on _table_1015197075  (cost=0.00..2.01 rows=1 width=24)

    EXPLAIN
    repo=# explain
    repo-# select * from _table_1015197075 where str ~ 'f';
    NOTICE:  QUERY PLAN:

    Seq Scan on _table_1015197075  (cost=100000000.00..100000001.03 rows=1 width=24)

to be honest, i "SET ENABLE_SEQSCAN = false" and created a
quickie table with about fifty rows, with an index on the "str"
column.

with '^f' it DID use the index; without the carat, 'f', it
absolutely COULD NOT use the index... imagine looking for words
in the dictionary that /contain/ the letter 'f', right? (no
mystery here -- but whether the '^f' COULD use the index was the
question, of course -- and it did).

cool. (i suspected, but wasn't certain.)

[snip]

> There are ways to anchor your searches to word boundaries,
> as well -- not just beginning-of-field and end-of-field. See
> your documentation for details.

rewording is in order here -- how about:

    There are ways to anchor your searches to word boundaries,
    as well -- you're not restricted to testing only for
    beginning-of-field or end-of-field. (In fact, that's a more
    likely solution to the problem than what I devised here.)
    See your documentation for information on [[:<:]] and
    [[:>:]].

[snip]

> As for [1] email stuff, it can be (zero or more of):
>
>     '[\\-\\.]'
>
> hyphen or dot, followed by
>
>     '[a-z0-9_]'
>
> alphanumerics (or underscore)
>
>     '+'
>
> one or more times.

how about a little added explanation for clarity?

    This means an address (before the @domain.com) can be
    alphanumerics only, or if it contains a dot or hyphen that
    more alphanumerics must follow the dot or hyphen. So 'me'
    and 'my.self' and 'albert.einstein-newton-john.jr' are all
    acceptable to this part of the pattern.

> And as for [2] site stuff, it can be (one or more of):
>
>     '[\\-\\.]'
>
> hyphen or dot, followed by
>
>     '[a-z0-9_]'
>
> alphanumerics (or underscore)
>
>     '+'
>
> one or more times.

more clarity -- something like:

    So the site stuff means that the domain (@some.where) MUST
    contain at least one dot or hyphen, each of which must be
    followed by alphanumerics. These would NOT match:

        'me..you' -- no alphanumerics after first dot
        'oops-' -- no alphanumerics after hyphen
        '-eesh' -- need alphanumerics before hyphen

    To be strict, that last one WILL match the 'site stuff'
    fragment of the pattern, but it'll fail in the whole pattern,
    because after the '@' we require SOME alphanumerics.


feel free to post comments/feedback to pgsql-user...

--
"We will fight them on the beaches, we will fight them on the
sons of beaches" -- Miguel Churchill, Winston's bastard Mexican
brother.
--lifted from http://www.astray.com/acmemail/stable/documentation.xml


will@serensoft.com
http://sourceforge.net/projects/newbiedoc -- we need your brain!
http://www.dontUthink.com/ -- your brain needs us!

В списке pgsql-general по дате отправления:

Предыдущее
От: Richard Emberson
Дата:
Сообщение: refcursor returned by pl/psql to jdbc
Следующее
От: Damon Cokenias
Дата:
Сообщение: View INSERTs not returning OID in Postgres 7.2.1