Обсуждение: Partial index with regexp not working

Поиск
Список
Период
Сортировка

Partial index with regexp not working

От
"Phoenix Kiula"
Дата:
I have a varchar ID field which captures a user account. If there is
no user id, then we just store the IP address of the user. Realizing
that we would like to index only those tuples where the user ID is not
an IP, I am trying to create a partial index as follows:

CREATE INDEX idx_trades_tid_partial ON trades (trader_id)
    WHERE trader_id ~ '[a-z]' ;

Any "trader_id" that contains an alphabet letter will be registered
user, as IPs only have numbers and dots. But when I do an EXPLAIN
ANALYZE on a query which has a where condition as follows:

   WHERE trader_id = 'johndoe'

It is not using this index at all! It is using no index in fact, it's
trying to do a sequential scan. Any ideas why this partial index is
not working??

TIA

Re: Partial index with regexp not working

От
Richard Huxton
Дата:
Phoenix Kiula wrote:
> CREATE INDEX idx_trades_tid_partial ON trades (trader_id)
>     WHERE trader_id ~ '[a-z]' ;

>    WHERE trader_id = 'johndoe'
>
> It is not using this index at all! It is using no index in fact, it's
> trying to do a sequential scan. Any ideas why this partial index is
> not working??

Why would it? There's no magic that lets PG know one arbitrary condition
correlates with another.

A partial index will only be considered if you test for its condition:

SELECT ... WHERE trader_id = 'johndoe' AND trader_id ~ '[a-z]'

For searching for specific IDs I wouldn't expect a partial index to be
much better than a full index - unless you have a *lot* of IP addresses
and hardly any names.

--
   Richard Huxton
   Archonet Ltd

Re: Partial index with regexp not working

От
Richard Broersma Jr
Дата:
--- Richard Huxton <dev@archonet.com> wrote:
> Phoenix Kiula wrote:
> > CREATE INDEX idx_trades_tid_partial ON trades (trader_id)
> >     WHERE trader_id ~ '[a-z]' ;
>
> >    WHERE trader_id = 'johndoe'
> >
> > It is not using this index at all! It is using no index in fact, it's
> > trying to do a sequential scan. Any ideas why this partial index is
> > not working??
> A partial index will only be considered if you test for its condition:
>
> SELECT ... WHERE trader_id = 'johndoe' AND trader_id ~ '[a-z]'

IIRC, for any index like this to work, doesn't the REGEXP need to be anchored to either the start
or end of the string?

Regards,
Richard Broersma Jr.

Re: Partial index with regexp not working

От
Richard Huxton
Дата:
Richard Broersma Jr wrote:
> --- Richard Huxton <dev@archonet.com> wrote:
>> Phoenix Kiula wrote:
>>> CREATE INDEX idx_trades_tid_partial ON trades (trader_id) WHERE
>>> trader_id ~ '[a-z]' ; WHERE trader_id = 'johndoe'
>>>
>>> It is not using this index at all! It is using no index in fact,
>>> it's trying to do a sequential scan. Any ideas why this partial
>>> index is not working??
>> A partial index will only be considered if you test for its
>> condition:
>>
>> SELECT ... WHERE trader_id = 'johndoe' AND trader_id ~ '[a-z]'
>
> IIRC, for any index like this to work, doesn't the REGEXP need to be
> anchored to either the start or end of the string?

That's true if you are trying to match an expression to the index. For
example:
   SELECT * FROM foo WHERE bar LIKE '%whatever%'
A btree index can't help here for the same reason you can't find someone
in a phone-book by their first name.
   SELECT * FROM foo WHERE bar LIKE 'whatever%'
This *can* use an index, but only if you are in "C" locale or have set
up text/varchar_pattern_ops appropriately. Then it gets converted into
 >= 'whatever' < 'whateves'.

Now in Phoenix's example the regexp is just being used to specify what
values the index covers. A more common example might be:
   CREATE INDEX inv_unpaid_idx ON invoices (client_id) WHERE NOT paid;
This indexes client_id but only for those invoices that haven't been
paid. Useful for a late-debtors report perhaps if you have lots of
invoices but 99% have been paid already.

The planner isn't smart enough to figure out which queries can use this
index by examining them, it just looks for (NOT paid) in the WHERE
clause and if it doesn't find it, ignores the index.

--
   Richard Huxton
   Archonet Ltd

Re: Partial index with regexp not working

От
Tom Lane
Дата:
Richard Huxton <dev@archonet.com> writes:
> The planner isn't smart enough to figure out which queries can use this
> index by examining them, it just looks for (NOT paid) in the WHERE
> clause and if it doesn't find it, ignores the index.

Well, it's a little bit brighter than that: it has some smarts about
btree-indexable comparisons and about null-testing.  For instance, it
can figure out that "x > 3" implies "x > 0", and so a query WHERE x > 3
could use a partial index WHERE x > 0.  Also, assuming that the >
operator is strict, it would recognize that WHERE x IS NOT NULL is
implied.

But there's certainly not anything in there that can make inferences
about regex matches.

            regards, tom lane

Re: Partial index with regexp not working

От
"Phoenix Kiula"
Дата:
On 11/09/2007, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Richard Huxton <dev@archonet.com> writes:
> > The planner isn't smart enough to figure out which queries can use this
> > index by examining them, it just looks for (NOT paid) in the WHERE
> > clause and if it doesn't find it, ignores the index.
>
> Well, it's a little bit brighter than that: it has some smarts about
> btree-indexable comparisons and about null-testing.  For instance, it
> can figure out that "x > 3" implies "x > 0", and so a query WHERE x > 3
> could use a partial index WHERE x > 0.  Also, assuming that the >
> operator is strict, it would recognize that WHERE x IS NOT NULL is
> implied.
>
> But there's certainly not anything in there that can make inferences
> about regex matches.



The suggestion in this thread that a regex index will come into play
only when the WHERE condition specifically mentions it was indeed the
key for me.

In my case, the ratio of alphanumeric values for a column to values
that are just plain IP addresses (numeric-plus-dots) is about 1:15, so
an index on the alphanumeric is a HUGE benefit.

This is a very smart difference from the world of the other major open
source database, so I'd say the planner is bright enough even when it
comes to regex.

Many thanks

Re: Partial index with regexp not working

От
"Phoenix Kiula"
Дата:
On 11/09/2007, Phoenix Kiula <phoenix.kiula@gmail.com> wrote:
>
> The suggestion in this thread that a regex index will come into play
> only when the WHERE condition specifically mentions it was indeed the
> key for me.



Ok, I've hit a snag about this index. I think it's to do with how my
regex is structured. Basically this column can have either IP
addresses, or alphanumeric user IDs. If it is not an IP address, it is
a registered user ID. What is the best way of ascertaining that a
column value is *not* an IP address?

I tried this:

select * from trader where trader_id !~ '[0-9]+\.[0-9]+\.[0-9]+\.';

And this works, but I wonder if a partial index on a negative
condition ("!~") will be slower than a positive condition?

Re: Partial index with regexp not working

От
Richard Huxton
Дата:
Phoenix Kiula wrote:
>
> Ok, I've hit a snag about this index. I think it's to do with how my
> regex is structured. Basically this column can have either IP
> addresses, or alphanumeric user IDs. If it is not an IP address, it is
> a registered user ID. What is the best way of ascertaining that a
> column value is *not* an IP address?
>
> I tried this:
>
> select * from trader where trader_id !~ '[0-9]+\.[0-9]+\.[0-9]+\.';
>
> And this works, but I wonder if a partial index on a negative
> condition ("!~") will be slower than a positive condition?

To be honest, I'd probably just have a separate column "uid_type", set
it when creating the user and then just have a partial index WHERE
uid_type='IP'

--
   Richard Huxton
   Archonet Ltd

Re: Partial index with regexp not working

От
Jorge Godoy
Дата:
On Wednesday 12 September 2007 09:34:55 Richard Huxton wrote:

> To be honest, I'd probably just have a separate column "uid_type", set
> it when creating the user and then just have a partial index WHERE
> uid_type='IP'

Or have a separate column with the user ID and have "(anonymous)" when you
have the IP address on the table (maybe even NULL, it all depends on what you
want).


--
Jorge Godoy      <jgodoy@gmail.com>