Обсуждение: Huge table searching optimization

От:
Oliver Kindernay
Дата:

Hi, I have table with just on column named url (it's not real url,
just random string for testing purposes), type text. I have lots of
entries in it (it's dynamic, i add and remove them on the fly), 100
000 and more. I've created index on this table to optimize
"searching". I just want to test if some "url" is in in the table, so
i am using this request:

select url from test2 where url ~* '^URLVALUE\\s*$';

there's \\s* because of padding. Here is the analyze:

postgres=# explain analyze select url from test2  where url ~* '^zyxel\\s*$';
WARNING:  nonstandard use of \\ in a string literal
LINE 1: ...plain analyze select url from test2  where url ~* '^zyxel\\s...
                                                             ^
HINT:  Use the escape string syntax for backslashes, e.g., E'\\'.
                                              QUERY PLAN
-------------------------------------------------------------------------------------------------------
 Seq Scan on test2  (cost=0.00..1726.00 rows=10 width=9) (actual
time=156.489..156.502 rows=1 loops=1)
   Filter: (url ~* '^zyxel\\s*$'::text)
 Total runtime: 156.538 ms
(3 rows)

It takes 156 ms, it's too much for my purposes, so i want to decrease
it. So what can I use for optimizing this request? Again, I just want
to test if "url" ("zyxel" in this examlpe) is in the table.

Some info:

version():  PostgreSQL 8.4.2 on i486-slackware-linux-gnu, compiled by
GCC gcc (GCC) 4.3.3, 32-bit
Ram: 500 MB
CPU: 2.6 Ghz (it's kvm virtualized, i don't know exact type, it's one core cpu)

Thank you.

От:
hubert depesz lubaczewski
Дата:

On Mon, Apr 05, 2010 at 04:28:35PM +0200, Oliver Kindernay wrote:
> Hi, I have table with just on column named url (it's not real url,
> just random string for testing purposes), type text. I have lots of
> entries in it (it's dynamic, i add and remove them on the fly), 100
> 000 and more. I've created index on this table to optimize
> "searching". I just want to test if some "url" is in in the table, so
> i am using this request:
>
> select url from test2 where url ~* '^URLVALUE\\s*$';
>
> there's \\s* because of padding. Here is the analyze:
>
> postgres=# explain analyze select url from test2  where url ~* '^zyxel\\s*$';
> WARNING:  nonstandard use of \\ in a string literal
> LINE 1: ...plain analyze select url from test2  where url ~* '^zyxel\\s...
>                                                              ^
> HINT:  Use the escape string syntax for backslashes, e.g., E'\\'.
>                                               QUERY PLAN
> -------------------------------------------------------------------------------------------------------
>  Seq Scan on test2  (cost=0.00..1726.00 rows=10 width=9) (actual
> time=156.489..156.502 rows=1 loops=1)
>    Filter: (url ~* '^zyxel\\s*$'::text)
>  Total runtime: 156.538 ms
> (3 rows)
> It takes 156 ms, it's too much for my purposes, so i want to decrease
> it. So what can I use for optimizing this request? Again, I just want
> to test if "url" ("zyxel" in this examlpe) is in the table.

add trigger to remove spaces from end of string on insert and update,
and then use normal = operator.

Best regards,

depesz

--
Linkedin: http://www.linkedin.com/in/depesz  /  blog: http://www.depesz.com/
jid/gtalk:  / aim:depeszhdl / skype:depesz_hdl / gg:6749007

От:
Andres Freund
Дата:

Hi,

On Monday 05 April 2010 16:28:35 Oliver Kindernay wrote:
> Hi, I have table with just on column named url (it's not real url,
> just random string for testing purposes), type text. I have lots of
> entries in it (it's dynamic, i add and remove them on the fly), 100
> 000 and more. I've created index on this table to optimize
> "searching". I just want to test if some "url" is in in the table, so
> i am using this request:
>
> select url from test2 where url ~* '^URLVALUE\\s*$';
>
> there's \\s* because of padding. Here is the analyze:
>
> postgres=# explain analyze select url from test2  where url ~*
> '^zyxel\\s*$'; WARNING:  nonstandard use of \\ in a string literal
> LINE 1: ...plain analyze select url from test2  where url ~* '^zyxel\\s...
>                                                              ^
> HINT:  Use the escape string syntax for backslashes, e.g., E'\\'.
>                                               QUERY PLAN
> ---------------------------------------------------------------------------
> ---------------------------- Seq Scan on test2  (cost=0.00..1726.00 rows=10
> width=9) (actual
> time=156.489..156.502 rows=1 loops=1)
>    Filter: (url ~* '^zyxel\\s*$'::text)
>  Total runtime: 156.538 ms
> (3 rows)
>
> It takes 156 ms, it's too much for my purposes, so i want to decrease
> it. So what can I use for optimizing this request? Again, I just want
> to test if "url" ("zyxel" in this examlpe) is in the table.
>
Depending on your locale it might be sensible to create a text_pattern_ops
index - see the following link:
http://www.postgresql.org/docs/current/static/indexes-opclass.html

Like suggested by depesz it would be far better to remove the padding and do
exact lookups though.

Andres

От:
Tom Lane
Дата:

Andres Freund <> writes:
> On Monday 05 April 2010 16:28:35 Oliver Kindernay wrote:
>> i am using this request:
>> select url from test2 where url ~* '^URLVALUE\\s*$';

> Depending on your locale it might be sensible to create a text_pattern_ops
> index - see the following link:
> http://www.postgresql.org/docs/current/static/indexes-opclass.html

text_pattern_ops won't help for a case-insensitive search.  The best bet
here would be to index on a case-folded, blank-removed version of the
url, viz

    create index ... on (normalize(url))

    select ... where normalize(url) = normalize('URLVALUE')

where normalize() is a suitably defined function.

Or if it's okay to only store the normalized form of the string,
you could simplify that a bit.

            regards, tom lane

От:
Oliver Kindernay
Дата:

Thanks to all, now it is 0.061 ms :)

2010/4/5 Tom Lane <>:
> Andres Freund <> writes:
>> On Monday 05 April 2010 16:28:35 Oliver Kindernay wrote:
>>> i am using this request:
>>> select url from test2 where url ~* '^URLVALUE\\s*$';
>
>> Depending on your locale it might be sensible to create a text_pattern_ops
>> index - see the following link:
>> http://www.postgresql.org/docs/current/static/indexes-opclass.html
>
> text_pattern_ops won't help for a case-insensitive search.  The best bet
> here would be to index on a case-folded, blank-removed version of the
> url, viz
>
>        create index ... on (normalize(url))
>
>        select ... where normalize(url) = normalize('URLVALUE')
>
> where normalize() is a suitably defined function.
>
> Or if it's okay to only store the normalized form of the string,
> you could simplify that a bit.
>
>                        regards, tom lane
>