Обсуждение: Huge table searching optimization
Hi, I have table with just on column named url (it's not real url,
just random string for testing purposes), type text. I have lots of
entries in it (it's dynamic, i add and remove them on the fly), 100
000 and more. I've created index on this table to optimize
"searching". I just want to test if some "url" is in in the table, so
i am using this request:
select url from test2 where url ~* '^URLVALUE\\s*$';
there's \\s* because of padding. Here is the analyze:
postgres=# explain analyze select url from test2 where url ~* '^zyxel\\s*$';
WARNING: nonstandard use of \\ in a string literal
LINE 1: ...plain analyze select url from test2 where url ~* '^zyxel\\s...
^
HINT: Use the escape string syntax for backslashes, e.g., E'\\'.
QUERY PLAN
-------------------------------------------------------------------------------------------------------
Seq Scan on test2 (cost=0.00..1726.00 rows=10 width=9) (actual
time=156.489..156.502 rows=1 loops=1)
Filter: (url ~* '^zyxel\\s*$'::text)
Total runtime: 156.538 ms
(3 rows)
It takes 156 ms, it's too much for my purposes, so i want to decrease
it. So what can I use for optimizing this request? Again, I just want
to test if "url" ("zyxel" in this examlpe) is in the table.
Some info:
version(): PostgreSQL 8.4.2 on i486-slackware-linux-gnu, compiled by
GCC gcc (GCC) 4.3.3, 32-bit
Ram: 500 MB
CPU: 2.6 Ghz (it's kvm virtualized, i don't know exact type, it's one core cpu)
Thank you.
On Mon, Apr 05, 2010 at 04:28:35PM +0200, Oliver Kindernay wrote:
> Hi, I have table with just on column named url (it's not real url,
> just random string for testing purposes), type text. I have lots of
> entries in it (it's dynamic, i add and remove them on the fly), 100
> 000 and more. I've created index on this table to optimize
> "searching". I just want to test if some "url" is in in the table, so
> i am using this request:
>
> select url from test2 where url ~* '^URLVALUE\\s*$';
>
> there's \\s* because of padding. Here is the analyze:
>
> postgres=# explain analyze select url from test2 where url ~* '^zyxel\\s*$';
> WARNING: nonstandard use of \\ in a string literal
> LINE 1: ...plain analyze select url from test2 where url ~* '^zyxel\\s...
> ^
> HINT: Use the escape string syntax for backslashes, e.g., E'\\'.
> QUERY PLAN
> -------------------------------------------------------------------------------------------------------
> Seq Scan on test2 (cost=0.00..1726.00 rows=10 width=9) (actual
> time=156.489..156.502 rows=1 loops=1)
> Filter: (url ~* '^zyxel\\s*$'::text)
> Total runtime: 156.538 ms
> (3 rows)
> It takes 156 ms, it's too much for my purposes, so i want to decrease
> it. So what can I use for optimizing this request? Again, I just want
> to test if "url" ("zyxel" in this examlpe) is in the table.
add trigger to remove spaces from end of string on insert and update,
and then use normal = operator.
Best regards,
depesz
--
Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/
jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
Hi,
On Monday 05 April 2010 16:28:35 Oliver Kindernay wrote:
> Hi, I have table with just on column named url (it's not real url,
> just random string for testing purposes), type text. I have lots of
> entries in it (it's dynamic, i add and remove them on the fly), 100
> 000 and more. I've created index on this table to optimize
> "searching". I just want to test if some "url" is in in the table, so
> i am using this request:
>
> select url from test2 where url ~* '^URLVALUE\\s*$';
>
> there's \\s* because of padding. Here is the analyze:
>
> postgres=# explain analyze select url from test2 where url ~*
> '^zyxel\\s*$'; WARNING: nonstandard use of \\ in a string literal
> LINE 1: ...plain analyze select url from test2 where url ~* '^zyxel\\s...
> ^
> HINT: Use the escape string syntax for backslashes, e.g., E'\\'.
> QUERY PLAN
> ---------------------------------------------------------------------------
> ---------------------------- Seq Scan on test2 (cost=0.00..1726.00 rows=10
> width=9) (actual
> time=156.489..156.502 rows=1 loops=1)
> Filter: (url ~* '^zyxel\\s*$'::text)
> Total runtime: 156.538 ms
> (3 rows)
>
> It takes 156 ms, it's too much for my purposes, so i want to decrease
> it. So what can I use for optimizing this request? Again, I just want
> to test if "url" ("zyxel" in this examlpe) is in the table.
>
Depending on your locale it might be sensible to create a text_pattern_ops
index - see the following link:
http://www.postgresql.org/docs/current/static/indexes-opclass.html
Like suggested by depesz it would be far better to remove the padding and do
exact lookups though.
Andres
Andres Freund <andres@anarazel.de> writes: > On Monday 05 April 2010 16:28:35 Oliver Kindernay wrote: >> i am using this request: >> select url from test2 where url ~* '^URLVALUE\\s*$'; > Depending on your locale it might be sensible to create a text_pattern_ops > index - see the following link: > http://www.postgresql.org/docs/current/static/indexes-opclass.html text_pattern_ops won't help for a case-insensitive search. The best bet here would be to index on a case-folded, blank-removed version of the url, viz create index ... on (normalize(url)) select ... where normalize(url) = normalize('URLVALUE') where normalize() is a suitably defined function. Or if it's okay to only store the normalized form of the string, you could simplify that a bit. regards, tom lane
Thanks to all, now it is 0.061 ms :) 2010/4/5 Tom Lane <tgl@sss.pgh.pa.us>: > Andres Freund <andres@anarazel.de> writes: >> On Monday 05 April 2010 16:28:35 Oliver Kindernay wrote: >>> i am using this request: >>> select url from test2 where url ~* '^URLVALUE\\s*$'; > >> Depending on your locale it might be sensible to create a text_pattern_ops >> index - see the following link: >> http://www.postgresql.org/docs/current/static/indexes-opclass.html > > text_pattern_ops won't help for a case-insensitive search. The best bet > here would be to index on a case-folded, blank-removed version of the > url, viz > > create index ... on (normalize(url)) > > select ... where normalize(url) = normalize('URLVALUE') > > where normalize() is a suitably defined function. > > Or if it's okay to only store the normalized form of the string, > you could simplify that a bit. > > regards, tom lane >