Обсуждение: query time

Поиск
Список
Период
Сортировка

query time

От
WireSpot
Дата:
I have a table with about 200.000 entries. Among other things, it
contains an integer field I use as a timestamp, and a variable
character field I use for user names. Certain queries are taking too
long IMO. I'm trying this on both 7.4 and 8.0.

If I do a direct comparison (using =) on the user name field and I
sort by the numeric field, I get about 5 ms. If I do a LIKE on the
user name and I don't sort at all, I get about 5 ms too. But if I use
both LIKE on the user name and sorting on the timestamp, the time
jumps to 2000 ms.

I have indexes on both fields, but I remember reading only one of them
will be used when doing a query.

Is such a serious jump in query times normal or am I doing something wrong?

I'm attaching the explain output for all 3 cases.

Вложения

Re: query time

От
Richard Huxton
Дата:
WireSpot wrote:
> I have a table with about 200.000 entries. Among other things, it
> contains an integer field I use as a timestamp, and a variable
> character field I use for user names. Certain queries are taking too
> long IMO. I'm trying this on both 7.4 and 8.0.
>
> If I do a direct comparison (using =) on the user name field and I
> sort by the numeric field, I get about 5 ms. If I do a LIKE on the
> user name and I don't sort at all, I get about 5 ms too. But if I use
>  both LIKE on the user name and sorting on the timestamp, the time
> jumps to 2000 ms.

> explain analyze select * from log_entries where user_name like
> '%john_doe%' limit 100 offset 0;

This will not (and never will be able to) use an index on user_name.
Think about it, you'd need an index that ordered use_name so that
(john_doe, AAAAAjohn_doe, Zjohn_doe1234) were all next to each other.

If you anchor the search (LIKE 'john_doe%') and are using the C locale
then an index can be used (because it's the same as >='john_doe' AND
<'john_dof').

If you really need to do indexed searches anywhere in a text-field
you'll need to look at tsearch2 in the contrib/ directory. That lets you
build a full-text index, but it's really meant for documents rather than
user names.
--
   Richard Huxton
   Archonet Ltd

Re: query time

От
WireSpot
Дата:
On Wed, 02 Feb 2005 14:48:41 +0000, Richard Huxton <dev@archonet.com> wrote:
> Think about it, you'd need an index that ordered use_name so that
> (john_doe, AAAAAjohn_doe, Zjohn_doe1234) were all next to each other.
>
> If you anchor the search (LIKE 'john_doe%') and are using the C locale
> then an index can be used (because it's the same as >='john_doe' AND
> <'john_dof').

Unfortunately, all my cases are LIKE '%john_doe'. So I'm guessing I'll
never get an index.

How about the response time when doing both LIKE user_name and ORDER
BY timestamp? Why does it get blown out of the water like that, from 5
and 5 to 2000 ms? If a LIKE by itself takes 5 ms and an ORDER by
itself takes 5 ms... Doesn't it grab the results matching the LIKE and
the ORDER only those?

While we're on the subject of indexes, is there any way I can speed up
a SELECT DISTINCT user_name FROM log_entries? With 200.000 entries I
get like 46 seconds on this one.

I apologise if these things are already in the manual, I'm only now
getting used to it and I don't find some things right away.

Re: query time

От
Richard Huxton
Дата:
WireSpot wrote:
> On Wed, 02 Feb 2005 14:48:41 +0000, Richard Huxton <dev@archonet.com> wrote:
>
>>Think about it, you'd need an index that ordered use_name so that
>>(john_doe, AAAAAjohn_doe, Zjohn_doe1234) were all next to each other.
>>
>>If you anchor the search (LIKE 'john_doe%') and are using the C locale
>>then an index can be used (because it's the same as >='john_doe' AND
>><'john_dof').
>
>
> Unfortunately, all my cases are LIKE '%john_doe'. So I'm guessing I'll
> never get an index.

Well, if you want names *ending* in john_doe you can create a function
reverse() and then create a functional index on it. If you want
something in the middle, tough.

> How about the response time when doing both LIKE user_name and ORDER
> BY timestamp? Why does it get blown out of the water like that, from 5
> and 5 to 2000 ms? If a LIKE by itself takes 5 ms and an ORDER by
> itself takes 5 ms... Doesn't it grab the results matching the LIKE and
> the ORDER only those?

In your first '%john_doe%' example you had a LIMIT 100 without a sort.
So - the sequential scan started and when PG found the 100th match it
stopped. If you order by timestamp, it has to find all the matches, sort
them by timestamp and then discard the 101st onwards.

Now, the reason it doesn't use the timestamp index instead is that it
thinks that your LIKE is going to be very specific:
   (cost=0.00..8250.17 rows=1 width=175)
In actual fact, there isn't one matching row there are 15,481:
   (actual time=0.432..1051.036 rows=15481 loops=1)
I'm not sure that there's any way for PG to know how many matches there
are going to be - it keeps track of values, but there's no easy way to
model substrings of a column.

Are you sure you need to match user_name against a double-wildcarded
value? If you do, all I could suggest is perhaps limiting the search to
one day/week/whatever at a time, which might make the timestamp index
seem appealing.

> While we're on the subject of indexes, is there any way I can speed up
> a SELECT DISTINCT user_name FROM log_entries? With 200.000 entries I
> get like 46 seconds on this one.

Not sure there's a simple way to avoid a seq-scan of the whole table. PG
has good concurrency support with MVCC, but the downside is that the
indexes don't store whether something is visible or not. That means you
can't just look at the index to determine what values are visible to
your current transaction.

 > I apologise if these things are already in the manual, I'm only now
 > getting used to it and I don't find some things right away.

No problem - all sensible questions. Worth checking the mailing-list
archives too though.

--
   Richard Huxton
   Archonet Ltd