Re: Hardware upgrade for a high-traffic database

Поиск
Список
Период
Сортировка
От Jason Coene
Тема Re: Hardware upgrade for a high-traffic database
Дата
Msg-id 200408120029.i7C0T30E002389@mail.gotfrag.com
обсуждение исходный текст
Ответ на Re: Hardware upgrade for a high-traffic database  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Hardware upgrade for a high-traffic database  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Hardware upgrade for a high-traffic database  (Pierre-Frédéric Caillaud<lists@boutiquenumerique.com>)
Список pgsql-performance
> You mean you are doing
>     SELECT ... WHERE userid = 42 ORDER BY timestamp DESC LIMIT 5;
> and hoping that separate indexes on userid and timestamp will get the
> job done?  They won't.  There are only two possible plans for this,
> neither very good: select all of user 42's posts and sort them, or
> scan timewise backwards through *all* posts looking for the last 5 from
> user 42.

Wow!  I did try the method you state below (including the WHERE restricted
column in the sort by, and creating a two-column index), and it did execute
much faster (even on odd userid's to avoid cached results as much as
possible).

We have a lot of:

SELECT whatever
    FROM ourtable
    WHERE field1 = X
    AND field2 = Y
    AND field3 = Z
    ORDER BY id DESC
    LIMIT 5

With indexes:

ourtable(id)
ourtable(field1, field2, field3)

Is it standard procedure with postgres to include any fields listed in WHERE
in the ORDER BY, and create a single index for only the ORDER BY fields (in
order of appearance, of course)?

>
> If you do this enough to justify a specialized index, I would suggest a
> two-column index on (userid, timestamp).  You will also need to tweak
> the query, because the planner is not quite smart enough to deduce that
> such an index is applicable to the given sort order:
>     SELECT ... WHERE userid = 42 ORDER BY userid DESC, timestamp DESC
> LIMIT 5;
> This should generate an index-scan-backwards plan that will execute nigh
> instantaneously, because it will only fetch the rows you really want.
>
> You might or might not be able to drop the separate indexes on userid
> and timestamp, depending on what other queries you might have that need
> them.  But you should be paying attention to what plans you are really
> getting (see EXPLAIN) rather than just assuming that some indexes chosen
> at random will do what you need.
>
>             regards, tom lane
>

We do many varied queries on nearly every table - our data is highly
relational, and we have a lot of indexes.  I thought the planner would pick
up the right index via constraints and not require them in ORDER BY...
EXPLAIN ANALYZE says that the indexes are being used, ala:

gf=# EXPLAIN ANALYZE SELECT id FROM comments WHERE userid = 51 ORDER BY
timestamp DESC LIMIT 5;
                                                                  QUERY PLAN
----------------------------------------------------------------------------
-------------------------------------------------------------------
 Limit  (cost=1608.43..1608.45 rows=5 width=8) (actual time=0.292..0.317
rows=5 loops=1)
   ->  Sort  (cost=1608.43..1609.45 rows=407 width=8) (actual
time=0.287..0.295 rows=5 loops=1)
         Sort Key: "timestamp"
         ->  Index Scan using comments_ix_userid on comments
(cost=0.00..1590.79 rows=407 width=8) (actual time=0.031..0.190 rows=35
loops=1)
               Index Cond: (userid = 51)
 Total runtime: 0.375 ms
(6 rows)

Is this the wrong procedure?  Your suggested syntax seems much more
efficient, but I don't quite understand exactly why, as PG is using our
existing indexes...

gf=# EXPLAIN ANALYZE SELECT id FROM comments WHERE userid = 51 ORDER BY
userid DESC, timestamp DESC LIMIT 5;

QUERY PLAN
----------------------------------------------------------------------------
----------------------------------------------------------------------------
----
 Limit  (cost=0.00..19.90 rows=5 width=12) (actual time=0.040..0.076 rows=5
loops=1)
   ->  Index Scan Backward using comments_ix_userid_timestamp on comments
(cost=0.00..1620.25 rows=407 width=12) (actual time=0.035..0.054 rows=5
loops=1)
         Index Cond: (userid = 51)
 Total runtime: 0.134 ms
(4 rows)

Note: This was done after adding an index on comments (userid, timestamp)

Regards,

Jason


В списке pgsql-performance по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: Buld Insert and Index use.
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Hardware upgrade for a high-traffic database