Обсуждение: ORDER BY ... LIMIT and JOIN

От:
Fizu
Дата:

Hello,

I'm trying to optimize the follow query which returns the top users
ordered by ranking. I'll show you my schema and "explain analyze" for
each case.

So, i'm asking two things:

1) Why "ranking" index is not used in the second query when sorting.
2) Am i missing some obvious optimization like a missing index? :)

Schemas:

# \d ranking
            Table "public.ranking"
  Column   |         Type          | Modifiers
-----------+-----------------------+-----------
 ranking   | bigint                |
 score     | double precision      |
 username  | character varying(20) | not null
 variation | bigint                |
Indexes:
    "ranking_tmp_pkey1" PRIMARY KEY, btree (username)
    "idxrank_6057" btree (ranking) CLUSTER


# \d user
                                  Table "public.user"
   Column   |         Type          |                     Modifiers
------------+-----------------------+---------------------------------------------------
 id         | integer               | not null default
nextval('user_id_seq'::regclass)
 username   | character varying(20) | not null
 about      | text                  |
 name       | character varying(50) |
 photo      | text                  |
 country_id | integer               |
Indexes:
    "user_pkey" PRIMARY KEY, btree (username)
    "country_ranking_user_idx" btree (country_id)


Explain:

#  explain analyze SELECT * FROM "ranking" INNER JOIN "user" ON
("ranking"."username" = "user"."username") WHERE "user"."country_id" =
1  ORDER BY "ranking"."ranking" ASC LIMIT 100;

 QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=13.03..13.04 rows=1 width=180) (actual
time=965.229..965.302 rows=100 loops=1)
   ->  Sort  (cost=13.03..13.04 rows=1 width=180) (actual
time=965.227..965.256 rows=100 loops=1)
         Sort Key: ranking.ranking
         Sort Method:  top-N heapsort  Memory: 56kB
         ->  Nested Loop  (cost=0.00..13.02 rows=1 width=180) (actual
time=0.049..900.847 rows=57309 loops=1)
               ->  Index Scan using country_ranking_user_idx on "user"
 (cost=0.00..6.49 rows=1 width=145) (actual time=0.023..57.633
rows=57309 loops=1)
                     Index Cond: (country_id = 1)
               ->  Index Scan using ranking_tmp_pkey1 on ranking
(cost=0.00..6.52 rows=1 width=35) (actual time=0.013..0.013 rows=1
loops=57309)
                     Index Cond: ((ranking.username)::text =
("user".username)::text)
 Total runtime: 965.412 ms
(10 rows)

# explain analyze SELECT * FROM "ranking" INNER JOIN "user" ON
("ranking"."username" = "user"."username") ORDER BY
"ranking"."ranking" ASC LIMIT 100;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..137.02 rows=100 width=180) (actual
time=0.056..1.973 rows=100 loops=1)
   ->  Nested Loop  (cost=0.00..3081316.65 rows=2248753 width=180)
(actual time=0.055..1.921 rows=100 loops=1)
         ->  Index Scan using idxrank_6057 on ranking
(cost=0.00..70735.73 rows=2248753 width=35) (actual time=0.021..0.076
rows=100 loops=1)
         ->  Index Scan using user_pkey on "user"  (cost=0.00..1.33
rows=1 width=145) (actual time=0.016..0.017 rows=1 loops=100)
               Index Cond: (("user".username)::text = (ranking.username)::text)
 Total runtime: 2.043 ms
(6 rows)


Thanks!
Fz

От:
Michael Andreen
Дата:

On Saturday 08 August 2009 08:02:47 Fizu wrote:
>                ->  Index Scan using country_ranking_user_idx on "user"
>  (cost=0.00..6.49 rows=1 width=145) (actual time=0.023..57.633
> rows=57309 loops=1)
>                      Index Cond: (country_id = 1)

The planner is expecting one user with country_id = 1, but instead there are
57309. Have you analyzed recently? Maybe increasing the statistics target will
help.

/Michael

От:
Fizu
Дата:

On Sat, Aug 8, 2009 at 2:09 PM, Michael Andreen<> wrote:
> The planner is expecting one user with country_id = 1, but instead there are
> 57309. Have you analyzed recently? Maybe increasing the statistics target will
> help.
>
> /Michael


Just after analyze user and ranking it still taking so long to order
by an indexed field.

# explain analyze SELECT * FROM "ranking" INNER JOIN "user" ON
("ranking"."username" = "user"."username") WHERE "user"."country_id" =
5 ORDER BY "ranking"."ranking" ASC LIMIT 100;

     QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=15340.13..15340.38 rows=100 width=178) (actual
time=4955.795..4955.865 rows=100 loops=1)
   ->  Sort  (cost=15340.13..15343.69 rows=1425 width=178) (actual
time=4955.794..4955.820 rows=100 loops=1)
         Sort Key: ranking.ranking
         Sort Method:  top-N heapsort  Memory: 56kB
         ->  Nested Loop  (cost=0.00..15285.67 rows=1425 width=178)
(actual time=20.951..4952.337 rows=1972 loops=1)
               ->  Index Scan using country_ranking_user_idx on "user"
 (cost=0.00..4807.25 rows=1710 width=143) (actual
time=20.923..4898.931 rows=1972 loops=1)
                     Index Cond: (country_id = 5)
               ->  Index Scan using ranking_tmp_pkey on ranking
(cost=0.00..6.12 rows=1 width=35) (actual time=0.024..0.025 rows=1
loops=1972)
                     Index Cond: ((ranking.username)::text =
("user".username)::text)
 Total runtime: 4955.974 ms
(10 rows)

# explain analyze SELECT * FROM "ranking" INNER JOIN "user" ON
("ranking"."username" = "user"."username") ORDER BY
"ranking"."ranking" ASC LIMIT 100;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..136.78 rows=100 width=178) (actual
time=0.058..1.870 rows=100 loops=1)
   ->  Nested Loop  (cost=0.00..3116910.51 rows=2278849 width=178)
(actual time=0.056..1.818 rows=100 loops=1)
         ->  Index Scan using idxrank_6224 on ranking
(cost=0.00..71682.17 rows=2278849 width=35) (actual time=0.022..0.065
rows=100 loops=1)
         ->  Index Scan using user_pkey on "user"  (cost=0.00..1.32
rows=1 width=143) (actual time=0.015..0.016 rows=1 loops=100)
               Index Cond: (("user".username)::text = (ranking.username)::text)
 Total runtime: 1.946 ms
(6 rows)


Thank you!
M

От:
Michael Andreen
Дата:

On Sunday 09 August 2009 21:26:08 Fizu wrote:
>                ->  Index Scan using country_ranking_user_idx on "user"
>  (cost=0.00..4807.25 rows=1710 width=143) (actual
> time=20.923..4898.931 rows=1972 loops=1)
>                      Index Cond: (country_id = 5)

The statistics looks good now, but almost all the time is still spent on
fetching users with country_id = 5. The actual ordering is only a tiny part of
the full cost. Why it takes time probably depends on your hardware in relation
to database size. I guess the database doesn't fit in ram? What settings have
you changed?

Clustering users on country_ranking_user_idx would probably help for this
specific case, but if it is a good idea depends on what other queries need to
be fast. If the table or indexes are bloated then clustering on any index or
doing reindex might do it.

/Michael

От:
Robert Haas
Дата:

On Sun, Aug 9, 2009 at 3:26 PM, Fizu<> wrote:
>               ->  Index Scan using country_ranking_user_idx on "user"
>  (cost=0.00..4807.25 rows=1710 width=143) (actual
> time=20.923..4898.931 rows=1972 loops=1)
>                     Index Cond: (country_id = 5)

An index scan that picks up 1972 rows is taking 5 seconds?  I think
there must be something wrong with this index.  Is it possible that
since you apparently weren't analyzing this database, that maybe you
didn't vacuum it either?  If so, you should probably do a VACUUM FULL
on your database and then a database-wide REINDEX, but at a minimum
you should try reindexing this particular index.

...Robert