Обсуждение: Slow SQL lookup due to every field being listed in SORT KEY

Поиск
Список
Период
Сортировка

Slow SQL lookup due to every field being listed in SORT KEY

От
Mason Harding
Дата:
Hi all.  I Have the following query (tested in postgres 8.4 and 9.0rc1)

SELECT distinct event0_.*
FROM event event0_ inner join account account1_ on event0_.account_id_owner=account1_.account_id
LEFT OUTER JOIN friend friendcoll2_ ON account1_.account_id=friendcoll2_.friend_account_id
WHERE (event0_.account_id_owner=2 or friendcoll2_.account_id=2
    AND friendcoll2_.status=2 AND (event0_.is_recomended is null OR event0_.is_recomended=false))
ORDER BY event0_.event_id DESC LIMIT 25

None of the tables listed here have more than a couple of thousand rows, and are all indexed.  If I run that query as is, it will take up to 5 seconds, if I remove the ORDER BY and LIMIT, it will run into about 200 ms.

Bellow is the output from SET enable_seqscan = off;EXPLAIN ANALYZE VERBOSE.  On Postgresql 9.0 this takes 2.3 seconds, on 8.4 it takes 4-5 seconds.  What I am noticing is that the Sort Key contains every row in event, not just event_id.  This seems to be causing the External Disk Merge.  This will use a memory merge if I have work_mem set to less than 30MB. If I set the SELECT to be SELECT distinct event0_.event_id, it will take about 19ms, but I need all rows returned.

Thanks all,
Mason

 Limit  (cost=32124.36..32125.55 rows=25 width=164) (actual time=2233.473..2301.552 rows=25 loops=1)
   Output: event0_.event_id, event0_.account_id_owner, event0_.event_name, event0_.account_id_remote, event0_.path_id, event0_.actual_user_workout_routine, event0_.user_workout_goal_id, event0_.cdate, event0
_.ctime, event0_.calories_burnt, event0_.distance_meters, event0_.duration_seconds, event0_.path_name, event0_.routine_name, event0_.dtype, event0_.item, event0_.is_deleted, event0_.is_recomended
   ->  Unique  (cost=32124.36..32128.26 rows=82 width=164) (actual time=2233.471..2301.544 rows=25 loops=1)
         Output: event0_.event_id, event0_.account_id_owner, event0_.event_name, event0_.account_id_remote, event0_.path_id, event0_.actual_user_workout_routine, event0_.user_workout_goal_id, event0_.cdate,
event0_.ctime, event0_.calories_burnt, event0_.distance_meters, event0_.duration_seconds, event0_.path_name, event0_.routine_name, event0_.dtype, event0_.item, event0_.is_deleted, event0_.is_recomended
         ->  Sort  (cost=32124.36..32124.57 rows=82 width=164) (actual time=2233.470..2299.043 rows=4435 loops=1)
               Output: event0_.event_id, event0_.account_id_owner, event0_.event_name, event0_.account_id_remote, event0_.path_id, event0_.actual_user_workout_routine, event0_.user_workout_goal_id, event0_.c
date, event0_.ctime, event0_.calories_burnt, event0_.distance_meters, event0_.duration_seconds, event0_.path_name, event0_.routine_name, event0_.dtype, event0_.item, event0_.is_deleted, event0_.is_recomended
               Sort Key: event0_.event_id, event0_.account_id_owner, event0_.event_name, event0_.account_id_remote, event0_.path_id, event0_.actual_user_workout_routine, event0_.user_workout_goal_id, event0_
.cdate, event0_.ctime, event0_.calories_burnt, event0_.distance_meters, event0_.duration_seconds, event0_.path_name, event0_.routine_name, event0_.dtype, event0_.item, event0_.is_deleted, event0_.is_recomend
ed
               Sort Method:  external merge  Disk: 6968kB
               ->  Merge Join  (cost=0.00..32121.75 rows=82 width=164) (actual time=0.105..197.393 rows=50895 loops=1)
                     Output: event0_.event_id, event0_.account_id_owner, event0_.event_name, event0_.account_id_remote, event0_.path_id, event0_.actual_user_workout_routine, event0_.user_workout_goal_id, eve
nt0_.cdate, event0_.ctime, event0_.calories_burnt, event0_.distance_meters, event0_.duration_seconds, event0_.path_name, event0_.routine_name, event0_.dtype, event0_.item, event0_.is_deleted, event0_.is_reco
mended
                     Merge Cond: (account1_.account_id = event0_.account_id_owner)
                     Join Filter: ((event0_.account_id_owner = 2) OR ((friendcoll2_.account_id = 2) AND (friendcoll2_.status = 2) AND ((event0_.is_recomended IS NULL) OR (NOT event0_.is_recomended))))
                     ->  Nested Loop Left Join  (cost=0.00..31843.58 rows=2155 width=10) (actual time=0.070..87.681 rows=3859 loops=1)
                           Output: account1_.account_id, friendcoll2_.account_id, friendcoll2_.status
                           ->  Index Scan using "AccountIDPKIndex" on public.account account1_  (cost=0.00..209.05 rows=1890 width=4) (actual time=0.025..0.981 rows=1890 loops=1)
                                 Output: account1_.account_id, account1_.user_name, account1_.password, account1_.account_type, account1_.is_active, account1_.is_quick_reg, account1_.name_last, account1_.nam
e_first, account1_.primary_image_url, account1_.ctime, account1_.cdate, account1_.email_address, account1_.address_street_1, account1_.address_street_2, account1_.address_city, account1_.address_state, accou
nt1_.address_zip_code_1, account1_.address_zip_code_2, account1_.date_of_birth, account1_.phone_home, account1_.phone_mobile, account1_.phone_buisness, account1_.phone_buisness_ext, account1_.lon, account1_.
lat, account1_.dtype, account1_.last_login_date, account1_.about_user_blurb, account1_.middle_initial, account1_.gender, account1_.address_country, account1_.weight_lbs, account1_.network_size, account1_.pri
mary_image_url_thumb, account1_.primary_image_url_small_thumb, account1_.is_activity_partner_listed, account1_.relationship_status, account1_.sec_profile_view, account1_.stats_build, account1_.stats_height_i
nches, account1_.stats_activity_level, account1_.list_profile_age, account1_.opt_in_third_party, account1_.opt_in_exclusive_offers, account1_.opt_in_new_features, account1_.lat_lon_is_current, account1_.woc_
no_of_entries, account1_.woc_weight_in_formula, account1_.woc_value_cardio, account1_.woc_value_strength, account1_.woc_value_body_sculpting, account1_.woc_value_body_flexibility, account1_.woc_value_weight_
management, account1_.woc_value_mental_vitality, account1_.woc_value_heart_health, account1_.woc_value_general_fitness, account1_.is_group, account1_.is_fivi_pro, account1_.is_group_open_invite, account1_.mi
ssion_statement, account1_.twitter_account_name, account1_.primary_photo_media_id, account1_.is_pro_listed, account1_.registered_on, account1_.is_password_autogenerated, account1_.is_set_password_hidden, acc
ount1_.is_notified_on_email_receipt, account1_.is_notified_on_friend_request
                           ->  Index Scan using "friendIdx1" on public.friend friendcoll2_  (cost=0.00..16.59 rows=12 width=10) (actual time=0.042..0.045 rows=1 loops=1890)
                                 Output: friendcoll2_.account_id, friendcoll2_.friend_account_id, friendcoll2_.cdate, friendcoll2_.ctime, friendcoll2_.status, friendcoll2_.friend_id
                                 Index Cond: (account1_.account_id = friendcoll2_.friend_account_id)
                     ->  Materialize  (cost=0.00..207.88 rows=2803 width=164) (actual time=0.024..26.091 rows=241058 loops=1)
                           Output: event0_.event_id, event0_.account_id_owner, event0_.event_name, event0_.account_id_remote, event0_.path_id, event0_.actual_user_workout_routine, event0_.user_workout_goal_i
d, event0_.cdate, event0_.ctime, event0_.calories_burnt, event0_.distance_meters, event0_.duration_seconds, event0_.path_name, event0_.routine_name, event0_.dtype, event0_.item, event0_.is_deleted, event0_.i
s_recomended
                           ->  Index Scan using "eventIdxTstdAccountIdOwner" on public.event event0_  (cost=0.00..200.88 rows=2803 width=164) (actual time=0.020..1.239 rows=2803 loops=1)
                                 Output: event0_.event_id, event0_.account_id_owner, event0_.event_name, event0_.account_id_remote, event0_.path_id, event0_.actual_user_workout_routine, event0_.user_workout_
goal_id, event0_.cdate, event0_.ctime, event0_.calories_burnt, event0_.distance_meters, event0_.duration_seconds, event0_.path_name, event0_.routine_name, event0_.dtype, event0_.item, event0_.is_deleted, eve
nt0_.is_recomended
 Total runtime: 2303.210 ms

Re: Slow SQL lookup due to every field being listed in SORT KEY

От
Stephen Frost
Дата:
* Mason Harding (mason.harding@gmail.com) wrote:
> Hi all.  I Have the following query (tested in postgres 8.4 and 9.0rc1)

Can you provide \d output from all the tables involved..?

Also, what does the query plan look like w/o 'enable_seqscan=off' (which
is not a good setting to use...)?  Increasing work_mem is often a good
idea if your system can afford it based on the number/kind of queries
running concurrently.  Note that you can also increase that setting for
just a single role, single session, or even single query.

    Thanks,

        Stephen

Вложения

Re: Slow SQL lookup due to every field being listed in SORT KEY

От
Tom Lane
Дата:
Mason Harding <mason.harding@gmail.com> writes:
> Hi all.  I Have the following query (tested in postgres 8.4 and 9.0rc1)

> SELECT distinct event0_.*
> FROM event event0_ inner join account account1_ on
> event0_.account_id_owner=account1_.account_id
> LEFT OUTER JOIN friend friendcoll2_ ON
> account1_.account_id=friendcoll2_.friend_account_id
> WHERE (event0_.account_id_owner=2 or friendcoll2_.account_id=2
>     AND friendcoll2_.status=2 AND (event0_.is_recomended is null OR
> event0_.is_recomended=false))
> ORDER BY event0_.event_id DESC LIMIT 25

> None of the tables listed here have more than a couple of thousand rows, and
> are all indexed.  If I run that query as is, it will take up to 5 seconds,
> if I remove the ORDER BY and LIMIT, it will run into about 200 ms.

The reason it's sorting by all the columns is the DISTINCT: that's
implemented by a sort-and-unique type of scheme so it has to be sure
that all the columns are sorted.  You didn't show the non-ORDER-BY
plan, but I suspect it's preferring a hash aggregation approach to
doing the DISTINCT if it doesn't have to produce sorted output.

The easiest way to make that query faster would be to raise work_mem
enough so that the sort doesn't have to spill to disk.

            regards, tom lane

Re: Slow SQL lookup due to every field being listed in SORT KEY

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> The reason it's sorting by all the columns is the DISTINCT

You might also verify that you actually need/*should* have the DISTINCT,
if it's included today..  Often developers put that in without
understanding why they're getting dups (which can often be due to
missing pieces from the JOIN clause or misunderstanding of the database
schema...).

    Stephen

Вложения

Re: Slow SQL lookup due to every field being listed in SORT KEY

От
Mason Harding
Дата:
Thanks all for your help.  I didn't really understand why it was sorting on every field, but it now makes sense.  What I ended up doing was replacing the

SELECT DISTINCT * FROM .... JOIN ... WHERE ... ORDER BY... LIMIT ...
with

SELECT * FROM ... WHERE id in (SELECT DISTINCT id FROM .... JOIN ... WHERE ... ) ORDER BY... LIMIT ...
This reduced the lookup time down to 19 ms, which is much faster than just upping the work_mem, as that still took 800ms

Thanks all,
Mason

On Fri, Sep 10, 2010 at 7:03 PM, Stephen Frost <sfrost@snowman.net> wrote:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> The reason it's sorting by all the columns is the DISTINCT

You might also verify that you actually need/*should* have the DISTINCT,
if it's included today..  Often developers put that in without
understanding why they're getting dups (which can often be due to
missing pieces from the JOIN clause or misunderstanding of the database
schema...).

       Stephen

-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.9 (GNU/Linux)

iEYEARECAAYFAkyK43kACgkQrzgMPqB3kihX4ACfVboO4jRzFO3hkckdHfrSeAgF
sysAnjmeoV7BA7uClEY8gXT4nEYhSx0u
=y556
-----END PGP SIGNATURE-----