Обсуждение: weird query plan

Поиск
Список
Период
Сортировка

weird query plan

От
weiping
Дата:
I have a table:
webdigest=# \d wd_urlusermaps
表 "public.wd_urlusermaps"
字段名 | 类型 | 修饰词
---------+-----------------------------+-------------------------------------------------------------
id | integer | not null default nextval('wd_urlusermaps_id_seq'::regclass)
urlid | integer | not null
tag | character varying(512) |
title | character varying(512) |
summary | character varying(1024) |
comment | character varying(1024) |
ctime | timestamp without time zone |
mtime | timestamp without time zone |
share | smallint |
userid | integer |
import | smallint | default 0
索引:
"wd_urlusermaps_pkey" PRIMARY KEY, btree (id) CLUSTER
"urlusermaps_urlid_userid" UNIQUE, btree (urlid, userid)
"urlusermaps_urlid" btree (urlid)
"urlusermaps_userid" btree (userid)
"wd_urlusermaps_ctime_idx" btree (ctime)
"wd_urlusermaps_share_idx" btree ("share")

and target statistic set to 1000, and two different query plan:

webdigest=# explain analyze select A.id as
fav_id,A.urlid,A.tag,A.title,A.summary,A.comment,A.ctime,A.share from
wd_urlusermaps A where share =1 and A.userid='219177' ORDER BY A.id DESC
limit 20 ;
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..4932.56 rows=20 width=96) (actual
time=730.461..2374.435 rows=20 loops=1)
-> Index Scan Backward using wd_urlusermaps_pkey on wd_urlusermaps a
(cost=0.00..269810.77 rows=1094 width=96) (actual time=730.456..2374.367
rows=20 loops=1)
Filter: (("share" = 1) AND (userid = 219177))
Total runtime: 2374.513 ms
(4 rows)

webdigest=# explain analyze select A.id as
fav_id,A.urlid,A.tag,A.title,A.summary,A.comment,A.ctime,A.share from
wd_urlusermaps A where share =1 and A.userid='219177' ORDER BY A.id DESC
limit 40 ;
QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=6805.77..6805.87 rows=40 width=96) (actual time=5.731..5.905
rows=40 loops=1)
-> Sort (cost=6805.77..6808.50 rows=1094 width=96) (actual
time=5.726..5.785 rows=40 loops=1)
Sort Key: id
-> Index Scan using urlusermaps_userid on wd_urlusermaps a
(cost=0.00..6750.55 rows=1094 width=96) (actual time=0.544..5.616
rows=41 loops=1)
Index Cond: (userid = 219177)
Filter: ("share" = 1)
Total runtime: 6.013 ms
(7 rows)

the userid=219177 got 2000+ record and around 40 shared=1, why above 2 query
shows so much difference?

any hint would be greatly appreciated.

-laser


Re: weird query plan

От
weiping
Дата:
sorry, forgot to mention our version, it's postgresql 8.2.3

-laser
> I have a table:
> webdigest=# \d wd_urlusermaps
> 表 "public.wd_urlusermaps"
> 字段名 | 类型 | 修饰词
> ---------+-----------------------------+-------------------------------------------------------------
> id | integer | not null default nextval('wd_urlusermaps_id_seq'::regclass)
> urlid | integer | not null
> tag | character varying(512) |
> title | character varying(512) |
> summary | character varying(1024) |
> comment | character varying(1024) |
> ctime | timestamp without time zone |
> mtime | timestamp without time zone |
> share | smallint |
> userid | integer |
> import | smallint | default 0
> 索引:
> "wd_urlusermaps_pkey" PRIMARY KEY, btree (id) CLUSTER
> "urlusermaps_urlid_userid" UNIQUE, btree (urlid, userid)
> "urlusermaps_urlid" btree (urlid)
> "urlusermaps_userid" btree (userid)
> "wd_urlusermaps_ctime_idx" btree (ctime)
> "wd_urlusermaps_share_idx" btree ("share")
>
> and target statistic set to 1000, and two different query plan:
>
> webdigest=# explain analyze select A.id as
> fav_id,A.urlid,A.tag,A.title,A.summary,A.comment,A.ctime,A.share from
> wd_urlusermaps A where share =1 and A.userid='219177' ORDER BY A.id DESC
> limit 20 ;
> QUERY PLAN
>
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=0.00..4932.56 rows=20 width=96) (actual
> time=730.461..2374.435 rows=20 loops=1)
> -> Index Scan Backward using wd_urlusermaps_pkey on wd_urlusermaps a
> (cost=0.00..269810.77 rows=1094 width=96) (actual time=730.456..2374.367
> rows=20 loops=1)
> Filter: (("share" = 1) AND (userid = 219177))
> Total runtime: 2374.513 ms
> (4 rows)
>
> webdigest=# explain analyze select A.id as
> fav_id,A.urlid,A.tag,A.title,A.summary,A.comment,A.ctime,A.share from
> wd_urlusermaps A where share =1 and A.userid='219177' ORDER BY A.id DESC
> limit 40 ;
> QUERY PLAN
>
---------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=6805.77..6805.87 rows=40 width=96) (actual time=5.731..5.905
> rows=40 loops=1)
> -> Sort (cost=6805.77..6808.50 rows=1094 width=96) (actual
> time=5.726..5.785 rows=40 loops=1)
> Sort Key: id
> -> Index Scan using urlusermaps_userid on wd_urlusermaps a
> (cost=0.00..6750.55 rows=1094 width=96) (actual time=0.544..5.616
> rows=41 loops=1)
> Index Cond: (userid = 219177)
> Filter: ("share" = 1)
> Total runtime: 6.013 ms
> (7 rows)
>
> the userid=219177 got 2000+ record and around 40 shared=1, why above 2 query
> shows so much difference?
>
> any hint would be greatly appreciated.
>
> -laser
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
>
>
>

different query plan because different limit # (Re: weird query plan)

От
weiping
Дата:
I changed the query to :
EXPLAIN ANALYZE select id from wd_urlusermaps where id in (select id
from wd_urlusermaps where share =1 and userid='219177') order by id desc
limit 20;

and it's much better now (from real execute time), but the cost report
higher
then slower one above, may be I should do some tunning on planner
parameter or
is it a planner bug?
QUERY PLAN

-------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=16118.83..16118.88 rows=20 width=4) (actual
time=17.539..17.619 rows=20 loops=1)
-> Sort (cost=16118.83..16121.57 rows=1094 width=4) (actual
time=17.534..17.560 rows=20 loops=1)
Sort Key: public.wd_urlusermaps.id
-> Nested Loop (cost=6753.28..16063.61 rows=1094 width=4) (actual
time=16.739..17.439 rows=41 loops=1)
-> HashAggregate (cost=6753.28..6764.22 rows=1094 width=4) (actual
time=16.707..16.786 rows=41 loops=1)
-> Index Scan using urlusermaps_userid on wd_urlusermaps
(cost=0.00..6750.55 rows=1094 width=4) (actual time=1.478..16.563
rows=41 loops=1)
Index Cond: (userid = 219177)
Filter: ("share" = 1)
-> Index Scan using wd_urlusermaps_pkey on wd_urlusermaps
(cost=0.00..8.49 rows=1 width=4) (actual time=0.008..0.010 rows=1 loops=41)
Index Cond: (public.wd_urlusermaps.id = public.wd_urlusermaps.id)
Total runtime: 17.762 ms
(11 rows)

> sorry, forgot to mention our version, it's postgresql 8.2.3
>
> -laser
>
>> I have a table:
>> webdigest=# \d wd_urlusermaps
>> 表 "public.wd_urlusermaps"
>> 字段名 | 类型 | 修饰词
>> ---------+-----------------------------+-------------------------------------------------------------
>> id | integer | not null default nextval('wd_urlusermaps_id_seq'::regclass)
>> urlid | integer | not null
>> tag | character varying(512) |
>> title | character varying(512) |
>> summary | character varying(1024) |
>> comment | character varying(1024) |
>> ctime | timestamp without time zone |
>> mtime | timestamp without time zone |
>> share | smallint |
>> userid | integer |
>> import | smallint | default 0
>> 索引:
>> "wd_urlusermaps_pkey" PRIMARY KEY, btree (id) CLUSTER
>> "urlusermaps_urlid_userid" UNIQUE, btree (urlid, userid)
>> "urlusermaps_urlid" btree (urlid)
>> "urlusermaps_userid" btree (userid)
>> "wd_urlusermaps_ctime_idx" btree (ctime)
>> "wd_urlusermaps_share_idx" btree ("share")
>>
>> and target statistic set to 1000, and two different query plan:
>>
>> webdigest=# explain analyze select A.id as
>> fav_id,A.urlid,A.tag,A.title,A.summary,A.comment,A.ctime,A.share from
>> wd_urlusermaps A where share =1 and A.userid='219177' ORDER BY A.id DESC
>> limit 20 ;
>> QUERY PLAN
>>
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
>> Limit (cost=0.00..4932.56 rows=20 width=96) (actual
>> time=730.461..2374.435 rows=20 loops=1)
>> -> Index Scan Backward using wd_urlusermaps_pkey on wd_urlusermaps a
>> (cost=0.00..269810.77 rows=1094 width=96) (actual time=730.456..2374.367
>> rows=20 loops=1)
>> Filter: (("share" = 1) AND (userid = 219177))
>> Total runtime: 2374.513 ms
>> (4 rows)
>>
>> webdigest=# explain analyze select A.id as
>> fav_id,A.urlid,A.tag,A.title,A.summary,A.comment,A.ctime,A.share from
>> wd_urlusermaps A where share =1 and A.userid='219177' ORDER BY A.id DESC
>> limit 40 ;
>> QUERY PLAN
>>
---------------------------------------------------------------------------------------------------------------------------------------------------------
>> Limit (cost=6805.77..6805.87 rows=40 width=96) (actual time=5.731..5.905
>> rows=40 loops=1)
>> -> Sort (cost=6805.77..6808.50 rows=1094 width=96) (actual
>> time=5.726..5.785 rows=40 loops=1)
>> Sort Key: id
>> -> Index Scan using urlusermaps_userid on wd_urlusermaps a
>> (cost=0.00..6750.55 rows=1094 width=96) (actual time=0.544..5.616
>> rows=41 loops=1)
>> Index Cond: (userid = 219177)
>> Filter: ("share" = 1)
>> Total runtime: 6.013 ms
>> (7 rows)
>>
>> the userid=219177 got 2000+ record and around 40 shared=1, why above 2 query
>> shows so much difference?
>>
>> any hint would be greatly appreciated.
>>
>> -laser
>>
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>        subscribe-nomail command to majordomo@postgresql.org so that your
>>        message can get through to the mailing list cleanly
>>
>>
>>
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>
>
>

Re: different query plan because different limit # (Re: weird query plan)

От
weiping
Дата:
continue digging shows:
set cpu_tuple_cost to 0.1;
explain analyze select * from wd_urlusermaps where share =1 and
userid='219177' order by id desc limit 20;
SET
时间: 0.256 ms
QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=7063.98..7064.03 rows=20 width=110) (actual
time=6.047..6.130 rows=20 loops=1)
-> Sort (cost=7063.98..7066.71 rows=1094 width=110) (actual
time=6.043..6.070 rows=20 loops=1)
Sort Key: id
-> Index Scan using urlusermaps_userid on wd_urlusermaps
(cost=0.00..7008.76 rows=1094 width=110) (actual time=0.710..5.838
rows=41 loops=1)
Index Cond: (userid = 219177)
Filter: ("share" = 1)
Total runtime: 6.213 ms
(7 rows)

now it's what i need, which means we should increase cpu_tuple_cost for
large
RAM node (we got 16G RAN and the table only serveral hundred M) to avoid
sort
happened too early. is it true?

-laser
> I changed the query to :
> EXPLAIN ANALYZE select id from wd_urlusermaps where id in (select id
> from wd_urlusermaps where share =1 and userid='219177') order by id desc
> limit 20;
>
> and it's much better now (from real execute time), but the cost report
> higher
> then slower one above, may be I should do some tunning on planner
> parameter or
> is it a planner bug?
> QUERY PLAN
>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------
> Limit (cost=16118.83..16118.88 rows=20 width=4) (actual
> time=17.539..17.619 rows=20 loops=1)
> -> Sort (cost=16118.83..16121.57 rows=1094 width=4) (actual
> time=17.534..17.560 rows=20 loops=1)
> Sort Key: public.wd_urlusermaps.id
> -> Nested Loop (cost=6753.28..16063.61 rows=1094 width=4) (actual
> time=16.739..17.439 rows=41 loops=1)
> -> HashAggregate (cost=6753.28..6764.22 rows=1094 width=4) (actual
> time=16.707..16.786 rows=41 loops=1)
> -> Index Scan using urlusermaps_userid on wd_urlusermaps
> (cost=0.00..6750.55 rows=1094 width=4) (actual time=1.478..16.563
> rows=41 loops=1)
> Index Cond: (userid = 219177)
> Filter: ("share" = 1)
> -> Index Scan using wd_urlusermaps_pkey on wd_urlusermaps
> (cost=0.00..8.49 rows=1 width=4) (actual time=0.008..0.010 rows=1 loops=41)
> Index Cond: (public.wd_urlusermaps.id = public.wd_urlusermaps.id)
> Total runtime: 17.762 ms
> (11 rows)
>
>
>> sorry, forgot to mention our version, it's postgresql 8.2.3
>>
>> -laser
>>
>>
>>> I have a table:
>>> webdigest=# \d wd_urlusermaps
>>> 表 "public.wd_urlusermaps"
>>> 字段名 | 类型 | 修饰词
>>> ---------+-----------------------------+-------------------------------------------------------------
>>> id | integer | not null default nextval('wd_urlusermaps_id_seq'::regclass)
>>> urlid | integer | not null
>>> tag | character varying(512) |
>>> title | character varying(512) |
>>> summary | character varying(1024) |
>>> comment | character varying(1024) |
>>> ctime | timestamp without time zone |
>>> mtime | timestamp without time zone |
>>> share | smallint |
>>> userid | integer |
>>> import | smallint | default 0
>>> 索引:
>>> "wd_urlusermaps_pkey" PRIMARY KEY, btree (id) CLUSTER
>>> "urlusermaps_urlid_userid" UNIQUE, btree (urlid, userid)
>>> "urlusermaps_urlid" btree (urlid)
>>> "urlusermaps_userid" btree (userid)
>>> "wd_urlusermaps_ctime_idx" btree (ctime)
>>> "wd_urlusermaps_share_idx" btree ("share")
>>>
>>> and target statistic set to 1000, and two different query plan:
>>>
>>> webdigest=# explain analyze select A.id as
>>> fav_id,A.urlid,A.tag,A.title,A.summary,A.comment,A.ctime,A.share from
>>> wd_urlusermaps A where share =1 and A.userid='219177' ORDER BY A.id DESC
>>> limit 20 ;
>>> QUERY PLAN
>>>
--------------------------------------------------------------------------------------------------------------------------------------------------------------------
>>> Limit (cost=0.00..4932.56 rows=20 width=96) (actual
>>> time=730.461..2374.435 rows=20 loops=1)
>>> -> Index Scan Backward using wd_urlusermaps_pkey on wd_urlusermaps a
>>> (cost=0.00..269810.77 rows=1094 width=96) (actual time=730.456..2374.367
>>> rows=20 loops=1)
>>> Filter: (("share" = 1) AND (userid = 219177))
>>> Total runtime: 2374.513 ms
>>> (4 rows)
>>>
>>> webdigest=# explain analyze select A.id as
>>> fav_id,A.urlid,A.tag,A.title,A.summary,A.comment,A.ctime,A.share from
>>> wd_urlusermaps A where share =1 and A.userid='219177' ORDER BY A.id DESC
>>> limit 40 ;
>>> QUERY PLAN
>>>
---------------------------------------------------------------------------------------------------------------------------------------------------------
>>> Limit (cost=6805.77..6805.87 rows=40 width=96) (actual time=5.731..5.905
>>> rows=40 loops=1)
>>> -> Sort (cost=6805.77..6808.50 rows=1094 width=96) (actual
>>> time=5.726..5.785 rows=40 loops=1)
>>> Sort Key: id
>>> -> Index Scan using urlusermaps_userid on wd_urlusermaps a
>>> (cost=0.00..6750.55 rows=1094 width=96) (actual time=0.544..5.616
>>> rows=41 loops=1)
>>> Index Cond: (userid = 219177)
>>> Filter: ("share" = 1)
>>> Total runtime: 6.013 ms
>>> (7 rows)
>>>
>>> the userid=219177 got 2000+ record and around 40 shared=1, why above 2 query
>>> shows so much difference?
>>>
>>> any hint would be greatly appreciated.
>>>
>>> -laser
>>>
>>>
>>> ---------------------------(end of broadcast)---------------------------
>>> TIP 1: if posting/reading through Usenet, please send an appropriate
>>>        subscribe-nomail command to majordomo@postgresql.org so that your
>>>        message can get through to the mailing list cleanly
>>>
>>>
>>>
>>>
>>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 5: don't forget to increase your free space map settings
>>
>>
>>
>>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faq
>
>
>

Re: different query plan because different limit # (Re: weird query plan)

От
Gregory Stark
Дата:
"weiping" <laserlist@pgsqldb.com> writes:

> -> Index Scan using urlusermaps_userid on wd_urlusermaps
>    (cost=0.00..6750.55 rows=1094 width=4) (actual time=1.478..16.563 rows=41 loops=1)
>    Index Cond: (userid = 219177)
>    Filter: ("share" = 1)

It's estimating 1094 rows and getting 41 rows. You might considering raising
the statistics target for that table.

Does it get accurate estimates for the number of rows for each of these?

explain analyze select * from wd_urlusermaps where userid=219177
explain analyze select * from wd_urlusermaps where share=1

(the latter might take a while)

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: weird query plan

От
Tom Lane
Дата:
weiping <laserlist@pgsqldb.com> writes:
> -> Index Scan using urlusermaps_userid on wd_urlusermaps a
> (cost=0.00..6750.55 rows=1094 width=96) (actual time=0.544..5.616
> rows=41 loops=1)
> Index Cond: (userid = 219177)
> Filter: ("share" = 1)

> the userid=219177 got 2000+ record and around 40 shared=1, why above 2 query
> shows so much difference?

Probably because the rowcount estimate is so far off (1094 vs 41).

Possibly boosting the statistics target would help.

            regards, tom lane