Обсуждение: Why performance improvement on converting subselect to a function ?
Hi,
For each company_id in certain table i have to search the same table
get certain rows sort them and pick up the top one , i tried using this
subselect:
explain analyze SELECT company_id , (SELECT edition FROM ONLY
public.branding_master b WHERE old_company_id = a.company_id OR company_id =
a.company_id ORDER BY b.company_id DESC LIMIT 1) from public.branding_master
a limit 50;
QUERY PLAN
Limit (cost=0.00..3.52 rows=50 width=4) (actual time=463.97..19429.54 rows=50
loops=1)
-> Seq Scan on branding_master a (cost=0.00..6530.79 rows=92679 width=4)
(actual time=463.97..19429.28 rows=51 loops=1)
SubPlan
-> Limit (cost=0.00..168.36 rows=1 width=6) (actual
time=66.96..380.94 rows=1 loops=51)
-> Index Scan Backward using branding_master_pkey on
branding_master b (cost=0.00..23990.26 rows=142 width=6) (actual
time=66.95..380.93 rows=1 loops=51)
Filter: ((old_company_id = $0) OR (company_id = $0))
Total runtime: 19429.76 msec
(7 rows)
Very Slow 20 secs.
CREATE FUNCTION most_recent_edition (integer) returns integer AS 'SELECT
edition::integer FROM ONLY public.branding_master b WHERE old_company_id = $1
OR company_id = $1 ORDER BY b.company_id DESC LIMIT 1 ' language 'sql';
tradein_clients=# explain analyze SELECT company_id ,
most_recent_edition(company_id) from public.branding_master limit 50;
QUERY PLAN
Limit (cost=0.00..3.52 rows=50 width=4) (actual time=208.23..3969.39 rows=50
loops=1)
-> Seq Scan on branding_master (cost=0.00..6530.79 rows=92679 width=4)
(actual time=208.22..3969.15 rows=51 loops=1)
Total runtime: 3969.52 msec
(3 rows)
Time: 4568.33 ms
4 times faster.
But i feel it can be lot more faster , can anyone suggest me something
to try.
Indexes exists on company_id(pkey) and old_company_id Most of the chores
are already done [ vacuum full analyze , reindex ]
Regds
mallah.
Rajesh Kumar Mallah <mallah@trade-india.com> writes:
> explain analyze SELECT company_id , (SELECT edition FROM ONLY
> public.branding_master b WHERE old_company_id = a.company_id OR company_id =
> a.company_id ORDER BY b.company_id DESC LIMIT 1) from public.branding_master
> a limit 50;
> Total runtime: 19429.76 msec
> CREATE FUNCTION most_recent_edition (integer) returns integer AS 'SELECT
> edition::integer FROM ONLY public.branding_master b WHERE old_company_id = $1
> OR company_id = $1 ORDER BY b.company_id DESC LIMIT 1 ' language 'sql';
> tradein_clients=# explain analyze SELECT company_id ,
> most_recent_edition(company_id) from public.branding_master limit 50;
> Total runtime: 3969.52 msec
Odd. Apparently the planner is picking a better plan in the function
context than in the subselect context --- which is strange since it
ought to have less information.
AFAIK the only way to see the plan generated for a SQL function's query
is like this:
regression=# create function foo(int) returns int as
regression-# 'select unique1 from tenk1 where unique1 = $1' language sql;
CREATE FUNCTION
regression=# set debug_print_plan TO 1;
SET
regression=# set client_min_messages TO debug;
SET
regression=# select foo(55);
DEBUG: plan:
DETAIL: {RESULT :startup_cost 0.00 :total_cost 0.01 :plan_rows 1 :plan_width 0
:targetlist ({TARGETENTRY :resdom {RESDOM :resno 1 :restype 23 :restypmod -1
:resname foo :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false}
:expr {FUNCEXPR :funcid 706101 :funcresulttype 23 :funcretset false
... (etc etc)
Would you do that and send it along? I'm curious ...
> But i feel it can be lot more faster , can anyone suggest me something
> to try.
Create an index on old_company_id, perhaps.
regards, tom lane
Tom Lane wrote:
Sorry for the delayed response.
branding_master_old_comapany_id btree (old_company_id),
regds , mallah.
Rajesh Kumar Mallah <mallah@trade-india.com> writes:explain analyze SELECT company_id , (SELECT edition FROM ONLY public.branding_master b WHERE old_company_id = a.company_id OR company_id = a.company_id ORDER BY b.company_id DESC LIMIT 1) from public.branding_master a limit 50; Total runtime: 19429.76 msecCREATE FUNCTION most_recent_edition (integer) returns integer AS 'SELECT edition::integer FROM ONLY public.branding_master b WHERE old_company_id = $1 OR company_id = $1 ORDER BY b.company_id DESC LIMIT 1 ' language 'sql';tradein_clients=# explain analyze SELECT company_id , most_recent_edition(company_id) from public.branding_master limit 50; Total runtime: 3969.52 msecOdd. Apparently the planner is picking a better plan in the function context than in the subselect context --- which is strange since it ought to have less information. AFAIK the only way to see the plan generated for a SQL function's query is like this: regression=# create function foo(int) returns int as regression-# 'select unique1 from tenk1 where unique1 = $1' language sql; CREATE FUNCTION regression=# set debug_print_plan TO 1; SET regression=# set client_min_messages TO debug; SET regression=# select foo(55); DEBUG: plan: DETAIL: {RESULT :startup_cost 0.00 :total_cost 0.01 :plan_rows 1 :plan_width 0 :targetlist ({TARGETENTRY :resdom {RESDOM :resno 1 :restype 23 :restypmod -1 :resname foo :ressortgroupref 0 :resorigtbl 0 :resorigcol 0 :resjunk false} :expr {FUNCEXPR :funcid 706101 :funcresulttype 23 :funcretset false... (etc etc) Would you do that and send it along? I'm curious ...
Sorry for the delayed response.
tradein_clients=# explain analyze SELECT company_id , data_bank.most_recent_edition(company_id) from public.branding_master limit 50;
--------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..3.57 rows=50 width=4) (actual time=149.52..2179.49 rows=50 loops=1)
-> Seq Scan on branding_master (cost=0.00..6626.52 rows=92752 width=4) (actual time=149.51..2179.30 rows=51 loops=1)
tradein_clients=#
tradein_clients=#
tradein_clients=# explain analyze SELECT company_id , data_bank.most_recent_edition(company_id) from public.branding_master limit 50;
DEBUG: StartTransactionCommand
LOG: plan:
{ LIMIT :startup_cost 0.00 :total_cost 185.65 :rows 1 :width 6 :qptargetlist
({ TARGETENTRY :resdom { RESDOM :resno 1 :restype 23 :restypmod -1 :resname
edition :reskey 0 :reskeyop 0 :ressortgroupref 0 :resjunk false } :expr { EXPR
:typeOid 23 :opType func :oper { FUNC :funcid 313 :funcresulttype 23
:funcretset false :funcformat 1 } :args ({ VAR :varno 1 :varattno 31 :vartype
21 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 31})}} { TARGETENTRY
:resdom { RESDOM :resno 2 :restype 23 :restypmod -1 :resname company_id
:reskey 0 :reskeyop 0 :ressortgroupref 1 :resjunk true } :expr { VAR :varno 1
:varattno 1 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno
1}}) :qpqual <> :lefttree { INDEXSCAN :startup_cost 0.00 :total_cost 24302.69
:rows 131 :width 6 :qptargetlist ({ TARGETENTRY :resdom { RESDOM :resno 1
:restype 23 :restypmod -1 :resname edition :reskey 0 :reskeyop 0
:ressortgroupref 0 :resjunk false } :expr { EXPR :typeOid 23 :opType func
:oper { FUNC :funcid 313 :funcresulttype 23 :funcretset false :funcformat 1 }
:args ({ VAR :varno 1 :varattno 31 :vartype 21 :vartypmod -1 :varlevelsup 0
:varnoold 1 :varoattno 31})}} { TARGETENTRY :resdom { RESDOM :resno 2 :restype
23 :restypmod -1 :resname company_id :reskey 0 :reskeyop 0 :ressortgroupref 1
:resjunk true } :expr { VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1
:varlevelsup 0 :varnoold 1 :varoattno 1}}) :qpqual ({ EXPR :typeOid 16
:opType or :oper <> :args ({ EXPR :typeOid 16 :opType op :oper { OPER :opno
96 :opid 65 :opresulttype 16 :opretset false } :args ({ VAR :varno 1 :varattno
19 :vartype 23 :vartypmod -1 :varlevelsup 0 :varnoold 1 :varoattno 19} {
PARAM :paramkind 12 :paramid 1 :paramname \<unnamed> :paramtype 23 })} { EXPR
:typeOid 16 :opType op :oper { OPER :opno 96 :opid 65 :opresulttype 16
:opretset false } :args ({ VAR :varno 1 :varattno 1 :vartype 23 :vartypmod -1
:varlevelsup 0 :varnoold 1 :varoattno 1} { PARAM :paramkind 12 :paramid 1
:paramname \<unnamed> :paramtype 23 })})}) :lefttree <> :righttree <> :extprm
() :locprm () :initplan <> :nprm 0 :scanrelid 1 :indxid ( 310742439)
:indxqual (<>) :indxqualorig (<>) :indxorderdir -1 } :righttree <> :extprm ()
:locprm () :initplan <> :nprm 0 :limitOffset <> :limitCount { CONST
:consttype 23 :constlen 4 :constbyval true :constisnull false :constvalue 4 [
1 0 0 0 ] }}
DEBUG: CommitTransactionCommand
But i feel it can be lot more faster , can anyone suggest me something to try.
Its there already..Create an index on old_company_id, perhaps.
branding_master_old_comapany_id btree (old_company_id),
regds , mallah.
regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: you can get off all lists at once with the unregister command (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
Rajesh Kumar Mallah <mallah@trade-india.com> writes:
> Tom Lane wrote:
>> Odd. Apparently the planner is picking a better plan in the function
>> context than in the subselect context --- which is strange since it
>> ought to have less information.
> [ verbose plan snipped ]
Well, that sure seems to be the same plan. Curious that the runtime
wasn't about the same. Perhaps the slow execution of the first query
was a caching effect? If you alternate trying the query both ways,
does the speed difference persist?
regards, tom lane
Re: Why performance improvement on converting subselect to a function ?
От
Rajesh Kumar Mallah
Дата:
Dear Tom, the problem was repeatble in the sense repeated execution of queries made no difference on performance. What lead to degradation was the bumping off of effective_cache_size parameter from 1000 to 64K Can any one point me the recent guide done by Sridhar and Josh i want to see what i mis(read|understood) from there ;-) [ it was on GeneralBits' Home Page ] Anyway the performance gain was from 32 secs to less than a sec what i restored cache size from 64K to 1000. I will post again with more details but at the moment i got to load my data_bank :) Regds Mallah. On Wednesday 30 Jul 2003 3:02 am, Tom Lane wrote: > Rajesh Kumar Mallah <mallah@trade-india.com> writes: > > Tom Lane wrote: > >> Odd. Apparently the planner is picking a better plan in the function > >> context than in the subselect context --- which is strange since it > >> ought to have less information. > > > > [ verbose plan snipped ] > > Well, that sure seems to be the same plan. Curious that the runtime > wasn't about the same. Perhaps the slow execution of the first query > was a caching effect? If you alternate trying the query both ways, > does the speed difference persist? > > regards, tom lane
Rajesh Kumar Mallah <mallah@trade-india.com> writes:
> What lead to degradation was the bumping off of
> effective_cache_size parameter from 1000 to 64K
Check the plan then; AFAIR the only possible effect of changing
effective_cache_size is to influence which plan the planner picks.
regards, tom lane
Re: Why performance improvement on converting subselect to a function ?
От
"Shridhar Daithankar"
Дата:
On 30 Jul 2003 at 12:54, Rajesh Kumar Mallah wrote: > Can any one point me the recent guide done by > Sridhar and Josh i want to see what i mis(read|understood) > from there ;-) [ it was on GeneralBits' Home Page ] http://www.varlena.com/GeneralBits/Tidbits/perf.html HTH Bye Shridhar -- program, n.: A magic spell cast over a computer allowing it to turn one's input into error messages. tr.v. To engage in a pastime similar to banging one's head against a wall, but with fewer opportunities for reward.
Tom Lane wrote:
>Rajesh Kumar Mallah <mallah@trade-india.com> writes:
>
>
>>What lead to degradation was the bumping off of
>>effective_cache_size parameter from 1000 to 64K
>>
>>
>
>Check the plan then; AFAIR the only possible effect of changing
>effective_cache_size is to influence which plan the planner picks.
>
Dear Tom,
Below are the plans for two cases. I dont know how to read them accurately
can u please explain them. Also can anyone point to some documentation
oriented towards understanding explain analyze output?
Regds
Mallah.
tradein_clients=# SET effective_cache_size = 1000;
SET
tradein_clients=# explain analyze SELECT
pri_key,most_recent_edition(pri_key) from profiles where
source='BRANDING' limit 100;
QUERY
PLAN
--------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..25.67 rows=100 width=4) (actual time=141.11..154.71
rows=100 loops=1)
-> Seq Scan on profiles (cost=0.00..15754.83 rows=61385 width=4)
(actual time=141.11..154.51 rows=101 loops=1)
Filter: (source = 'BRANDING'::character varying)
Total runtime: 154.84 msec
(4 rows)
tradein_clients=# SET effective_cache_size = 64000;
SET
tradein_clients=# explain analyze SELECT
pri_key,most_recent_edition(pri_key) from profiles where
source='BRANDING' limit 100;
QUERY
PLAN
----------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..25.67 rows=100 width=4) (actual
time=587.61..22884.75 rows=100 loops=1)
-> Seq Scan on profiles (cost=0.00..15754.83 rows=61385 width=4)
(actual time=587.60..22884.25 rows=101 loops=1)
Filter: (source = 'BRANDING'::character varying)
Total runtime: 22884.97 msec
(4 rows)
tradein_clients=#
>
> regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 5: Have you checked our extensive FAQ?
>
> http://www.postgresql.org/docs/faqs/FAQ.html
>
>
Rajesh Kumar Mallah <mallah@trade-india.com> writes: > Below are the plans for two cases. I dont know how to read them accurately > can u please explain them. Well, they're the same plan, as far as they go. I suppose that the runtime difference must come from choosing a different plan inside the most_recent_edition() function, which we cannot see in the explain output. As before, turning on logging of verbose query plans is the only way to look at what the function is doing. > Also can anyone point to some documentation > oriented towards understanding explain analyze output? http://www.postgresql.org/docs/view.php?version=7.3&idoc=0&file=performance-tips.html regards, tom lane