Обсуждение: Need to increase performance of a query
Hi, I have the following query that needs tuning: psrdb=# explain analyze (SELECT psrdb(# MAX(item_rank.rank) AS maxRank psrdb(# FROM psrdb(# item_rank item_rank psrdb(# WHERE psrdb(# item_rank.project_id='proj2783' psrdb(# AND item_rank.pf_id IS NULL psrdb(# psrdb(# ) psrdb-# ORDER BY psrdb-# maxRank DESC; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- Sort (cost=0.19..0.19 rows=1 width=0) (actual time=12.154..12.155 rows=1 loops=1) Sort Key: ($0) Sort Method: quicksort Memory: 17kB InitPlan -> Limit (cost=0.00..0.17 rows=1 width=8) (actual time=12.129..12.130 rows=1 loops=1) -> Index Scan Backward using item_rank_rank on item_rank (cost=0.00..2933.84 rows=17558 width=8) (actual time=12.126..12.126 rows=1 loops=1) Filter: ((rank IS NOT NULL) AND (pf_id IS NULL) AND ((project_id)::text = 'proj2783'::text)) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=12.140..12.142 rows=1 loops=1) Total runtime: 12.206 ms (9 rows) I have been playing with indexes but it seems that it doesn't make any difference. (I have created an index: item_rank_index" btree (project_id) WHERE (pf_id IS NULL)) Any advice on how to make it run faster? Thanks a lot, Anne
On 2010-06-10 19:50, Anne Rosset wrote: > Any advice on how to make it run faster? What timing do you get if you run it with \t (timing on) and without explain analyze ? I would be surprised if you can get it much faster than what is is.. I may be that a significant portion is "planning cost" so if you run it a lot you might benefit from a prepared statement. -- Jesper
On Thu, Jun 10, 2010 at 10:50:40AM -0700, Anne Rosset wrote: > Any advice on how to make it run faster? First, let me ask a simple question - what runtime for this query will be satisfactory for you? Best regards, depesz -- Linkedin: http://www.linkedin.com/in/depesz / blog: http://www.depesz.com/ jid/gtalk: depesz@depesz.com / aim:depeszhdl / skype:depesz_hdl / gg:6749007
Jesper Krogh wrote: > On 2010-06-10 19:50, Anne Rosset wrote: >> Any advice on how to make it run faster? > > What timing do you get if you run it with \t (timing on) and without > explain analyze ? > > I would be surprised if you can get it much faster than what is is.. I > may be that a > significant portion is "planning cost" so if you run it a lot you > might benefit from > a prepared statement. > > Hi Jesper, Thanks your response: psrdb=# \timing Timing is on. psrdb=# (SELECT psrdb(# MAX(item_rank.rank) AS maxRank psrdb(# FROM psrdb(# item_rank item_rank psrdb(# WHERE psrdb(# item_rank.project_id='proj2783' psrdb(# AND item_rank.pf_id IS NULL psrdb(# psrdb(# ) psrdb-# ORDER BY psrdb-# maxRank DESC; maxrank ------------- 20200000000 (1 row) Time: 12.947 ms It really seems to me that it should take less time. Specially when I see the result with a different where clause like this one: psrdb=# SELECT psrdb-# MAX(item_rank.rank) AS maxRank psrdb-# FROM psrdb-# item_rank item_rank psrdb-# WHERE psrdb-# item_rank.pf_id='plan1408' psrdb-# ORDER BY psrdb-# maxRank DESC; maxrank ------------- 20504000000 (1 row) Time: 2.582 ms Thanks, Anne
Jochen Erwied wrote: > Thursday, June 10, 2010, 8:36:08 PM you wrote: > > >> psrdb=# (SELECT >> psrdb(# MAX(item_rank.rank) AS maxRank >> psrdb(# FROM >> psrdb(# item_rank item_rank >> psrdb(# WHERE >> psrdb(# item_rank.project_id='proj2783' >> psrdb(# AND item_rank.pf_id IS NULL >> psrdb(# >> psrdb(# ) >> psrdb-# ORDER BY >> psrdb-# maxRank DESC; >> > > Don't think it does really matter, but why do you sort a resultset > consisting of only one row? > > Sorry, I should have removed the ORDER by (the full query has a union). So without the ORDER by, here are the results: psrdb=# SELECT psrdb-# MAX(item_rank.rank) AS maxRank psrdb-# FROM psrdb-# item_rank item_rank psrdb-# WHERE psrdb-# item_rank.pf_id='plan1408'; maxrank ------------- 20504000000 (1 row) Time: 1.516 ms psrdb=# SELECT psrdb-# MAX(item_rank.rank) AS maxRank psrdb-# FROM psrdb-# item_rank item_rank psrdb-# WHERE psrdb-# item_rank.project_id='proj2783' psrdb-# AND item_rank.pf_id IS NULL; maxrank ------------- 20200000000 (1 row) Time: 13.177 ms Is there anything that can be done for the second one? Thanks, Anne
On Thu, Jun 10, 2010 at 12:34:07PM -0700, Anne Rosset wrote: > Jochen Erwied wrote: >> Thursday, June 10, 2010, 8:36:08 PM you wrote: >> >> >>> psrdb=# (SELECT >>> psrdb(# MAX(item_rank.rank) AS maxRank >>> psrdb(# FROM >>> psrdb(# item_rank item_rank >>> psrdb(# WHERE >>> psrdb(# item_rank.project_id='proj2783' >>> psrdb(# AND item_rank.pf_id IS NULL >>> psrdb(# >>> psrdb(# ) >>> psrdb-# ORDER BY >>> psrdb-# maxRank DESC; >>> >> >> Don't think it does really matter, but why do you sort a resultset >> consisting of only one row? >> >> > Sorry, I should have removed the ORDER by (the full query has a union). > So without the ORDER by, here are the results: > psrdb=# SELECT > psrdb-# MAX(item_rank.rank) AS maxRank > psrdb-# FROM > psrdb-# item_rank item_rank > psrdb-# WHERE > psrdb-# item_rank.pf_id='plan1408'; > maxrank > ------------- > 20504000000 > (1 row) > > Time: 1.516 ms > psrdb=# SELECT > psrdb-# MAX(item_rank.rank) AS maxRank > psrdb-# FROM > psrdb-# item_rank item_rank > psrdb-# WHERE > psrdb-# item_rank.project_id='proj2783' > psrdb-# AND item_rank.pf_id IS NULL; > maxrank > ------------- > 20200000000 > (1 row) > > Time: 13.177 ms > > Is there anything that can be done for the second one? > > Thanks, > Anne > What about an IS NULL index on pf_id? Regards, Ken
Thursday, June 10, 2010, 9:34:07 PM you wrote: > Time: 1.516 ms > Time: 13.177 ms I'd suppose the first query to scan a lot less rows than the second one. Could you supply an explained plan for the fast query? -- Jochen Erwied | home: jochen@erwied.eu +49-208-38800-18, FAX: -19 Sauerbruchstr. 17 | work: joe@mbs-software.de +49-2151-7294-24, FAX: -50 D-45470 Muelheim | mobile: jochen.erwied@vodafone.de +49-173-5404164
Thursday, June 10, 2010, 8:36:08 PM you wrote: > psrdb=# (SELECT > psrdb(# MAX(item_rank.rank) AS maxRank > psrdb(# FROM > psrdb(# item_rank item_rank > psrdb(# WHERE > psrdb(# item_rank.project_id='proj2783' > psrdb(# AND item_rank.pf_id IS NULL > psrdb(# > psrdb(# ) > psrdb-# ORDER BY > psrdb-# maxRank DESC; Don't think it does really matter, but why do you sort a resultset consisting of only one row? -- Jochen Erwied | home: jochen@erwied.eu +49-208-38800-18, FAX: -19 Sauerbruchstr. 17 | work: joe@mbs-software.de +49-2151-7294-24, FAX: -50 D-45470 Muelheim | mobile: jochen.erwied@vodafone.de +49-173-5404164
Kenneth Marshall wrote: > On Thu, Jun 10, 2010 at 12:34:07PM -0700, Anne Rosset wrote: > >> Jochen Erwied wrote: >> >>> Thursday, June 10, 2010, 8:36:08 PM you wrote: >>> >>> >>> >>>> psrdb=# (SELECT >>>> psrdb(# MAX(item_rank.rank) AS maxRank >>>> psrdb(# FROM >>>> psrdb(# item_rank item_rank >>>> psrdb(# WHERE >>>> psrdb(# item_rank.project_id='proj2783' >>>> psrdb(# AND item_rank.pf_id IS NULL >>>> psrdb(# >>>> psrdb(# ) >>>> psrdb-# ORDER BY >>>> psrdb-# maxRank DESC; >>>> >>>> >>> Don't think it does really matter, but why do you sort a resultset >>> consisting of only one row? >>> >>> >>> >> Sorry, I should have removed the ORDER by (the full query has a union). >> So without the ORDER by, here are the results: >> psrdb=# SELECT >> psrdb-# MAX(item_rank.rank) AS maxRank >> psrdb-# FROM >> psrdb-# item_rank item_rank >> psrdb-# WHERE >> psrdb-# item_rank.pf_id='plan1408'; >> maxrank >> ------------- >> 20504000000 >> (1 row) >> >> Time: 1.516 ms >> psrdb=# SELECT >> psrdb-# MAX(item_rank.rank) AS maxRank >> psrdb-# FROM >> psrdb-# item_rank item_rank >> psrdb-# WHERE >> psrdb-# item_rank.project_id='proj2783' >> psrdb-# AND item_rank.pf_id IS NULL; >> maxrank >> ------------- >> 20200000000 >> (1 row) >> >> Time: 13.177 ms >> >> Is there anything that can be done for the second one? >> >> Thanks, >> Anne >> >> > What about an IS NULL index on pf_id? > > Regards, > Ken > Hi Ken, I have the following index: "item_rank_index2" btree (project_id) WHERE (pf_id IS NULL) Are you suggesting something else? Thanks, Anne
Jochen Erwied wrote: > Thursday, June 10, 2010, 9:34:07 PM you wrote: > > >> Time: 1.516 ms >> > > >> Time: 13.177 ms >> > > I'd suppose the first query to scan a lot less rows than the second one. > Could you supply an explained plan for the fast query? > > Hi Jochen, Here is the explained plan for the fastest query: psrdb=# explain analyze ELECT psrdb-# MAX(item_rank.rank) AS maxRank psrdb-# FROM psrdb-# item_rank item_rank psrdb-# WHERE psrdb-# item_rank.pf_id='plan1408'; ERROR: syntax error at or near "ELECT" at character 17 psrdb=# explain analyze SELECT psrdb-# MAX(item_rank.rank) AS maxRank psrdb-# FROM psrdb-# item_rank item_rank psrdb-# WHERE psrdb-# item_rank.pf_id='plan1408'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=8.28..8.29 rows=1 width=8) (actual time=0.708..0.709 rows=1 loops=1) -> Index Scan using item_rank_pf on item_rank (cost=0.00..8.27 rows=1 width=8) (actual time=0.052..0.407 rows=303 loops=1) Index Cond: ((pf_id)::text = 'plan1408'::text) Total runtime: 0.761 ms (4 rows) Time: 2.140 ms
On 6/10/10 12:34 PM, Anne Rosset wrote: > Jochen Erwied wrote: >> Thursday, June 10, 2010, 8:36:08 PM you wrote: >> >>> psrdb=# (SELECT >>> psrdb(# MAX(item_rank.rank) AS maxRank >>> psrdb(# FROM >>> psrdb(# item_rank item_rank >>> psrdb(# WHERE >>> psrdb(# item_rank.project_id='proj2783' >>> psrdb(# AND item_rank.pf_id IS NULL >>> psrdb(# >>> psrdb(# ) >>> psrdb-# ORDER BY >>> psrdb-# maxRank DESC; >> >> Don't think it does really matter, but why do you sort a resultset >> consisting of only one row? >> > Sorry, I should have removed the ORDER by (the full query has a union). > So without the ORDER by, here are the results: > psrdb=# SELECT > psrdb-# MAX(item_rank.rank) AS maxRank > psrdb-# FROM > psrdb-# item_rank item_rank > psrdb-# WHERE > psrdb-# item_rank.pf_id='plan1408'; > maxrank > ------------- > 20504000000 > (1 row) > > Time: 1.516 ms > psrdb=# SELECT > psrdb-# MAX(item_rank.rank) AS maxRank > psrdb-# FROM > psrdb-# item_rank item_rank > psrdb-# WHERE > psrdb-# item_rank.project_id='proj2783' > psrdb-# AND item_rank.pf_id IS NULL; > maxrank > ------------- > 20200000000 > (1 row) > > Time: 13.177 ms > > Is there anything that can be done for the second one? Postgres normally doesn't index NULL values even if the column is indexed, so it has to do a table scan when your query includesan IS NULL condition. You need to create an index that includes the "IS NULL" condition. create index item_rank_null_idx on item_rank(pf_id) where item_rank.pf_id is null; Craig
Craig James wrote: > On 6/10/10 12:34 PM, Anne Rosset wrote: >> Jochen Erwied wrote: >>> Thursday, June 10, 2010, 8:36:08 PM you wrote: >>> >>>> psrdb=# (SELECT >>>> psrdb(# MAX(item_rank.rank) AS maxRank >>>> psrdb(# FROM >>>> psrdb(# item_rank item_rank >>>> psrdb(# WHERE >>>> psrdb(# item_rank.project_id='proj2783' >>>> psrdb(# AND item_rank.pf_id IS NULL >>>> psrdb(# >>>> psrdb(# ) >>>> psrdb-# ORDER BY >>>> psrdb-# maxRank DESC; >>> >>> Don't think it does really matter, but why do you sort a resultset >>> consisting of only one row? >>> >> Sorry, I should have removed the ORDER by (the full query has a union). >> So without the ORDER by, here are the results: >> psrdb=# SELECT >> psrdb-# MAX(item_rank.rank) AS maxRank >> psrdb-# FROM >> psrdb-# item_rank item_rank >> psrdb-# WHERE >> psrdb-# item_rank.pf_id='plan1408'; >> maxrank >> ------------- >> 20504000000 >> (1 row) >> >> Time: 1.516 ms >> psrdb=# SELECT >> psrdb-# MAX(item_rank.rank) AS maxRank >> psrdb-# FROM >> psrdb-# item_rank item_rank >> psrdb-# WHERE >> psrdb-# item_rank.project_id='proj2783' >> psrdb-# AND item_rank.pf_id IS NULL; >> maxrank >> ------------- >> 20200000000 >> (1 row) >> >> Time: 13.177 ms >> >> Is there anything that can be done for the second one? > > Postgres normally doesn't index NULL values even if the column is > indexed, so it has to do a table scan when your query includes an IS > NULL condition. You need to create an index that includes the "IS > NULL" condition. > > create index item_rank_null_idx on item_rank(pf_id) > where item_rank.pf_id is null; > > Craig > Hi Craig, I tried again after adding your suggested index but I didn't see any improvements: (seems that the index is not used) psrdb=# explain analyze SELECT psrdb-# MAX(item_rank.rank) AS maxRank psrdb-# FROM psrdb-# item_rank item_rank psrdb-# WHERE psrdb-# item_rank.project_id='proj2783' psrdb-# AND item_rank.pf_id IS NULL; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------------- Result (cost=0.17..0.18 rows=1 width=0) (actual time=11.942..11.943 rows=1 loops=1) InitPlan -> Limit (cost=0.00..0.17 rows=1 width=8) (actual time=11.931..11.932 rows=1 loops=1) -> Index Scan Backward using item_rank_rank on item_rank (cost=0.00..2933.84 rows=17558 width=8) (actual time=11.926..11.926 rows=1 loops=1) Filter: ((rank IS NOT NULL) AND (pf_id IS NULL) AND ((project_id)::text = 'proj2783'::text)) Total runtime: 11.988 ms (6 rows) Time: 13.654 ms Thanks, Anne
On 10/06/10 22:47, Craig James wrote: > Postgres normally doesn't index NULL values even if the column is > indexed, so it has to do a table scan when your query includes an IS > NULL condition. That was addressed in version 8.3. 8.3 and upwards can use an index for IS NULL. I believe the NULLs were stored in the index in earlier releases too, they just couldn't be searched for. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Heikki Linnakangas wrote: > On 10/06/10 22:47, Craig James wrote: >> Postgres normally doesn't index NULL values even if the column is >> indexed, so it has to do a table scan when your query includes an IS >> NULL condition. > > That was addressed in version 8.3. 8.3 and upwards can use an index > for IS NULL. > > I believe the NULLs were stored in the index in earlier releases too, > they just couldn't be searched for. > I am using postgres 8.3.6. So why doesn't it use my index? Thanks, Anne
On 06/10/2010 12:56 PM, Anne Rosset wrote: > Craig James wrote: >> create index item_rank_null_idx on item_rank(pf_id) >> where item_rank.pf_id is null; >> >> Craig >> > Hi Craig, > I tried again after adding your suggested index but I didn't see any > improvements: (seems that the index is not used) > Filter: ((rank IS NOT NULL) AND (pf_id IS NULL) AND > ((project_id)::text = 'proj2783'::text)) > Total runtime: 11.988 ms > (6 rows) > > Time: 13.654 ms try: create index item_rank_null_idx on item_rank(pf_id) where rank IS NOT NULL AND pf_id IS NULL; Joe
Вложения
On 06/10/2010 01:10 PM, Joe Conway wrote: > try: > > create index item_rank_null_idx on item_rank(pf_id) > where rank IS NOT NULL AND pf_id IS NULL; oops -- that probably should be: create index item_rank_null_idx on item_rank(project_id) where rank IS NOT NULL AND pf_id IS NULL; Joe
Вложения
Joe Conway wrote: > On 06/10/2010 01:10 PM, Joe Conway wrote: > >> try: >> >> create index item_rank_null_idx on item_rank(pf_id) >> where rank IS NOT NULL AND pf_id IS NULL; >> > > oops -- that probably should be: > > create index item_rank_null_idx on item_rank(project_id) > where rank IS NOT NULL AND pf_id IS NULL; > > Joe > > I tried that and it didn't make any difference. Same query plan. Anne
On 10/06/10 23:08, Anne Rosset wrote: > Heikki Linnakangas wrote: >> On 10/06/10 22:47, Craig James wrote: >>> Postgres normally doesn't index NULL values even if the column is >>> indexed, so it has to do a table scan when your query includes an IS >>> NULL condition. >> >> That was addressed in version 8.3. 8.3 and upwards can use an index >> for IS NULL. >> >> I believe the NULLs were stored in the index in earlier releases too, >> they just couldn't be searched for. >> > I am using postgres 8.3.6. So why doesn't it use my index? Well, apparently the planner doesn't think it would be any cheaper. I wonder if this helps: CREATE INDEX item_rank_project_id ON item_rank(project_id, rank, pf_id); And make sure you drop any of the indexes that are not being used, to make sure the planner doesn't choose them instead. (You should upgrade to 8.3.11, BTW. There's been a bunch of bug-fixes in-between, though I don't know if any are related to this, but there's other important fixes there) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On 06/10/2010 01:21 PM, Anne Rosset wrote: >> > I tried that and it didn't make any difference. Same query plan. A little experimentation suggests this might work: create index item_rank_project on item_rank(project_id, rank) where pf_id IS NULL; Joe
Вложения
Joe Conway wrote: > On 06/10/2010 01:21 PM, Anne Rosset wrote: > >>> >>> >> I tried that and it didn't make any difference. Same query plan. >> > > A little experimentation suggests this might work: > > create index item_rank_project on item_rank(project_id, rank) where > pf_id IS NULL; > > Joe > > Yes it does. Thanks a lot! Anne