Обсуждение: Improve Query
So Google hasn't been helpful and I'm not entirely sure what to look for in the mailing lists to find the answer to my problem, so here goes. I have a query and I have run explain analyze select count(*) from score where leaderboardid=35 and score <= 6841 and active The result is "Aggregate (cost=2491.06..2491.07 rows=1 width=0) (actual time=38.878..38.878 rows=1 loops=1)" " -> Seq Scan on score (cost=0.00..2391.17 rows=39954 width=0) (actual time=0.012..30.760 rows=38571 loops=1)" " Filter: (active AND (score <= 6841) AND (leaderboardid = 35))" "Total runtime: 38.937 ms" I have an index on score, I have an index on score and leaderboard and active. I can't seem to figure out how to create an index that will turn that "Seq Scan" into an index scan. The biggest problem is that the query degrades very quickly with a lot more rows and I will be getting A LOT MORE rows. What can I do to improve the performance of this query? Thank you so much, ZC
try creating index on all three columns. Btw, 38ms is pretty fast. If you run that query very often, do prepare it, cos I reckon it takes few ms to actually create plan for it.
The plan ought to be different when there are more scores and the table is analyzed and your statistics target is high enough. At this point you don't have enough data to merit doing anything but a seq scan. The overhead is simply not worth it.
You could try inserting a lot more rows. I'd create a function to do several million inserts with random numbers and then analyze and rerun.
I think in the end your probably going to see a couple of bitmap index scans, anding the results together, and then a bitmap scan.
Keep in mind that postgres stores statistics information about each column, but doesn't at this point store statistics about two columns together.
Preparing the query might actually hurt performance because postgres treats a prepare as "plan this query but I'm not going to tell you value of the parameters". If you actually let the query replan every time then you will get a different plan for the leaderboards or score ranges that are more popular.
You could try inserting a lot more rows. I'd create a function to do several million inserts with random numbers and then analyze and rerun.
I think in the end your probably going to see a couple of bitmap index scans, anding the results together, and then a bitmap scan.
Keep in mind that postgres stores statistics information about each column, but doesn't at this point store statistics about two columns together.
Preparing the query might actually hurt performance because postgres treats a prepare as "plan this query but I'm not going to tell you value of the parameters". If you actually let the query replan every time then you will get a different plan for the leaderboards or score ranges that are more popular.
On Wed, May 27, 2009 at 8:09 AM, Zach Calvert <zachcalvert@hemerasoftware.com> wrote:
So Google hasn't been helpful and I'm not entirely sure what to look
for in the mailing lists to find the answer to my problem, so here
goes.
I have a query and I have run
explain analyze
select count(*)
from score
where leaderboardid=35 and score <= 6841 and active
The result is
"Aggregate (cost=2491.06..2491.07 rows=1 width=0) (actual
time=38.878..38.878 rows=1 loops=1)"
" -> Seq Scan on score (cost=0.00..2391.17 rows=39954 width=0)
(actual time=0.012..30.760 rows=38571 loops=1)"
" Filter: (active AND (score <= 6841) AND (leaderboardid = 35))"
"Total runtime: 38.937 ms"
I have an index on score, I have an index on score and leaderboard and
active. I can't seem to figure out how to create an index that will
turn that "Seq Scan" into an index scan. The biggest problem is that
the query degrades very quickly with a lot more rows and I will be
getting A LOT MORE rows. What can I do to improve the performance of
this query?
Thank you so much,
ZC
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance
I'm running the inserts now via a JDBC call I have, which is then followed up by the query I'm showing and a few others. I have run tests on all of the others, and all others run index scans and are very fast, 10 ms or less. This one started at 2 milliseconds when the table is empty and is up to 40 milliseconds with 40K inserts. It is degrading fast and I can't imagine what will happen with 400K, let alone 400 million. It is getting slower at a fairly fast clip and I need it to remain fast. Does postgre just not do count(*) with index scans? Is that my problem? I'm still running the exact same query. Here are the indexes I have tried CREATE INDEX idx_score_score ON score USING btree (score); CREATE INDEX idx_score_ldbscore ON score USING btree (leaderboardid, score); CREATE INDEX idx_score_ldbactive ON score USING btree (leaderboardid, active); CREATE INDEX idx_score_ldbactivescore ON score USING btree (leaderboardid, active, score); CREATE INDEX idx_score_scoreactiveldb ON score USING btree (score, active, leaderboardid); Yet still I run explain analyze select count(*) from score where leaderboardid=35 and active and score <= 6841 and get "Aggregate (cost=2641.29..2641.30 rows=1 width=0) (actual time=134.826..134.826 rows=1 loops=1)" " -> Seq Scan on score (cost=0.00..2536.44 rows=41938 width=0) (actual time=0.011..126.250 rows=40918 loops=1)" " Filter: (active AND (score <= 6841) AND (leaderboardid = 35))" "Total runtime: 48.891 ms" On Wed, May 27, 2009 at 11:06 AM, Nikolas Everett <nik9000@gmail.com> wrote: > The plan ought to be different when there are more scores and the table is > analyzed and your statistics target is high enough. At this point you don't > have enough data to merit doing anything but a seq scan. The overhead is > simply not worth it. > > You could try inserting a lot more rows. I'd create a function to do > several million inserts with random numbers and then analyze and rerun. > > I think in the end your probably going to see a couple of bitmap index > scans, anding the results together, and then a bitmap scan. > > Keep in mind that postgres stores statistics information about each column, > but doesn't at this point store statistics about two columns together. > > Preparing the query might actually hurt performance because postgres treats > a prepare as "plan this query but I'm not going to tell you value of the > parameters". If you actually let the query replan every time then you will > get a different plan for the leaderboards or score ranges that are more > popular. > > On Wed, May 27, 2009 at 8:09 AM, Zach Calvert > <zachcalvert@hemerasoftware.com> wrote: >> >> So Google hasn't been helpful and I'm not entirely sure what to look >> for in the mailing lists to find the answer to my problem, so here >> goes. >> >> I have a query and I have run >> explain analyze >> select count(*) >> from score >> where leaderboardid=35 and score <= 6841 and active >> >> The result is >> "Aggregate (cost=2491.06..2491.07 rows=1 width=0) (actual >> time=38.878..38.878 rows=1 loops=1)" >> " -> Seq Scan on score (cost=0.00..2391.17 rows=39954 width=0) >> (actual time=0.012..30.760 rows=38571 loops=1)" >> " Filter: (active AND (score <= 6841) AND (leaderboardid = 35))" >> "Total runtime: 38.937 ms" >> >> I have an index on score, I have an index on score and leaderboard and >> active. I can't seem to figure out how to create an index that will >> turn that "Seq Scan" into an index scan. The biggest problem is that >> the query degrades very quickly with a lot more rows and I will be >> getting A LOT MORE rows. What can I do to improve the performance of >> this query? >> >> >> >> >> >> Thank you so much, >> ZC >> >> -- >> Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-performance > >
you have to vacuum analyze after you've created index, afaik. No, count(*) is still counting rows.
Still getting a seq scan after doing vacuum analyze. Any other ideas? 2009/5/27 Grzegorz Jaśkiewicz <gryzman@gmail.com>: > you have to vacuum analyze after you've created index, afaik. > No, count(*) is still counting rows. >
Zach Calvert wrote: > Still getting a seq scan after doing vacuum analyze. Any other ideas? Try CLUSTERing the table on the (leaderboardid, active, score) index. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com