Обсуждение: Slow first query despite LIMIT and OFFSET clause
Hope someone can shed light on this. I have a well-performing query that uses the index. Without OFFSET etc it returns about 11,000 records. I only need about 20 of these records at any given time, which is why my LIMIT and OFFSET try to pull only the 20 records or so. My queries are fast in general *except* the first time. The first time I pull my 20 records, it takes quite a lot of time -- about 8-10 seconds or so, which is unacceptable in our case. My guesses: 1. This first query slowness may be because thequery is being read into the memory? 2. Because the query uses an ORDER BY DESC on another indexed (date) key, which means it may well be slowing down the sorting the first time? Appreciate any thoughts. My query is: explain analyze SELECT testimonials.url ,testimonials.alias ,testimonials.aliasEntered ,testimonials.title ,testimonials.modify_date ,testimonials.id ,visitcount.visit_count ,visitcount.unique_count ,visitcount.modify_date ,coalesce( extract(epoch from now()) - extract(epoch from visitcount.modify_date), 0) ,(select count(id) from testimonials WHERE testimonials.user_id = 'superman' and testimonials.user_known = 1 and testimonials.status = 'Y' ) AS total FROM testimonials LEFT JOIN visitcount ON testimonials.id = visitcount.id WHERE testimonials.user_id = 'superman' and testimonials.user_known = 1 and testimonials.status = 'Y' ORDER BY testimonials.modify_date desc OFFSET 0 LIMIT 10 ; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=224.68..224.71 rows=10 width=187) (actual time=453.429..453.539 rows=10 loops=1) InitPlan -> Aggregate (cost=63.52..63.53 rows=1 width=8) (actual time=89.268..89.271 rows=1 loops=1) -> Index Scan using new_idx_userknown on testimonials (cost=0.00..63.41 rows=42 width=8) (actual time=0.039..49.968 rows=10149 loops=1) Index Cond: ((user_id)::text = 'superman'::text) Filter: (status = 'Y'::bpchar) -> Sort (cost=161.16..161.26 rows=42 width=187) (actual time=453.420..453.464 rows=10 loops=1) Sort Key: testimonials.modify_date -> Nested Loop Left Join (cost=0.00..160.02 rows=42 width=187) (actual time=89.384..395.008 rows=10149 loops=1) -> Index Scan using new_idx_userknown on testimonials (cost=0.00..63.41 rows=42 width=171) (actual time=0.061..50.990 rows=10149 loops=1) Index Cond: ((user_id)::text = 'superman'::text) Filter: (status = 'Y'::bpchar) -> Index Scan using visitcount_pkey1 on visitcount (cost=0.00..2.28 rows=1 width=24) (actual time=0.007..0.010 rows=1 loops=10149) Index Cond: (testimonials.id = visitcount.id) Total runtime: 461. 682 ms (15 rows) It's using the following indexes on the "testimonials" table. "new_idx_modify_date" btree (modify_date) WITH (fillfactor=75) "new_idx_userknown" btree (user_id) WITH (fillfactor=70) WHERE user_known = 1 THANKS!
On Sun, Jan 25, 2009 at 8:41 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote: > My query is: > > > explain analyze SELECT > testimonials.url > ,testimonials.alias > ,testimonials.aliasEntered > ,testimonials.title > ,testimonials.modify_date > ,testimonials.id > ,visitcount.visit_count > ,visitcount.unique_count > ,visitcount.modify_date > ,coalesce( extract(epoch from now()) - extract(epoch > from visitcount.modify_date), 0) > ,(select count(id) from testimonials WHERE > testimonials.user_id = 'superman' and testimonials.user_known = 1 and > testimonials.status = 'Y' ) AS total > FROM testimonials > LEFT JOIN visitcount ON testimonials.id = visitcount.id > WHERE > testimonials.user_id = 'superman' > and testimonials.user_known = 1 > and testimonials.status = 'Y' > ORDER BY testimonials.modify_date desc > OFFSET 0 LIMIT 10 > > > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=224.68..224.71 rows=10 width=187) (actual > time=453.429..453.539 rows=10 loops=1) > InitPlan > -> Aggregate (cost=63.52..63.53 rows=1 width=8) (actual > time=89.268..89.271 rows=1 loops=1) > -> Index Scan using new_idx_userknown on testimonials > (cost=0.00..63.41 rows=42 width=8) (actual time=0.039..49.968 > rows=10149 loops=1) > Index Cond: ((user_id)::text = 'superman'::text) > Filter: (status = 'Y'::bpchar) > -> Sort (cost=161.16..161.26 rows=42 width=187) (actual > time=453.420..453.464 rows=10 loops=1) > Sort Key: testimonials.modify_date > -> Nested Loop Left Join (cost=0.00..160.02 rows=42 > width=187) (actual time=89.384..395.008 rows=10149 loops=1) > -> Index Scan using new_idx_userknown on testimonials > (cost=0.00..63.41 rows=42 width=171) (actual time=0.061..50.990 > rows=10149 loops=1) > Index Cond: ((user_id)::text = 'superman'::text) > Filter: (status = 'Y'::bpchar) > -> Index Scan using visitcount_pkey1 on visitcount > (cost=0.00..2.28 rows=1 width=24) (actual time=0.007..0.010 rows=1 > loops=10149) Have you analyzed these tables? The estimates and real row counts are quite different.
On Mon, Jan 26, 2009 at 2:26 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Sun, Jan 25, 2009 at 8:41 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote: > >> My query is: >> >> >> explain analyze SELECT >> testimonials.url >> ,testimonials.alias >> ,testimonials.aliasEntered >> ,testimonials.title >> ,testimonials.modify_date >> ,testimonials.id >> ,visitcount.visit_count >> ,visitcount.unique_count >> ,visitcount.modify_date >> ,coalesce( extract(epoch from now()) - extract(epoch >> from visitcount.modify_date), 0) >> ,(select count(id) from testimonials WHERE >> testimonials.user_id = 'superman' and testimonials.user_known = 1 and >> testimonials.status = 'Y' ) AS total >> FROM testimonials >> LEFT JOIN visitcount ON testimonials.id = visitcount.id >> WHERE >> testimonials.user_id = 'superman' >> and testimonials.user_known = 1 >> and testimonials.status = 'Y' >> ORDER BY testimonials.modify_date desc >> OFFSET 0 LIMIT 10 >> >> >> QUERY PLAN >> ---------------------------------------------------------------------------------------------------------------------------------------------------- >> Limit (cost=224.68..224.71 rows=10 width=187) (actual >> time=453.429..453.539 rows=10 loops=1) >> InitPlan >> -> Aggregate (cost=63.52..63.53 rows=1 width=8) (actual >> time=89.268..89.271 rows=1 loops=1) >> -> Index Scan using new_idx_userknown on testimonials >> (cost=0.00..63.41 rows=42 width=8) (actual time=0.039..49.968 >> rows=10149 loops=1) >> Index Cond: ((user_id)::text = 'superman'::text) >> Filter: (status = 'Y'::bpchar) >> -> Sort (cost=161.16..161.26 rows=42 width=187) (actual >> time=453.420..453.464 rows=10 loops=1) >> Sort Key: testimonials.modify_date >> -> Nested Loop Left Join (cost=0.00..160.02 rows=42 >> width=187) (actual time=89.384..395.008 rows=10149 loops=1) >> -> Index Scan using new_idx_userknown on testimonials >> (cost=0.00..63.41 rows=42 width=171) (actual time=0.061..50.990 >> rows=10149 loops=1) >> Index Cond: ((user_id)::text = 'superman'::text) >> Filter: (status = 'Y'::bpchar) >> -> Index Scan using visitcount_pkey1 on visitcount >> (cost=0.00..2.28 rows=1 width=24) (actual time=0.007..0.010 rows=1 >> loops=10149) > > Have you analyzed these tables? The estimates and real row counts are > quite different. > Hi Scott. Yes, there is an autovacuum on both the tables. Should i additionally do a manual vacuum too?
On Sun, Jan 25, 2009 at 11:58 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote: > Hi Scott. Yes, there is an autovacuum on both the tables. Should i > additionally do a manual vacuum too? Nah, just an analyze. what version of pgsql is this, btw?
On Mon, Jan 26, 2009 at 3:04 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: > On Sun, Jan 25, 2009 at 11:58 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote: >> Hi Scott. Yes, there is an autovacuum on both the tables. Should i >> additionally do a manual vacuum too? > > Nah, just an analyze. what version of pgsql is this, btw? > Actually both vacuum and analyze happen at regular intervals. Are they not doing their job? PG 8.2.9
On Mon, Jan 26, 2009 at 12:06 AM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote: > On Mon, Jan 26, 2009 at 3:04 PM, Scott Marlowe <scott.marlowe@gmail.com> wrote: >> On Sun, Jan 25, 2009 at 11:58 PM, Phoenix Kiula <phoenix.kiula@gmail.com> wrote: >>> Hi Scott. Yes, there is an autovacuum on both the tables. Should i >>> additionally do a manual vacuum too? >> >> Nah, just an analyze. what version of pgsql is this, btw? >> > > > Actually both vacuum and analyze happen at regular intervals. Are they > not doing their job? Hard to say. You could try increasing your stats target on the fields where the approximation is way off.
On Jan 26, 2009, at 4:41 AM, Phoenix Kiula wrote: > Appreciate any thoughts. > > My query is: > > > explain analyze SELECT > testimonials.url > ,testimonials.alias > ,testimonials.aliasEntered > ,testimonials.title > ,testimonials.modify_date > ,testimonials.id > ,visitcount.visit_count > ,visitcount.unique_count > ,visitcount.modify_date > ,coalesce( extract(epoch from now()) - extract(epoch > from visitcount.modify_date), 0) > ,(select count(id) from testimonials WHERE > testimonials.user_id = 'superman' and testimonials.user_known = 1 and > testimonials.status = 'Y' ) AS total > FROM testimonials > LEFT JOIN visitcount ON testimonials.id = visitcount.id > WHERE > testimonials.user_id = 'superman' > and testimonials.user_known = 1 > and testimonials.status = 'Y' > ORDER BY testimonials.modify_date desc > OFFSET 0 LIMIT 10 > ; > > > > > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=224.68..224.71 rows=10 width=187) (actual > time=453.429..453.539 rows=10 loops=1) > InitPlan > -> Aggregate (cost=63.52..63.53 rows=1 width=8) (actual > time=89.268..89.271 rows=1 loops=1) > -> Index Scan using new_idx_userknown on testimonials > (cost=0.00..63.41 rows=42 width=8) (actual time=0.039..49.968 > rows=10149 loops=1) > Index Cond: ((user_id)::text = 'superman'::text) > Filter: (status = 'Y'::bpchar) > -> Sort (cost=161.16..161.26 rows=42 width=187) (actual > time=453.420..453.464 rows=10 loops=1) > Sort Key: testimonials.modify_date > -> Nested Loop Left Join (cost=0.00..160.02 rows=42 > width=187) (actual time=89.384..395.008 rows=10149 loops=1) > -> Index Scan using new_idx_userknown on testimonials > (cost=0.00..63.41 rows=42 width=171) (actual time=0.061..50.990 > rows=10149 loops=1) > Index Cond: ((user_id)::text = 'superman'::text) > Filter: (status = 'Y'::bpchar) > -> Index Scan using visitcount_pkey1 on visitcount > (cost=0.00..2.28 rows=1 width=24) (actual time=0.007..0.010 rows=1 > loops=10149) > Index Cond: (testimonials.id = visitcount.id) > Total runtime: 461. > 682 ms > (15 rows) Does that query plan look any better without the select count(id) from testimonials? If so you may be better off keeping track of those counts in a separate table updated by triggers on the testimonials table. Whether that really helps depends on how variable your selectors are to determine those counts. If those counts are generally very low the benefit will probably be minimal. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,497f5466747032672819277!
On Wed, Jan 28, 2009 at 2:37 AM, Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote: > > Does that query plan look any better without the select count(id) from > testimonials? > > If so you may be better off keeping track of those counts in a separate > table updated by triggers on the testimonials table. Whether that really > helps depends on how variable your selectors are to determine those counts. > If those counts are generally very low the benefit will probably be minimal. > Thanks Alban. We have now made all the triggers and such. That part is working. I suppose not having the count(id) is helping just with a few seconds, but the query is still taking about 15 seconds in some cases. Here are the query and its exec plan again fyi. Any other ideas for tweaking? explain analyze SELECT testimonials.url ,testimonials.alias ,testimonials.aliasEntered ,testimonials.title ,testimonials.modify_date ,testimonials.id ,visitcount.visit_count ,visitcount.unique_count ,visitcount.modify_date ,coalesce( extract(epoch from now()) - extract(epoch from visitcount.modify_date), 0) ,(select count(id) from testimonials WHERE testimonials.user_id = 'superman' and testimonials.user_known = 1 and testimonials.status = 'Y' ) AS total FROM testimonials LEFT JOIN visitcount ON testimonials.id = visitcount.id WHERE testimonials.user_id = 'superman' and testimonials.user_known = 1 and testimonials.status = 'Y' ORDER BY testimonials.modify_date desc OFFSET 0 LIMIT 10 ; QUERY PLAN ---------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=224.68..224.71 rows=10 width=187) (actual time=453.429..453.539 rows=10 loops=1) InitPlan -> Aggregate (cost=63.52..63.53 rows=1 width=8) (actual time=89.268..89.271 rows=1 loops=1) -> Index Scan using new_idx_userknown on testimonials (cost=0.00..63.41 rows=42 width=8) (actual time=0.039..49.968 rows=10149 loops=1) Index Cond: ((user_id)::text = 'superman'::text) Filter: (status = 'Y'::bpchar) -> Sort (cost=161.16..161.26 rows=42 width=187) (actual time=453.420..453.464 rows=10 loops=1) Sort Key: testimonials.modify_date -> Nested Loop Left Join (cost=0.00..160.02 rows=42 width=187) (actual time=89.384..395.008 rows=10149 loops=1) -> Index Scan using new_idx_userknown on testimonials (cost=0.00..63.41 rows=42 width=171) (actual time=0.061..50.990 rows=10149 loops=1) Index Cond: ((user_id)::text = 'superman'::text) Filter: (status = 'Y'::bpchar) -> Index Scan using visitcount_pkey1 on visitcount (cost=0.00..2.28 rows=1 width=24) (actual time=0.007..0.010 rows=1 loops=10149) Index Cond: (testimonials.id = visitcount.id) Total runtime: 461. 682 ms (15 rows)
On Jan 28, 2009, at 1:38 PM, Phoenix Kiula wrote: > Thanks Alban. We have now made all the triggers and such. That part is > working. I suppose not having the count(id) is helping just with a few > seconds, but the query is still taking about 15 seconds in some cases. > > Here are the query and its exec plan again fyi. Any other ideas for > tweaking? Ah I see, that's the original query and its plan again, not the one after implementing those triggers! You had me scratching my head for a bit there, wondering why the count() subquery was still there. A few things in this query appear to take relatively much time: - The index scans on new_idx_userknown; What's worrying there is that the planner expects only a few rows (42) while in actuality they are quite many (10149). This scan is performed twice too! It seems that the statistics that index uses are off. That may mean changing the statistics on the columns involved or increasing the frequency that autovacuum visits them. - The nested loop left join is expensive; That's probably also due to the incorrect assumptions the planner makes about the index scans I mentioned above. It expects to have to loop 42 times, but ends up doing so 10149 times instead! I believe loops aren't particularly efficient, they'll only beat other methods if there are few rows to loop through. The loop is taking 395-89 = 306 ms for 10149 rows, while the planner expected it to take 306 * (42/10149) = 1.3 ms. Quite a difference! You probably need to do something about new_idx_userknown. A partial index (as suggested elsewhere) may help make it smaller (easier to fit in RAM, fewer branches required to find a node), but the bad statistics are likely to be the real problem here. Without knowing anything about that particular index and the tables it's indexing it's hard to tell how to improve it. > explain analyze SELECT > testimonials.url > ,testimonials.alias > ,testimonials.aliasEntered > ,testimonials.title > ,testimonials.modify_date > ,testimonials.id > ,visitcount.visit_count > ,visitcount.unique_count > ,visitcount.modify_date > ,coalesce( extract(epoch from now()) - extract(epoch > from visitcount.modify_date), 0) > ,(select count(id) from testimonials WHERE > testimonials.user_id = 'superman' and testimonials.user_known = 1 and > testimonials.status = 'Y' ) AS total > FROM testimonials > LEFT JOIN visitcount ON testimonials.id = visitcount.id > WHERE > testimonials.user_id = 'superman' > and testimonials.user_known = 1 > and testimonials.status = 'Y' > ORDER BY testimonials.modify_date desc > OFFSET 0 LIMIT 10 > ; > > > > > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=224.68..224.71 rows=10 width=187) (actual > time=453.429..453.539 rows=10 loops=1) > InitPlan > -> Aggregate (cost=63.52..63.53 rows=1 width=8) (actual > time=89.268..89.271 rows=1 loops=1) > -> Index Scan using new_idx_userknown on testimonials > (cost=0.00..63.41 rows=42 width=8) (actual time=0.039..49.968 > rows=10149 loops=1) > Index Cond: ((user_id)::text = 'superman'::text) > Filter: (status = 'Y'::bpchar) > -> Sort (cost=161.16..161.26 rows=42 width=187) (actual > time=453.420..453.464 rows=10 loops=1) > Sort Key: testimonials.modify_date > -> Nested Loop Left Join (cost=0.00..160.02 rows=42 > width=187) (actual time=89.384..395.008 rows=10149 loops=1) > -> Index Scan using new_idx_userknown on testimonials > (cost=0.00..63.41 rows=42 width=171) (actual time=0.061..50.990 > rows=10149 loops=1) > Index Cond: ((user_id)::text = 'superman'::text) > Filter: (status = 'Y'::bpchar) > -> Index Scan using visitcount_pkey1 on visitcount > (cost=0.00..2.28 rows=1 width=24) (actual time=0.007..0.010 rows=1 > loops=10149) > Index Cond: (testimonials.id = visitcount.id) > Total runtime: 461. > 682 ms > (15 rows) > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > > > > Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4980a309747032541118883!
On Thu, Jan 29, 2009 at 2:25 AM, Alban Hertroys <dalroi@solfertje.student.utwente.nl> wrote: > Ah I see, that's the original query and its plan again, not the one after > implementing those triggers! You had me scratching my head for a bit there, > wondering why the count() subquery was still there. Yes that was just for info. Here are the new query without the count() in there: explain analyze SELECT testimonials.url ,testimonials.alias ,testimonials.aliasEntered ,testimonials.title ,testimonials.modify_date ,testimonials.id ,visitcount.visit_count ,visitcount.unique_count ,visitcount.modify_date ,coalesce( extract(epoch from now()) - extract(epoch from visitcount.modify_date), 0) FROM testimonials LEFT OUTER JOIN visitcount USING (id) WHERE testimonials.user_id = 'superman' and testimonials.user_known = 1 and testimonials.status = 'Y' ORDER BY testimonials.modify_date desc OFFSET 0 LIMIT 10 ; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Limit (cost=61.42..61.44 rows=10 width=162) (actual time=105.400..105.499 rows=10 loops=1) -> Sort (cost=61.42..61.46 rows=16 width=162) (actual time=105.392..105.425 rows=10 loops=1) Sort Key: testimonials.modify_date -> Nested Loop Left Join (cost=0.00..61.10 rows=16 width=162) (actual time=0.092..94.516 rows=2027 loops=1) -> Index Scan using new_idx_userknown on testimonials (cost=0.00..24.29 rows=16 width=146) (actual time=0.058..10.983 rows=2027 loops=1) Index Cond: ((user_id)::text = 'superman'::text) Filter: (status = 'Y'::bpchar) -> Index Scan using visitcount_pkey1 on visitcount (cost=0.00..2.28 rows=1 width=24) (actual time=0.024..0.026 rows=1 loops=2027) Index Cond: (testimonials.id = visitcount.id) Total runtime: 105.652 ms (10 rows) Note that I have an index on user_id, but because this is a website, there are several user_ids where we only have the IP. The above query is only ever needed for registered users, so for just the registered users we created another partial index called "new_idx_userknown" btree (user_id) WHERE user_known = 1 Of course for unregistered users we use user_known = 0, so they are excluded from this index. Is this not a useful partial index? I think in this SQL, the user_id is always "superman" and the user_known always 1 which is why the guesstimate from the planner may be off? Love to hear thoughts. THANKS!
On Jan 29, 2009, at 1:35 AM, Phoenix Kiula wrote: > On Thu, Jan 29, 2009 at 2:25 AM, Alban Hertroys > <dalroi@solfertje.student.utwente.nl> wrote: > >> Ah I see, that's the original query and its plan again, not the one >> after >> implementing those triggers! You had me scratching my head for a >> bit there, >> wondering why the count() subquery was still there. > > Yes that was just for info. Here are the new query without the count() > in there: > > > explain analyze SELECT > testimonials.url > ,testimonials.alias > ,testimonials.aliasEntered > ,testimonials.title > ,testimonials.modify_date > ,testimonials.id > ,visitcount.visit_count > ,visitcount.unique_count > ,visitcount.modify_date > ,coalesce( extract(epoch from now()) - extract(epoch from > visitcount.modify_date), 0) > FROM testimonials > LEFT OUTER JOIN visitcount USING (id) > WHERE > testimonials.user_id = 'superman' > and testimonials.user_known = 1 > and testimonials.status = 'Y' > ORDER BY testimonials.modify_date desc > OFFSET 0 LIMIT 10 > ; > > > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------------------------- > Limit (cost=61.42..61.44 rows=10 width=162) (actual > time=105.400..105.499 rows=10 loops=1) > -> Sort (cost=61.42..61.46 rows=16 width=162) (actual > time=105.392..105.425 rows=10 loops=1) > Sort Key: testimonials.modify_date > -> Nested Loop Left Join (cost=0.00..61.10 rows=16 > width=162) (actual time=0.092..94.516 rows=2027 loops=1) > -> Index Scan using new_idx_userknown on testimonials > (cost=0.00..24.29 rows=16 width=146) (actual time=0.058..10.983 > rows=2027 loops=1) > Index Cond: ((user_id)::text = 'superman'::text) > Filter: (status = 'Y'::bpchar) > -> Index Scan using visitcount_pkey1 on visitcount > (cost=0.00..2.28 rows=1 width=24) (actual time=0.024..0.026 rows=1 > loops=2027) > Index Cond: (testimonials.id = visitcount.id) > Total runtime: 105.652 ms > (10 rows) > > Note that I have an index on user_id, but because this is a website, > there are several user_ids where we only have the IP. The above query > is only ever needed for registered users, so for just the registered > users we created another partial index called > > "new_idx_userknown" btree (user_id) WHERE user_known = 1 > > Of course for unregistered users we use user_known = 0, so they are > excluded from this index. Is this not a useful partial index? I think > in this SQL, the user_id is always "superman" and the user_known > always 1 which is why the guesstimate from the planner may be off? > > Love to hear thoughts. Well, that seems to have got you rid of the somewhat expensive index scans on new_idx_userknown as well (the duplicate entry for the scan being due to the subquery of course). What's remaining is the left join. If I understand correctly you have a PK on visitcount.id and that table only contains records for people who have a visitcount > 0? That table gets updated a lot I'd think? The query plan still shows a bad estimate on that join; it has improved, but not enough. Does the plan look better right after you ANALYSE visitcount? I'm suspecting you either need to autovacuum visitcount more frequently or you need to increase the statistics size on visitcount.id. You're updating that table a lot I think, which creates one new dead row for every update. Letting vacuum mark the dead ones as reusable more frequently should also help keep that table and it's indexes cleaner, although the records the indexes are pointing to will be all over the place. I'm wondering... In highly updated tables it's probably more efficient to leave the dead rows alone (just marking them dead) and only append the updated ones at the end of the table? The dead rows will accumulate at the start of the table while the new ones go to the end. After a while a large section of the start of the table could just be removed as it'd only contain dead rows... This may already be in place of course, I don't have time now to look into the design specifics and it seems kind of an obvious thing to do! Regards, Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,49816122747034095710041!
Phoenix Kiula <phoenix.kiula@gmail.com> writes: > Index Cond: ((user_id)::text = 'superman'::text) > Filter: (status = 'Y'::bpchar) > Of course for unregistered users we use user_known = 0, so they are > excluded from this index. Is this not a useful partial index? I think > in this SQL, the user_id is always "superman" and the user_known > always 1 which is why the guesstimate from the planner may be off? Well the histograms are for each column separately, so the planner will take the selectivity estimates for user_id='superman' and status = 'Y' and multiply them. If the "status" of 'superman' records are very different from the status records as a whole then this will give poor results. If that's not the case then raising the statistics target for those two columns might help. And of course if the table hasn't been analyzed recently then analyzing it more often is always good. There isn't really a good solution for cross-column stats. You could perhaps create a functional index (could still be partial too) on an expression like CASE WHEN status = 'Y' THEN superman Which will make Postgres build stats for the result of that expression specifically. Then if you use that expression exactly as-is in the query the planner should those statistics. I think. I haven't tried this... Tell us how it goes :) I wonder if we should look at building "partial" histograms for the columns in partial indexes effectively equivalent to this... hm... -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Get trained by Bruce Momjian - ask me about EnterpriseDB's PostgreSQL training!