Обсуждение: Slow Query- Bad Row Estimate
Unfortunately I have not received a response on this question. Is more information needed? Does anyone have any ideas why the estimates may be bad? Or what I might be able to do to speed this up?
thanks
From: Ozer, Pam
Sent: Tuesday, October 26, 2010 4:27 PM
To: 'pgsql-performance@postgresql.org'
Subject: Slow Query- Bad Row Estimate
I have the following query:
select distinct Region.RegionShort as RegionShort
,County.County as County
from Region
join PostalCodeRegionCountyCity on (PostalCodeRegionCountyCity.RegionId=Region.RegionId)
join DealerGroupGeoCache on (DealerGroupGeoCache.RegionId=PostalCodeRegionCountyCity.RegionId)
and (DealerGroupGeoCache.CountyId=PostalCodeRegionCountyCity.CountyId)
and (DealerGroupGeoCache.CityId=PostalCodeRegionCountyCity.CityId)
join County on (PostalCodeRegionCountyCity.CountyId=County.CountyId)
where (DealerGroupGeoCache.DealerGroupId=13) and (PostalCodeRegionCountyCity.RegionId=5)
With the following Explain:
"HashAggregate (cost=6743.96..6747.36 rows=34 width=11) (actual time=854.407..854.425 rows=57 loops=1)"
" -> Nested Loop (cost=0.00..6743.28 rows=34 width=11) (actual time=0.062..762.698 rows=163491 loops=1)"
" -> Nested Loop (cost=0.00..6679.19 rows=34 width=11) (actual time=0.053..260.001 rows=163491 loops=1)"
" -> Index Scan using region_i00 on region (cost=0.00..3.36 rows=1 width=5) (actual time=0.009..0.011 rows=1 loops=1)"
" Index Cond: (regionid = 5)"
" -> Merge Join (cost=0.00..6672.43 rows=34 width=10) (actual time=0.040..189.654 rows=163491 loops=1)"
" Merge Cond: ((postalcoderegioncountycity.countyid = dealergroupgeocache.countyid) AND (postalcoderegioncountycity.cityid = dealergroupgeocache.cityid))"
" -> Index Scan using postalcoderegioncountycity_i06 on postalcoderegioncountycity (cost=0.00..716.05 rows=2616 width=10) (actual time=0.018..1.591 rows=2615 loops=1)"
" Index Cond: (regionid = 5)"
" -> Index Scan using dealergroupgeocache_i01 on dealergroupgeocache (cost=0.00..5719.56 rows=9055 width=10) (actual time=0.015..87.689 rows=163491 loops=1)"
" Index Cond: ((dealergroupgeocache.dealergroupid = 13) AND (dealergroupgeocache.regionid = 5))"
" -> Index Scan using county_i00 on county (cost=0.00..1.77 rows=1 width=12) (actual time=0.002..0.002 rows=1 loops=163491)"
" Index Cond: (county.countyid = dealergroupgeocache.countyid)"
"Total runtime: 854.513 ms"
The statistics have been recently updated and it does not change the bad estimates.
The DealerGroupGeoCache Table has 765392 Rows, And the query returns 57 rows.
I am not at all involved in the way the server is set up so being able to change the settings is not very likely unless it will make a huge difference.
Is there any way for me to speed up this query without changing the settings?
If not what would you think the changes that would be needed?
We are currently running Postgres8.4 with the following settings.
shared_buffers = 500MB # min 128kB
effective_cache_size = 1000MB
max_connections = 100
temp_buffers = 100MB
work_mem = 100MB
maintenance_work_mem = 500MB
max_files_per_process = 10000
seq_page_cost = 1.0
random_page_cost = 1.1
cpu_tuple_cost = 0.1
cpu_index_tuple_cost = 0.05
cpu_operator_cost = 0.01
default_statistics_target = 1000
autovacuum_max_workers = 1
#log_min_messages = DEBUG1
#log_min_duration_statement = 1000
#log_statement = all
#log_temp_files = 128
#log_lock_waits = on
#log_line_prefix = '%m %u %d %h %p %i %c %l %s'
#log_duration = on
#debug_print_plan = on
Any help is appreciated,
Pam
On 10/29/10 1:54 PM, Ozer, Pam wrote: > " -> Index Scan using dealergroupgeocache_i01 on > dealergroupgeocache (cost=0.00..5719.56 rows=9055 width=10) (actual > time=0.015..87.689 rows=163491 loops=1)" This appears to be your problem here. a) when was dealergroupgeocache last ANALYZED? b) try increasing the stats_target on dealergroupid and regionid, to say 500 and re-analyzing. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
"Ozer, Pam" <pozer@automotive.com> writes: > Unfortunately I have not received a response on this question. Is more > information needed? Does anyone have any ideas why the estimates may be > bad? Or what I might be able to do to speed this up? The most likely explanation for the bad rowcount estimates is that there is correlation between the regionid/countyid/cityid columns, only the planner doesn't know it. Can you reformulate that data representation at all, or at least avoid depending on it as a join key? regards, tom lane
"Ozer, Pam" <pozer@automotive.com> wrote: > Is more information needed? Table layouts of the tables involved (including indexes) would be interesting. A description of the machine would be useful, including OS, CPUs, RAM, and disk system. I know you said you might have trouble changing the config, but some of these seem problematic. > shared_buffers = 500MB > effective_cache_size = 1000MB > max_connections = 100 > temp_buffers = 100MB So you will allow up to 10GB to be tied up in space reserved for temporary tables, but only expect to cache 1GB of your database? That hardly seems optimal. > work_mem = 100MB That could be another 10GB or more in work memory at any moment, if each connection was running a query which needed one work_mem allocation. > seq_page_cost = 1.0 > random_page_cost = 1.1 > cpu_tuple_cost = 0.1 > cpu_index_tuple_cost = 0.05 > cpu_operator_cost = 0.01 Those settings are OK if the active portion of the database is fully cached. Is it? > default_statistics_target = 1000 If plan times get long with complex queries, you might want to back that off; otherwise, OK. > autovacuum_max_workers = 1 That seems like a bad idea. Allowing multiple workers helps reduce bloat and improve statistics. If autovacuum is affecting performance, you would be better off tweaking the autovacuum cost limits. -Kevin
I am not sure what you mean by reformulate the data representation. Do you mean do I have to join on all three columns? -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Friday, October 29, 2010 2:18 PM To: Ozer, Pam Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Slow Query- Bad Row Estimate "Ozer, Pam" <pozer@automotive.com> writes: > Unfortunately I have not received a response on this question. Is more > information needed? Does anyone have any ideas why the estimates may be > bad? Or what I might be able to do to speed this up? The most likely explanation for the bad rowcount estimates is that there is correlation between the regionid/countyid/cityid columns, only the planner doesn't know it. Can you reformulate that data representation at all, or at least avoid depending on it as a join key? regards, tom lane
I had just analyzed the dealergroupgeochache table. Wow. Thank you. That did the trick. Can you give me an explanation ofthe default_stats work? I don't think I completely understand what it means when you set it to 500 instead of 1000? thanks -----Original Message----- From: pgsql-performance-owner@postgresql.org [mailto:pgsql-performance-owner@postgresql.org] On Behalf Of Josh Berkus Sent: Friday, October 29, 2010 2:10 PM To: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Slow Query- Bad Row Estimate On 10/29/10 1:54 PM, Ozer, Pam wrote: > " -> Index Scan using dealergroupgeocache_i01 on > dealergroupgeocache (cost=0.00..5719.56 rows=9055 width=10) (actual > time=0.015..87.689 rows=163491 loops=1)" This appears to be your problem here. a) when was dealergroupgeocache last ANALYZED? b) try increasing the stats_target on dealergroupid and regionid, to say 500 and re-analyzing. -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
"Ozer, Pam" <pozer@automotive.com> writes: > I am not sure what you mean by reformulate the data representation. Do > you mean do I have to join on all three columns? No, I was wondering if you could change things so that you join on just one column, instead of two that each tell part of the truth. BTW, did you check your current statistics target? If it's small then raising it might possibly fix the problem by itself. regards, tom lane
On 10/29/10 2:47 PM, Ozer, Pam wrote: > I had just analyzed the dealergroupgeochache table. Wow. Thank you. That did the trick. Can you give me an explanationof the default_stats work? I don't think I completely understand what it means when you set it to 500 insteadof 1000? You're already at 1000? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
Yes. The default statistics target was at 1000. So that would be what the column was using correct? -----Original Message----- From: Josh Berkus [mailto:josh@agliodbs.com] Sent: Friday, October 29, 2010 2:55 PM To: Ozer, Pam Cc: pgsql-performance@postgresql.org Subject: Re: [PERFORM] Slow Query- Bad Row Estimate On 10/29/10 2:47 PM, Ozer, Pam wrote: > I had just analyzed the dealergroupgeochache table. Wow. Thank you. That did the trick. Can you give me an explanationof the default_stats work? I don't think I completely understand what it means when you set it to 500 insteadof 1000? You're already at 1000? -- -- Josh Berkus PostgreSQL Experts Inc. http://www.pgexperts.com
"Ozer, Pam" <pozer@automotive.com> writes: > Yes. The default statistics target was at 1000. So that would be what the column was using correct? But you evidently didn't have stats. Perhaps you have autovacuum turned off? What PG version is this anyway? regards, tom lane
Its 8.4. On the column stats_target=-1 before I changed it. AutoVacuum is set to on. I actually did a full analyze of the database and then ran it again. So what am I missing? -----Original Message----- From: Tom Lane [mailto:tgl@sss.pgh.pa.us] Sent: Friday, October 29, 2010 3:03 PM To: Ozer, Pam Cc: Josh Berkus; pgsql-performance@postgresql.org Subject: Re: [PERFORM] Slow Query- Bad Row Estimate "Ozer, Pam" <pozer@automotive.com> writes: > Yes. The default statistics target was at 1000. So that would be what the column was using correct? But you evidently didn't have stats. Perhaps you have autovacuum turned off? What PG version is this anyway? regards, tom lane