Обсуждение: Wrong plan or what ?
Hi all,
I'm running Postgres7.3.3 and I'm performing this simple select:
select *
from user_logs ul,
user_data ud,
class_default cd
where
ul.id_user = ud.id_user and
ud.id_class = cd.id_class and
cd.id_provider = 39;
these are the number of rows for each table:
user_logs: 1258955
class_default: 31 ( only one with id_provider = 39 )
user_data: 10274;
this is the explain analyze for that query:
QUERY PLAN
Hash Join (cost=265.64..32000.76 rows=40612 width=263) (actual
time=11074.21..11134.28 rows=10 loops=1)
Hash Cond: ("outer".id_user = "inner".id_user)
-> Seq Scan on user_logs ul (cost=0.00..24932.65 rows=1258965 width=48)
(actual time=0.02..8530.21 rows=1258966 loops=1)
-> Hash (cost=264.81..264.81 rows=331 width=215) (actual
time=30.22..30.22 rows=0 loops=1)
-> Nested Loop (cost=0.00..264.81 rows=331 width=215) (actual
time=29.95..30.20 rows=6 loops=1)
-> Seq Scan on class_default cd (cost=0.00..1.39 rows=1
width=55) (actual time=0.08..0.10 rows=1 loops=1)
Filter: (id_provider = 39)
-> Index Scan using idx_user_data_class on user_data ud
(cost=0.00..258.49 rows=395 width=160) (actual time=29.82..29.96 rows=6
loops=1)
Index Cond: (ud.id_class = "outer".id_class)
Total runtime: 11135.65 msec
(10 rows)
I'm able to performe that select with these 3 steps:
SELECT id_class from class_default where id_provider = 39;
id_class
----------
48
(1 row)
SELECT id_user from user_data where id_class in ( 48 );
id_user
---------
10943
10942
10934
10927
10910
10909
(6 rows)
SELECT * from user_logs where id_user in (
10943, 10942, 10934, 10927, 10910, 10909
);
[SNIPPED]
and the time ammount is a couple of milliseconds.
Why the planner or the executor ( I don't know ) do not follow
the same strategy ?
Thank you
Gaetano Mendola
Gaetano, > SELECT * from user_logs where id_user in ( > 10943, 10942, 10934, 10927, 10910, 10909 > ); > [SNIPPED] > Why the planner or the executor ( I don't know ) do not follow > the same strategy ? It is, actually, according to the query plan. Can you post the EXPLAIN ANALYZE for the above query? -- -Josh Berkus Aglio Database Solutions San Francisco
"Josh Berkus" <josh@agliodbs.com> > Gaetano, > > > SELECT * from user_logs where id_user in ( > > 10943, 10942, 10934, 10927, 10910, 10909 > > ); > > [SNIPPED] > > > Why the planner or the executor ( I don't know ) do not follow > > the same strategy ? > > It is, actually, according to the query plan. > > Can you post the EXPLAIN ANALYZE for the above query? Index Scan using idx_user_user_logs, idx_user_user_logs, idx_user_user_logs, idx_user_user_logs, idx_user_user_logs, idx_user_user_logs on user_logs (cost=0.00..5454.21 rows=2498 width=48) (actual time=0.09..0.28 rows=10 loops=1) Index Cond: ((id_user = 10943) OR (id_user = 10942) OR (id_user = 10934) OR (id_user = 10927) OR (id_user = 10910) OR (id_user = 10909)) Total runtime: 0.41 msec (3 rows) Thank you Gaetano PS: if I execute the query I obtain 10 rows instead of 3 that say the explain analyze.
Forget my PS to last message.
Gaetano,
> QUERY PLAN
> Hash Join (cost=265.64..32000.76 rows=40612 width=263) (actual
> time=11074.21..11134.28 rows=10 loops=1)
> Hash Cond: ("outer".id_user = "inner".id_user)
> -> Seq Scan on user_logs ul (cost=0.00..24932.65 rows=1258965 width=48)
> (actual time=0.02..8530.21 rows=1258966 loops=1)
OK, here's your problem
The planner thinks that you're going to get 40162 rows out of the final join,
not 10. If the row estimate was correct, then the Seq Scan would be a
reasonable plan. But it's not. Here's some steps you can take to clear
things up for the planner:
1) Make sure you've VACUUM ANALYZED
2) Adjust the following postgresql.conf statistics:
a) effective_cache_size: increase to 70% of available (not used by other
processes) RAM.
b) random_page_cost: decrease, maybe to 2.
c) default_statistics_target: try increasing to 100
(warning: this will significantly increase the time required to do ANALYZE)
Then test again!
--
-Josh Berkus
Aglio Database Solutions
San Francisco
"Josh Berkus" <josh@agliodbs.com>
> Gaetano,
>
> > QUERY PLAN
> > Hash Join (cost=265.64..32000.76 rows=40612 width=263) (actual
> > time=11074.21..11134.28 rows=10 loops=1)
> > Hash Cond: ("outer".id_user = "inner".id_user)
> > -> Seq Scan on user_logs ul (cost=0.00..24932.65 rows=1258965
width=48)
> > (actual time=0.02..8530.21 rows=1258966 loops=1)
>
> OK, here's your problem
>
> The planner thinks that you're going to get 40162 rows out of the final
join,
> not 10. If the row estimate was correct, then the Seq Scan would be a
> reasonable plan. But it's not. Here's some steps you can take to clear
> things up for the planner:
>
> 1) Make sure you've VACUUM ANALYZED
> 2) Adjust the following postgresql.conf statistics:
> a) effective_cache_size: increase to 70% of available (not used by other
> processes) RAM.
> b) random_page_cost: decrease, maybe to 2.
> c) default_statistics_target: try increasing to 100
> (warning: this will significantly increase the time required to do
ANALYZE)
>
> Then test again!
No improvement at all,
I pushed default_statistics_target to 1000
but the rows expected are still 40612 :-(
Of course I restarted the postmaster and I vacuumed analyze the DB
Thank you
Gaetano