Обсуждение: Wrong plan or what ?
Hi all,
I'm running Postgres7.3.3 and I'm performing this simple select:
select *
from user_logs ul,
user_data ud,
class_default cd
where
ul.id_user = ud.id_user and
ud.id_class = cd.id_class and
cd.id_provider = 39;
these are the number of rows for each table:
user_logs: 1258955
class_default: 31 ( only one with id_provider = 39 )
user_data: 10274;
this is the explain analyze for that query:
QUERY PLAN
Hash Join (cost=265.64..32000.76 rows=40612 width=263) (actual
time=11074.21..11134.28 rows=10 loops=1)
Hash Cond: ("outer".id_user = "inner".id_user)
-> Seq Scan on user_logs ul (cost=0.00..24932.65 rows=1258965 width=48)
(actual time=0.02..8530.21 rows=1258966 loops=1)
-> Hash (cost=264.81..264.81 rows=331 width=215) (actual
time=30.22..30.22 rows=0 loops=1)
-> Nested Loop (cost=0.00..264.81 rows=331 width=215) (actual
time=29.95..30.20 rows=6 loops=1)
-> Seq Scan on class_default cd (cost=0.00..1.39 rows=1
width=55) (actual time=0.08..0.10 rows=1 loops=1)
Filter: (id_provider = 39)
-> Index Scan using idx_user_data_class on user_data ud
(cost=0.00..258.49 rows=395 width=160) (actual time=29.82..29.96 rows=6
loops=1)
Index Cond: (ud.id_class = "outer".id_class)
Total runtime: 11135.65 msec
(10 rows)
I'm able to performe that select with these 3 steps:
SELECT id_class from class_default where id_provider = 39;
id_class
----------
48
(1 row)
SELECT id_user from user_data where id_class in ( 48 );
id_user
---------
10943
10942
10934
10927
10910
10909
(6 rows)
SELECT * from user_logs where id_user in (
10943, 10942, 10934, 10927, 10910, 10909
);
[SNIPPED]
and the time ammount is a couple of milliseconds.
Why the planner or the executor ( I don't know ) do not follow
the same strategy ?
Thank you
Gaetano Mendola
"Josh Berkus" <josh@agliodbs.com> > Gaetano, > > > SELECT * from user_logs where id_user in ( > > 10943, 10942, 10934, 10927, 10910, 10909 > > ); > > [SNIPPED] > > > Why the planner or the executor ( I don't know ) do not follow > > the same strategy ? > > It is, actually, according to the query plan. > > Can you post the EXPLAIN ANALYZE for the above query? Index Scan using idx_user_user_logs, idx_user_user_logs, idx_user_user_logs, idx_user_user_logs, idx_user_user_logs, idx_user_user_logs on user_logs (cost=0.00..5454.21 rows=2498 width=48) (actual time=0.09..0.28 rows=10 loops=1) Index Cond: ((id_user = 10943) OR (id_user = 10942) OR (id_user = 10934) OR (id_user = 10927) OR (id_user = 10910) OR (id_user = 10909)) Total runtime: 0.41 msec (3 rows) Thank you Gaetano PS: if I execute the query I obtain 10 rows instead of 3 that say the explain analyze.
Forget my PS to last message.
"Josh Berkus" <josh@agliodbs.com>
> Gaetano,
>
> > QUERY PLAN
> > Hash Join (cost=265.64..32000.76 rows=40612 width=263) (actual
> > time=11074.21..11134.28 rows=10 loops=1)
> > Hash Cond: ("outer".id_user = "inner".id_user)
> > -> Seq Scan on user_logs ul (cost=0.00..24932.65 rows=1258965
width=48)
> > (actual time=0.02..8530.21 rows=1258966 loops=1)
>
> OK, here's your problem
>
> The planner thinks that you're going to get 40162 rows out of the final
join,
> not 10. If the row estimate was correct, then the Seq Scan would be a
> reasonable plan. But it's not. Here's some steps you can take to clear
> things up for the planner:
>
> 1) Make sure you've VACUUM ANALYZED
> 2) Adjust the following postgresql.conf statistics:
> a) effective_cache_size: increase to 70% of available (not used by other
> processes) RAM.
> b) random_page_cost: decrease, maybe to 2.
> c) default_statistics_target: try increasing to 100
> (warning: this will significantly increase the time required to do
ANALYZE)
>
> Then test again!
No improvement at all,
I pushed default_statistics_target to 1000
but the rows expected are still 40612 :-(
Of course I restarted the postmaster and I vacuumed analyze the DB
Thank you
Gaetano
Gaetano, > SELECT * from user_logs where id_user in ( > 10943, 10942, 10934, 10927, 10910, 10909 > ); > [SNIPPED] > Why the planner or the executor ( I don't know ) do not follow > the same strategy ? It is, actually, according to the query plan. Can you post the EXPLAIN ANALYZE for the above query? -- -Josh Berkus Aglio Database Solutions San Francisco
In response to "Mendola Gaetano":
> I'm running Postgres7.3.3 and I'm performing this simple select:
Looking at your fast three step plan
> SELECT id_class from class_default where id_provider = 39;
> SELECT id_user from user_data where id_class in ( 48 );
> SELECT * from user_logs where id_user in (
> 10943, 10942, 10934, 10927, 10910, 10909 );
I'ld stem for reordering the from and where clauses alike:
select *
from
class_default cd,
user_data ud,
user_logs ul
where
cd.id_provider = 39 and
ud.id_class = cd.id_class and
ul.id_user = ud.id_user;
Personally I dislike implied joins and rather go for _about_ this:
select *
from
( class_default cd
LEFT JOIN user_data ud ON ud.id_class = cd.id_class )
LEFT JOIN user_logs ul ON ul.id_user = ud.id_user,
where
cd.id_provider = 39;
Good luck,
HansH
""HansH"" <hartenhans@op.het.net>
> In response to "Mendola Gaetano":
> > I'm running Postgres7.3.3 and I'm performing this simple select:
>
> Looking at your fast three step plan
> > SELECT id_class from class_default where id_provider = 39;
> > SELECT id_user from user_data where id_class in ( 48 );
> > SELECT * from user_logs where id_user in (
> > 10943, 10942, 10934, 10927, 10910, 10909 );
> I'ld stem for reordering the from and where clauses alike:
> select *
> from
> class_default cd,
> user_data ud,
> user_logs ul
> where
> cd.id_provider = 39 and
> ud.id_class = cd.id_class and
> ul.id_user = ud.id_user;
still wrong:
Hash Join (cost=267.10..32994.34 rows=41881 width=264) (actual
time=6620.17..6847.20 rows=94 loops=1)
Hash Cond: ("outer".id_user = "inner".id_user)
-> Seq Scan on user_logs ul (cost=0.00..25712.15 rows=1298315 width=48)
(actual time=0.01..5381.69 rows=1298351 loops=1)
-> Hash (cost=266.25..266.25 rows=339 width=216) (actual
time=0.89..0.89 rows=0 loops=1)
-> Nested Loop (cost=0.00..266.25 rows=339 width=216) (actual
time=0.16..0.83 rows=21 loops=1)
-> Seq Scan on class_default cd (cost=0.00..1.39 rows=1
width=55) (actual time=0.08..0.09 rows=1 loops=1)
Filter: (id_provider = 39)
-> Index Scan using idx_user_data_class on user_data ud
(cost=0.00..260.00 rows=389 width=161) (actual time=0.06..0.40 rows=21
loops=1)
Index Cond: (ud.id_class = "outer".id_class)
Total runtime: 6847.60 msec
(10 rows)
the returned are 94.
> Personally I dislike implied joins and rather go for _about_ this:
> select *
> from
> ( class_default cd
> LEFT JOIN user_data ud ON ud.id_class = cd.id_class )
> LEFT JOIN user_logs ul ON ul.id_user = ud.id_user,
> where
> cd.id_provider = 39;
worst:
Merge Join (cost=280.48..55717.14 rows=41881 width=264) (actual
time=18113.64..18182.94 rows=105 loops=1)
Merge Cond: ("outer".id_user = "inner".id_user)
-> Index Scan using idx_user_user_logs on user_logs ul
(cost=0.00..51665.66 rows=1298315 width=48) (actual time=10.78..15459.37
rows=1298354 loops=1)
-> Sort (cost=280.48..281.33 rows=339 width=216) (actual
time=1.11..1.20 rows=105 loops=1)
Sort Key: ud.id_user
-> Nested Loop (cost=0.00..266.25 rows=339 width=216) (actual
time=0.14..0.82 rows=21 loops=1)
-> Seq Scan on class_default cd (cost=0.00..1.39 rows=1
width=55) (actual time=0.07..0.07 rows=1 loops=1)
Filter: (id_provider = 39)
-> Index Scan using idx_user_data_class on user_data ud
(cost=0.00..260.00 rows=389 width=161) (actual time=0.05..0.39 rows=21
loops=1)
Index Cond: (ud.id_class = "outer".id_class)
Total runtime: 18185.61 msec
:-(
thank you anyway.
Gaetano