Re: Plan for relatively simple query seems to be very inefficient
| От | Dave Held |
|---|---|
| Тема | Re: Plan for relatively simple query seems to be very inefficient |
| Дата | |
| Msg-id | 49E94D0CFCD4DB43AFBA928DDD20C8F9026184A0@asg002.asg.local обсуждение исходный текст |
| Ответ на | Plan for relatively simple query seems to be very inefficient (Arjen van der Meijden <acmmailing@vulcanus.its.tudelft.nl>) |
| Ответы |
Re: Plan for relatively simple query seems to be very inefficient
|
| Список | pgsql-performance |
> -----Original Message-----
> From: Arjen van der Meijden
> [mailto:acmmailing@vulcanus.its.tudelft.nl]
> Sent: Wednesday, April 06, 2005 11:53 AM
> To: performance pgsql
> Subject: [PERFORM] Plan for relatively simple query seems to be very
> inefficient
>
> [...]
> SELECT COUNT(*) FROM
> data_main AS dm,
> postcodes AS p
> WHERE dm.range BETWEEN p.range_from AND p.range_till
> [...]
> Aggregate (cost=332586.85..332586.85 rows=1 width=0) (actual
> time=22712.038..22712.039 rows=1 loops=1)
> -> Nested Loop (cost=3.76..328945.96 rows=1456356
> width=0) (actual
> time=0.054..22600.826 rows=82688 loops=1)
I'm still a noob at reading EXPLAIN ANALYZE, but it seems to me
that your statistics are throwing off the planner here. It
estimates 1.4M and gets 82K, so it's off by a factor of about 20.
Have you considered doing a VACUUM or upping your statistics?
> [...]
> When I do something completely bogus, which will result in
> coupling the data per record from data_main on one record from
> postcodes, it still not very fast but acceptable:
> [...]
> Aggregate (cost=10076.98..10076.98 rows=1 width=0) (actual
> time=1456.016..1456.017 rows=1 loops=1)
> -> Merge Join (cost=8636.81..9913.13 rows=65537
> width=0) (actual
> time=1058.105..1358.571 rows=81920 loops=1)
Looks like Merge Join is faster than the Nested Loop for this
query. If you notice, the row counts are a lot closer to the
estimates, too. This is probably a "good" plan.
> [...]
> Doing something similarily bogus, but with less results is
> much faster, even though it should have basically the same
> plan:
>
> SELECT COUNT(*) FROM
> data_main AS dm,
> postcodes AS p
> WHERE dm.range = p.postcode_id
> [...]
> Aggregate (cost=2138.63..2138.63 rows=1 width=0) (actual
> time=180.667..180.668 rows=1 loops=1)
> -> Hash Join (cost=4.00..2087.02 rows=20642 width=0) (actual
> time=180.645..180.645 rows=0 loops=1)
This one I don't understand at all. Clearly, the Hash Join is
the way to go, but the estimates are way off (which probably
explains why this plan isn't chosen in the first place).
> Hash Cond: ("outer".range = "inner".postcode_id)
> -> Seq Scan on data_main dm (cost=0.00..1262.20
> rows=81920
> width=2) (actual time=0.005..105.548 rows=81920 loops=1)
> -> Hash (cost=3.60..3.60 rows=160 width=2) (actual
> time=0.592..0.592 rows=0 loops=1)
> -> Seq Scan on postcodes p (cost=0.00..3.60
> rows=160
> width=2) (actual time=0.025..0.349 rows=160 loops=1)
> Total runtime: 180.807 ms
> (7 rows)
> [...]
My completely amateur guess is that the planner is able to use
Merge Join and Hash Join on your contrived queries because you
are only trying to join one field to a single value (i.e.:
operator=). But the BETWEEN clause is what forces the Nested
Loop. You can see that here:
-> Seq Scan on postcodes p (cost=0.00..3.60 rows=160
width=4) (actual time=0.010..0.396 rows=160 loops=1)
vs. here:
-> Index Scan using postcodes_pkey on postcodes p
(cost=0.00..5.76 rows=160 width=2) (actual time=0.034..0.507 rows=160
loops=1)
So the first query forces a SeqScan on postcodes, while the
second can do an IndexScan.
__
David B. Held
Software Engineer/Array Services Group
200 14th Ave. East, Sartell, MN 56377
320.534.3637 320.253.7800 800.752.8129
В списке pgsql-performance по дате отправления: