Re: query speed question

Поиск
Список
Период
Сортировка
От Alban Hertroys
Тема Re: query speed question
Дата
Msg-id 474FD66B-DB38-4DC3-BC25-7BE6EFF2459C@solfertje.student.utwente.nl
обсуждение исходный текст
Ответ на Re: query speed question  (Christopher Condit <condit@sdsc.edu>)
Список pgsql-general
On 10 Sep 2009, at 19:09, Christopher Condit wrote:

> Here are both queries (with explain analyze):
>
> EXPLAIN ANALYZE SELECT B.* FROM B,
> (SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth = 0)
> AS foo WHERE getwoalatitude(B.latitude::numeric) = foo.lat AND
> getwoalongitude(B.longitude::numeric) = foo.lon
>
> after vacuum analyze:
> "Merge Join  (cost=17873237.91..26594735.94 rows=1984509 width=420)
> (actual time=752983.201..941125.197 rows=226941 loops=1)"
> "  Merge Cond: ((a.lat = (getwoalatitude((b.latitude)::numeric)))
> AND (a.lon = (getwoalongitude((b.longitude)::numeric))))"
> "  ->  Sort  (cost=2019.51..2037.61 rows=7239 width=16) (actual
> time=30.704..32.171 rows=2111 loops=1)"
> "        Sort Key: a.lat, a.lon"
> "        Sort Method:  quicksort  Memory: 212kB"
> "        ->  Index Scan using depthidx on a  (cost=0.00..1555.43
> rows=7239 width=16) (actual time=0.533..24.631 rows=2111 loops=1)"
> "              Index Cond: (depth = 0)"
> "              Filter: ((value > 0::numeric) AND (value <
> 2::numeric))"

Here you can see that the estimated row counts are way off compared to
the actual number of rows. That means the statistics for that table
aren't reliable; either you need to ANALYZE them or you need to
increase the statistics size of the columns involved.

This part of the query is the quicker part, but it may be possible
that the planner wouldn't have picked a merge-join with the remainder
if the estimates were closer to reality.

You can paste those queries in http://explain-analyze.info/, although
you'll need to strip the quotes from the plan (not sure where those
get added, psql doesn't add them).

> "  ->  Materialize  (cost=17871190.21..18083338.96 rows=16971900
> width=420) (actual time=751324.751..919278.574 rows=16963350 loops=1)"
> "        ->  Sort  (cost=17871190.21..17913619.96 rows=16971900
> width=420) (actual time=751324.744..820522.604 rows=16963350 loops=1)"
> "              Sort Key: (getwoalatitude((b.latitude)::numeric)),
> (getwoalongitude((b.longitude)::numeric))"
> "              Sort Method:  external merge  Disk: 4599344kB"
> "              ->  Seq Scan on b  (cost=0.00..750696.00
> rows=16971900 width=420) (actual time=1.781..229158.949
> rows=16971901 loops=1)"
> "Total runtime: 942295.914 ms"

Here's where most time is spent. Firstly, an external merge to disk is
going to be slow. I think the setting controlling the amount of memory
available for sorting is work_mem, you can try increasing that in the
client (set work_mem TO <some value>) before executing your query and
see what value is sufficient.

The other issue here is the sequential scan on table b. I think that's
caused by the planner needing to merge to disk though.

> EXPLAIN ANALYZE SELECT b.* FROM b JOIN a ON (getwoalatitude
> (b.latitude::numeric) = a.lat AND getwoalongitude
> (b.longitude::numeric) = a.lon) WHERE a.value > 0 AND a.value < 2
> AND a.depth = 0
>
> "Merge Join  (cost=17873237.91..26594735.94 rows=1984509 width=420)
> (actual time=755255.801..906407.961 rows=226941 loops=1)"
> "  Merge Cond: ((a.lat = (getwoalatitude((b.latitude)::numeric)))
> AND (a.lon = (getwoalongitude((b.longitude)::numeric))))"
> "  ->  Sort  (cost=2019.51..2037.61 rows=7239 width=16) (actual
> time=27.658..29.108 rows=2111 loops=1)"
> "        Sort Key: a.lat, a.lon"
> "        Sort Method:  quicksort  Memory: 212kB"
> "        ->  Index Scan using depthidx on a (cost=0.00..1555.43
> rows=7239 width=16) (actual time=0.467..21.646 rows=2111 loops=1)"
> "              Index Cond: (depth = 0)"
> "              Filter: ((value > 0::numeric) AND (value <
> 2::numeric))"
> "  ->  Materialize  (cost=17871190.21..18083338.96 rows=16971900
> width=420) (actual time=753605.414..884549.890 rows=16963350 loops=1)"
> "        ->  Sort  (cost=17871190.21..17913619.96 rows=16971900
> width=420) (actual time=753605.407..822844.299 rows=16963350 loops=1)"
> "              Sort Key: (getwoalatitude((b.latitude)::numeric)),
> (getwoalongitude((b.longitude)::numeric))"
> "              Sort Method:  external merge  Disk: 4599344kB"
> "              ->  Seq Scan on b  (cost=0.00..750696.00
> rows=16971900 width=420) (actual time=0.095..229888.646
> rows=16971901 loops=1)"
> "Total runtime: 911284.022 ms"

This plan is actually very similar to the original query's plan, it
has the same problems.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4aa939f812071577543529!



В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: COPY command character set
Следующее
От: "Peter Headland"
Дата:
Сообщение: Re: COPY command character set