Re: query speed question

Поиск
Список
Период
Сортировка
От Christopher Condit
Тема Re: query speed question
Дата
Msg-id BAFDED91EF7D5A43A30F897D6C94B1284D22EA56FC@MBX.ad.sdsc.edu
обсуждение исходный текст
Ответ на Re: query speed question  (Bill Moran <wmoran@potentialtech.com>)
Список pgsql-general
> > > > I have two tables that are georeferenced (although in this case
> I'm
> > > not using PostGIS) that I need to join.
> > > > A ( lat | lon | depth | value)
> > > > |A| = 1,100,000
> > > >
> > > > B ( lat | lon | attributes)
> > > > |B| = 14,000,000
> > > >
> > > > A is a special case because the lat / lon values are all at half
> > > degree intervals (ie 33.5, -12.5). The lat/lons in B are arbitrary.
> > > > I've written a function in B called getSpecialLat(latitude) and
> > > getSpecialLon(longitude)  to calculate the correct A latitude and
> built
> > > an index on both functions.
> > > >
> > > > Here's the query that I'm trying, but it's rather slow:
> > > > SELECT B.* FROM B,
> > > > (SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth =
> 0)
> > > AS foo
> > > > WHERE getSpecialLat(B.lat) = foo.lat AND getSpecialLon(B.lon) =
> > > foo.lon
> > > >
> > > > "Nested Loop  (cost=3569.88..32055.02 rows=1414 width=422)"
> > > > "  ->  Index Scan using A_valueidx on A  (cost=0.00..555.26
> rows=6 width=16)"
> > > > "        Index Cond: ((value > 0) AND (value < 2))"
> > > > "        Filter: (depth = 0)"
> > > > "  ->  Bitmap Heap Scan on B  (cost=3569.88..5029.48 rows=424
> width=422)"
> > > > "        Recheck Cond: ((getSpecialLon((B.lon)::numeric) = A.lon)
> AND (getSpecialLat((B.lat)::numeric) = A.lat))"
> > > > "        ->  BitmapAnd  (cost=3569.88..3569.88 rows=424 width=0)"
> > > > "              ->  Bitmap Index Scan on Blonidx
> (cost=0.00..1760.38 rows=84859 width=0)"
> > > > "                    Index Cond: (getSpecialLon((B.lon)::numeric)
> = A.lon)"
> > > > "              ->  Bitmap Index Scan on Blatidx
> (cost=0.00..1766.81 rows=84859 width=0)"
> > > > "                    Index Cond:
> (getSpeicalLat((B.latitude)::numeric) = A.lat)"
> > > >
> > > > Am I missing something in terms of speeding up this query?
> > >
> > > I'd be interested to see if the query rewritten as a JOIN would be
> > > faster.
> >
> > I can write it like this:
> > select b.*
> > from b join a on (getwoalatitude(b.latitude::numeric) = a.lat
> > and getwoalongitude(b.longitude::numeric) = a.lon)
> > where
> > a.value > 0 and a.value < 2 and a.depth = 0
> >
> > which results in this plan:
> > "Nested Loop  (cost=1387.20..13152982.35 rows=1625767 width=422)"
> > "  ->  Index Scan using a_depthidx on a_(cost=0.00..1464.07 rows=6897
> width=16)"
> > "        Index Cond: (depth = 0)"
> > "        Filter: ((value > 0::numeric) AND (value < 2::numeric))"
> > "  ->  Bitmap Heap Scan on b  (cost=1387.20..1686.37 rows=424
> width=422)"
> > "        Recheck Cond: ((getSpecialLon((b.lon)::numeric) = a.lon) AND
> (getSpecialLat((b.lat)::numeric) = a.lat))"
> > "        ->  BitmapAnd  (cost=1387.20..1387.20 rows=424 width=0)"
> > "              ->  Bitmap Index Scan on Blonidx  (cost=0.00..672.15
> rows=84859 width=0)"
> > "                    Index Cond: (getSpecialLon((b.lon)::numeric) =
> a.lon)"
> > "              ->  Bitmap Index Scan on Blatidx  (cost=0.00..672.36
> rows=84859 width=0)"
> > "                    Index Cond: (getSpecialLat((b.lat)::numeric) =
> a.lat)"
> >
> > However it's still taking ages to execute (over five minutes - I
> stopped it before it finished)
>
> Do you really expect that query to return 1.6M rows?  I doubt it, since
> the subselect version only returns 1400.

No - I do not expect it to return 1.6M...

> If you do a vacuum analyze on
> those two tables, does it speed either of the queries up?

Here are the new results (with explain analyze):

EXPLAIN ANALYZE SELECT B.* FROM B,
(SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth = 0) AS foo
WHERE getwoalatitude(B.latitude::numeric) = foo.lat AND getwoalongitude(B.longitude::numeric) = foo.lon

after vacuum analyze:
"Merge Join  (cost=17873237.91..26594735.94 rows=1984509 width=420) (actual time=752983.201..941125.197 rows=226941
loops=1)"
"  Merge Cond: ((a.lat = (getwoalatitude((b.latitude)::numeric))) AND (a.lon =
(getwoalongitude((b.longitude)::numeric))))"
"  ->  Sort  (cost=2019.51..2037.61 rows=7239 width=16) (actual time=30.704..32.171 rows=2111 loops=1)"
"        Sort Key: a.lat, a.lon"
"        Sort Method:  quicksort  Memory: 212kB"
"        ->  Index Scan using depthidx on a  (cost=0.00..1555.43 rows=7239 width=16) (actual time=0.533..24.631
rows=2111loops=1)" 
"              Index Cond: (depth = 0)"
"              Filter: ((value > 0::numeric) AND (value < 2::numeric))"
"  ->  Materialize  (cost=17871190.21..18083338.96 rows=16971900 width=420) (actual time=751324.751..919278.574
rows=16963350loops=1)" 
"        ->  Sort  (cost=17871190.21..17913619.96 rows=16971900 width=420) (actual time=751324.744..820522.604
rows=16963350loops=1)" 
"              Sort Key: (getwoalatitude((b.latitude)::numeric)), (getwoalongitude((b.longitude)::numeric))"
"              Sort Method:  external merge  Disk: 4599344kB"
"              ->  Seq Scan on b  (cost=0.00..750696.00 rows=16971900 width=420) (actual time=1.781..229158.949
rows=16971901loops=1)" 
"Total runtime: 942295.914 ms"


EXPLAIN ANALYZE SELECT b.* FROM b JOIN a
ON (getwoalatitude(b.latitude::numeric) = a.lat AND getwoalongitude(b.longitude::numeric) = a.lon)
WHERE a.value > 0 AND a.value < 2 AND a.depth = 0

"Merge Join  (cost=17873237.91..26594735.94 rows=1984509 width=420) (actual time=755255.801..906407.961 rows=226941
loops=1)"
"  Merge Cond: ((a.lat = (getwoalatitude((b.latitude)::numeric))) AND (a.lon =
(getwoalongitude((b.longitude)::numeric))))"
"  ->  Sort  (cost=2019.51..2037.61 rows=7239 width=16) (actual time=27.658..29.108 rows=2111 loops=1)"
"        Sort Key: a.lat, a.lon"
"        Sort Method:  quicksort  Memory: 212kB"
"        ->  Index Scan using depthidx on a (cost=0.00..1555.43 rows=7239 width=16) (actual time=0.467..21.646
rows=2111loops=1)" 
"              Index Cond: (depth = 0)"
"              Filter: ((value > 0::numeric) AND (value < 2::numeric))"
"  ->  Materialize  (cost=17871190.21..18083338.96 rows=16971900 width=420) (actual time=753605.414..884549.890
rows=16963350loops=1)" 
"        ->  Sort  (cost=17871190.21..17913619.96 rows=16971900 width=420) (actual time=753605.407..822844.299
rows=16963350loops=1)" 
"              Sort Key: (getwoalatitude((b.latitude)::numeric)), (getwoalongitude((b.longitude)::numeric))"
"              Sort Method:  external merge  Disk: 4599344kB"
"              ->  Seq Scan on b  (cost=0.00..750696.00 rows=16971900 width=420) (actual time=0.095..229888.646
rows=16971901loops=1)" 
"Total runtime: 911284.022 ms"

> Try "set enable_nestloop=off" and rerun the two queries.  If that helps
> and analyze didn't then perhaps you need to increase the statistics
> target on those two tables, or perhaps you'll just have to use that
> set command to force the planner to avoid a nested loop.
>
> Hope one of these helps.
Thank, Bill, it's a bit faster. Any other thoughts?

-Chris

В списке pgsql-general по дате отправления:

Предыдущее
От: APseudoUtopia
Дата:
Сообщение: Full-Text Searching: to_tsquery() vs. plainto_tsquery()
Следующее
От: Oleg Bartunov
Дата:
Сообщение: Re: Full-Text Searching: to_tsquery() vs. plainto_tsquery()