Обсуждение: query speed question

Поиск
Список
Период
Сортировка

query speed question

От
Christopher Condit
Дата:

I have two tables that are georeferenced (although in this case I’m not using PostGIS) that I need to join.

A ( lat | lon | depth | value)

|A| = 1,100,000

 

B ( lat | lon | attributes)

|B| = 14,000,000

 

A is a special case because the lat / lon values are all at half degree intervals (ie 33.5, -12.5). The lat/lons in B are arbitrary.

I’ve written a function in B called getSpecialLat(latitude) and getSpecialLon(longitude)  to calculate the correct A latitude and built an index on both functions.

 

Here’s the query that I’m trying, but it’s rather slow:

SELECT B.* FROM B,

(SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth = 0) AS foo

WHERE getSpecialLat(B.lat) = foo.lat AND getSpecialLon(B.lon) = foo.lon

 

"Nested Loop  (cost=3569.88..32055.02 rows=1414 width=422)"

"  ->  Index Scan using A_valueidx on A  (cost=0.00..555.26 rows=6 width=16)"

"        Index Cond: ((value > 0) AND (value < 2))"

"        Filter: (depth = 0)"

"  ->  Bitmap Heap Scan on B  (cost=3569.88..5029.48 rows=424 width=422)"

"        Recheck Cond: ((getSpecialLon((B.lon)::numeric) = A.lon) AND (getSpecialLat((B.lat)::numeric) = A.lat))"

"        ->  BitmapAnd  (cost=3569.88..3569.88 rows=424 width=0)"

"              ->  Bitmap Index Scan on Blonidx  (cost=0.00..1760.38 rows=84859 width=0)"

"                    Index Cond: (getSpecialLon((B.lon)::numeric) = A.lon)"

"              ->  Bitmap Index Scan on Blatidx  (cost=0.00..1766.81 rows=84859 width=0)"

"                    Index Cond: (getSpeicalLat((B.latitude)::numeric) = A.lat)"

 

Am I missing something in terms of speeding up this query?

 

Thanks,

-Chris

Re: query speed question

От
Bill Moran
Дата:
Christopher Condit <condit@sdsc.edu> wrote:
>
> I have two tables that are georeferenced (although in this case I'm not using PostGIS) that I need to join.
> A ( lat | lon | depth | value)
> |A| = 1,100,000
>
> B ( lat | lon | attributes)
> |B| = 14,000,000
>
> A is a special case because the lat / lon values are all at half degree intervals (ie 33.5, -12.5). The lat/lons in B
arearbitrary. 
> I've written a function in B called getSpecialLat(latitude) and getSpecialLon(longitude)  to calculate the correct A
latitudeand built an index on both functions. 
>
> Here's the query that I'm trying, but it's rather slow:
> SELECT B.* FROM B,
> (SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth = 0) AS foo
> WHERE getSpecialLat(B.lat) = foo.lat AND getSpecialLon(B.lon) = foo.lon
>
> "Nested Loop  (cost=3569.88..32055.02 rows=1414 width=422)"
> "  ->  Index Scan using A_valueidx on A  (cost=0.00..555.26 rows=6 width=16)"
> "        Index Cond: ((value > 0) AND (value < 2))"
> "        Filter: (depth = 0)"
> "  ->  Bitmap Heap Scan on B  (cost=3569.88..5029.48 rows=424 width=422)"
> "        Recheck Cond: ((getSpecialLon((B.lon)::numeric) = A.lon) AND (getSpecialLat((B.lat)::numeric) = A.lat))"
> "        ->  BitmapAnd  (cost=3569.88..3569.88 rows=424 width=0)"
> "              ->  Bitmap Index Scan on Blonidx  (cost=0.00..1760.38 rows=84859 width=0)"
> "                    Index Cond: (getSpecialLon((B.lon)::numeric) = A.lon)"
> "              ->  Bitmap Index Scan on Blatidx  (cost=0.00..1766.81 rows=84859 width=0)"
> "                    Index Cond: (getSpeicalLat((B.latitude)::numeric) = A.lat)"
>
> Am I missing something in terms of speeding up this query?

I'd be interested to see if the query rewritten as a JOIN would be faster.

--
Bill Moran
http://www.potentialtech.com

Re: query speed question

От
Christopher Condit
Дата:
> > I have two tables that are georeferenced (although in this case I'm
> not using PostGIS) that I need to join.
> > A ( lat | lon | depth | value)
> > |A| = 1,100,000
> >
> > B ( lat | lon | attributes)
> > |B| = 14,000,000
> >
> > A is a special case because the lat / lon values are all at half
> degree intervals (ie 33.5, -12.5). The lat/lons in B are arbitrary.
> > I've written a function in B called getSpecialLat(latitude) and
> getSpecialLon(longitude)  to calculate the correct A latitude and built
> an index on both functions.
> >
> > Here's the query that I'm trying, but it's rather slow:
> > SELECT B.* FROM B,
> > (SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth = 0)
> AS foo
> > WHERE getSpecialLat(B.lat) = foo.lat AND getSpecialLon(B.lon) =
> foo.lon
> >
> > "Nested Loop  (cost=3569.88..32055.02 rows=1414 width=422)"
> > "  ->  Index Scan using A_valueidx on A  (cost=0.00..555.26 rows=6
> width=16)"
> > "        Index Cond: ((value > 0) AND (value < 2))"
> > "        Filter: (depth = 0)"
> > "  ->  Bitmap Heap Scan on B  (cost=3569.88..5029.48 rows=424
> width=422)"
> > "        Recheck Cond: ((getSpecialLon((B.lon)::numeric) = A.lon) AND
> (getSpecialLat((B.lat)::numeric) = A.lat))"
> > "        ->  BitmapAnd  (cost=3569.88..3569.88 rows=424 width=0)"
> > "              ->  Bitmap Index Scan on Blonidx  (cost=0.00..1760.38
> rows=84859 width=0)"
> > "                    Index Cond: (getSpecialLon((B.lon)::numeric) =
> A.lon)"
> > "              ->  Bitmap Index Scan on Blatidx  (cost=0.00..1766.81
> rows=84859 width=0)"
> > "                    Index Cond:
> (getSpeicalLat((B.latitude)::numeric) = A.lat)"
> >
> > Am I missing something in terms of speeding up this query?
>
> I'd be interested to see if the query rewritten as a JOIN would be
> faster.

I can write it like this:
select b.*
from b join a on (getwoalatitude(b.latitude::numeric) = a.lat
and getwoalongitude(b.longitude::numeric) = a.lon)
where
a.value > 0 and a.value < 2 and a.depth = 0

which results in this plan:
"Nested Loop  (cost=1387.20..13152982.35 rows=1625767 width=422)"
"  ->  Index Scan using a_depthidx on a_(cost=0.00..1464.07 rows=6897 width=16)"
"        Index Cond: (depth = 0)"
"        Filter: ((value > 0::numeric) AND (value < 2::numeric))"
"  ->  Bitmap Heap Scan on b  (cost=1387.20..1686.37 rows=424 width=422)"
"        Recheck Cond: ((getSpecialLon((b.lon)::numeric) = a.lon) AND (getSpecialLat((b.lat)::numeric) = a.lat))"
"        ->  BitmapAnd  (cost=1387.20..1387.20 rows=424 width=0)"
"              ->  Bitmap Index Scan on Blonidx  (cost=0.00..672.15 rows=84859 width=0)"
"                    Index Cond: (getSpecialLon((b.lon)::numeric) = a.lon)"
"              ->  Bitmap Index Scan on Blatidx  (cost=0.00..672.36 rows=84859 width=0)"
"                    Index Cond: (getSpecialLat((b.lat)::numeric) = a.lat)"

However it's still taking ages to execute (over five minutes - I stopped it before it finished)

-Chris


Re: query speed question

От
Alban Hertroys
Дата:
On 3 Sep 2009, at 23:11, Christopher Condit wrote:

> I have two tables that are georeferenced (although in this case I'm
> not using PostGIS) that I need to join.
> A ( lat | lon | depth | value)
> |A| = 1,100,000
>
> B ( lat | lon | attributes)
> |B| = 14,000,000
>
> A is a special case because the lat / lon values are all at half
> degree intervals (ie 33.5, -12.5). The lat/lons in B are arbitrary.
> I've written a function in B called getSpecialLat(latitude) and
> getSpecialLon(longitude)  to calculate the correct A latitude and
> built an index on both functions.

My guess is that those functions round lat and lon values to their
nearest half-degree interval counterpart as in table A?
I assume you marked that function immutable?
Is the return type indeed a numeric, as there are some explicit casts
in the query plan?

> Here's the query that I'm trying, but it's rather slow:
> SELECT B.* FROM B,
> (SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth = 0)
> AS foo
> WHERE getSpecialLat(B.lat) = foo.lat AND getSpecialLon(B.lon) =
> foo.lon
>
> "Nested Loop  (cost=3569.88..32055.02 rows=1414 width=422)"
> "  ->  Index Scan using A_valueidx on A  (cost=0.00..555.26 rows=6
> width=16)"
> "        Index Cond: ((value > 0) AND (value < 2))"
> "        Filter: (depth = 0)"
> "  ->  Bitmap Heap Scan on B  (cost=3569.88..5029.48 rows=424
> width=422)"
> "        Recheck Cond: ((getSpecialLon((B.lon)::numeric) = A.lon)
> AND (getSpecialLat((B.lat)::numeric) = A.lat))"
> "        ->  BitmapAnd  (cost=3569.88..3569.88 rows=424 width=0)"
> "              ->  Bitmap Index Scan on Blonidx  (cost=0.00..1760.38
> rows=84859 width=0)"
> "                    Index Cond: (getSpecialLon((B.lon)::numeric) =
> A.lon)"
> "              ->  Bitmap Index Scan on Blatidx  (cost=0.00..1766.81
> rows=84859 width=0)"
> "                    Index Cond: (getSpeicalLat
> ((B.latitude)::numeric) = A.lat)"
>
> Am I missing something in terms of speeding up this query?


Hard to tell without knowing where most time gets spent. An EXPLAIN
ANALYSE would tell.

You could try comparing integers instead of numerics, simply by
multiplying your half-degree values by 10 (or by 2) and cast them to
int. Integer comparisons are typically faster than numerics. It's hard
to tell whether that does indeed take up a significant amount of time
without the above ;)

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4aa0f4d811866722913219!



Re: query speed question

От
Bill Moran
Дата:
In response to Christopher Condit <condit@sdsc.edu>:

> > > I have two tables that are georeferenced (although in this case I'm
> > not using PostGIS) that I need to join.
> > > A ( lat | lon | depth | value)
> > > |A| = 1,100,000
> > >
> > > B ( lat | lon | attributes)
> > > |B| = 14,000,000
> > >
> > > A is a special case because the lat / lon values are all at half
> > degree intervals (ie 33.5, -12.5). The lat/lons in B are arbitrary.
> > > I've written a function in B called getSpecialLat(latitude) and
> > getSpecialLon(longitude)  to calculate the correct A latitude and built
> > an index on both functions.
> > >
> > > Here's the query that I'm trying, but it's rather slow:
> > > SELECT B.* FROM B,
> > > (SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth = 0)
> > AS foo
> > > WHERE getSpecialLat(B.lat) = foo.lat AND getSpecialLon(B.lon) =
> > foo.lon
> > >
> > > "Nested Loop  (cost=3569.88..32055.02 rows=1414 width=422)"
> > > "  ->  Index Scan using A_valueidx on A  (cost=0.00..555.26 rows=6 width=16)"
> > > "        Index Cond: ((value > 0) AND (value < 2))"
> > > "        Filter: (depth = 0)"
> > > "  ->  Bitmap Heap Scan on B  (cost=3569.88..5029.48 rows=424 width=422)"
> > > "        Recheck Cond: ((getSpecialLon((B.lon)::numeric) = A.lon) AND (getSpecialLat((B.lat)::numeric) = A.lat))"
> > > "        ->  BitmapAnd  (cost=3569.88..3569.88 rows=424 width=0)"
> > > "              ->  Bitmap Index Scan on Blonidx  (cost=0.00..1760.38 rows=84859 width=0)"
> > > "                    Index Cond: (getSpecialLon((B.lon)::numeric) = A.lon)"
> > > "              ->  Bitmap Index Scan on Blatidx  (cost=0.00..1766.81 rows=84859 width=0)"
> > > "                    Index Cond: (getSpeicalLat((B.latitude)::numeric) = A.lat)"
> > >
> > > Am I missing something in terms of speeding up this query?
> >
> > I'd be interested to see if the query rewritten as a JOIN would be
> > faster.
>
> I can write it like this:
> select b.*
> from b join a on (getwoalatitude(b.latitude::numeric) = a.lat
> and getwoalongitude(b.longitude::numeric) = a.lon)
> where
> a.value > 0 and a.value < 2 and a.depth = 0
>
> which results in this plan:
> "Nested Loop  (cost=1387.20..13152982.35 rows=1625767 width=422)"
> "  ->  Index Scan using a_depthidx on a_(cost=0.00..1464.07 rows=6897 width=16)"
> "        Index Cond: (depth = 0)"
> "        Filter: ((value > 0::numeric) AND (value < 2::numeric))"
> "  ->  Bitmap Heap Scan on b  (cost=1387.20..1686.37 rows=424 width=422)"
> "        Recheck Cond: ((getSpecialLon((b.lon)::numeric) = a.lon) AND (getSpecialLat((b.lat)::numeric) = a.lat))"
> "        ->  BitmapAnd  (cost=1387.20..1387.20 rows=424 width=0)"
> "              ->  Bitmap Index Scan on Blonidx  (cost=0.00..672.15 rows=84859 width=0)"
> "                    Index Cond: (getSpecialLon((b.lon)::numeric) = a.lon)"
> "              ->  Bitmap Index Scan on Blatidx  (cost=0.00..672.36 rows=84859 width=0)"
> "                    Index Cond: (getSpecialLat((b.lat)::numeric) = a.lat)"
>
> However it's still taking ages to execute (over five minutes - I stopped it before it finished)

Do you really expect that query to return 1.6M rows?  I doubt it, since
the subselect version only returns 1400.  If you do a vacuum analyze on
those two tables, does it speed either of the queries up?

Try "set enable_nestloop=off" and rerun the two queries.  If that helps
and analyze didn't then perhaps you need to increase the statistics
target on those two tables, or perhaps you'll just have to use that
set command to force the planner to avoid a nested loop.

Hope one of these helps.

--
Bill Moran
http://www.potentialtech.com
http://people.collaborativefusion.com/~wmoran/

Re: query speed question

От
Christopher Condit
Дата:
> > > > I have two tables that are georeferenced (although in this case
> I'm
> > > not using PostGIS) that I need to join.
> > > > A ( lat | lon | depth | value)
> > > > |A| = 1,100,000
> > > >
> > > > B ( lat | lon | attributes)
> > > > |B| = 14,000,000
> > > >
> > > > A is a special case because the lat / lon values are all at half
> > > degree intervals (ie 33.5, -12.5). The lat/lons in B are arbitrary.
> > > > I've written a function in B called getSpecialLat(latitude) and
> > > getSpecialLon(longitude)  to calculate the correct A latitude and
> built
> > > an index on both functions.
> > > >
> > > > Here's the query that I'm trying, but it's rather slow:
> > > > SELECT B.* FROM B,
> > > > (SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth =
> 0)
> > > AS foo
> > > > WHERE getSpecialLat(B.lat) = foo.lat AND getSpecialLon(B.lon) =
> > > foo.lon
> > > >
> > > > "Nested Loop  (cost=3569.88..32055.02 rows=1414 width=422)"
> > > > "  ->  Index Scan using A_valueidx on A  (cost=0.00..555.26
> rows=6 width=16)"
> > > > "        Index Cond: ((value > 0) AND (value < 2))"
> > > > "        Filter: (depth = 0)"
> > > > "  ->  Bitmap Heap Scan on B  (cost=3569.88..5029.48 rows=424
> width=422)"
> > > > "        Recheck Cond: ((getSpecialLon((B.lon)::numeric) = A.lon)
> AND (getSpecialLat((B.lat)::numeric) = A.lat))"
> > > > "        ->  BitmapAnd  (cost=3569.88..3569.88 rows=424 width=0)"
> > > > "              ->  Bitmap Index Scan on Blonidx
> (cost=0.00..1760.38 rows=84859 width=0)"
> > > > "                    Index Cond: (getSpecialLon((B.lon)::numeric)
> = A.lon)"
> > > > "              ->  Bitmap Index Scan on Blatidx
> (cost=0.00..1766.81 rows=84859 width=0)"
> > > > "                    Index Cond:
> (getSpeicalLat((B.latitude)::numeric) = A.lat)"
> > > >
> > > > Am I missing something in terms of speeding up this query?
> > >
> > > I'd be interested to see if the query rewritten as a JOIN would be
> > > faster.
> >
> > I can write it like this:
> > select b.*
> > from b join a on (getwoalatitude(b.latitude::numeric) = a.lat
> > and getwoalongitude(b.longitude::numeric) = a.lon)
> > where
> > a.value > 0 and a.value < 2 and a.depth = 0
> >
> > which results in this plan:
> > "Nested Loop  (cost=1387.20..13152982.35 rows=1625767 width=422)"
> > "  ->  Index Scan using a_depthidx on a_(cost=0.00..1464.07 rows=6897
> width=16)"
> > "        Index Cond: (depth = 0)"
> > "        Filter: ((value > 0::numeric) AND (value < 2::numeric))"
> > "  ->  Bitmap Heap Scan on b  (cost=1387.20..1686.37 rows=424
> width=422)"
> > "        Recheck Cond: ((getSpecialLon((b.lon)::numeric) = a.lon) AND
> (getSpecialLat((b.lat)::numeric) = a.lat))"
> > "        ->  BitmapAnd  (cost=1387.20..1387.20 rows=424 width=0)"
> > "              ->  Bitmap Index Scan on Blonidx  (cost=0.00..672.15
> rows=84859 width=0)"
> > "                    Index Cond: (getSpecialLon((b.lon)::numeric) =
> a.lon)"
> > "              ->  Bitmap Index Scan on Blatidx  (cost=0.00..672.36
> rows=84859 width=0)"
> > "                    Index Cond: (getSpecialLat((b.lat)::numeric) =
> a.lat)"
> >
> > However it's still taking ages to execute (over five minutes - I
> stopped it before it finished)
>
> Do you really expect that query to return 1.6M rows?  I doubt it, since
> the subselect version only returns 1400.

No - I do not expect it to return 1.6M...

> If you do a vacuum analyze on
> those two tables, does it speed either of the queries up?

Here are the new results (with explain analyze):

EXPLAIN ANALYZE SELECT B.* FROM B,
(SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth = 0) AS foo
WHERE getwoalatitude(B.latitude::numeric) = foo.lat AND getwoalongitude(B.longitude::numeric) = foo.lon

after vacuum analyze:
"Merge Join  (cost=17873237.91..26594735.94 rows=1984509 width=420) (actual time=752983.201..941125.197 rows=226941
loops=1)"
"  Merge Cond: ((a.lat = (getwoalatitude((b.latitude)::numeric))) AND (a.lon =
(getwoalongitude((b.longitude)::numeric))))"
"  ->  Sort  (cost=2019.51..2037.61 rows=7239 width=16) (actual time=30.704..32.171 rows=2111 loops=1)"
"        Sort Key: a.lat, a.lon"
"        Sort Method:  quicksort  Memory: 212kB"
"        ->  Index Scan using depthidx on a  (cost=0.00..1555.43 rows=7239 width=16) (actual time=0.533..24.631
rows=2111loops=1)" 
"              Index Cond: (depth = 0)"
"              Filter: ((value > 0::numeric) AND (value < 2::numeric))"
"  ->  Materialize  (cost=17871190.21..18083338.96 rows=16971900 width=420) (actual time=751324.751..919278.574
rows=16963350loops=1)" 
"        ->  Sort  (cost=17871190.21..17913619.96 rows=16971900 width=420) (actual time=751324.744..820522.604
rows=16963350loops=1)" 
"              Sort Key: (getwoalatitude((b.latitude)::numeric)), (getwoalongitude((b.longitude)::numeric))"
"              Sort Method:  external merge  Disk: 4599344kB"
"              ->  Seq Scan on b  (cost=0.00..750696.00 rows=16971900 width=420) (actual time=1.781..229158.949
rows=16971901loops=1)" 
"Total runtime: 942295.914 ms"


EXPLAIN ANALYZE SELECT b.* FROM b JOIN a
ON (getwoalatitude(b.latitude::numeric) = a.lat AND getwoalongitude(b.longitude::numeric) = a.lon)
WHERE a.value > 0 AND a.value < 2 AND a.depth = 0

"Merge Join  (cost=17873237.91..26594735.94 rows=1984509 width=420) (actual time=755255.801..906407.961 rows=226941
loops=1)"
"  Merge Cond: ((a.lat = (getwoalatitude((b.latitude)::numeric))) AND (a.lon =
(getwoalongitude((b.longitude)::numeric))))"
"  ->  Sort  (cost=2019.51..2037.61 rows=7239 width=16) (actual time=27.658..29.108 rows=2111 loops=1)"
"        Sort Key: a.lat, a.lon"
"        Sort Method:  quicksort  Memory: 212kB"
"        ->  Index Scan using depthidx on a (cost=0.00..1555.43 rows=7239 width=16) (actual time=0.467..21.646
rows=2111loops=1)" 
"              Index Cond: (depth = 0)"
"              Filter: ((value > 0::numeric) AND (value < 2::numeric))"
"  ->  Materialize  (cost=17871190.21..18083338.96 rows=16971900 width=420) (actual time=753605.414..884549.890
rows=16963350loops=1)" 
"        ->  Sort  (cost=17871190.21..17913619.96 rows=16971900 width=420) (actual time=753605.407..822844.299
rows=16963350loops=1)" 
"              Sort Key: (getwoalatitude((b.latitude)::numeric)), (getwoalongitude((b.longitude)::numeric))"
"              Sort Method:  external merge  Disk: 4599344kB"
"              ->  Seq Scan on b  (cost=0.00..750696.00 rows=16971900 width=420) (actual time=0.095..229888.646
rows=16971901loops=1)" 
"Total runtime: 911284.022 ms"

> Try "set enable_nestloop=off" and rerun the two queries.  If that helps
> and analyze didn't then perhaps you need to increase the statistics
> target on those two tables, or perhaps you'll just have to use that
> set command to force the planner to avoid a nested loop.
>
> Hope one of these helps.
Thank, Bill, it's a bit faster. Any other thoughts?

-Chris

Re: query speed question

От
Christopher Condit
Дата:
Hi Alban-

> > I have two tables that are georeferenced (although in this case I'm
> > not using PostGIS) that I need to join.
> > A ( lat | lon | depth | value)
> > |A| = 1,100,000
> >
> > B ( lat | lon | attributes)
> > |B| = 14,000,000
> >
> > A is a special case because the lat / lon values are all at half
> > degree intervals (ie 33.5, -12.5). The lat/lons in B are arbitrary.
> > I've written a function in B called getSpecialLat(latitude) and
> > getSpecialLon(longitude)  to calculate the correct A latitude and
> > built an index on both functions.
>
> My guess is that those functions round lat and lon values to their
> nearest half-degree interval counterpart as in table A?
> I assume you marked that function immutable?
> Is the return type indeed a numeric, as there are some explicit casts
> in the query plan?

Yes - the function is immutable. The return type is numeric, but I fixed the casting on input problem...

> > Here's the query that I'm trying, but it's rather slow:
> > SELECT B.* FROM B,
> > (SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth = 0)
> > AS foo
> > WHERE getSpecialLat(B.lat) = foo.lat AND getSpecialLon(B.lon) =
> > foo.lon
> >
> > "Nested Loop  (cost=3569.88..32055.02 rows=1414 width=422)"
> > "  ->  Index Scan using A_valueidx on A  (cost=0.00..555.26 rows=6
> > width=16)"
> > "        Index Cond: ((value > 0) AND (value < 2))"
> > "        Filter: (depth = 0)"
> > "  ->  Bitmap Heap Scan on B  (cost=3569.88..5029.48 rows=424
> > width=422)"
> > "        Recheck Cond: ((getSpecialLon((B.lon)::numeric) = A.lon)
> > AND (getSpecialLat((B.lat)::numeric) = A.lat))"
> > "        ->  BitmapAnd  (cost=3569.88..3569.88 rows=424 width=0)"
> > "              ->  Bitmap Index Scan on Blonidx  (cost=0.00..1760.38
> > rows=84859 width=0)"
> > "                    Index Cond: (getSpecialLon((B.lon)::numeric) =
> > A.lon)"
> > "              ->  Bitmap Index Scan on Blatidx  (cost=0.00..1766.81
> > rows=84859 width=0)"
> > "                    Index Cond: (getSpeicalLat
> > ((B.latitude)::numeric) = A.lat)"
> >
> > Am I missing something in terms of speeding up this query?
>
>
> Hard to tell without knowing where most time gets spent. An EXPLAIN
> ANALYSE would tell.

Here are both queries (with explain analyze):

EXPLAIN ANALYZE SELECT B.* FROM B,
(SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth = 0) AS foo WHERE getwoalatitude(B.latitude::numeric) =
foo.latAND getwoalongitude(B.longitude::numeric) = foo.lon 

after vacuum analyze:
"Merge Join  (cost=17873237.91..26594735.94 rows=1984509 width=420) (actual time=752983.201..941125.197 rows=226941
loops=1)"
"  Merge Cond: ((a.lat = (getwoalatitude((b.latitude)::numeric))) AND (a.lon =
(getwoalongitude((b.longitude)::numeric))))"
"  ->  Sort  (cost=2019.51..2037.61 rows=7239 width=16) (actual time=30.704..32.171 rows=2111 loops=1)"
"        Sort Key: a.lat, a.lon"
"        Sort Method:  quicksort  Memory: 212kB"
"        ->  Index Scan using depthidx on a  (cost=0.00..1555.43 rows=7239 width=16) (actual time=0.533..24.631
rows=2111loops=1)" 
"              Index Cond: (depth = 0)"
"              Filter: ((value > 0::numeric) AND (value < 2::numeric))"
"  ->  Materialize  (cost=17871190.21..18083338.96 rows=16971900 width=420) (actual time=751324.751..919278.574
rows=16963350loops=1)" 
"        ->  Sort  (cost=17871190.21..17913619.96 rows=16971900 width=420) (actual time=751324.744..820522.604
rows=16963350loops=1)" 
"              Sort Key: (getwoalatitude((b.latitude)::numeric)), (getwoalongitude((b.longitude)::numeric))"
"              Sort Method:  external merge  Disk: 4599344kB"
"              ->  Seq Scan on b  (cost=0.00..750696.00 rows=16971900 width=420) (actual time=1.781..229158.949
rows=16971901loops=1)" 
"Total runtime: 942295.914 ms"


EXPLAIN ANALYZE SELECT b.* FROM b JOIN a ON (getwoalatitude(b.latitude::numeric) = a.lat AND
getwoalongitude(b.longitude::numeric)= a.lon) WHERE a.value > 0 AND a.value < 2 AND a.depth = 0 

"Merge Join  (cost=17873237.91..26594735.94 rows=1984509 width=420) (actual time=755255.801..906407.961 rows=226941
loops=1)"
"  Merge Cond: ((a.lat = (getwoalatitude((b.latitude)::numeric))) AND (a.lon =
(getwoalongitude((b.longitude)::numeric))))"
"  ->  Sort  (cost=2019.51..2037.61 rows=7239 width=16) (actual time=27.658..29.108 rows=2111 loops=1)"
"        Sort Key: a.lat, a.lon"
"        Sort Method:  quicksort  Memory: 212kB"
"        ->  Index Scan using depthidx on a (cost=0.00..1555.43 rows=7239 width=16) (actual time=0.467..21.646
rows=2111loops=1)" 
"              Index Cond: (depth = 0)"
"              Filter: ((value > 0::numeric) AND (value < 2::numeric))"
"  ->  Materialize  (cost=17871190.21..18083338.96 rows=16971900 width=420) (actual time=753605.414..884549.890
rows=16963350loops=1)" 
"        ->  Sort  (cost=17871190.21..17913619.96 rows=16971900 width=420) (actual time=753605.407..822844.299
rows=16963350loops=1)" 
"              Sort Key: (getwoalatitude((b.latitude)::numeric)), (getwoalongitude((b.longitude)::numeric))"
"              Sort Method:  external merge  Disk: 4599344kB"
"              ->  Seq Scan on b  (cost=0.00..750696.00 rows=16971900 width=420) (actual time=0.095..229888.646
rows=16971901loops=1)" 
"Total runtime: 911284.022 ms"

Any thoughts?

Thanks,
-Chris

Re: query speed question

От
Alban Hertroys
Дата:
On 10 Sep 2009, at 19:09, Christopher Condit wrote:

> Here are both queries (with explain analyze):
>
> EXPLAIN ANALYZE SELECT B.* FROM B,
> (SELECT lat, lon FROM A WHERE value > 0 AND value < 2 AND depth = 0)
> AS foo WHERE getwoalatitude(B.latitude::numeric) = foo.lat AND
> getwoalongitude(B.longitude::numeric) = foo.lon
>
> after vacuum analyze:
> "Merge Join  (cost=17873237.91..26594735.94 rows=1984509 width=420)
> (actual time=752983.201..941125.197 rows=226941 loops=1)"
> "  Merge Cond: ((a.lat = (getwoalatitude((b.latitude)::numeric)))
> AND (a.lon = (getwoalongitude((b.longitude)::numeric))))"
> "  ->  Sort  (cost=2019.51..2037.61 rows=7239 width=16) (actual
> time=30.704..32.171 rows=2111 loops=1)"
> "        Sort Key: a.lat, a.lon"
> "        Sort Method:  quicksort  Memory: 212kB"
> "        ->  Index Scan using depthidx on a  (cost=0.00..1555.43
> rows=7239 width=16) (actual time=0.533..24.631 rows=2111 loops=1)"
> "              Index Cond: (depth = 0)"
> "              Filter: ((value > 0::numeric) AND (value <
> 2::numeric))"

Here you can see that the estimated row counts are way off compared to
the actual number of rows. That means the statistics for that table
aren't reliable; either you need to ANALYZE them or you need to
increase the statistics size of the columns involved.

This part of the query is the quicker part, but it may be possible
that the planner wouldn't have picked a merge-join with the remainder
if the estimates were closer to reality.

You can paste those queries in http://explain-analyze.info/, although
you'll need to strip the quotes from the plan (not sure where those
get added, psql doesn't add them).

> "  ->  Materialize  (cost=17871190.21..18083338.96 rows=16971900
> width=420) (actual time=751324.751..919278.574 rows=16963350 loops=1)"
> "        ->  Sort  (cost=17871190.21..17913619.96 rows=16971900
> width=420) (actual time=751324.744..820522.604 rows=16963350 loops=1)"
> "              Sort Key: (getwoalatitude((b.latitude)::numeric)),
> (getwoalongitude((b.longitude)::numeric))"
> "              Sort Method:  external merge  Disk: 4599344kB"
> "              ->  Seq Scan on b  (cost=0.00..750696.00
> rows=16971900 width=420) (actual time=1.781..229158.949
> rows=16971901 loops=1)"
> "Total runtime: 942295.914 ms"

Here's where most time is spent. Firstly, an external merge to disk is
going to be slow. I think the setting controlling the amount of memory
available for sorting is work_mem, you can try increasing that in the
client (set work_mem TO <some value>) before executing your query and
see what value is sufficient.

The other issue here is the sequential scan on table b. I think that's
caused by the planner needing to merge to disk though.

> EXPLAIN ANALYZE SELECT b.* FROM b JOIN a ON (getwoalatitude
> (b.latitude::numeric) = a.lat AND getwoalongitude
> (b.longitude::numeric) = a.lon) WHERE a.value > 0 AND a.value < 2
> AND a.depth = 0
>
> "Merge Join  (cost=17873237.91..26594735.94 rows=1984509 width=420)
> (actual time=755255.801..906407.961 rows=226941 loops=1)"
> "  Merge Cond: ((a.lat = (getwoalatitude((b.latitude)::numeric)))
> AND (a.lon = (getwoalongitude((b.longitude)::numeric))))"
> "  ->  Sort  (cost=2019.51..2037.61 rows=7239 width=16) (actual
> time=27.658..29.108 rows=2111 loops=1)"
> "        Sort Key: a.lat, a.lon"
> "        Sort Method:  quicksort  Memory: 212kB"
> "        ->  Index Scan using depthidx on a (cost=0.00..1555.43
> rows=7239 width=16) (actual time=0.467..21.646 rows=2111 loops=1)"
> "              Index Cond: (depth = 0)"
> "              Filter: ((value > 0::numeric) AND (value <
> 2::numeric))"
> "  ->  Materialize  (cost=17871190.21..18083338.96 rows=16971900
> width=420) (actual time=753605.414..884549.890 rows=16963350 loops=1)"
> "        ->  Sort  (cost=17871190.21..17913619.96 rows=16971900
> width=420) (actual time=753605.407..822844.299 rows=16963350 loops=1)"
> "              Sort Key: (getwoalatitude((b.latitude)::numeric)),
> (getwoalongitude((b.longitude)::numeric))"
> "              Sort Method:  external merge  Disk: 4599344kB"
> "              ->  Seq Scan on b  (cost=0.00..750696.00
> rows=16971900 width=420) (actual time=0.095..229888.646
> rows=16971901 loops=1)"
> "Total runtime: 911284.022 ms"

This plan is actually very similar to the original query's plan, it
has the same problems.

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4aa939f812071577543529!