Обсуждение: performance issue with distance function
I am trying to execute the following command:
SELECT R.*, distance(L1.lat, L1.lon, L2.lat, L2.lon) AS Distance
FROM Restaurants R, Locations L1, Locations L2, FoodTypeRestaurantIDX FTR
WHERE R.Zipcode=L1.Zipcode AND L2.Zipcode = '93705' AND R.Delivery=true AND
R.RestaurantID=FTR.RestaurantID AND FTR.FoodTypeID=1 AND distance(L1.lat,
L1.lon, L2.lat, L2.lon) <= 60
LIMIT 100  OFFSET 0
I would rather execute this command: (but i get the error "Attribute
'distance' not found")
SELECT R.*, distance(L1.lat, L1.lon, L2.lat, L2.lon) AS Distance
FROM Restaurants R, Locations L1, Locations L2, FoodTypeRestaurantIDX FTR
WHERE R.Zipcode=L1.Zipcode AND L2.Zipcode = '93705' AND R.Delivery=true AND
R.RestaurantID=FTR.RestaurantID AND FTR.FoodTypeID=1 AND distance <= 60
LIMIT 100  OFFSET 0
Having that second distance function in the "WHERE" section of my sql
statement is costing me at least 10-20 seconds of execution time.  I am
looking for a solution that will speed this up. Does anyone have any advice.
Thanks in advance.
Ryan
PS. Here are the respective execution plans:
Merge Join  (cost=0.00..3463985.82 rows=4342404 width=202) ->  Nested Loop  (cost=0.00..3461172.63 rows=14735
width=166)      ->  Nested Loop  (cost=0.00..127378.88 rows=147350 width=162)             ->  Index Scan using
restaurantszipcodeidxon restaurants r
 
(cost=0.00..62.50 rows=500 width=138)             ->  Index Scan using locationszipcodeidx on locations l2
(cost=0.00..251.69 rows=295 width=24)       ->  Seq Scan on foodtyperestaurantidx ftr  (cost=0.00..22.50 rows=10
width=4) ->  Index Scan using locationszipcodeidx on locations l1
(cost=0.00..2260.63 rows=29470 width=36)
Hash Join  (cost=74.08..804150.11 rows=1447468 width=266) ->  Nested Loop  (cost=0.00..326410.91 rows=2894936
width=120)      ->  Index Scan using locationszipcodeidx on locations l2
 
(cost=0.00..251.69 rows=295 width=60)       ->  Seq Scan on locations l1  (cost=0.00..664.70 rows=29470
width=60) ->  Hash  (cost=73.96..73.96 rows=50 width=146)       ->  Merge Join  (cost=67.58..73.96 rows=50 width=146)
         ->  Sort  (cost=22.67..22.67 rows=10 width=8)                   ->  Seq Scan on foodtyperestaurantidx ftr
 
(cost=0.00..22.50 rows=10 width=8)             ->  Sort  (cost=44.91..44.91 rows=500 width=138)                   ->
SeqScan on restaurants r  (cost=0.00..22.50
 
rows=500 width=138)
			
		Ryan Littrell wrote: > > I am trying to execute the following command: > > SELECT R.*, distance(L1.lat, L1.lon, L2.lat, L2.lon) AS Distance > FROM Restaurants R, Locations L1, Locations L2, FoodTypeRestaurantIDX FTR > WHERE R.Zipcode=L1.Zipcode AND L2.Zipcode = '93705' AND R.Delivery=true AND > R.RestaurantID=FTR.RestaurantID AND FTR.FoodTypeID=1 AND distance(L1.lat, > L1.lon, L2.lat, L2.lon) <= 60 > LIMIT 100 OFFSET 0 > > I would rather execute this command: (but i get the error "Attribute > 'distance' not found") > > SELECT R.*, distance(L1.lat, L1.lon, L2.lat, L2.lon) AS Distance > FROM Restaurants R, Locations L1, Locations L2, FoodTypeRestaurantIDX FTR > WHERE R.Zipcode=L1.Zipcode AND L2.Zipcode = '93705' AND R.Delivery=true AND > R.RestaurantID=FTR.RestaurantID AND FTR.FoodTypeID=1 AND distance <= 60 > LIMIT 100 OFFSET 0 > > Having that second distance function in the "WHERE" section of my sql > statement is costing me at least 10-20 seconds of execution time. I am > looking for a solution that will speed this up. Does anyone have any advice. > Thanks in advance. > this probably isn't what you want, but would it speed things up if you did an "order by distance" instead of doing the "distance <= 60", then having your application cut the results at 60? that should work from a language perspective, at least, and if the distance function is pretty computationally intensive, it should help. -- Jeff Hoffmann PropertyKey.com
Hi Ryan, There is a bit of a strange way around the distance overhead issue : Create another table with structure like (lat1,long1,zip1,lat2,long2,zip2,distance) and precalculate the distance for each possibility. This means n*(n-1) rows if you have n location rows. You would then include this table in your query and use distance like you wanted to initially ( should work fast provided you index it on lat1,long1,distance) The calculation overhead of distance is then removed from your query ( at the expense of some disk space ). The insert of each new location requires n calculations of distance - you could perform this in the background I guess ! regards Mark