Обсуждение: Advice on geolocation
I use a static postgresql database to serve web pages containing information about board game players and their ratings in various games. There are about 7000 players listed and about 300 games. I don't expect this to grow quickly in the short run. Most of the players are located in the US. One of the planned improvements is to publish contact information and to let people search for other people who live nearby and have common interests in games. I have street addresses for most of these people, though they currenty aren't in the database. I am also in the process of using Daniel Egnor's geocoder project (for the Google contest) working so that I can get Latitude and Longitude for most of these addresses. I expect to be able to do this for something like 5000 people. I am looking for hints on how to best use this information for radius queries. I have tried looking at Gist and PostGIS documentation. It looks like for Gist I would need to create a new datatype (since I don't want to use a flat earth model making point unsuitable) and figure out an efficient indexing scheme. I am pretty sure this is beyond my capability to do (in the short run at least). It wasn't clear to me that PostGIS would be fast for the small amount of data I have as their data models looked pretty general and probably had a lot of overhead. Can people recommend using either of the above approaches or any other open source solution? I also am not sure if using 2d lat+long or 3d rectangular is a better way to go. 3d is simpler as there is no wrap around. If I use a spherical model for the earth, then for any great circle distance I can compute a secant distance and using that as a bounding box distance for searches. This seems a lot easier than doing things with lat+long. If I do this there may be tricks for setting up the coordinates so that indexes work better. Since initially I will only have US data (and even later I expect to have mostly US data) setting up coordinates so that they are used from most discrimating to least discrinimating seems to be a good idea. I didn't see much on this issue at the main parts of the PostGIS site, but I haven't tried to read any of the referenced papers so far.
I've done quite a bit of this as I used to work on a GIS product. Given to lat,lon pairs it is not hard to find the great circle distance between them. This assumes a spherical Earth but good enough for your purposes as the error will be under a few hundred meters What you do is assume a plane defined by three points: your two geolocations and the Earth's center. Next assume a circle the same radius as the Earth (look this up) with center at the Earth's center. Your two geolocations are on the edge of the circle. Find the angle between the points then divide that angle over 360 degrees and multiply by the circumference of the circle (Earth). So we have converted a spherical geometry problem (hard) into a plain geometry problem (easier high school math) The trick to computing the angle between the two locations is to convert lat,long, at one Earth radius to (X,Y,Z) then you have two vectors and can use length,length,length to define a plain triangle. This is an easy one because you can convert it basic plain trig and you don't have to hunt down a specialist in spherical trigonometry --- Bruno Wolff III <bruno@wolff.to> wrote: > I use a static postgresql database to serve web pages containing > information > about board game players and their ratings in various games. There > are > about 7000 players listed and about 300 games. I don't expect this to > grow > quickly in the short run. Most of the players are located in the US. > > One of the planned improvements is to publish contact information and > to let people search for other people who live nearby and have common > interests in games. I have street addresses for most of these people, > though they currenty aren't in the database. I am also in the process > of using Daniel Egnor's geocoder project (for the Google contest) > working so that I can get Latitude and Longitude for most of these > addresses. I expect to be able to do this for something like 5000 > people. > > I am looking for hints on how to best use this information for radius > queries. I have tried looking at Gist and PostGIS documentation. > It looks like for Gist I would need to create a new datatype (since > I don't want to use a flat earth model making point unsuitable) and > figure out an efficient indexing scheme. I am pretty sure this is > beyond my > capability to do (in the short run at least). It wasn't clear to me > that > PostGIS would be fast for the small amount of data I have as their > data > models looked pretty general and probably had a lot of overhead. Can > people > recommend using either of the above approaches or any other open > source > solution? > > I also am not sure if using 2d lat+long or 3d rectangular is a better > way > to go. 3d is simpler as there is no wrap around. If I use a spherical > model for the earth, then for any great circle distance I can compute > a secant distance and using that as a bounding box distance for > searches. > This seems a lot easier than doing things with lat+long. If I do this > there may be tricks for setting up the coordinates so that indexes > work > better. Since initially I will only have US data (and even later I > expect > to have mostly US data) setting up coordinates so that they are used > from > most discrimating to least discrinimating seems to be a good idea. I > didn't > see much on this issue at the main parts of the PostGIS site, but I > haven't > tried to read any of the referenced papers so far. > > ---------------------------(end of > broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html ===== Chris Albertson Home: 310-376-1029 chrisalbertson90278@yahoo.com Cell: 310-990-7550 Office: 310-336-5189 Christopher.J.Albertson@aero.org __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
On Fri, Jul 26, 2002 at 09:33:53 -0700, Chris Albertson <chrisalbertson90278@yahoo.com> wrote: > > I've done quite a bit of this as I used to work on a GIS > product. Given to lat,lon pairs it is not hard to find > the great circle distance between them. This assumes a > spherical Earth but good enough for your purposes as the > error will be under a few hundred meters My concern about this is that it will not be indexable. I suspect, but don't know, that this won't really be an issue for the small number (~5000) points of data (especially since I expect other constraints to be used in most queries).
On Fri, 26 Jul 2002, Bruno Wolff III wrote: [snip] > My concern about this is that it will not be indexable. I suspect, but don't > know, that this won't really be an issue for the small number (~5000) points > of data (especially since I expect other constraints to be used in most > queries). Bruno, check out PostGIS a bit more. I believe you'll find that a lot of the fuctionality you're wanting is there or coming soon. I think there is even a distance function. Rod -- "Open Source Software - Sometimes you get more than you paid for..."
--- Bruno Wolff III <bruno@wolff.to> wrote: > On Fri, Jul 26, 2002 at 09:33:53 -0700, > Chris Albertson <chrisalbertson90278@yahoo.com> wrote: > > > > I've done quite a bit of this as I used to work on a GIS > > product. Given to lat,lon pairs it is not hard to find > > the great circle distance between them. This assumes a > > spherical Earth but good enough for your purposes as the > > error will be under a few hundred meters > > My concern about this is that it will not be indexable. I suspect, > but don't > know, that this won't really be an issue for the small number (~5000) > points > of data (especially since I expect other constraints to be used in > most > queries). The way I handle the non-indexable problem in an astronomical database the has tens of millions of stars in it is to fist make a crude querry. I SELECT all stars in a "box" then apply the great circle method only to those. For example if you wanted to find evey point within 500 miles of some point first assume a flat Earth and assume each degree equals 60 miles. This will give you to many hits but you only need to apply the more exact method to a small subset of the data. For many purposes (say within a couple hundred miles.) the "flat Earth" model is good enough if you assume degrees of latitude = 60 miles and degrees of long. = cos(lat)*60 mile per degree. then simply use 60*sqrt(lat^2+long2) to compute the distance (60 is for nautical miles can't remember the stature mile conversion right now.) For bigger distances like USA to Japan you really do need the great circle method. BUt "flat Earth" works for inside the same city ===== Chris Albertson Home: 310-376-1029 chrisalbertson90278@yahoo.com Cell: 310-990-7550 Office: 310-336-5189 Christopher.J.Albertson@aero.org __________________________________________________ Do You Yahoo!? Yahoo! Health - Feel better, live better http://health.yahoo.com
Chris Albertson wrote: > --- Bruno Wolff III <bruno@wolff.to> wrote: > >>On Fri, Jul 26, 2002 at 09:33:53 -0700, >> Chris Albertson <chrisalbertson90278@yahoo.com> wrote: >> >>>I've done quite a bit of this as I used to work on a GIS >>>product. Given to lat,lon pairs it is not hard to find >>>the great circle distance between them. This assumes a >>>spherical Earth but good enough for your purposes as the >>>error will be under a few hundred meters >> >>My concern about this is that it will not be indexable. I suspect, >>but don't >>know, that this won't really be an issue for the small number (~5000) >>points >>of data (especially since I expect other constraints to be used in >>most >>queries). I haven't really followed this entire thread, but FWIW, here is a plpgsql function to get great circle distance given a pair of lat/lons (the Haversine formula). There is also something in contrib (see contrib/earthdistance) to do this, but I've used this on a webhost where I couldn't install my own C libraries. CREATE FUNCTION "geodist" (float8,float8,float8,float8 ) RETURNS float8 AS ' DECLARE lat1 ALIAS FOR $1; lon1 ALIAS FOR $2; lat2 ALIAS FOR $3; lon2 ALIAS FOR $4; dist float8; BEGIN dist := 0.621 * 6371.2 * 2 * atan2( sqrt(abs(0 + pow(sin(radians(lat2)/2 - radians(lat1)/2),2) + cos(radians(lat1)) * cos(radians(lat2)) * pow(sin(radians(lon2)/2 - radians(lon1)/2),2))),sqrt(abs(1 - pow(sin(radians(lat2)/2 - radians(lat1)/2),2) + cos(radians(lat1)) * cos(radians(lat2)) * pow(sin(radians(lon2)/2 - radians(lon1)/2),2)))); return dist; END; ' LANGUAGE 'plpgsql'; I used this for finding US zipcodes within a certain distance of the given zipcode. To improve performance, I also used a "box" around the lat/lon pairs: . . . WHERE abs(z.lat - $lat1d) * 60 * 1.15078 <= $dist and abs(z.long - $lon1d) * 60 * 1.15078 <= $dist and geodist($lat1d,$lon1d,lat,long) <= $dist and z.zip = az.zipcode . . . This limits the data being considered to a square area twice the dimension of your desired distance, and then the distance calc further restricts down to a circle of radius $dist. Hopefully you can get the idea from this snippet. HTH, Joe