Обсуждение: haversine formula with postgreSQL

Поиск
Список
Период
Сортировка

haversine formula with postgreSQL

От
Jonathan
Дата:
Hi!

I am looking at the PHP/MySQL Google Maps API store locator example
here:

http://code.google.com/apis/maps/articles/phpsqlsearch.html

And I'm trying to get this to work with PostgreSQL instead of MySQL.

I've (slightly) modified the haversine formula part of my PHP script
but I keep getting this error:

Invalid query: ERROR: column "distance" does not exist LINE
1: ...ude ) ) ) ) AS distance FROM aaafacilities HAVING distance <...
^

I'm new to this, but it doesn't look like I need to create a column in
my table for distance, or at least the directions didn't say to create
a distance column.

Here is my PHP with SQL:
$query = sprintf("SELECT 'ID', 'FACILITY', 'ADDRESS', latitude,
longitude, ( 3959 * acos( cos( radians('%s') ) * cos( radians
( latitude ) ) * cos( radians( longitude ) - radians('%s') ) + sin
( radians('%s') ) * sin( radians( latitude ) ) ) ) AS distance FROM
aaafacilities HAVING distance < '%s' ORDER BY dist LIMIT 0 OFFSET 20",
  pg_escape_string($center_lat),
  pg_escape_string($center_lng),
  pg_escape_string($center_lat),
  pg_escape_string($radius));

Does anyone have any ideas on how I can get this to work?  I'm not
sure what is wrong, since it doesn't seem like I need to create a
distance column and when I do create one, I get this:

Invalid query: ERROR: column "aaafacilities.latitude" must appear in
the GROUP BY clause or be used in an aggregate function

Thanks for any comments or suggestions.  I appreciate it.  I'm new to
this.

Jonathan Harahush

Re: haversine formula with postgreSQL

От
Scott Marlowe
Дата:
On Thu, Sep 17, 2009 at 1:16 PM, Jonathan <jharahush@gmail.com> wrote:
> Hi!
>
> I am looking at the PHP/MySQL Google Maps API store locator example
> here:
>
> http://code.google.com/apis/maps/articles/phpsqlsearch.html
>
> And I'm trying to get this to work with PostgreSQL instead of MySQL.
>
> I've (slightly) modified the haversine formula part of my PHP script
> but I keep getting this error:
>
> Invalid query: ERROR: column "distance" does not exist LINE
> 1: ...ude ) ) ) ) AS distance FROM aaafacilities HAVING distance <...
> ^
>
> I'm new to this, but it doesn't look like I need to create a column in
> my table for distance, or at least the directions didn't say to create
> a distance column.
>
> Here is my PHP with SQL:
> $query = sprintf("SELECT 'ID', 'FACILITY', 'ADDRESS', latitude,
> longitude, ( 3959 * acos( cos( radians('%s') ) * cos( radians
> ( latitude ) ) * cos( radians( longitude ) - radians('%s') ) + sin
> ( radians('%s') ) * sin( radians( latitude ) ) ) ) AS distance FROM
> aaafacilities HAVING distance < '%s' ORDER BY dist LIMIT 0 OFFSET 20",
>  pg_escape_string($center_lat),
>  pg_escape_string($center_lng),
>  pg_escape_string($center_lat),
>  pg_escape_string($radius));
>
> Does anyone have any ideas on how I can get this to work?  I'm not
> sure what is wrong, since it doesn't seem like I need to create a
> distance column and when I do create one, I get this:

Is that really the whole query?  Why a having with no group by?

Can you do me a favor and print out $query instead of the php stuff?
It might help you as well to troubleshoot to see the real query.

Re: haversine formula with postgreSQL

От
Tom Lane
Дата:
Jonathan <jharahush@gmail.com> writes:
> Here is my PHP with SQL:
> $query = sprintf("SELECT 'ID', 'FACILITY', 'ADDRESS', latitude,
> longitude, ( 3959 * acos( cos( radians('%s') ) * cos( radians
> ( latitude ) ) * cos( radians( longitude ) - radians('%s') ) + sin
> ( radians('%s') ) * sin( radians( latitude ) ) ) ) AS distance FROM
> aaafacilities HAVING distance < '%s' ORDER BY dist LIMIT 0 OFFSET 20",

Sigh, you've been misled by MySQL's nonstandard behavior.  You cannot
refer to output columns of a query in its HAVING clause; it's disallowed
per spec and not logically sensible either.  The simplest way to deal
with it is just to repeat the expression in HAVING.  If you really
really don't want to write it twice, you can use a subquery.

            regards, tom lane

Re: haversine formula with postgreSQL

От
Mike Christensen
Дата:
This behavior kinda gets me sometimes too, especially in WHERE clauses..

I'm a bit curious as to why this is so bad.  I could see why it would
be expensive to do, since your clause wouldn't be indexed - but why is
the syntax itself not allowed?  Repeating the clause isn't gonna gain
you any speed, and might make it even slower since the expression
would have to be evaluated twice right?  Perhaps I'm missing
something..

Mike

On Thu, Sep 17, 2009 at 2:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Jonathan <jharahush@gmail.com> writes:
>> Here is my PHP with SQL:
>> $query = sprintf("SELECT 'ID', 'FACILITY', 'ADDRESS', latitude,
>> longitude, ( 3959 * acos( cos( radians('%s') ) * cos( radians
>> ( latitude ) ) * cos( radians( longitude ) - radians('%s') ) + sin
>> ( radians('%s') ) * sin( radians( latitude ) ) ) ) AS distance FROM
>> aaafacilities HAVING distance < '%s' ORDER BY dist LIMIT 0 OFFSET 20",
>
> Sigh, you've been misled by MySQL's nonstandard behavior.  You cannot
> refer to output columns of a query in its HAVING clause; it's disallowed
> per spec and not logically sensible either.  The simplest way to deal
> with it is just to repeat the expression in HAVING.  If you really
> really don't want to write it twice, you can use a subquery.
>
>                        regards, tom lane
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Re: haversine formula with postgreSQL

От
Jonathan
Дата:
It's the whole query as far as I can tell.  The app takes input from
the user --- the user enters an address and chooses a radius ("show me
all facilities within 5 miles of this address") and then the latitude
and longitude of the address and the radius is passed into the query
so that the database can grab all locations within a certain radius.

The example provided by Google is using MySQL.  The query for MySQL
looks like this:

SELECT address, name, lat, lng, ( 3959 * acos( cos( radians('%s') ) *
cos( radians( lat ) ) * cos( radians( lng ) - radians('%s') ) + sin
( radians('%s') ) * sin( radians( lat ) ) ) ) AS distance FROM markers
HAVING distance < '%s' ORDER BY distance LIMIT 0 , 20

And I'm attempting to change it to work with Postgres and have done
this:

SELECT 'ID', 'FACILITY', 'ADDRESS', latitude, longitude, ( 3959 * acos
( cos( radians('%s') ) * cos( radians( latitude ) ) * cos( radians
( longitude ) - radians('%s') ) + sin( radians('%s') ) * sin( radians
( latitude ) ) ) ) AS distance FROM aaafacilities HAVING distance <
'%s' ORDER BY distance LIMIT 0 OFFSET 20


aaafacilities is my table name in my posgres database.

I'm sorry if this isn't enough info.. like I said, I'm new to this but
definitely interested in learning and figuring this out!

From what I can tell, the database is supposed to calculate and then
output the distance of each "match" but it seems like in the MySQL
example, it can do this without having an actual distance column in
the database.

Thanks again!

Re: haversine formula with postgreSQL

От
Tom Lane
Дата:
Mike Christensen <mike@kitchenpc.com> writes:
> This behavior kinda gets me sometimes too, especially in WHERE clauses..
> I'm a bit curious as to why this is so bad.  I could see why it would
> be expensive to do, since your clause wouldn't be indexed - but why is
> the syntax itself not allowed?

It's not logically sensible: per the implicit execution model defined by
the spec, the output list is not computed until after WHERE/HAVING/etc
are evaluated, so it makes no sense to refer to output expressions in
those clauses.  As an example of why the execution ordering is
important, you'd be pretty unhappy if this threw a division-by-zero
error:

    select 1/avg(x) from tab group by y having avg(x) <> 0;

While we could interpret such a reference as meaning to copy the output
expression into the other clause, it doesn't seem like a particularly
good idea to encourage confusion about what the syntax means.

Also, allowing references to output column names here actually creates
an ambiguity: for instance "distance" could also be a column name
available from some table in the FROM clause.  So it's not exactly
cost-free to allow this; it will likely result in queries being silently
interpreted in some way other than what the author expected.

BTW, there are two cases where Postgres *does* allow such references:

1. In ORDER BY.  This is mainly because the SQL spec used to require it.
It's actually logically consistent because ORDER BY is notionally
executed after forming the output expressions, but it's still confusing.
The spec authors thought better of this idea and removed it in SQL99,
but we're still stuck supporting it for backwards compatibility reasons.

2.  In GROUP BY.  This is, frankly, a mistake, and one I wish we could
have a do-over on.  Again we're stuck with it for compatibility reasons,
but we're not likely to extend the mistake to other clauses.

In both these cases, to reduce the scope for ambiguity problems we only
allow references to output columns as simple ORDER or GROUP list items
(for instance "ORDER BY distance" but not "ORDER BY abs(distance)").
This is all right because it's still frequently useful, but you'd seldom
write a WHERE or HAVING clause that consisted *only* of an output-column
name.  So even if we did extend the behavior it wouldn't help much,
unless we were to fling the doors wide open for ambiguity problems.

            regards, tom lane

Re: haversine formula with postgreSQL

От
"Brent Wood"
Дата:
A bit out in left field,

Writing your own haversine in Postgres seems a bit like reinventing a wooden wheel when you gan get a free pneumatic
one...

Any reason not to just install PostGIS & fully support geometries & projections in Postgres?

You can build the geometries provided to the functions on the fly from lat/lon coordinates stored as numerics in your
SQL,so your DB structures don't even have to change if you don't want them to.. 

http://www.postgis.org/documentation/manual-1.4/ST_Distance_Sphere.html
http://www.postgis.org/documentation/manual-1.4/ST_Distance_Spheroid.html


HTH

   Brent Wood


Brent Wood
DBA/GIS consultant
NIWA, Wellington
New Zealand
>>> Scott Marlowe <scott.marlowe@gmail.com> 09/18/09 11:35 AM >>>
On Thu, Sep 17, 2009 at 1:16 PM, Jonathan <jharahush@gmail.com> wrote:
> Hi!
>
> I am looking at the PHP/MySQL Google Maps API store locator example
> here:
>
> http://code.google.com/apis/maps/articles/phpsqlsearch.html
>
> And I'm trying to get this to work with PostgreSQL instead of MySQL.
>
> I've (slightly) modified the haversine formula part of my PHP script
> but I keep getting this error:
>
> Invalid query: ERROR: column "distance" does not exist LINE
> 1: ...ude ) ) ) ) AS distance FROM aaafacilities HAVING distance <...
> ^
>
> I'm new to this, but it doesn't look like I need to create a column in
> my table for distance, or at least the directions didn't say to create
> a distance column.
>
> Here is my PHP with SQL:
> $query = sprintf("SELECT 'ID', 'FACILITY', 'ADDRESS', latitude,
> longitude, ( 3959 * acos( cos( radians('%s') ) * cos( radians
> ( latitude ) ) * cos( radians( longitude ) - radians('%s') ) + sin
> ( radians('%s') ) * sin( radians( latitude ) ) ) ) AS distance FROM
> aaafacilities HAVING distance < '%s' ORDER BY dist LIMIT 0 OFFSET 20",
>  pg_escape_string($center_lat),
>  pg_escape_string($center_lng),
>  pg_escape_string($center_lat),
>  pg_escape_string($radius));
>
> Does anyone have any ideas on how I can get this to work?  I'm not
> sure what is wrong, since it doesn't seem like I need to create a
> distance column and when I do create one, I get this:

Is that really the whole query?  Why a having with no group by?

Can you do me a favor and print out $query instead of the php stuff?
It might help you as well to troubleshoot to see the real query.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

NIWA is the trading name of the National Institute of Water & Atmospheric Research Ltd.

Re: haversine formula with postgreSQL

От
Jonathan Harahush
Дата:
I do have PostGIS installed and I use it for other things (geoserver), but I'm not knowledgeable enough about it to the point where I understand how to get it to work with the Google Maps API.  I'll look into it.  In the meantime, I was hoping to create something based off of the GMaps/PHP/MySQL example I referenced in an earlier post since I'm still learning.

The reason why I'm using Postgres is because it's installed at work.  We don't use MySQL.

Thanks for all of the help so far!  I appreciate it.

On Thu, Sep 17, 2009 at 5:50 PM, Brent Wood <b.wood@niwa.co.nz> wrote:
A bit out in left field,

Writing your own haversine in Postgres seems a bit like reinventing a wooden wheel when you gan get a free pneumatic one...

Any reason not to just install PostGIS & fully support geometries & projections in Postgres?

You can build the geometries provided to the functions on the fly from lat/lon coordinates stored as numerics in your SQL, so your DB structures don't even have to change if you don't want them to..

http://www.postgis.org/documentation/manual-1.4/ST_Distance_Sphere.html
http://www.postgis.org/documentation/manual-1.4/ST_Distance_Spheroid.html


HTH

  Brent Wood


Brent Wood
DBA/GIS consultant
NIWA, Wellington
New Zealand
>>> Scott Marlowe <scott.marlowe@gmail.com> 09/18/09 11:35 AM >>>
On Thu, Sep 17, 2009 at 1:16 PM, Jonathan <jharahush@gmail.com> wrote:
> Hi!
>
> I am looking at the PHP/MySQL Google Maps API store locator example
> here:
>
> http://code.google.com/apis/maps/articles/phpsqlsearch.html
>
> And I'm trying to get this to work with PostgreSQL instead of MySQL.
>
> I've (slightly) modified the haversine formula part of my PHP script
> but I keep getting this error:
>
> Invalid query: ERROR: column "distance" does not exist LINE
> 1: ...ude ) ) ) ) AS distance FROM aaafacilities HAVING distance <...
> ^
>
> I'm new to this, but it doesn't look like I need to create a column in
> my table for distance, or at least the directions didn't say to create
> a distance column.
>
> Here is my PHP with SQL:
> $query = sprintf("SELECT 'ID', 'FACILITY', 'ADDRESS', latitude,
> longitude, ( 3959 * acos( cos( radians('%s') ) * cos( radians
> ( latitude ) ) * cos( radians( longitude ) - radians('%s') ) + sin
> ( radians('%s') ) * sin( radians( latitude ) ) ) ) AS distance FROM
> aaafacilities HAVING distance < '%s' ORDER BY dist LIMIT 0 OFFSET 20",
>  pg_escape_string($center_lat),
>  pg_escape_string($center_lng),
>  pg_escape_string($center_lat),
>  pg_escape_string($radius));
>
> Does anyone have any ideas on how I can get this to work?  I'm not
> sure what is wrong, since it doesn't seem like I need to create a
> distance column and when I do create one, I get this:

Is that really the whole query?  Why a having with no group by?

Can you do me a favor and print out $query instead of the php stuff?
It might help you as well to troubleshoot to see the real query.

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

NIWA is the trading name of the National Institute of Water & Atmospheric Research Ltd.

Re: haversine formula with postgreSQL

От
Scott Marlowe
Дата:
On Thu, Sep 17, 2009 at 8:37 PM, Jonathan Harahush <jharahush@gmail.com> wrote:
> I do have PostGIS installed and I use it for other things (geoserver), but
> I'm not knowledgeable enough about it to the point where I understand how to
> get it to work with the Google Maps API.  I'll look into it.  In the
> meantime, I was hoping to create something based off of the GMaps/PHP/MySQL
> example I referenced in an earlier post since I'm still learning.
> The reason why I'm using Postgres is because it's installed at work.  We
> don't use MySQL.
>
> Thanks for all of the help so far!  I appreciate it.

The good news is that while postgresql is more persnickity about SQL
standards and it feels like you're dealing with a harsh task master,
the lessons learned are good ones.  They'll translate to other
database engines, and in the future when mysql grows more standards
compliant too.

Re: haversine formula with postgreSQL

От
Greg Smith
Дата:
On Thu, 17 Sep 2009, Jonathan wrote:

> Invalid query: ERROR: column "distance" does not exist LINE
> 1: ...ude ) ) ) ) AS distance FROM aaafacilities HAVING distance <...
> ^

You can't use distance in a HAVING clause if distance is computed as part
of the query result.  You can rewrite this to use a subquery instead:

SELECT * FROM
   (SELECT 'ID', 'FACILITY', 'ADDRESS', latitude, longitude,
     ( 3959 * acos( cos( radians('%s') ) * cos( radians
     ( latitude ) ) * cos( radians( longitude ) - radians('%s') ) + sin
     ( radians('%s') ) * sin( radians( latitude ) ) ) ) AS distance
    FROM aaafacilities)
HAVING distance < '%s' ORDER BY dist LIMIT 0 OFFSET 20

And I think that might work for you.  The performance is going to be
miserable for large stat sets, because it's going to scan the whole
aaafacilities table every time and recompute every distance, but as an
example goes it's probably acceptable.

Be warned that this example is like a textbook example of how to introduce
a SQL injection vulnerability into your code.  Once you get the syntax
right, you should be using a parameterized query here rather than
generting the query using sprintf before exposing this code to the outside
world.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: haversine formula with postgreSQL

От
Scott Bailey
Дата:
> And I think that might work for you.  The performance is going to be
> miserable for large stat sets, because it's going to scan the whole
> aaafacilities table every time and recompute every distance, but as an
> example goes it's probably acceptable.

Something I did when implementing haversine in Oracle was first
calculate a bounding box for lat/lon and only calculate the distance for
cities w/in the bounding box.

Scott

Re: haversine formula with postgreSQL

От
Merlin Moncure
Дата:
On Fri, Sep 18, 2009 at 5:50 PM, Scott Bailey <artacus@comcast.net> wrote:
>> And I think that might work for you.  The performance is going to be
>> miserable for large stat sets, because it's going to scan the whole
>> aaafacilities table every time and recompute every distance, but as an
>> example goes it's probably acceptable.
>
> Something I did when implementing haversine in Oracle was first calculate a
> bounding box for lat/lon and only calculate the distance for cities w/in the
> bounding box.

This is easily done in postgresql with gist operator on box type.  In
some cases you can actually optimize it even further...you can make a
'inner' bounding box that captures points that do not need the
expensive distance function.  So you use gist to pull points that
'may' be inside the distance, box overlap the points that 'must' be
inside the distance, and distance function the rest.

Also I highly suggest moving the distance calculation inside an
immutable function.

merlin

Re: haversine formula with postgreSQL

От
Rob Wultsch
Дата:
On Thu, Sep 17, 2009 at 2:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Jonathan <jharahush@gmail.com> writes:
> Here is my PHP with SQL:
> $query = sprintf("SELECT 'ID', 'FACILITY', 'ADDRESS', latitude,
> longitude, ( 3959 * acos( cos( radians('%s') ) * cos( radians
> ( latitude ) ) * cos( radians( longitude ) - radians('%s') ) + sin
> ( radians('%s') ) * sin( radians( latitude ) ) ) ) AS distance FROM
> aaafacilities HAVING distance < '%s' ORDER BY dist LIMIT 0 OFFSET 20",

Sigh, you've been misled by MySQL's nonstandard behavior.  You cannot
refer to output columns of a query in its HAVING clause; it's disallowed
per spec and not logically sensible either.  The simplest way to deal
with it is just to repeat the expression in HAVING.  If you really
really don't want to write it twice, you can use a subquery.

                       regards, tom lane

This practice is also a bad habit for MySQL users. I regularly see queries from users that have conditions that logically belong in the WHERE clause but the user shoves it into the HAVING. This is often done without a specific GROUP BY. The MySQL optimizer does not deal with this well.

When would it make logical sense to have a HAVING clause that deals with a column that is not inside a aggregating function?
--
Rob Wultsch
wultsch@gmail.com

Re: haversine formula with postgreSQL

От
Tom Lane
Дата:
Rob Wultsch <wultsch@gmail.com> writes:
> On Thu, Sep 17, 2009 at 2:47 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Jonathan <jharahush@gmail.com> writes:
>>> Here is my PHP with SQL:
>>> $query = sprintf("SELECT 'ID', 'FACILITY', 'ADDRESS', latitude,
>>> longitude, ( 3959 * acos( cos( radians('%s') ) * cos( radians
>>> ( latitude ) ) * cos( radians( longitude ) - radians('%s') ) + sin
>>> ( radians('%s') ) * sin( radians( latitude ) ) ) ) AS distance FROM
>>> aaafacilities HAVING distance < '%s' ORDER BY dist LIMIT 0 OFFSET 20",
>>
>> Sigh, you've been misled by MySQL's nonstandard behavior.  You cannot
>> refer to output columns of a query in its HAVING clause; it's disallowed
>> per spec and not logically sensible either.  The simplest way to deal
>> with it is just to repeat the expression in HAVING.  If you really

> This practice is also a bad habit for MySQL users. I regularly see queries
> from users that have conditions that logically belong in the WHERE clause
> but the user shoves it into the HAVING. This is often done without a
> specific GROUP BY. The MySQL optimizer does not deal with this well.

Actually, that reminds me that there's an even bigger nonstandard
behavior here.  HAVING is *not* just some weird alternative spelling of
WHERE; it implies that the query is grouped.  If you don't have a GROUP
BY clause then the implication is that the query should yield just one
row, and what's more it can't refer to any variables at all except
within aggregate functions, since there are no grouped variables.
MySQL gets this flat wrong.  (In fairness, we used to get it wrong too,
until we read the spec more closely.)

> When would it make logical sense to have a HAVING clause that deals with a
> column that is not inside a aggregating function?

Well, HAVING is supposed to apply to the post-GROUP BY rows.  So 99%
of the time you would want the HAVING condition to involve an aggregate
function --- otherwise you ought to put it in WHERE and filter away
the unwanted rows sooner.  However I can imagine having a
very-expensive-to-execute test that you'd rather execute only once
per group.

            regards, tom lane