Обсуждение: avg() with floating-point types
I have city and postal_code tables linked by city_postal_code through a city_id and postal_code_id. The postal_codes have latitude/longitude, the cities don't. I want to set the city lat/long to the average of the associated postal codes (abstract for a minute on whether that actually makes sense from a geographical perspective), so I have a statement: update city set latitude = city2.lat from (select c.city_id, avg(pc.latitude) as lat from city c left join city_postal_codecpc using (city_id) left join postal_code pc using (postal_code_id) group by c.city_id) city2 where city2.city_id = city.city_id The datatype of both city.latitude and postal_code.latitude is number(16,12). This works, but I would like to understand why there is sometimes a discrepancy between avg(pc.latitude) and what actually gets inserted into the city table -- is it the usual floating-point discrepancy or is there something I can do about it? E.g. after the above update: select c.latitude, avg(pc.latitude), c.latitude-avg(pc.latitude) as diff from city c left join city_postal_code cpc using(city_id) left join postal_code pc using (postal_code_id) group by c.city_id,c.latitude having avg(pc.latitude)!= c.latitude latitude | avg | diff -----------------+---------------------+---------------------36.709374333333 | 36.7093743333333333 | -0.000000000000333341.078385733333| 41.0783857333333333 | -0.000000000000333331.576437888889 | 31.5764378888888889 | 0.000000000000111142.666669666667| 42.6666696666666667 | 0.000000000000333335.104581166667 | 35.1045811666666667 | 0.000000000000333331.263006142857| 31.2630061428571429 | -0.000000000000142938.805648772727 | 38.8056487727272727 | -0.0000000000002727 ... An additional question -- is the UPDATE above written as cleanly as possible (I am not very confident on my understanding of UPDATE-SET-FROM syntax)?
"George Pavlov" <gpavlov@mynewplace.com> writes: > The datatype of both city.latitude and postal_code.latitude is > number(16,12). > This works, but I would like to understand why there is sometimes a > discrepancy between avg(pc.latitude) and what actually gets inserted > into the city table -- is it the usual floating-point discrepancy or is > there something I can do about it? E.g. after the above update: You're forcing the result of the avg() calculation to be rounded to 12 digits when you store it into city.latitude. Your example simply shows that the avg() calculation is being done to more precision than that. regards, tom lane
On Sun, Jan 01, 2006 at 04:40:18PM -0800, George Pavlov wrote: > The datatype of both city.latitude and postal_code.latitude is > number(16,12). > > This works, but I would like to understand why there is sometimes a > discrepancy between avg(pc.latitude) and what actually gets inserted > into the city table -- is it the usual floating-point discrepancy or is > there something I can do about it? E.g. after the above update: You've declared the numeric column as (16,12) so the inserted values are rounded to the 12th decimal place. Notice that that's where the values start to differ: > latitude | avg | diff > -----------------+---------------------+--------------------- > 36.709374333333 | 36.7093743333333333 | -0.0000000000003333 Is such a difference significant to your application? The distance discrepancy is on the order of tens of nanometers, which seems absurdly precise. With lat/lon coordinates five places after the decimal point is about meter-precision, which is probably more than adequate for whatever you're doing. Incidentally, if you're doing anything geospatial then you might want to check out PostGIS: http://www.postgis.org/ If you look around you should be able to find shapefiles with points or polygons for cities and postal codes. -- Michael Fuhr