Обсуждение: Floating point comparison inconsistencies of the geometric types

Поиск

Список

Период

Сортировка

Floating point comparison inconsistencies of the geometric types

От

Emre Hasegeli

Дата:

27 мая 2016 г., 13:43:12

There are those macros defined for the built-in geometric types:

> #define EPSILON                 1.0E-06

> #define FPzero(A)               (fabs(A) <= EPSILON)
> #define FPeq(A,B)               (fabs((A) - (B)) <= EPSILON)
> #define FPne(A,B)               (fabs((A) - (B)) > EPSILON)
> #define FPlt(A,B)               ((B) - (A) > EPSILON)
> #define FPle(A,B)               ((A) - (B) <= EPSILON)
> #define FPgt(A,B)               ((A) - (B) > EPSILON)
> #define FPge(A,B)               ((B) - (A) <= EPSILON)

with this warning:

>  *    XXX These routines were not written by a numerical analyst.

Most of the geometric operators use those macros for comparison, but
those  do not:

* polygon << polygon
* polygon &< polygon
* polygon &> polygon
* polygon >> polygon
* polygon <<| polygon
* polygon &<| polygon
* polygon |&> polygon
* polygon |>> polygon
* box @> point
* point <@ box
* lseg <@ box
* circle @> point
* point <@ circle

This is really a bug that needs to be fixed one way or another.  I think
that it is better to fix it by removing the macros all together.  I
am not sure how useful they are in practice.  I haven't seen anyone
complaining about the above operators not using the macros.  Though,
people often complain about the ones using the macros and the problems
caused by them.

The hackers evidently don't like the macros, either.  That should be
why they are not used on the new operators.  What annoys me most about
this situation is the inconsistency blocks demonstrating our indexes.
Because of this, we had to rip out point type support from BRIN
inclusion operator class which could be useful to PostGIS.

Fixing it has been discussed many times before [1][2][3][4] with
no result.  Here is my plan to fix the situation covering the problems
around it:

1) Reimplement some operators to avoid divisions

The attached patch does it on some of the line operators.

2) Use exact comparison on everywhere except the operators fuzzy
   comparison is certainly required

The attach patch changes all operators except some "lseg" operators.
"lseg" stores two points on a line.  Most of the operations done on it
are lossy.  I don't see a problem treating them differently, those
operators are very unlikely to be index supported.

3) Check numbers for underflow and overflow

I am thinking to use CHECKFLOATVAL on utils/adt/float.c, but it is not
exposed outside at the moment.  Would it be okay to create a new header
file utils/float.h to increase code sharing between float and geometric
types?

4) Implement relative fuzzy comparison for the remaining operators

It is better to completely get rid of those macros while we are on it.
I think we can get away by implementing just a single function for fuzzy
equality, not for other comparisons.  I am inclined to put it to
utils/adt/float.c.  Is this a good idea?

Tom Lane commented on the function posted to the list [1] on 2002:

> Not like that. Perhaps use a fraction of the absolute value of the
> one with larger absolute value.  As-is it's hard to tell how FLT_EPSILON
> is measured.

I cannot image how the function would look like.  I would appreciate
any guidance.

5) Implement default hash operator class for all geometric types

This would solve most complained problem of those types allowing them
to used with DISTINCT and GROUP BY.

6) Implement default btree operator class at least for the point type

This would let the point type to be used with ORDER BY.  It is
relatively straight forward to implement it for the point type.  Is it
a good idea to somehow implement it for other types?

7) Add GiST index support for missing cross type operators

Currently only contained by operators are supported by an out-of-range
strategy number.  I think we can make the operator classes much nicer
by allowing really cross type operator families.

Comments?

[1] https://www.postgresql.org/message-id/flat/D90A5A6C612A39408103E6ECDD77B8290FD4E3@voyager.corporate.connx.com
[2] https://www.postgresql.org/message-id/flat/4A7C2C4B.5020508@netspace.net.au
[3] https://www.postgresql.org/message-id/flat/12549.1346111029@sss.pgh.pa.us
[4] https://www.postgresql.org/message-id/flat/20150512181307.GJ2523@alvh.no-ip.org

Вложения

0001-geo-ops-fpcomp-v01.patch

Re: Floating point comparison inconsistencies of the geometric types

От

Robert Haas

Дата:

01 июня 2016 г., 16:09:03

On Fri, May 27, 2016 at 6:43 AM, Emre Hasegeli <emre@hasegeli.com> wrote:
> There are those macros defined for the built-in geometric types:
>
>> #define EPSILON                 1.0E-06
>
>> #define FPzero(A)               (fabs(A) <= EPSILON)
>> #define FPeq(A,B)               (fabs((A) - (B)) <= EPSILON)
>> #define FPne(A,B)               (fabs((A) - (B)) > EPSILON)
>> #define FPlt(A,B)               ((B) - (A) > EPSILON)
>> #define FPle(A,B)               ((A) - (B) <= EPSILON)
>> #define FPgt(A,B)               ((A) - (B) > EPSILON)
>> #define FPge(A,B)               ((B) - (A) <= EPSILON)
>
> with this warning:
>
>>  *    XXX These routines were not written by a numerical analyst.

I agree that those macros looks like a pile of suck.  It's unclear to
me what purpose they're trying to accomplish, but regardless of what
it is, it's hard for me to believe that they are accomplishing it.
Whether 1.0E-06 is a correct fuzz factor presumably depends greatly on
the scale of the number; e.g. if the values are all like "3" it's
probably fine but if they are like "37142816124856" it's probably not
enough fuzz and if they are all like ".00000004" it's probably way too
much fuzz.

Figuring out what to do about it is harder.  Your proposal seems to be
to remove them except where we need the fuzzy behavior, which doesn't
sound unreasonable, but I don't personally understand why we need it
in some places and not others.  It would be good if some of the people
who are more numerically inclined than I am (and hate floats less, but
then that's everyone) could jump in here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Floating point comparison inconsistencies of the geometric types

От

Kevin Grittner

Дата:

01 июня 2016 г., 17:17:08

On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, May 27, 2016 at 6:43 AM, Emre Hasegeli <emre@hasegeli.com> wrote:
>> There are those macros defined for the built-in geometric types:
>>
>>> #define EPSILON                 1.0E-06
>>
>>> #define FPzero(A)               (fabs(A) <= EPSILON)
>>> #define FPeq(A,B)               (fabs((A) - (B)) <= EPSILON)
>>> #define FPne(A,B)               (fabs((A) - (B)) > EPSILON)
>>> #define FPlt(A,B)               ((B) - (A) > EPSILON)
>>> #define FPle(A,B)               ((A) - (B) <= EPSILON)
>>> #define FPgt(A,B)               ((A) - (B) > EPSILON)
>>> #define FPge(A,B)               ((B) - (A) <= EPSILON)
>>
>> with this warning:
>>
>>>  *    XXX These routines were not written by a numerical analyst.
>
> I agree that those macros looks like a pile of suck.

+1

> It's unclear to
> me what purpose they're trying to accomplish, but regardless of what
> it is, it's hard for me to believe that they are accomplishing it.
> Whether 1.0E-06 is a correct fuzz factor presumably depends greatly on
> the scale of the number; e.g. if the values are all like "3" it's
> probably fine but if they are like "37142816124856" it's probably not
> enough fuzz and if they are all like ".00000004" it's probably way too
> much fuzz.

Also, it's a pretty lame heuristic.  It doesn't really eliminate
*any* problems; it just aims to make them less frequent by shifting
the edges around.  In doing so, it creates whole new classes of
problems:

test=# \set A '''(1.9999996,1.9999996),(1,1)''::box'
test=# \set B '''(2,2),(1,1)''::box'
test=# \set C '''(2.0000004,2.0000004),(1,1)''::box'
test=# select :A = :B, :B = :C, :A = :C;?column? | ?column? | ?column?
----------+----------+----------t        | t        | f
(1 row)

Is the benefit we get from the macros worth destroying the
transitive property of the comparison operators on these types?

> Figuring out what to do about it is harder.  Your proposal seems to be
> to remove them except where we need the fuzzy behavior, which doesn't
> sound unreasonable, but I don't personally understand why we need it
> in some places and not others.

+1

My first inclination is to remove those macros in version 10, but
it would be good to hear from some people using these types on what
the impact of that would be.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Floating point comparison inconsistencies of the geometric types

От

Tom Lane

Дата:

01 июня 2016 г., 17:28:01

Kevin Grittner <kgrittn@gmail.com> writes:
> On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>> Figuring out what to do about it is harder.  Your proposal seems to be
>> to remove them except where we need the fuzzy behavior, which doesn't
>> sound unreasonable, but I don't personally understand why we need it
>> in some places and not others.

> +1

> My first inclination is to remove those macros in version 10, but
> it would be good to hear from some people using these types on what
> the impact of that would be.

As I understand it, the key problem is that tests like "is point on line"
would basically never succeed except in the most trivial cases, because of
roundoff error.  That's not very nice, and it might cascade to larger
problems like object-containment tests failing unexpectedly.  We would
need to go through all the geometric operations and figure out where that
kind of gotcha is significant and what we can do about it.  Seems like a
fair amount of work :-(.  If somebody's willing to do that kind of
investigation, then sure, but I don't think just blindly removing these
macros is going to lead to anything good.

Also, I suppose this means that Robert promises not to make any of his
usual complaints about breaking compatibility?  Because we certainly
would be.
        regards, tom lane

Re: Floating point comparison inconsistencies of the geometric types

От

Jim Nasby

Дата:

01 июня 2016 г., 17:52:38

On 6/1/16 9:27 AM, Tom Lane wrote:
> Kevin Grittner <kgrittn@gmail.com> writes:
>> > On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> >> Figuring out what to do about it is harder.  Your proposal seems to be
>>> >> to remove them except where we need the fuzzy behavior, which doesn't
>>> >> sound unreasonable, but I don't personally understand why we need it
>>> >> in some places and not others.
>> > +1
>> > My first inclination is to remove those macros in version 10, but
>> > it would be good to hear from some people using these types on what
>> > the impact of that would be.
> As I understand it, the key problem is that tests like "is point on line"
> would basically never succeed except in the most trivial cases, because of
> roundoff error.  That's not very nice, and it might cascade to larger
> problems like object-containment tests failing unexpectedly.  We would
> need to go through all the geometric operations and figure out where that
> kind of gotcha is significant and what we can do about it.  Seems like a
> fair amount of work :-(.  If somebody's willing to do that kind of
> investigation, then sure, but I don't think just blindly removing these
> macros is going to lead to anything good.

I suspect another wrinkle here is that in the GIS world a single point 
can be represented it multiple reference/coordinate systems, and it 
would have different values in each of them. AIUI the transforms between 
those systems can be rather complicated if different projection methods 
are involved. I don't know if PostGIS depends on what these macros are 
doing or not. If it doesn't, perhaps it would be sufficient to lop of 
the last few bits of the significand. ISTM that'd be much better than 
what the macros currently do.

BTW, I suspect the macro values were chosen specifically for dealing 
with LAT/LONG.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Floating point comparison inconsistencies of the geometric types

От

Paul Ramsey

Дата:

01 июня 2016 г., 18:03:24

On Wed, Jun 1, 2016 at 7:52 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:

>
> I suspect another wrinkle here is that in the GIS world a single point can
> be represented it multiple reference/coordinate systems, and it would have
> different values in each of them. AIUI the transforms between those systems
> can be rather complicated if different projection methods are involved. I
> don't know if PostGIS depends on what these macros are doing or not. If it
> doesn't, perhaps it would be sufficient to lop of the last few bits of the
> significand. ISTM that'd be much better than what the macros currently do.

We don't depend on these, we have our own :/
The real answer for a GIS system is to have an explicit tolerance
parameter for calculations like distance/touching/containment, but
unfortunately we didn't do that so now we have our own
compatibility/boil the ocean problem if we ever wanted/were funded to
add one.
P.

> BTW, I suspect the macro values were chosen specifically for dealing with
> LAT/LONG.
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)   mobile: 512-569-9461
>
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers

Re: Floating point comparison inconsistencies of the geometric types

От

Joe Conway

Дата:

01 июня 2016 г., 18:05:49

On 06/01/2016 07:52 AM, Jim Nasby wrote:
> On 6/1/16 9:27 AM, Tom Lane wrote:
>> Kevin Grittner <kgrittn@gmail.com> writes:
>>> > On Wed, Jun 1, 2016 at 8:08 AM, Robert Haas <robertmhaas@gmail.com>
>>> wrote:
>>>> >> Figuring out what to do about it is harder.  Your proposal seems
>>>> to be
>>>> >> to remove them except where we need the fuzzy behavior, which
>>>> doesn't
>>>> >> sound unreasonable, but I don't personally understand why we need it
>>>> >> in some places and not others.
>>> > +1
>>> > My first inclination is to remove those macros in version 10, but
>>> > it would be good to hear from some people using these types on what
>>> > the impact of that would be.
>> As I understand it, the key problem is that tests like "is point on line"
>> would basically never succeed except in the most trivial cases,
>> because of
>> roundoff error.  That's not very nice, and it might cascade to larger
>> problems like object-containment tests failing unexpectedly.  We would
>> need to go through all the geometric operations and figure out where that
>> kind of gotcha is significant and what we can do about it.  Seems like a
>> fair amount of work :-(.  If somebody's willing to do that kind of
>> investigation, then sure, but I don't think just blindly removing these
>> macros is going to lead to anything good.
>
> I suspect another wrinkle here is that in the GIS world a single point
> can be represented it multiple reference/coordinate systems, and it
> would have different values in each of them. AIUI the transforms between
> those systems can be rather complicated if different projection methods
> are involved. I don't know if PostGIS depends on what these macros are
> doing or not. If it doesn't, perhaps it would be sufficient to lop of
> the last few bits of the significand. ISTM that'd be much better than
> what the macros currently do.

IIRC PostGIS uses a function from libgeos to do things like "point
equals" (ST_Equals). I've never looked at that source, but would be
unsurprised to find that it does something similar to this although
probably also more sophisticated.

(looks) yes -- ST_Equals calls GEOSEquals() after some sanity checking...

> BTW, I suspect the macro values were chosen specifically for dealing
> with LAT/LONG.

I think that is it exactly.

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Re: Floating point comparison inconsistencies of the geometric types

От

Jim Nasby

Дата:

01 июня 2016 г., 19:00:07

On 6/1/16 10:03 AM, Paul Ramsey wrote:
> We don't depend on these, we have our own :/
> The real answer for a GIS system is to have an explicit tolerance
> parameter for calculations like distance/touching/containment, but
> unfortunately we didn't do that so now we have our own
> compatibility/boil the ocean problem if we ever wanted/were funded to
> add one.

Well it sounds like what's currently happening in Postgres is probably 
going to change, so how might we structure that to help PostGIS? Would 
simply lopping off the last few bits of the significand/mantissa work, 
or is that not enough when different GRSes are involved?
-- 
Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
Experts in Analytics, Data Architecture and PostgreSQL
Data in Trouble? Get it in Treble! http://BlueTreble.com
855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Floating point comparison inconsistencies of the geometric types

От

Paul Ramsey

Дата:

01 июня 2016 г., 21:00:26

On Wed, Jun 1, 2016 at 8:59 AM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 6/1/16 10:03 AM, Paul Ramsey wrote:
>>
>> We don't depend on these, we have our own :/
>> The real answer for a GIS system is to have an explicit tolerance
>> parameter for calculations like distance/touching/containment, but
>> unfortunately we didn't do that so now we have our own
>> compatibility/boil the ocean problem if we ever wanted/were funded to
>> add one.
>
>
> Well it sounds like what's currently happening in Postgres is probably going
> to change, so how might we structure that to help PostGIS? Would simply
> lopping off the last few bits of the significand/mantissa work, or is that
> not enough when different GRSes are involved?

PostGIS doesn't look at all at what the PgSQL geotypes do, so go
forward w/o fear. Tolerance in geo world is more than vertex rounding
though, it's things like saying that when distance(pt,line) < epsilon
then distance(pt,line) == 0, or similarly for shape touching, etc. One
of the things people find annoying about postgis is that

ST_Intersects(ST_Intersection(a, b), a) can come out as false (a
derived point at a crossing of lines may not exactly intersect either
of the input lines), which is a direct result of our use of exact math
for the boolean intersects test.

Anyways, go forth and do whatever makes sense for PgSQL

P

>
> --
> Jim Nasby, Data Architect, Blue Treble Consulting, Austin TX
> Experts in Analytics, Data Architecture and PostgreSQL
> Data in Trouble? Get it in Treble! http://BlueTreble.com
> 855-TREBLE2 (855-873-2532)   mobile: 512-569-9461

Re: Floating point comparison inconsistencies of the geometric types

От

Tom Lane

Дата:

01 июня 2016 г., 21:52:16

Paul Ramsey <pramsey@cleverelephant.ca> writes:
> One of the things people find annoying about postgis is that
> ST_Intersects(ST_Intersection(a, b), a) can come out as false (a
> derived point at a crossing of lines may not exactly intersect either
> of the input lines), which is a direct result of our use of exact math
> for the boolean intersects test.

That's an interesting comment, because it's more or less exactly the
type of failure that we could expect to get if we remove fuzzy comparisons
from the built-in types.  How much of a problem is it in practice for
PostGIS users?  Do you have any plans to change it?

> Anyways, go forth and do whatever makes sense for PgSQL

I think we're trying to figure out what that is ...
        regards, tom lane

Re: Floating point comparison inconsistencies of the geometric types

От

Robert Haas

Дата:

07 июля 2016 г., 23:23:25

On Wed, Jun 1, 2016 at 10:27 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> As I understand it, the key problem is that tests like "is point on line"
> would basically never succeed except in the most trivial cases, because of
> roundoff error.  That's not very nice, and it might cascade to larger
> problems like object-containment tests failing unexpectedly.  We would
> need to go through all the geometric operations and figure out where that
> kind of gotcha is significant and what we can do about it.  Seems like a
> fair amount of work :-(.  If somebody's willing to do that kind of
> investigation, then sure, but I don't think just blindly removing these
> macros is going to lead to anything good.

Yeah, it does seem to need some research.

> Also, I suppose this means that Robert promises not to make any of his
> usual complaints about breaking compatibility?  Because we certainly
> would be.

Pot, meet Mr. Kettle!

Obviously, the inconvenience caused by any backward incompatibility
has to be balanced against the fact that the new behavior is
presumably better.  But I stridently object to the accusation that of
the two of us I'm the one more concerned with backward-compatibility.
There may be some instances where I've had a more conservative
judgement than you about breaking user-facing stuff, but you've
blocked dozens of changes to the C API that would have enabled
meaningful extension development on the grounds that somebody might
complain when a future release changes the API!  I think behavior
changes that users will notice are of vastly greater significance than
those which will only be observed by developers.

In this particular case, I think that the current behavior is pretty
stupid, and that the built-in geometric types are barely used,
possibly because they have stupid behavior.  So I would be willing to
bet on a well-thought-out change in this area coming out to a net
positive.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Floating point comparison inconsistencies of the geometric types

От

Emre Hasegeli

Дата:

18 июля 2016 г., 23:54:22

My progress so far is attached as 2 patches.  First one introduces an
header file for adt/float.c.  Second one refactors the geometric
operations.

I have removed the fuzzy comparison macros all together.  It is very
hard to keep some of them and maintain consistency.  Many operators
that would benefit from the fuzzy comparison is quite broken at the
moment [1], anyway.  I have also made the comparisons safe against NaN
[2].

[1]
https://www.postgresql.org/message-id/flat/CAE2gYzw_-z%3DV2kh8QqFjenu%3D8MJXzOP44wRW%3DAzzeamrmTT1%3DQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/flat/28685.1468246504%40sss.pgh.pa.us

Вложения

Re: Floating point comparison inconsistencies of the geometric types

От

Kevin Grittner

Дата:

02 сентября 2016 г., 23:16:50

On Mon, Jul 18, 2016 at 3:54 PM, Emre Hasegeli <emre@hasegeli.com> wrote:
> My progress so far is attached as 2 patches.  First one introduces an
> header file for adt/float.c.  Second one refactors the geometric
> operations.

The first patch fails to apply due to bit-rot.  That's easy enough
to correct, but then it runs into warnings on make:

btree_gin.c: In function ‘leftmostvalue_float4’:
btree_gin.c:234:2: error: implicit declaration of function
‘get_float4_infinity’ [-Werror=implicit-function-declaration] return Float4GetDatum(-get_float4_infinity()); ^
btree_gin.c: In function ‘leftmostvalue_float8’:
btree_gin.c:242:2: error: implicit declaration of function
‘get_float8_infinity’ [-Werror=implicit-function-declaration] return Float8GetDatum(-get_float8_infinity()); ^

Please fix.

Something to consider before posting new version -- should we
change some of those macros to static inline (in the .h files) to
avoid double-evaluation hazards?  They might perform as well or
even better that way, and remove a subtle programmer foot-gun.

Changing status to "Waiting on Author".

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Floating point comparison inconsistencies of the geometric types

От

Emre Hasegeli

Дата:

04 сентября 2016 г., 20:42:48

> The first patch fails to apply due to bit-rot.  That's easy enough
> to correct, but then it runs into warnings on make:

Rebased and fixed.

> Something to consider before posting new version -- should we
> change some of those macros to static inline (in the .h files) to
> avoid double-evaluation hazards?  They might perform as well or
> even better that way, and remove a subtle programmer foot-gun.

One reason to keep them as macros is that they are currently
supporting both float4 and float8.  Currently they look like this:

FLOAT_EQ(val1, val2)
FLOAT_PL(result, val1, val2)

I guess if we would use inline functions, they would look like:

FLOAT8_EQ(val1, val2)
result = FLOAT8_PL(val1, val2)

Result would need to be defined again in the function to be checked
for overflow.  I think this way would be more developer-friendly.  I
have gone some trouble to allocate the result to be able to use the
macros.  Do you know if the performance of both would be the same?

I am not much experienced in C.  If you think that inline functions
are better suited, I can rework the patches.

Вложения

Re: Floating point comparison inconsistencies of the geometric types

От

Kevin Grittner

Дата:

10 сентября 2016 г., 00:25:52

On Sun, Sep 4, 2016 at 12:42 PM, Emre Hasegeli <emre@hasegeli.com> wrote:
>> The first patch fails to apply due to bit-rot.  That's easy enough
>> to correct, but then it runs into warnings on make:
>
> Rebased and fixed.

These patches apply and build on top of 5c609a74 with no problems,
but `make check` finds differences per the attached.  Please
investigate why the regression tests are failing and what the
appropriate response is.

>> Something to consider before posting new version -- should we
>> change some of those macros to static inline (in the .h files) to
>> avoid double-evaluation hazards?  They might perform as well or
>> even better that way, and remove a subtle programmer foot-gun.
>
> One reason to keep them as macros is that they are currently
> supporting both float4 and float8.  Currently they look like this:
>
> FLOAT_EQ(val1, val2)
> FLOAT_PL(result, val1, val2)
>
> I guess if we would use inline functions, they would look like:
>
> FLOAT8_EQ(val1, val2)
> result = FLOAT8_PL(val1, val2)
>
> Result would need to be defined again in the function to be checked
> for overflow.  I think this way would be more developer-friendly.  I
> have gone some trouble to allocate the result to be able to use the
> macros.  Do you know if the performance of both would be the same?

In one case where I was concerned that a static inline definition
would not perform as well as a macro, I went so far as to compare
the compiled code for both at a number of call sites in an
optimized build and found that a reasonably current gcc generated
*identical* code for both.  Of course, this will not always be the
case -- in particular for macros which require multiple evaluations
of one or more parameters and they are not simple literals.  In
such cases, the static inline often benchmarks faster because the
results from the potentially expensive expression can be stored on
the stack or in a register, and just referenced again.

> I am not much experienced in C.  If you think that inline functions
> are better suited, I can rework the patches.

I suspect that they will be as fast or faster, and they eliminate
the hazard of multiple evaluation, where a programmer might not be
aware of the multiple evaluation or of some side-effect of an
argument.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

regression.diffs

Re: Floating point comparison inconsistencies of the geometric types

От

Kevin Grittner

Дата:

27 сентября 2016 г., 20:36:06

On Fri, Sep 9, 2016 at 4:25 PM, Kevin Grittner <kgrittn@gmail.com> wrote:
> On Sun, Sep 4, 2016 at 12:42 PM, Emre Hasegeli <emre@hasegeli.com> wrote:

> These patches apply and build on top of 5c609a74 with no problems,
> but `make check` finds differences per the attached.  Please
> investigate why the regression tests are failing and what the
> appropriate response is.

>> I am not much experienced in C.  If you think that inline functions
>> are better suited, I can rework the patches.
>
> I suspect that they will be as fast or faster, and they eliminate
> the hazard of multiple evaluation, where a programmer might not be
> aware of the multiple evaluation or of some side-effect of an
> argument.

Emre, are you going to address the above?  It would have to be Real
Soon Now.

-- 
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Floating point comparison inconsistencies of the geometric types

От

Emre Hasegeli

Дата:

28 сентября 2016 г., 10:59:17

> Emre, are you going to address the above?  It would have to be Real
> Soon Now.

Yes, I am working on it.  I found more problems, replaced more
algorithms.  That took a lot of time.  I will post the new version
really soon.  I wouldn't feel bad, if you wouldn't have enough time to
review it in this commitfest.

Re: Floating point comparison inconsistencies of the geometric types

От

Emre Hasegeli

Дата:

28 сентября 2016 г., 20:02:23

> These patches apply and build on top of 5c609a74 with no problems,
> but `make check` finds differences per the attached.  Please
> investigate why the regression tests are failing and what the
> appropriate response is.

I fixed the first one and workaround the second with COLLATE "C".  I
have how my changes caused this regression.

"select_views" test runs "SELECT name, #thepath FROM iexit ORDER BY 1,
2" and expects to get rows in this order:

>  I- 580                        Ramp |        8
>  I- 580/I-680                  Ramp |        2

With the collation on my laptop, this is actually true:

> regression=# select 'I- 580/I-680                  Ramp' < 'I- 580                        Ramp';
>  ?column?
> ----------
>  t
> (1 row)

However, on the Linux server, I am testing it is not:

> regression=# select 'I- 580                        Ramp' < 'I- 580/I-680                  Ramp';
>  ?column?
> ----------
>  f
> (1 row)

Do you know how it is not failing on the master?

> I suspect that they will be as fast or faster, and they eliminate
> the hazard of multiple evaluation, where a programmer might not be
> aware of the multiple evaluation or of some side-effect of an
> argument.

I reworked the the patches to use inline functions and fixed the
problems I found.  The new versions are attached.

Вложения

Re: Floating point comparison inconsistencies of the geometric types

От

Kevin Grittner

Дата:

28 сентября 2016 г., 22:04:18

On Wed, Sep 28, 2016 at 12:02 PM, Emre Hasegeli <emre@hasegeli.com> wrote:

>> `make check` finds differences per the attached.  Please
>> investigate why the regression tests are failing and what the
>> appropriate response is.
>
> I fixed the first one and workaround the second with COLLATE "C".  I
> have how my changes caused this regression.
>
> "select_views" test runs "SELECT name, #thepath FROM iexit ORDER BY 1,
> 2" and expects to get rows in this order:
>
>>  I- 580                        Ramp |        8
>>  I- 580/I-680                  Ramp |        2
>
> With the collation on my laptop, this is actually true:
>
>> regression=# select 'I- 580/I-680                  Ramp' < 'I- 580                        Ramp';
>>  ?column?
>> ----------
>>  t
>> (1 row)
>
> However, on the Linux server, I am testing it is not:
>
>> regression=# select 'I- 580                        Ramp' < 'I- 580/I-680                  Ramp';
>>  ?column?
>> ----------
>>  f
>> (1 row)
>
> Do you know how it is not failing on the master?

Well, those two results are not contradictory -- notice that you
switched the order of the values in the comparison.  I don't think
you've really found the explanation yet.

>> [discussing inline static functions compared to macros for min()/max(), etc.]
>> I suspect that they will be as fast or faster, and they eliminate
>> the hazard of multiple evaluation, where a programmer might not be
>> aware of the multiple evaluation or of some side-effect of an
>> argument.
>
> I reworked the the patches to use inline functions and fixed the
> problems I found.  The new versions are attached.

Will take a look and post again.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Floating point comparison inconsistencies of the geometric types

От

Kevin Grittner

Дата:

28 сентября 2016 г., 23:17:46

On Wed, Sep 28, 2016 at 2:04 PM, Kevin Grittner <kgrittn@gmail.com> wrote:

> Will take a look and post again.

I am moving this patch to the next CF.  You'll be hearing from me
sometime after this CF is closed.

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: Floating point comparison inconsistencies of the geometric types

От

Emre Hasegeli

Дата:

29 сентября 2016 г., 11:37:35

> Well, those two results are not contradictory -- notice that you
> switched the order of the values in the comparison.  I don't think
> you've really found the explanation yet.

I am sorry I copy-pasted the wrong example:

"select_views" test runs:

> SELECT name, #thepath FROM iexit ORDER BY 1, 2

and expects to get rows in this order:

>  I- 580                        Ramp |        8
>  I- 580/I-680                  Ramp |        2

with the collation on my laptop, this is actually true:

> regression=# select 'I- 580                        Ramp' < 'I- 580/I-680                  Ramp';
>  ?column?
> ----------
>  t
> (1 row)

on the Linux server I am testing, it is not:

> regression=# select 'I- 580                        Ramp' < 'I- 580/I-680                  Ramp';
>  ?column?
> ----------
>  f
> (1 row)

The later should be the case on your environment as the test was also
failing for you.  This is not consistent with the expected test
result.  Do you know how this test can still pass on the master?

Re: Floating point comparison inconsistencies of the geometric types

От

Kyotaro HORIGUCHI

Дата:

10 ноября 2016 г., 13:35:13

Hello,

At Thu, 29 Sep 2016 10:37:30 +0200, Emre Hasegeli <emre@hasegeli.com> wrote in
<CAE2gYzxijWKwJ-ZPD--QHM+SxMd+vL_81_3Xt0spnCbsqFH=Ug@mail.gmail.com>
> > regression=# select 'I- 580                        Ramp' < 'I- 580/I-680                  Ramp';
> >  ?column?
> > ----------
> >  t
> > (1 row)
> 
> on the Linux server I am testing, it is not:
> 
> > regression=# select 'I- 580                        Ramp' < 'I- 580/I-680                  Ramp';
> >  ?column?
> > ----------
> >  f
> > (1 row)
> 
> The later should be the case on your environment as the test was also
> failing for you.  This is not consistent with the expected test
> result.  Do you know how this test can still pass on the master?

Perhaps you ran the test under the environment LC_COLLATE (or
LANG) of something other than C. The reson for the result is
collation. The following returns expected result.


=# select 'I- 580                        Ramp' < ('I- 580/I-680                  Ramp' COLLATE "C");?column? 
----------t
(1 row)

For a reason I don't know, say, en_US locale behaves in an
unintuitive way.

regression=# select ' ' < ('/' COLLATE "en_US"), ' x' < ('/a' COLLATE "en_US");?column? | ?column? 
----------+----------t        | f
(1 row)


Returning to the issue, the following query should give you the
expected result.

SELECT name, #thepath  FROM iexit ORDER BY name COLLATE "C", 2;

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Floating point comparison inconsistencies of the geometric types

От

Emre Hasegeli

Дата:

10 ноября 2016 г., 15:41:24

> Returning to the issue, the following query should give you the
> expected result.
>
> SELECT name, #thepath  FROM iexit ORDER BY name COLLATE "C", 2;

Yes, I have worked around it like this.  What I couldn't understand is
how my patch can cause this regression.  How is it passes on master
without COLLATE?

Re: Floating point comparison inconsistencies of the geometric types

От

Kyotaro HORIGUCHI

Дата:

11 ноября 2016 г., 05:05:40

Hello,

> > Returning to the issue, the following query should give you the
> > expected result.
> >
> > SELECT name, #thepath  FROM iexit ORDER BY name COLLATE "C", 2;
> 
> Yes, I have worked around it like this.  What I couldn't understand is
> how my patch can cause this regression.  How is it passes on master
> without COLLATE?

Ah, sorry, I understand that *you* added the COLLATE.  Revering
select_views.sql/out to master actually causes a regression error.

The reason for that is that you forgot to edit an alternative
exptect file, which matches for en_US locale.

But the test doesn't for collation and the only difference
between the alternative expect files comes from the difference of
collation for the query. "the workaround" seems to be the right
way to do it. I recommend rather to leave the workaroud as it is
and remove select_views_1.out from the "expected" directory.

Aside from that, I'd like to comment this patch on other points
later.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Floating point comparison inconsistencies of the geometric types

От

Kyotaro HORIGUCHI

Дата:

11 ноября 2016 г., 10:41:00

Hello,

> Aside from that, I'd like to comment this patch on other points
> later.

The start of this patch is that the fact that most of but not all
geometric operators use fuzzy comparson. But Tom pointed out that
the fixed fuzz factor is not reasonable but hard to find how to
modify it.

We can remove the fuzz factor altogether but I think we also
should provide a means usable to do similar things. At least "is
a point on a line" might be useless for most cases without any
fuzzing feature. (Nevertheless, it is a problem only when it is
being used to do that:) If we don't find reasonable policy on
fuzzing operations, it would be the proof that we shouldn't
change the behavior.


The 0001 patch adds many FP comparison functions individually
considering NaN. As the result the sort order logic involving NaN
is scattered around into the functions, then, you implement
generic comparison function using them. It seems inside-out to
me. Defining ordering at one place, then comparison using it
seems to be reasonable.


The 0002 patch repalces many native operators for floating point
numbers by functions having sanity check. This seems to be
addressing to Tom's comment. It is reasonable for comparison
operators but I don't think replacing all arithmetics is so.  For
example, float8_div checks that
- If both of operands are not INF, result should not be INF.- If the devident is not exactly zero, the result should
notbe zero.
 

The second assumption is wrong by itself. For an extreme case,
4.9e-324 / 1.7e+308 becomes exactly zero (by underflow). We
canont assert it to be wrong but the devedent is not zero. The
validity of the result varies according to its meaning. For the
case of box_cn,

> center->x = float8_div(float8_pl(box->high.x, box->low.x), 2.0);
> center->y = float8_div(float8_pl(box->high.y, box->low.y), 2.0);

If the center somehow goes extremely near to the origin, it could
result in a false error.

> =# select @@ box'(-8e-324, -8e-324), (4.9e-324, 4.9e-324)';
> ERROR:  value out of range: underflow

I don't think this underflow is an error, and actually it is a
change of the current behavior without a reasonable reason. More
significant (and maybe unacceptable) side-effect is that it
changes the behavior of ordinary operators. I don't think this is
acceptable. More consideration is needed.

> =# select ('-8e-324'::float8 + '4.9e-324'::float8) / 2.0;
> ERROR:  value out of range: underflow



In regard to fuzzy operations, libgeos seems to have several
types of this kind of feature. (I haven't looked closer into
them). Other than reducing precision seems overkill or
unappliable for PostgreSQL bulitins. As Jim said, can we replace
the fixed scale fuzz factor by precision reduction? Maybe, with a
GUC variable (I hear someone's roaring..) to specify the amount
defaults to fit the current assumption.


https://apt-browse.org/browse/ubuntu/trusty/universe/i386/libgeos++-dev/3.4.2-4ubuntu1/file/usr/include/geos/geom/BinaryOp.h

/** Define this to use PrecisionReduction policy* in an attempt at by-passing binary operation* robustness problems
(handlesTopologyExceptions)*/
 
...* Define this to use TopologyPreserving simplification policy* in an attempt at by-passing binary operation*
robustnessproblems (handles TopologyExceptions)
 
(seems not activated)
...* Use common bits removal policy.* If enabled, this would be tried /before/* Geometry snapping.
...
/** Use snapping policy*/


regards,


-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Floating point comparison inconsistencies of the geometric types

От

Emre Hasegeli

Дата:

13 ноября 2016 г., 16:05:25

> The reason for that is that you forgot to edit an alternative
> exptect file, which matches for en_US locale.

Now I understand.  Thank you for the explanation.

> But the test doesn't for collation and the only difference
> between the alternative expect files comes from the difference of
> collation for the query. "the workaround" seems to be the right
> way to do it. I recommend rather to leave the workaroud as it is
> and remove select_views_1.out from the "expected" directory.

I will do.

Re: Floating point comparison inconsistencies of the geometric types

От

Emre Hasegeli

Дата:

14 ноября 2016 г., 00:41:12

> We can remove the fuzz factor altogether but I think we also
> should provide a means usable to do similar things. At least "is
> a point on a line" might be useless for most cases without any
> fuzzing feature. (Nevertheless, it is a problem only when it is
> being used to do that:) If we don't find reasonable policy on
> fuzzing operations, it would be the proof that we shouldn't
> change the behavior.

It was my initial idea to keep the fuzzy comparison behaviour on some
places, but the more I get into I realised that it is almost
impossible to get this right.  Instead, I re-implemented some
operators to keep precision as much as possible.  The previous "is a
point on a line" operator would *never* give the true result without
the fuzzy comparison.  The new implementation would return true, when
precision is not lost.  I think this is a behaviour people, who are
working with floating points, are prepared to deal with.  By the way,
"is a point on a line" operator is quite wrong with the fuzzy
comparison at the moment [1].

> The 0001 patch adds many FP comparison functions individually
> considering NaN. As the result the sort order logic involving NaN
> is scattered around into the functions, then, you implement
> generic comparison function using them. It seems inside-out to
> me. Defining ordering at one place, then comparison using it
> seems to be reasonable.

I agree that it would be simpler to use the comparison function for
implementing other operators.  I have done it other way around to make
them more optimised.  They are called very often.  I don't think
checking exit code of the comparison function would be optimised the
same way.  I could leave the comparison functions as they are, but
re-implemented them using the others to keep documentation of NaN
comparison in the single place.

> If the center somehow goes extremely near to the origin, it could
> result in a false error.
>
>> =# select @@ box'(-8e-324, -8e-324), (4.9e-324, 4.9e-324)';
>> ERROR:  value out of range: underflow
>
> I don't think this underflow is an error, and actually it is a
> change of the current behavior without a reasonable reason. More
> significant (and maybe unacceptable) side-effect is that it
> changes the behavior of ordinary operators. I don't think this is
> acceptable. More consideration is needed.
>
>> =# select ('-8e-324'::float8 + '4.9e-324'::float8) / 2.0;
>> ERROR:  value out of range: underflow

This is the current behaviour of float datatype.  My patch doesn't
change that.  This problem would probably also apply to multiplying
very small values.  I agree that this is not the ideal behaviour.
Though I am not sure, if we should go to a different direction than
the float datatypes.

I think there is value in making geometric types compatible with the
float.  Users are going to mix them, anyway.  For example, users can
calculate the center of a box manually, and confuse when the built-in
operator behaves differently.

> In regard to fuzzy operations, libgeos seems to have several
> types of this kind of feature. (I haven't looked closer into
> them). Other than reducing precision seems overkill or
> unappliable for PostgreSQL bulitins. As Jim said, can we replace
> the fixed scale fuzz factor by precision reduction? Maybe, with a
> GUC variable (I hear someone's roaring..) to specify the amount
> defaults to fit the current assumption.

I am disinclined to try to implement something complicated for the
geometric types.  I think they are mostly useful for 2 purposes: uses
simple enough to not worth looking for better solutions, and
demonstrating our indexing capabilities.  The inconsistencies harm
both of those.

[1]
https://www.postgresql.org/message-id/flat/CAE2gYzw_-z%3DV2kh8QqFjenu%3D8MJXzOP44wRW%3DAzzeamrmTT1%3DQ%40mail.gmail.com

Re: Floating point comparison inconsistencies of the geometric types

От

Kyotaro HORIGUCHI

Дата:

14 ноября 2016 г., 11:06:19

Hello,

At Sun, 13 Nov 2016 22:41:01 +0100, Emre Hasegeli <emre@hasegeli.com> wrote in
<CAE2gYzymeQXGGmhU1Vc35DpugwfRd-QRK3BM-6TGg0rwHcDN_w@mail.gmail.com>
> > We can remove the fuzz factor altogether but I think we also
> > should provide a means usable to do similar things. At least "is
> > a point on a line" might be useless for most cases without any
> > fuzzing feature. (Nevertheless, it is a problem only when it is
> > being used to do that:) If we don't find reasonable policy on
> > fuzzing operations, it would be the proof that we shouldn't
> > change the behavior.
> 
> It was my initial idea to keep the fuzzy comparison behaviour on some
> places, but the more I get into I realised that it is almost
> impossible to get this right.  Instead, I re-implemented some
> operators to keep precision as much as possible.  The previous "is a
> point on a line" operator would *never* give the true result without
> the fuzzy comparison.  The new implementation would return true, when
> precision is not lost.

There's no way to accurately store numbers other than some
special ones in floating point format where mantissa precision is
limited. Even 0.1 is not stored accurately with a binary
mantissa. (It would be concealed by machine rounding for most
cases though.)  The same is said for numeric. It cannot store 1/3
accurately and doesn't even conceal the incaccuracy. Even if they
could store numbers accurately, anyway , say, the constant pi
does not have infinite precision.  As the result, we have a
tradition that equal operator on real numbers should be avoided
generally. Numerics are provided mainly for financial use, on
which finite-but-high precision decimal arithmetic are performed.

> I think this is a behaviour people, who are
> working with floating points, are prepared to deal with.  

What way to deal with it is in your mind? The problem hides
behind operators. To fix it a user should rewrite a expression
using more primitive operators. For example, (line_a # point_a)
should be rewritten as ((point_a <-> lineseg_a) < EPSILON), or in
more primitive way. I regared this that the operator # just
become useless.


> By the way, "is a point on a line" operator is quite wrong with
> the fuzzy comparison at the moment [1].

Th bug is a isolate problem from the epsilon behavior. It of
course should be fixed, but in "appropriate" way that may defined
in this discussion.

> > The 0001 patch adds many FP comparison functions individually
> > considering NaN. As the result the sort order logic involving NaN
> > is scattered around into the functions, then, you implement
> > generic comparison function using them. It seems inside-out to
> > me. Defining ordering at one place, then comparison using it
> > seems to be reasonable.
> 
> I agree that it would be simpler to use the comparison function for
> implementing other operators.  I have done it other way around to make
> them more optimised.  They are called very often.  I don't think
> checking exit code of the comparison function would be optimised the
> same way.  I could leave the comparison functions as they are, but
> re-implemented them using the others to keep documentation of NaN
> comparison in the single place.

Regarding optimization, at least gcc generates seemingly not so
different code for the two. The both generally generates extended
code directly calling isnan() and so. Have you measured the
performance of the two implement (with -O2, without
--enable-cassert)?  This kind of hand-optimization gets
legitimacy when we see a siginificant difference, according to
the convention here.. I suppose.

At least the comment you dropped by the patch,
intfloat4_cmp_internal(float4 a, float4 b){
-    /*
-     * We consider all NANs to be equal and larger than any non-NAN. This is
-     * somewhat arbitrary; the important thing is to have a consistent sort
-     * order.
-     */

seems very significant and should be kept anywhere relevant.


> > If the center somehow goes extremely near to the origin, it could
> > result in a false error.
> >
> >> =# select @@ box'(-8e-324, -8e-324), (4.9e-324, 4.9e-324)';
> >> ERROR:  value out of range: underflow
> >
> > I don't think this underflow is an error, and actually it is a
> > change of the current behavior without a reasonable reason. More
> > significant (and maybe unacceptable) side-effect is that it
> > changes the behavior of ordinary operators. I don't think this is
> > acceptable. More consideration is needed.
> >
> >> =# select ('-8e-324'::float8 + '4.9e-324'::float8) / 2.0;
> >> ERROR:  value out of range: underflow
> 
> This is the current behaviour of float datatype.  My patch doesn't
> change that. 

Very sorry for the mistake! I somehow saw float8pl on master and
float8_div with your patch. Pleas forget it.

> This problem would probably also apply to multiplying
> very small values.  I agree that this is not the ideal behaviour.
> Though I am not sure, if we should go to a different direction than
> the float datatypes.

Yes, perhaps the behavior of geometric arithmetic should be
considered differently from bare float calculation, but it would
not be the topic for now.

> I think there is value in making geometric types compatible with the
> float.  Users are going to mix them, anyway.  For example, users can
> calculate the center of a box manually, and confuse when the built-in
> operator behaves differently.

That doesn't resolve the problem that some comparison operators
are practically useless, and means that we (maybe) should create
another geometirc data type based on numeric. But it also would
be a crap from the same reason with float-based.

At a glance, by no-tolerance comparison, the following geometric
operators seem to be hardly usable.

#, &&, ?#, ?-, ?|, ?-|, ?||, @>, <@, ~=

And including(le, ge) and exclusing(lt, gt) comparisons seem to
lose their distinction practically.

<<, >>, &<, &>, <<|, |>>, &<|, |&>, <^, >^

I seached pgsql-general ML but counldn't find a complaint about
the current behavior. Even though I'm not familar with PostGIS, I
found that it uses exactly the same EPSILON method with
PostgreSQL.  And libgeos also takes tolerance on comparison by
several ways. From the facts, I think that the users of geometric
types requires any kind of tolerance/fuzzyness factor on
comparison.

If we had an apparent plan to use them for other than
earth-scale(?)  geometric usage, we could design what they should
be. But without such a plan it is just a breakage of the current
usage.

# By the way, 2D geometric functions are inaccurate for
# earth-scale spherical figures , but we ignoring that?


> > In regard to fuzzy operations, libgeos seems to have several
> > types of this kind of feature. (I haven't looked closer into
> > them). Other than reducing precision seems overkill or
> > unappliable for PostgreSQL bulitins. As Jim said, can we replace
> > the fixed scale fuzz factor by precision reduction? Maybe, with a
> > GUC variable (I hear someone's roaring..) to specify the amount
> > defaults to fit the current assumption.
> 
> I am disinclined to try to implement something complicated for the
> geometric types.
>  I think they are mostly useful for 2 purposes: uses
> simple enough to not worth looking for better solutions, and
> demonstrating our indexing capabilities.  The inconsistencies harm
> both of those.

About What kind of (needless) complication you are saying? The
fuzziness seems to me essential for geometric comparisons to work
practically. Addition to that, I don't think that we're not
allowed to change the behavior in such area of released versions
the time after time.

I don't think index scan and tolerant comparison are not
contradicting. Could you let me have an description about the
indexing capabilities and the inconsistencies?

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Floating point comparison inconsistencies of the geometric types

От

Emre Hasegeli

Дата:

14 ноября 2016 г., 13:42:01

> What way to deal with it is in your mind? The problem hides
> behind operators. To fix it a user should rewrite a expression
> using more primitive operators. For example, (line_a # point_a)
> should be rewritten as ((point_a <-> lineseg_a) < EPSILON), or in
> more primitive way. I regared this that the operator # just
> become useless.

Simple equations like this works well with the current algorithm:

> select '(0.1,0.1)'::point @ '(0,0),(1,1)'::line;

The operator does what you expect from it.  Users can use something
like you described to get fuzzy behaviour with an epsilon they choose.

> Regarding optimization, at least gcc generates seemingly not so
> different code for the two. The both generally generates extended
> code directly calling isnan() and so. Have you measured the
> performance of the two implement (with -O2, without
> --enable-cassert)?  This kind of hand-optimization gets
> legitimacy when we see a siginificant difference, according to
> the convention here.. I suppose.

I tested it with this program:

> int main()
> {
>    double  i,
>            j;
>    int     result = 0;
>
>    for (i = 0.1; i < 10000.0; i += 1.0)
>        for (j = 0.1; j < 10000.0; j += 1.0)
>            if (float8_lt(i, j))
>                result = (result + 1) % 10;
>
>    return result;
> }

The one calling cmp() was noticeable slower.

./test1  0.74s user 0.00s system 99% cpu 0.748 total
./test2  0.89s user 0.00s system 99% cpu 0.897 total

This would probably be not much noticeable by calling SQL functions
which are doing a few comparisons only, but it may be necessary to do
many more comparisons on some places.  I don't find the optimised
versions less clear than calling the cmp().  I can change it the other
way, if you find it more clear.

> At least the comment you dropped by the patch,
>
>  int
>  float4_cmp_internal(float4 a, float4 b)
>  {
> -       /*
> -        * We consider all NANs to be equal and larger than any non-NAN. This is
> -        * somewhat arbitrary; the important thing is to have a consistent sort
> -        * order.
> -        */
>
> seems very significant and should be kept anywhere relevant.

I will add it back on the next version.

> I seached pgsql-general ML but counldn't find a complaint about
> the current behavior. Even though I'm not familar with PostGIS, I
> found that it uses exactly the same EPSILON method with
> PostgreSQL.

Is it?  I understood from Paul Ramsey's comment on this thread [1]
that they don't.

> If we had an apparent plan to use them for other than
> earth-scale(?)  geometric usage, we could design what they should
> be. But without such a plan it is just a breakage of the current
> usage.

We give no promises about the geometric types being useful in earth scale.

> About What kind of (needless) complication you are saying? The
> fuzziness seems to me essential for geometric comparisons to work
> practically. Addition to that, I don't think that we're not
> allowed to change the behavior in such area of released versions
> the time after time.

Even when it is a total mess?

> I don't think index scan and tolerant comparison are not
> contradicting. Could you let me have an description about the
> indexing capabilities and the inconsistencies?

The first problem is that some operators are not using the epsilon.
This requires special treatment while developing index support for
operators.  I have tried to support point for BRIN using the box
operators, and failed because of that.

Comparing differences with epsilon is not applicable the same way to
every operator.  Even with simple operators like "point in box" it
covers different distances outside the box depending on where the
point is.  For example, "point <-> box < EPSILON" wouldn't be
equivalent with "point <@ box", when the point is outside corner of
the box.  Things get more complicated with lines.  Because of these,
we are easily violating basic expectations of the operators:

> regression=# select '{1000,0.000001,0}'::line ?|| '{90000,0.00009,0}'::line;
>
> ?column?
> ----------
> f
> (1 row)
>
> regression=# select '{90000,0.00009,0}'::line ?|| '{1000,0.000001,0}'::line;
> ?column?
> ----------
> t
> (1 row)

Another problem is lack of hash and btree operator classes.  In my
experience, the point datatype is by far most used one.  People often
try to use it on DISTINCT, GROUP BY, or ORDER BY clauses and complain
when it doesn't work.  There are many complaints like this on the
archives.  If we get rid of the epsilon, we can easily add those
operator classes.

[1] https://www.postgresql.org/message-id/CACowWR0DBEjCfBscKKumdRLJUkObjB7D%3Diw7-0_ZwSFJM9_gpw%40mail.gmail.com

Re: Floating point comparison inconsistencies of the geometric types

От

Kyotaro HORIGUCHI

Дата:

18 ноября 2016 г., 10:31:22

Hello,

At Mon, 14 Nov 2016 11:41:52 +0100, Emre Hasegeli <emre@hasegeli.com> wrote in
<CAE2gYzyjvRbj4bQZRp-yBo73-77-LDH0WnwPtAtZRFKPxYQRgw@mail.gmail.com>
> > What way to deal with it is in your mind? The problem hides
> > behind operators. To fix it a user should rewrite a expression
> > using more primitive operators. For example, (line_a # point_a)
> > should be rewritten as ((point_a <-> lineseg_a) < EPSILON), or in
> > more primitive way. I regared this that the operator # just
> > become useless.
> 
> Simple equations like this works well with the current algorithm:
> 
> > select '(0.1,0.1)'::point @ '(0,0),(1,1)'::line;

That's too simple for this discussion. If it satisfies someone's
requirement, a geometiric type with integer would go. Any
trigonometric calculations (by other than right angles) would
break it.

> The operator does what you expect from it.  Users can use something
> like you described to get fuzzy behaviour with an epsilon they choose.
> 
> > Regarding optimization, at least gcc generates seemingly not so
> > different code for the two. The both generally generates extended
> > code directly calling isnan() and so. Have you measured the
> > performance of the two implement (with -O2, without
> > --enable-cassert)?  This kind of hand-optimization gets
> > legitimacy when we see a siginificant difference, according to
> > the convention here.. I suppose.
> 
> I tested it with this program:
> 
> > int main()
> > {
> >    double  i,
> >            j;
> >    int     result = 0;
> >
> >    for (i = 0.1; i < 10000.0; i += 1.0)
> >        for (j = 0.1; j < 10000.0; j += 1.0)
> >            if (float8_lt(i, j))
> >                result = (result + 1) % 10;
> >
> >    return result;
> > }
> 
> The one calling cmp() was noticeable slower.
> 
> ./test1  0.74s user 0.00s system 99% cpu 0.748 total
> ./test2  0.89s user 0.00s system 99% cpu 0.897 total
> 
> This would probably be not much noticeable by calling SQL functions
> which are doing a few comparisons only, but it may be necessary to do
> many more comparisons on some places.  I don't find the optimised
> versions less clear than calling the cmp().  I can change it the other
> way, if you find it more clear.

Thanks for the measurment. I had similar numbers (14:13) for
separate code from Pg, but probably this difference is buried
under other steps. The law of stay-stupid says that this kind of
optimization should be consider only after it found to be
problematic, I think.


> > At least the comment you dropped by the patch,
> >
> >  int
> >  float4_cmp_internal(float4 a, float4 b)
> >  {
> > -       /*
> > -        * We consider all NANs to be equal and larger than any non-NAN. This is
> > -        * somewhat arbitrary; the important thing is to have a consistent sort
> > -        * order.
> > -        */
> >
> > seems very significant and should be kept anywhere relevant.
> 
> I will add it back on the next version.
> 
> > I seached pgsql-general ML but counldn't find a complaint about
> > the current behavior. Even though I'm not familar with PostGIS, I
> > found that it uses exactly the same EPSILON method with
> > PostgreSQL.
> 
> Is it?  I understood from Paul Ramsey's comment on this thread [1]
> that they don't.

I saw the repository of PostGIS by myself, but only in
box2d.c. Other objects have their own EPSILONs, EPSILOG_SQLMM,
DBL_EPSLON, FLT_EPSILON... Maybe that's all

|  * Copyright (C) 2008-2011 Paul Ramsey <pramsey@cleverelephant.ca>
|  * Copyright (C) 2008 Mark Cave-Ayland <mark.cave-ayland@siriusit.co.uk>
| 
| #ifndef EPSILON
| #define EPSILON        1.0E-06
| #endif
| #ifndef FPeq
| #define FPeq(A,B)     (fabs((A) - (B)) <= EPSILON)
| #endif

I agree that we should have our own tolerance method, and it migh
be an explicit one. But anyway we need any kind of tolerance on
comparing FP numbers. Removing such tolerance as preparation for
implementing another one is quite natural, but released version
should have any kind of tolerance.

> > If we had an apparent plan to use them for other than
> > earth-scale(?)  geometric usage, we could design what they should
> > be. But without such a plan it is just a breakage of the current
> > usage.
> 
> We give no promises about the geometric types being useful in earth scale.

So, we shouldn't just remove it.

> > About What kind of (needless) complication you are saying? The
> > fuzziness seems to me essential for geometric comparisons to work
> > practically. Addition to that, I don't think that we're not
> > allowed to change the behavior in such area of released versions
> > the time after time.
> 
> Even when it is a total mess?

Yes. The seemingly 'total-mess' is intended for a use with
lon/lat values and probably gave useful result at the time of
development. The mess is rather better than just ruined.

> > I don't think index scan and tolerant comparison are not
> > contradicting. Could you let me have an description about the
> > indexing capabilities and the inconsistencies?
> 
> The first problem is that some operators are not using the epsilon.
> This requires special treatment while developing index support for
> operators.  I have tried to support point for BRIN using the box
> operators, and failed because of that.
> 
> Comparing differences with epsilon is not applicable the same way to
> every operator.  Even with simple operators like "point in box" it
> covers different distances outside the box depending on where the
> point is.  For example, "point <-> box < EPSILON" wouldn't be
> equivalent with "point <@ box", when the point is outside corner of
> the box.  Things get more complicated with lines.  Because of these,
> we are easily violating basic expectations of the operators:

To keep such kind of integrity, we should deeply consider about
errors.

> > regression=# select '{1000,0.000001,0}'::line ?|| '{90000,0.00009,0}'::line;
> >
> > ?column?
> > ----------
> > f
> > (1 row)
> >
> > regression=# select '{90000,0.00009,0}'::line ?|| '{1000,0.000001,0}'::line;
> > ?column?
> > ----------
> > t
> > (1 row)

This kind of thing also appears on exact comparison with more
uncertain way. So we have the EPSILON so that the errors would be
reasonable at least on lon/lat geometrics. The values above are
far out of the scope of the EPSILON from the first. But exact
ones should be just unworkable.

> Another problem is lack of hash and btree operator classes.  In my
> experience, the point datatype is by far most used one.  People often
> try to use it on DISTINCT, GROUP BY, or ORDER BY clauses and complain
> when it doesn't work.  There are many complaints like this on the
> archives.  If we get rid of the epsilon, we can easily add those
> operator classes.

That's a fate of FP numbers. Btree is uasble for such data since
it is constructed using inequality operators but hash is almost
unsuitable without rounding and normalization because of the
nature of floating point numbers. Range scan on bree will work
find but on the other hand equality doesn't work even on btree
index from the same reason to the below.

DSITINCT, GROUP BY are just inpossible without somehow defining
the equiality of two points. Bare FP numbers doesn't fit since
generally they cannot match each other and should match based on
any resolutions of 0.000001, 0.1, 1, or 1000 according to
objective. I think probably those who complain so have different
resolution in their minds.

Again, FP numbers generally cannot match exactly each other. You
have to avoid that.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: Floating point comparison inconsistencies of the geometric types

От

Emre Hasegeli

Дата:

18 ноября 2016 г., 12:58:32

> To keep such kind of integrity, we should deeply consider about
> errors.

My point is that I don't think we can keep integrity together with the
fuzzy behaviour, or at least I don't have the skills to do that.  I
can just leave the more complicated operators like "is a
point on a line" as it is, and only change the basic ones.  Do you
think a smaller patch like this would be acceptable?

> That's a fate of FP numbers. Btree is uasble for such data since
> it is constructed using inequality operators but hash is almost
> unsuitable without rounding and normalization because of the
> nature of floating point numbers. Range scan on bree will work
> find but on the other hand equality doesn't work even on btree
> index from the same reason to the below.

Btree is very useful with floats.  There is no problem with it.

> Again, FP numbers generally cannot match exactly each other. You
> have to avoid that.

Again, they do match very well.  Floating point numbers are no magic.
They are perfectly deterministic.

Re: Floating point comparison inconsistencies of the geometric types

От

Kyotaro HORIGUCHI

Дата:

21 ноября 2016 г., 09:34:14

Hello,

# Sorry for a trash I emitted before..

Perhaps you don't assume any calculations applied on stored
geo-type values. Please be patient to disucuss with me. (Sorry
for perhas hard-to-read English..)

At Fri, 18 Nov 2016 10:58:27 +0100, Emre Hasegeli <emre@hasegeli.com> wrote in
<CAE2gYzxVxKNS7qU74UdHVZTmfXQjxMbFiXH5+16XFy90SRAbXA@mail.gmail.com>
> > To keep such kind of integrity, we should deeply consider about
> > errors.
> 
> My point is that I don't think we can keep integrity together with the
> fuzzy behaviour, or at least I don't have the skills to do that.  I
> can just leave the more complicated operators like "is a
> point on a line" as it is, and only change the basic ones.  Do you
> think a smaller patch like this would be acceptable?

The size of the patch is not a matter. It broken the current
behavior is a matter. It shows strange behavior for some cases
but it designed to work reasonable for some cases.

> > That's a fate of FP numbers. Btree is uasble for such data since
> > it is constructed using inequality operators but hash is almost
> > unsuitable without rounding and normalization because of the
> > nature of floating point numbers. Range scan on bree will work
> > find but on the other hand equality doesn't work even on btree
> > index from the same reason to the below.
> 
> Btree is very useful with floats.  There is no problem with it.

I don't mind btrees on bare floats. It will 'work' if error is
properly considered, or error can be omitted for the
purpose. Point has some kind of physical context or expected
behavior other than that of bare floats which the EPSILON
implies.

> > Again, FP numbers generally cannot match exactly each other. You
> > have to avoid that.
> 
> Again, they do match very well.  Floating point numbers are no magic.
> They are perfectly deterministic.

No magic, yes, so it is always accompanied by error. If you store
points, then searching for *any of them*, it works fine. It's no
magic. If you obtained a ponit from any physical data source, say
GPS receiver, then search identical points from a table, it would
be almost impossible to find a "exactly the same" location. It's
no magic.  Once a calculation has been applied on them, it is no
longer guaranteed to work in the same way. It's also no magic.

CREATE OR REPLACE FUNCTION rotate_point (integer, float8) RETURNS void AS $$ UPDATE p SET x = x * cos($2) - y *
sin($2),y = x * sin($2) + y * cos($2) WHERE i = $1 $$ LANGUAGE SQL;
 
ALTER TABLE p (i int, x float8, y float8);
INSERT INTO p VALUES (0, 1.0, 1.0), (1, 1.0, 1.0);
SELECT rotate_point(0, pi() * 2.0 / 3.0);
SELECT rotate_point(0, pi() * 2.0 / 3.0);
SELECT rotate_point(0, pi() * 2.0 / 3.0);
SELECT * FROM p WHERE i = 0;i | x |         y         
---+---+-------------------0 | 1 | 0.999999999999999

So

SELECT p0.x = p1.x AND p0.y = p1.y FROM p p0, p p1 WHERE p0.i = 0 and p1.i = 1;?column? 
----------f
(1 row)

This just repeats 1/3 rotation 3 times. Ideally p0 and p1 are
identical but this primitive operation generates visible error
that makes the comparison fail. Even though these two points are
expected be identical in some geometric viewpoint. This is a part
of the meaning of the EPSILON.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types

От

Kyotaro HORIGUCHI

Дата:

11 января 2017 г., 11:51:14

Hello, 

At Fri, 18 Nov 2016 10:58:27 +0100, Emre Hasegeli <emre@hasegeli.com> wrote in
<CAE2gYzxVxKNS7qU74UdHVZTmfXQjxMbFiXH5+16XFy90SRAbXA@mail.gmail.com>
> > To keep such kind of integrity, we should deeply consider about
> > errors.
> 
> My point is that I don't think we can keep integrity together with the
> fuzzy behaviour, or at least I don't have the skills to do that.  I
> can just leave the more complicated operators like "is a
> point on a line" as it is, and only change the basic ones.  Do you
> think a smaller patch like this would be acceptable?

The size of the patch is not a problem. I regret that I haven't
made your requirement clear. So as the startpoint of the new
discussion, I briefly summarize the current implement of
geometric comparisons.

- Floating point comparisons for gemetric types
 Comparison related to geometric types is performed by FPeq macro and similars defined in geo_decls.h. This intends to
givetolerance to the comparisons.
 
                A FPeq:     |<=e-|-e=>|    (<= means inclusive, e = epsilon = tolerance) FPne:   ->|  e | e  |<-  (<-
meansexclusive) FPlt:          | e  |<-   FPle:     |<=e | FPgt:   ->|  e | FPge:          | e=>|
 
 These seems reasonable ignoring the tolerance amount issue.


- Consistency between index and non-index scans.
GIST supports geometric types.
=# create table tpnt1(id int, p point);=# insert into tpnt1 (select i + 200, point(i*1.0e-6 / 100.0, i * 1.0e-6 /
100.0)from generate_series(-200, 200) as i);=# create index on tpnt1 using gist (p);=# set enable_seqscan to false;=#
setenable_bitmapscan to true;=# select count(*) from tpnt1 where p ~= point(0, 0);   201=# select count(*) from tpnt1
wherep << point(0, 0);   100=# set enable_seqscan to true;=# set enable_bitmapscan to false;=# select count(*) from
tpnt1where p ~= point(0, 0);   201=# select count(*) from tpnt1 where p << point(0, 0);   100
 

At least for the point type, (bitmap) index scan is consistent
with sequential scan.  I remember that the issue was
"inconsistency between indexed and non-indexed scans over
geometric types" but they seem consistent with each other.

You mentioned b-tree, which don't have predefined opclass for
geometric types. So the "inconsistency" may be mentioning the
difference between geometric values and combinations of plain
(first-class) values. But the two are different things and
apparently using different operators (~= and = for equality) so
the issue is not fit for this. More specifically, "p ~= p0" and
"x = x0 and y = y0" are completely different.


Could you let me (or any other guys on this ml) have more precise
description on the problem and/or what you want to do with this
patch?

regards,
-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types

От

Emre Hasegeli

Дата:

11 января 2017 г., 21:37:59

> - Floating point comparisons for gemetric types
>
>   Comparison related to geometric types is performed by FPeq
>   macro and similars defined in geo_decls.h. This intends to give
>   tolerance to the comparisons.
>
>                  A
>   FPeq:     |<=e-|-e=>|    (<= means inclusive, e = epsilon = tolerance)
>   FPne:   ->|  e | e  |<-  (<- means exclusive)
>   FPlt:          | e  |<-
>   FPle:     |<=e |
>   FPgt:   ->|  e |
>   FPge:          | e=>|
>
>   These seems reasonable ignoring the tolerance amount issue.

I tried to explain below why I don't find this method reasonable.

> At least for the point type, (bitmap) index scan is consistent
> with sequential scan.  I remember that the issue was
> "inconsistency between indexed and non-indexed scans over
> geometric types" but they seem consistent with each other.

You can see on the Git history that it was a lot of trouble to make
the indexes consistent with the operators, and this is limiting
improving indexing of the geometric types.

> You mentioned b-tree, which don't have predefined opclass for
> geometric types. So the "inconsistency" may be mentioning the
> difference between geometric values and combinations of plain
> (first-class) values. But the two are different things and
> apparently using different operators (~= and = for equality) so
> the issue is not fit for this. More specifically, "p ~= p0" and
> "x = x0 and y = y0" are completely different.

Yes, the default btree operator class doesn't have to implement
standard equality and basic comparison operator symbols.  We can
implement it with different symbols.

> Could you let me (or any other guys on this ml) have more precise
> description on the problem and/or what you want to do with this
> patch?

I will try to itemize the current problems with the epsilon method:

= Operator Inconsistencies

My current patches address all of them.

- Only some operators are using the epsilon.

I included the list of the ones which doesn't use the epsilon on
initial post of this thread.  This makes the operators not compatible
with each other.  For example, you cannot say "if box_a @> box_b and
box_b @> point then box_a @> box_b".

- The application of epsilon is implementation dependent.

Epsilon works between two numbers.  There is not always a single way
of implementing geometric operators.  This is surprising to the users.
For example, you cannot say "if box_a @> box_b then box_a <-> box_b <=
epsilon".

This is also creating a high barrier on developing those operators.
Fixing bugs and performance regressions are very likely to change the
results.

- Some operators are just wrong:

Line operators are not well maintained.  The following are giving
wrong results when neither A or B of lines are not exactly 1:

* line # line
* line <-> line
* point ## line
* point ## lseg

They are broken independently from the epsilon, though it is not easy
to fix them without changing the results of the cases that are
currently working.

- Some operators are violating commutative property.

For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".

- Some operators are violating transitive property.

For example, you cannot say "if point_a ~= point_b and point_b ~=
point_c then point_a ~= point_c".

- The operators with epsilon are not compatible with float operators.

This is also surprising for the users.  For example, you cannot say
"if point_a << point_b then point_a[0] < point_b[0]".

- The operators are not checking for underflow or overflow.
- NaN values are not treated same as the float datatype.

Those are independent problems, though checking them with epsilon
would create additional set of inconsistencies.  Epsilon creates
especially more problems around 0.

= Missing Bits on the Operator Classes

I want to address those in the future, if my patches are accepted.

- There is no GiST or SP-GiST support for cross datatype operators
other than containment.

We can develop nice cross-datatype operator families to support more
operators.  It would have been easier, if we could rely on some
operators to implement others.  Geometric types serve the purpose of
demonstrating our indexing capabilities.  We should make them good
examples, not full of special cases with hidden bugs.

- There is no BRIN support for types other than the box.

BRIN inclusion operator class is datatype agnostic.  It uses some
operators to implement others which doesn't work for anything more the
box vs box operators, because of the inconsistencies.

- GROUP BY and DISTINCT are not working.
- ORDER BY is not working.

There are regular user complaints about this on the mailing list.  We
can easily provide default hash and btree operator classes that would
fix the problem, if we haven't had the epsilon.

Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types

От

Kyotaro HORIGUCHI

Дата:

18 января 2017 г., 14:36:12

Hello.

(I apologize in advance for possible inaccurate wording on maths,
which might cause confusion..)

At Wed, 11 Jan 2017 16:37:59 +0100, Emre Hasegeli <emre@hasegeli.com> wrote in
<CAE2gYzzNufOZqh4HO3Od8urzamNSeX-OXJxfNkRcU3ex9RD8jw@mail.gmail.com>
> > - Floating point comparisons for gemetric types
> >
> >   Comparison related to geometric types is performed by FPeq
> >   macro and similars defined in geo_decls.h. This intends to give
> >   tolerance to the comparisons.
> >
> >                  A
> >   FPeq:     |<=e-|-e=>|    (<= means inclusive, e = epsilon = tolerance)
> >   FPne:   ->|  e | e  |<-  (<- means exclusive)
> >   FPlt:          | e  |<-
> >   FPle:     |<=e |
> >   FPgt:   ->|  e |
> >   FPge:          | e=>|
> >
> >   These seems reasonable ignoring the tolerance amount issue.
> 
> I tried to explain below why I don't find this method reasonable.

Thank you very much for the explanation.

> > At least for the point type, (bitmap) index scan is consistent
> > with sequential scan.  I remember that the issue was
> > "inconsistency between indexed and non-indexed scans over
> > geometric types" but they seem consistent with each other.
> 
> You can see on the Git history that it was a lot of trouble to make
> the indexes consistent with the operators, and this is limiting
> improving indexing of the geometric types.

Yeah. geo_ops.c has the following comment from its first version
in the current repository.

| *    Relational operators for Points.
| *        Since we do have a sense of coordinates being
| *        "equal" to a given accuracy (point_vert, point_horiz),

So, we take it as a requirement for geometric types. All the
difficulties come from the "nature".

> > You mentioned b-tree, which don't have predefined opclass for
> > geometric types. So the "inconsistency" may be mentioning the
> > difference between geometric values and combinations of plain
> > (first-class) values. But the two are different things and
> > apparently using different operators (~= and = for equality) so
> > the issue is not fit for this. More specifically, "p ~= p0" and
> > "x = x0 and y = y0" are completely different.
> 
> Yes, the default btree operator class doesn't have to implement
> standard equality and basic comparison operator symbols.  We can
> implement it with different symbols.

Perhaps I don't understand it. Many opclass are defined for
btree. But since (ordinary) btree handles only one-dimentional,
totally-orderable sets, geometric types even point don't fit
it. Even if we relaxed it by removing EPSILON, multidimentional
data still doesn't fit.

Still we can implement equality mechanism *over* multiple btree
indexes, but it cannot be a opclass for btree.

> > Could you let me (or any other guys on this ml) have more precise
> > description on the problem and/or what you want to do with this
> > patch?
> 
> I will try to itemize the current problems with the epsilon method:
> 
> = Operator Inconsistencies
> 
> My current patches address all of them.

No. It just removes tolerans that is a "requirement" for
coordinates of geometric types. I think we don't have enough
reason to remove it. We could newly define geometric types with
no-tolerance as 'exact_point" or something but it still doesn't
fit btree.

> - Only some operators are using the epsilon.
> 
> I included the list of the ones which doesn't use the epsilon on
> initial post of this thread.  This makes the operators not compatible
> with each other.  For example, you cannot say "if box_a @> box_b and
> box_b @> point then box_a @> box_b".

(It must be "then box_a @> point".)  Both of the operators don't
seem to use EPSILON so the transitivity holds, but perhaps we
should change them to more appropriate shape, that is, to use
EPSILON. Even having the tolerance, we can define the operators
to keep the transitivity.

> - The application of epsilon is implementation dependent.
> 
> Epsilon works between two numbers.  There is not always a single way
> of implementing geometric operators.  This is surprising to the users.
> For example, you cannot say "if box_a @> box_b then box_a <-> box_b <=
> epsilon".

Maybe it should be like "if box_a ~= box_b && box_b @< box_a then
..". I'm not sure off-hand. But I don't see the relationship is
significant.

> This is also creating a high barrier on developing those operators.
> Fixing bugs and performance regressions are very likely to change the
> results.

I agree to you. The effect of the EPSILON is quite arbitrary and
difficult to see their chemistry.

> - Some operators are just wrong:
> 
> Line operators are not well maintained.  The following are giving
> wrong results when neither A or B of lines are not exactly 1:
> 
> * line # line
> * line <-> line
> * point ## line
> * point ## lseg

These are not comparison operators so EPSILON is unrelated. And I
agree that they needs fix.

> They are broken independently from the epsilon, though it is not easy
> to fix them without changing the results of the cases that are
> currently working.

Although I'm not sure they are using EPSILON, I think they don't
need it.

> - Some operators are violating commutative property.
> 
> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".

Whether EPSILON is introduced or not, commutativity cannot be
assured for it from calculation error, I suppose.

> - Some operators are violating transitive property.
> 
> For example, you cannot say "if point_a ~= point_b and point_b ~=
> point_c then point_a ~= point_c".

It holds only in ideal (or mathematical) world, or for similars
of integer (which are strictly implemented mathematica world on
computer).  Without the EPSILON, two points hardly match by
floating point error, as I mentioned. I don't have an evidence
though (sorry), mere equality among three floating point numbers
is not assured.


Sorry, I have no more time today, I'll continue tomorrow.


> - The operators with epsilon are not compatible with float operators.
> 
> This is also surprising for the users.  For example, you cannot say
> "if point_a << point_b then point_a[0] < point_b[0]".
> 
> - The operators are not checking for underflow or overflow.
> - NaN values are not treated same as the float datatype.
> 
> Those are independent problems, though checking them with epsilon
> would create additional set of inconsistencies.  Epsilon creates
> especially more problems around 0.
> 
> = Missing Bits on the Operator Classes
> 
> I want to address those in the future, if my patches are accepted.
> 
> - There is no GiST or SP-GiST support for cross datatype operators
> other than containment.
> 
> We can develop nice cross-datatype operator families to support more
> operators.  It would have been easier, if we could rely on some
> operators to implement others.  Geometric types serve the purpose of
> demonstrating our indexing capabilities.  We should make them good
> examples, not full of special cases with hidden bugs.
> 
> - There is no BRIN support for types other than the box.
> 
> BRIN inclusion operator class is datatype agnostic.  It uses some
> operators to implement others which doesn't work for anything more the
> box vs box operators, because of the inconsistencies.
> 
> - GROUP BY and DISTINCT are not working.
> - ORDER BY is not working.
> 
> There are regular user complaints about this on the mailing list.  We
> can easily provide default hash and btree operator classes that would
> fix the problem, if we haven't had the epsilon.
> 

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types

От

Kyotaro HORIGUCHI

Дата:

24 января 2017 г., 16:17:41

Sorry, this is the continuation of the previous comment.

At Wed, 18 Jan 2017 17:36:12 +0900 (Tokyo Standard Time), Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote in
<20170118.173612.13506164.horiguchi.kyotaro@lab.ntt.co.jp>
> Hello.
> 
> (I apologize in advance for possible inaccurate wording on maths,
> which might cause confusion..)
> 
> At Wed, 11 Jan 2017 16:37:59 +0100, Emre Hasegeli <emre@hasegeli.com> wrote in
<CAE2gYzzNufOZqh4HO3Od8urzamNSeX-OXJxfNkRcU3ex9RD8jw@mail.gmail.com>
> > > - Floating point comparisons for gemetric types
> > >
> > >   Comparison related to geometric types is performed by FPeq
> > >   macro and similars defined in geo_decls.h. This intends to give
> > >   tolerance to the comparisons.
> > >
> > >                  A
> > >   FPeq:     |<=e-|-e=>|    (<= means inclusive, e = epsilon = tolerance)
> > >   FPne:   ->|  e | e  |<-  (<- means exclusive)
> > >   FPlt:          | e  |<-
> > >   FPle:     |<=e |
> > >   FPgt:   ->|  e |
> > >   FPge:          | e=>|
> > >
> > >   These seems reasonable ignoring the tolerance amount issue.
> > 
> > I tried to explain below why I don't find this method reasonable.
> 
> Thank you very much for the explanation.
> 
> > > At least for the point type, (bitmap) index scan is consistent
> > > with sequential scan.  I remember that the issue was
> > > "inconsistency between indexed and non-indexed scans over
> > > geometric types" but they seem consistent with each other.
> > 
> > You can see on the Git history that it was a lot of trouble to make
> > the indexes consistent with the operators, and this is limiting
> > improving indexing of the geometric types.
> 
> Yeah. geo_ops.c has the following comment from its first version
> in the current repository.
> 
> | *    Relational operators for Points.
> | *        Since we do have a sense of coordinates being
> | *        "equal" to a given accuracy (point_vert, point_horiz),
> 
> So, we take it as a requirement for geometric types. All the
> difficulties come from the "nature".
> 
> > > You mentioned b-tree, which don't have predefined opclass for
> > > geometric types. So the "inconsistency" may be mentioning the
> > > difference between geometric values and combinations of plain
> > > (first-class) values. But the two are different things and
> > > apparently using different operators (~= and = for equality) so
> > > the issue is not fit for this. More specifically, "p ~= p0" and
> > > "x = x0 and y = y0" are completely different.
> > 
> > Yes, the default btree operator class doesn't have to implement
> > standard equality and basic comparison operator symbols.  We can
> > implement it with different symbols.
> 
> Perhaps I don't understand it. Many opclass are defined for
> btree. But since (ordinary) btree handles only one-dimentional,
> totally-orderable sets, geometric types even point don't fit
> it. Even if we relaxed it by removing EPSILON, multidimentional
> data still doesn't fit.
> 
> Still we can implement equality mechanism *over* multiple btree
> indexes, but it cannot be a opclass for btree.
> 
> > > Could you let me (or any other guys on this ml) have more precise
> > > description on the problem and/or what you want to do with this
> > > patch?
> > 
> > I will try to itemize the current problems with the epsilon method:
> > 
> > = Operator Inconsistencies
> > 
> > My current patches address all of them.
> 
> No. It just removes tolerans that is a "requirement" for
> coordinates of geometric types. I think we don't have enough
> reason to remove it. We could newly define geometric types with
> no-tolerance as 'exact_point" or something but it still doesn't
> fit btree.
> 
> > - Only some operators are using the epsilon.
> > 
> > I included the list of the ones which doesn't use the epsilon on
> > initial post of this thread.  This makes the operators not compatible
> > with each other.  For example, you cannot say "if box_a @> box_b and
> > box_b @> point then box_a @> box_b".
> 
> (It must be "then box_a @> point".)  Both of the operators don't
> seem to use EPSILON so the transitivity holds, but perhaps we
> should change them to more appropriate shape, that is, to use
> EPSILON. Even having the tolerance, we can define the operators
> to keep the transitivity.
> 
> > - The application of epsilon is implementation dependent.
> > 
> > Epsilon works between two numbers.  There is not always a single way
> > of implementing geometric operators.  This is surprising to the users.
> > For example, you cannot say "if box_a @> box_b then box_a <-> box_b <=
> > epsilon".
> 
> Maybe it should be like "if box_a ~= box_b && box_b @< box_a then
> ..". I'm not sure off-hand. But I don't see the relationship is
> significant.
> 
> > This is also creating a high barrier on developing those operators.
> > Fixing bugs and performance regressions are very likely to change the
> > results.
> 
> I agree to you. The effect of the EPSILON is quite arbitrary and
> difficult to see their chemistry.
> 
> > - Some operators are just wrong:
> > 
> > Line operators are not well maintained.  The following are giving
> > wrong results when neither A or B of lines are not exactly 1:
> > 
> > * line # line
> > * line <-> line
> > * point ## line
> > * point ## lseg
> 
> These are not comparison operators so EPSILON is unrelated. And I
> agree that they needs fix.
> 
> > They are broken independently from the epsilon, though it is not easy
> > to fix them without changing the results of the cases that are
> > currently working.
> 
> Although I'm not sure they are using EPSILON, I think they don't
> need it.
> 
> > - Some operators are violating commutative property.
> > 
> > For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".
> 
> Whether EPSILON is introduced or not, commutativity cannot be
> assured for it from calculation error, I suppose.
> 
> > - Some operators are violating transitive property.
> > 
> > For example, you cannot say "if point_a ~= point_b and point_b ~=
> > point_c then point_a ~= point_c".
> 
> It holds only in ideal (or mathematical) world, or for similars
> of integer (which are strictly implemented mathematica world on
> computer).  Without the EPSILON, two points hardly match by
> floating point error, as I mentioned. I don't have an evidence
> though (sorry), mere equality among three floating point numbers
> is not assured.
> 
> 
> Sorry, I have no more time today, I'll continue tomorrow.


> - The operators with epsilon are not compatible with float operators.
> 
> This is also surprising for the users.  For example, you cannot say
> "if point_a << point_b then point_a[0] < point_b[0]".

It is because "is strictly left of" just doesn't mean their
x-coordinates are in such a relationship. The difference might be
surprising but is a specification (truely not a bug:).

> - The operators are not checking for underflow or overflow.
> - NaN values are not treated same as the float datatype.
> 
> Those are independent problems, though checking them with epsilon
> would create additional set of inconsistencies.  Epsilon creates
> especially more problems around 0.

I dont' understand this precisely but it seems to be reasonable.

> = Missing Bits on the Operator Classes
> 
> I want to address those in the future, if my patches are accepted.
> 
> - There is no GiST or SP-GiST support for cross datatype operators
> other than containment.
> 
> We can develop nice cross-datatype operator families to support more
> operators.  It would have been easier, if we could rely on some
> operators to implement others.  Geometric types serve the purpose of
> demonstrating our indexing capabilities.  We should make them good
> examples, not full of special cases with hidden bugs.
> 
> - There is no BRIN support for types other than the box.
> 
> BRIN inclusion operator class is datatype agnostic.  It uses some
> operators to implement others which doesn't work for anything more the
> box vs box operators, because of the inconsistencies.
> 
> - GROUP BY and DISTINCT are not working.
> - ORDER BY is not working.
> 
> There are regular user complaints about this on the mailing list.  We
> can easily provide default hash and btree operator classes that would
> fix the problem, if we haven't had the epsilon.

This final section seems to mention the application but sorry, it
still don't describe for me what application that this patch
aiming to enable. But I can understand that you want to handle
floating point numbers like integers. The epsilon is surely quite
annoying for the purpose.


Ok, returning to the basis of the current geometric types, it is
described as below. It is not perfect (so needs amendment) but
achived to a certain degree.

1. For geometric types, equality needs to be defined with some  tolerance. Concretely an absolute value of 1E-06, which
is quite random but would be suitable to ignore error of  longitude and latitude.
 

2. Translative comparisons should consider the tolerance to make  a result that does not contradict the equality.

3. It is not sure about other comparsons such like parallel and  perpendicular, but it is hardly usable without any
amountof  tolerance.
 


On the other hand, some people complains about the following.

A. The geometric comparisons are not in relationships that  analogously deducible (or expected) from scalar
comparisons.

B. The tolerance seems prevents us from using indexes.


As a presupposition, we have the common knowlegde, or principle,
about floating point handling on a digital computer.

X. We shouldn't detect equalty of two floating point numbers just  using '=' for real usage. It hardly works for
arbitrary numbers including results of any arithmetics, so it should  incorporate any amount of tolerance that differs
among applications.
 

Sorry for some refrains from the past discussion, but the above X
and 1-3 are the immovable basis in this area. (I belive it . But
any objection are welcome)

You might want to object that float has an equality operator, but
it is because it is an atomic type, which has no specific
purpose. On the contrary geometric types are application types,
which is assumed specific purpose and it should behave
reasonablly against the expected purpose.


That being said, if we want to, for example, use btree index for
each coordinate of geometric types, it is doable by normalizing
the values before applying operators, instead of applying epsilon
arbitrarily in operators.

Normalization may be done both on storing or on-the-fly just
befor usage. It depends on whether we want to preserve precision
of stored values. If we want, normalization should be done just
before comparisons, or on creating indexes.

I don't know much about the normalization method for floating
point numbers but libgeos might give some hints.

And still I don't have an idea how to detect parallelity and
perpendicularity based on the normalization.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types

От

Emre Hasegeli

Дата:

25 января 2017 г., 17:24:18

I am responding both of your emails together.

> Perhaps I don't understand it. Many opclass are defined for
> btree. But since (ordinary) btree handles only one-dimentional,
> totally-orderable sets, geometric types even point don't fit
> it. Even if we relaxed it by removing EPSILON, multidimentional
> data still doesn't fit.

Yes, multidimentional data doesn't fit into btree.  Let's say we would
have to introduce operators called *<, *<=, *>, *>= to support btree
opclass for point.  I agree those operators would be useless, but
btree opclass is used for other purposes too.  "ORDER BY point",
merge-joins, btree index support for ~= would be useful.

Lack of ORDER BY, GROUP BY, and DISTINCT support is the major
inconvenience about the geometric types.  There are many complaints
about this on the mailing list.  Many frameworks and ORMs are not
prepared to deal with getting an error when they use certain types on
those clauses.

>> - Only some operators are using the epsilon.
>>
>> I included the list of the ones which doesn't use the epsilon on
>> initial post of this thread.  This makes the operators not compatible
>> with each other.  For example, you cannot say "if box_a @> box_b and
>> box_b @> point then box_a @> box_b".
>
> (It must be "then box_a @> point".)

Yes, I am sorry.

>  Both of the operators don't
> seem to use EPSILON so the transitivity holds, but perhaps we
> should change them to more appropriate shape, that is, to use
> EPSILON. Even having the tolerance, we can define the operators
> to keep the transitivity.

Yes, that would be another way.  In my view, this would make things
worse, because I believe epsilon approach is completely broken.  The
operators which are not using the epsilon are the only ones people can
sanely use to develop applications.

>> > - The application of epsilon is implementation dependent.
>> >
>> > Epsilon works between two numbers.  There is not always a single way
>> > of implementing geometric operators.  This is surprising to the users.
>> > For example, you cannot say "if box_a @> box_b then box_a <-> box_b <=
>> > epsilon".
>>
>> Maybe it should be like "if box_a ~= box_b && box_b @< box_a then
>> ..". I'm not sure off-hand. But I don't see the relationship is
>> significant.

What I meant was the behaviour being unclear to even people who knows
about the epsilon.  If two boxes are overlapping with each other, one
who knows about EPSILON can expect the distance between them to be
less than EPSILON, but this wouldn't be true.

>> - Some operators are violating commutative property.
>>
>> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".
>
> Whether EPSILON is introduced or not, commutativity cannot be
> assured for it from calculation error, I suppose.

It can easily be assured by treating both sides of the operator the
same.  It is actually assured on my patch.

>> - Some operators are violating transitive property.
>>
>> For example, you cannot say "if point_a ~= point_b and point_b ~=
>> point_c then point_a ~= point_c".
>
> It holds only in ideal (or mathematical) world, or for similars
> of integer (which are strictly implemented mathematica world on
> computer).  Without the EPSILON, two points hardly match by
> floating point error, as I mentioned. I don't have an evidence
> though (sorry), mere equality among three floating point numbers
> is not assured.

Of course, it is assured.  Floating point numbers are deterministic.

>> - The operators with epsilon are not compatible with float operators.
>>
>> This is also surprising for the users.  For example, you cannot say
>> "if point_a << point_b then point_a[0] < point_b[0]".
>
> It is because "is strictly left of" just doesn't mean their
> x-coordinates are in such a relationship. The difference might be
> surprising but is a specification (truely not a bug:).

Then what does that mean?  Every operator with EPSILON is currently
surprising in different way.  People use databases to build
applications.  They often need to implement same operations on the
application side.  It is very hard to be bug-compatible with our
geometric types.

>> = Missing Bits on the Operator Classes

> This final section seems to mention the application but sorry, it
> still don't describe for me what application that this patch
> aiming to enable. But I can understand that you want to handle
> floating point numbers like integers. The epsilon is surely quite
> annoying for the purpose.

I will try to itemise what applications I am aiming to enable:

- Applications with very little GIS requirement

PostGIS can be too much work for an application in s different
business domain but has a little requirement to GIS.  The built-in
geometric types are incomplete to support real world geography.
Getting rid of epsilon would make this worse. In contrary, it would
allow people to deal with the difficulties on their own.

- Applications with geometric objects

I believe people could make use the geometric types in many different
business areas, if they would be in better shape.  I am working for a
gaming company.  There is great overlap between the logic of many
games and the geometric types.  I could support using them more, if
they wouldn't be so buggy and impractical.

- Demonstrating indexes

We require indexing features to be committed with an example.
Geometric types have served this purpose many times.  Many operators
are actually added to demonstrate indexes.  These examples are also
used by other projects like PostGIS.  They can re-use part of the
implementations, or the implementation as a starting point.  I believe
we should make those examples as clean as possible.

> Ok, returning to the basis of the current geometric types, it is
> described as below. It is not perfect (so needs amendment) but
> achived to a certain degree.
>
> 1. For geometric types, equality needs to be defined with some
>    tolerance. Concretely an absolute value of 1E-06, which is
>    quite random but would be suitable to ignore error of
>    longitude and latitude.

I don't agree this requirement as it is put.  Returning true to
something like "point(0.000001, 0.000001) ~= point(0.000002,
0.000002)" is obviously wrong.

> 2. Translative comparisons should consider the tolerance to make
>    a result that does not contradict the equality.

The current implementation fails to provide it:

> hasegeli=# select point(0.000001, 0.000001) ~= point(0.000002, 0.000002);
> ?column?
> ----------
> t
> (1 row)
>
> hasegeli=# select point(0.000001, 0.000001) << point(0.0000021, 0.0000021);
> ?column?
> ----------
> t
> (1 row)
>
> hasegeli=# select point(0.000002, 0.000002) << point(0.0000021, 0.0000021);
> ?column?
> ----------
> f
> (1 row)

> 3. It is not sure about other comparsons such like parallel and
>    perpendicular, but it is hardly usable without any amount of
>    tolerance.

I agree.  I don't think they are useable with our tolerance either.

> On the other hand, some people complains about the following.

I believe have listed enough complaints on my previous post.

> As a presupposition, we have the common knowlegde, or principle,
> about floating point handling on a digital computer.
>
> X. We shouldn't detect equalty of two floating point numbers just
>    using '=' for real usage. It hardly works for arbitrary
>    numbers including results of any arithmetics, so it should
>    incorporate any amount of tolerance that differs among
>    applications.
>
> Sorry for some refrains from the past discussion, but the above X
> and 1-3 are the immovable basis in this area. (I belive it . But
> any objection are welcome)
>
> You might want to object that float has an equality operator, but
> it is because it is an atomic type, which has no specific
> purpose. On the contrary geometric types are application types,
> which is assumed specific purpose and it should behave
> reasonablly against the expected purpose.

Are they really?  We have really simple geometric types which are not
for any specific kind of application.  We are not claiming that they
are for earth scale or any other scale.  I think we should aim of
making them predictable, extensible, as general purpose as possible to
let people use them for their purpose.

> That being said, if we want to, for example, use btree index for
> each coordinate of geometric types, it is doable by normalizing
> the values before applying operators, instead of applying epsilon
> arbitrarily in operators.
>
> Normalization may be done both on storing or on-the-fly just
> befor usage. It depends on whether we want to preserve precision
> of stored values. If we want, normalization should be done just
> before comparisons, or on creating indexes.
>
> I don't know much about the normalization method for floating
> point numbers but libgeos might give some hints.

Me neither.  I have been told that storing the minimum and and maximum
possible value is a good strategy for tolerance.  Then all operators
can return true, false, or maybe.  This would be a much bigger and
incompatible change that I don't think I can implement.

We cannot come into an agreement.  Geometric types are not well
maintained.  I am trying my best to put them into a shape.  Please
help me do that.  Let's try to find a way to leave them better than we
found.

Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types

От

Kyotaro HORIGUCHI

Дата:

26 января 2017 г., 15:12:30

Hello, Emre.

At Wed, 25 Jan 2017 12:24:18 +0100, Emre Hasegeli <emre@hasegeli.com> wrote in
<CAE2gYzzLWCdYDPLi_UQXLSJ+pqc7g4WGNHsb8tHkmosKK+rO7g@mail.gmail.com>
> I am responding both of your emails together.
> 
> > Perhaps I don't understand it. Many opclass are defined for
> > btree. But since (ordinary) btree handles only one-dimentional,
> > totally-orderable sets, geometric types even point don't fit
> > it. Even if we relaxed it by removing EPSILON, multidimentional
> > data still doesn't fit.
> 
> Yes, multidimentional data doesn't fit into btree.  Let's say we would
> have to introduce operators called *<, *<=, *>, *>= to support btree
> opclass for point.  I agree those operators would be useless, but
> btree opclass is used for other purposes too.  "ORDER BY point",
> merge-joins, btree index support for ~= would be useful.

Btree cannot be used unless any kind of total-orderedness is
defined but hash index would be usable so.

> Lack of ORDER BY, GROUP BY, and DISTINCT support is the major
> inconvenience about the geometric types.  There are many complaints
> about this on the mailing list.  Many frameworks and ORMs are not
> prepared to deal with getting an error when they use certain types on
> those clauses.

Even though I'm not sure but I don't see a "natural" (or
agreeable by many poeple) ordering of geometric types in
general. Anyway it's quite application (not application program
but the relationship with the real world) specific.

> >> - Only some operators are using the epsilon.
> >>
> >> I included the list of the ones which doesn't use the epsilon on
> >> initial post of this thread.  This makes the operators not compatible
> >> with each other.  For example, you cannot say "if box_a @> box_b and
> >> box_b @> point then box_a @> box_b".
> >
> > (It must be "then box_a @> point".)
> 
> Yes, I am sorry.
> 
> >  Both of the operators don't
> > seem to use EPSILON so the transitivity holds, but perhaps we
> > should change them to more appropriate shape, that is, to use
> > EPSILON. Even having the tolerance, we can define the operators
> > to keep the transitivity.
> 
> Yes, that would be another way.  In my view, this would make things
> worse, because I believe epsilon approach is completely broken.  The
> operators which are not using the epsilon are the only ones people can
> sanely use to develop applications.

What we should not forget is that PostGIS does the same thing and
it is widly used (I believe..). This means it not broken at least
on a certain context. But it is a fact that it also quite
inconvenient to get performance from, say, indexes.

> >> > - The application of epsilon is implementation dependent.
> >> >
> >> > Epsilon works between two numbers.  There is not always a single way
> >> > of implementing geometric operators.  This is surprising to the users.
> >> > For example, you cannot say "if box_a @> box_b then box_a <-> box_b <=
> >> > epsilon".
> >>
> >> Maybe it should be like "if box_a ~= box_b && box_b @< box_a then
> >> ..". I'm not sure off-hand. But I don't see the relationship is
> >> significant.
> 
> What I meant was the behaviour being unclear to even people who knows
> about the epsilon.  If two boxes are overlapping with each other, one
> who knows about EPSILON can expect the distance between them to be
> less than EPSILON, but this wouldn't be true.

Yeah, the EPSILON is mere a fuzz factor so we cannot expect any
kind of stable behavior for differences under the level. But on
the analogy of comparisons of floating point numbers, I suppose
that inequality comparison could be done without the tolerance.

> >> - Some operators are violating commutative property.
> >>
> >> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".
> >
> > Whether EPSILON is introduced or not, commutativity cannot be
> > assured for it from calculation error, I suppose.
> 
> It can easily be assured by treating both sides of the operator the
> same.  It is actually assured on my patch.

It surely holds for certain cases. Even how many applicable cases
we guess, finally we cannot proof that it works generally. Just
three times of 1/3 rotation breakes it.

> >> - Some operators are violating transitive property.
> >>
> >> For example, you cannot say "if point_a ~= point_b and point_b ~=
> >> point_c then point_a ~= point_c".
> >
> > It holds only in ideal (or mathematical) world, or for similars
> > of integer (which are strictly implemented mathematica world on
> > computer).  Without the EPSILON, two points hardly match by
> > floating point error, as I mentioned. I don't have an evidence
> > though (sorry), mere equality among three floating point numbers
> > is not assured.
> 
> Of course, it is assured.  Floating point numbers are deterministic.

Hmm, I have nothing more to say if you don't agree that floating
point numbers involving any kind of arithmetic is hardly
deterministic especially not defining its usage.

> >> - The operators with epsilon are not compatible with float operators.
> >>
> >> This is also surprising for the users.  For example, you cannot say
> >> "if point_a << point_b then point_a[0] < point_b[0]".
> >
> > It is because "is strictly left of" just doesn't mean their
> > x-coordinates are in such a relationship. The difference might be
> > surprising but is a specification (truely not a bug:).
> 
> Then what does that mean?  Every operator with EPSILON is currently
> surprising in different way.  People use databases to build
> applications.  They often need to implement same operations on the
> application side.  It is very hard to be bug-compatible with our
> geometric types.

The world of the geometric types in PostgreSQL *is* built
so. There is nothing different with that Windows client can make
different results from PostgreSQL on a Linux server for a simple
floating point arithmetics, or even different binaries made by
different version of compilers on the same platoform can. Relying
on such coherency by accident is a bad policy.

> >> = Missing Bits on the Operator Classes
> 
> > This final section seems to mention the application but sorry, it
> > still don't describe for me what application that this patch
> > aiming to enable. But I can understand that you want to handle
> > floating point numbers like integers. The epsilon is surely quite
> > annoying for the purpose.
> 
> I will try to itemise what applications I am aiming to enable:
> 
> - Applications with very little GIS requirement
> 
> PostGIS can be too much work for an application in s different
> business domain but has a little requirement to GIS.  The built-in
> geometric types are incomplete to support real world geography.
> Getting rid of epsilon would make this worse. In contrary, it would
> allow people to deal with the difficulties on their own.
> 
> - Applications with geometric objects
> 
> I believe people could make use the geometric types in many different
> business areas, if they would be in better shape.  I am working for a
> gaming company.  There is great overlap between the logic of many
> games and the geometric types.  I could support using them more, if
> they wouldn't be so buggy and impractical.
> 
> - Demonstrating indexes
> 
> We require indexing features to be committed with an example.
> Geometric types have served this purpose many times.  Many operators
> are actually added to demonstrate indexes.  These examples are also
> used by other projects like PostGIS.  They can re-use part of the
> implementations, or the implementation as a starting point.  I believe
> we should make those examples as clean as possible.
> 
> > Ok, returning to the basis of the current geometric types, it is
> > described as below. It is not perfect (so needs amendment) but
> > achived to a certain degree.
> >
> > 1. For geometric types, equality needs to be defined with some
> >    tolerance. Concretely an absolute value of 1E-06, which is
> >    quite random but would be suitable to ignore error of
> >    longitude and latitude.
> 
> I don't agree this requirement as it is put.  Returning true to
> something like "point(0.000001, 0.000001) ~= point(0.000002,
> 0.000002)" is obviously wrong.

No. It can be right. For example, it is right for an application
where the precision of data is 0.1 or 1E-6. But it is wrong if
the tolerance designed for its purpose is 1E-7.

> > 2. Translative comparisons should consider the tolerance to make
> >    a result that does not contradict the equality.
> 
> The current implementation fails to provide it:
> 
> > hasegeli=# select point(0.000001, 0.000001) ~= point(0.000002, 0.000002);
> > ?column?
> > ----------
> > t
> > (1 row)
> >
> > hasegeli=# select point(0.000001, 0.000001) << point(0.0000021, 0.0000021);
> > ?column?
> > ----------
> > t
> > (1 row)
> >
> > hasegeli=# select point(0.000002, 0.000002) << point(0.0000021, 0.0000021);
> > ?column?
> > ----------
> > f
> > (1 row)

So this example without any concrete application is
irrelevant.

> > 3. It is not sure about other comparsons such like parallel and
> >    perpendicular, but it is hardly usable without any amount of
> >    tolerance.
> 
> I agree.  I don't think they are useable with our tolerance either.
> 
> > On the other hand, some people complains about the following.
> 
> I believe have listed enough complaints on my previous post.
> 
> > As a presupposition, we have the common knowlegde, or principle,
> > about floating point handling on a digital computer.
> >
> > X. We shouldn't detect equalty of two floating point numbers just
> >    using '=' for real usage. It hardly works for arbitrary
> >    numbers including results of any arithmetics, so it should
> >    incorporate any amount of tolerance that differs among
> >    applications.
> >
> > Sorry for some refrains from the past discussion, but the above X
> > and 1-3 are the immovable basis in this area. (I belive it . But
> > any objection are welcome)
> >
> > You might want to object that float has an equality operator, but
> > it is because it is an atomic type, which has no specific
> > purpose. On the contrary geometric types are application types,
> > which is assumed specific purpose and it should behave
> > reasonablly against the expected purpose.
> 
> Are they really?  We have really simple geometric types which are not

PostGIS or libgeos seems to prove it. They are designed exactly
for this purpose and actually used.

> for any specific kind of application.  We are not claiming that they
> are for earth scale or any other scale.  I think we should aim of
> making them predictable, extensible, as general purpose as possible to
> let people use them for their purpose.

If settable tolerance, and preferablly, with definable method
were introduced, it could be said that generic geometric types.


> > That being said, if we want to, for example, use btree index for
> > each coordinate of geometric types, it is doable by normalizing
> > the values before applying operators, instead of applying epsilon
> > arbitrarily in operators.
> >
> > Normalization may be done both on storing or on-the-fly just
> > befor usage. It depends on whether we want to preserve precision
> > of stored values. If we want, normalization should be done just
> > before comparisons, or on creating indexes.
> >
> > I don't know much about the normalization method for floating
> > point numbers but libgeos might give some hints.
> 
> Me neither.  I have been told that storing the minimum and and maximum
> possible value is a good strategy for tolerance.  Then all operators
> can return true, false, or maybe.  This would be a much bigger and
> incompatible change that I don't think I can implement.
> 
> We cannot come into an agreement.  Geometric types are not well
> maintained.  I am trying my best to put them into a shape.  Please
> help me do that.  Let's try to find a way to leave them better than we
> found.

Although some operators have apparently wrong implement and
fixing it would be a progress, it is not a progress toward your
objective if I don't agree to just remove the tolerance.

# Sorry for the long detour..

So, the union of the two requirements seems to be having such
parameter as a GUC.

Geometric types don't work without any amount of tolerance for
many cases, and surely some people needs the EPSILON but the
amout is quite arbitrary. On the other hand, some other people
want no-tolerance. This seems enough reason to make it variable.


If there's no objection for the above, how about proceeding by
the following steps?

1: Replace fixed EPSILON with a guc parameter 'geometric_tolerance' or such and its default value is 1E-6. This also
shouldhave documentation.
 

2: Fix some currently broken operators to give sane results, considring geometric_tolerance.

Once we get to this stage, setting geometric_tolerance to 0 would
give the preferable behavior for you, and it is mere a concrete
value for a specific application. No problem.

After that, perhaps we may add some indexing method for geometric
types not considering tolerance. Then someone may want make them
support geometric_tolerance.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types

От

Emre Hasegeli

Дата:

26 января 2017 г., 16:53:28

> Even though I'm not sure but I don't see a "natural" (or
> agreeable by many poeple) ordering of geometric types in
> general. Anyway it's quite application (not application program
> but the relationship with the real world) specific.

We can just define it for point as "ORDER BY point.x, point.y".

> What we should not forget is that PostGIS does the same thing and
> it is widly used (I believe..). This means it not broken at least
> on a certain context. But it is a fact that it also quite
> inconvenient to get performance from, say, indexes.

I understand from Paul Ramsey's email [1] on this thread that PostGIS
doesn't currently have a tolerance.

> Yeah, the EPSILON is mere a fuzz factor so we cannot expect any
> kind of stable behavior for differences under the level. But on
> the analogy of comparisons of floating point numbers, I suppose
> that inequality comparison could be done without the tolerance.

What do you mean exactly?

>> >> - Some operators are violating commutative property.
>> >>
>> >> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".
>> >
>> > Whether EPSILON is introduced or not, commutativity cannot be
>> > assured for it from calculation error, I suppose.
>>
>> It can easily be assured by treating both sides of the operator the
>> same.  It is actually assured on my patch.
>
> It surely holds for certain cases. Even how many applicable cases
> we guess, finally we cannot proof that it works generally. Just
> three times of 1/3 rotation breakes it.

It is a different story.  We cannot talk about commutative property of
rotation function that we currently don't have, because it would be an
asymmetrical operator.

The parallel operator is currently marked as commutative.  The planner
is free to switch the sides of the operation.  Therefore this is not
only a surprise, but a bug.

> Hmm, I have nothing more to say if you don't agree that floating
> point numbers involving any kind of arithmetic is hardly
> deterministic especially not defining its usage.

The floating point results are not random.  There are certain
guarantees.  The transitive property of equality is one of them.  Our
aim should be making things easier for our users by providing more
guarantees not breaking what is already there.

> The world of the geometric types in PostgreSQL *is* built
> so. There is nothing different with that Windows client can make
> different results from PostgreSQL on a Linux server for a simple
> floating point arithmetics, or even different binaries made by
> different version of compilers on the same platoform can. Relying
> on such coherency by accident is a bad policy.

Yes, the results are not portable.  We should only rely on the results
being stable on the same build.  The epsilon doesn't cure this
problem.  It arguably makes it worse.

> PostGIS or libgeos seems to prove it. They are designed exactly
> for this purpose and actually used.

Yes, PostGIS is a GIS application.  We are not.  Geometric types name
suggests to me them being useful for general purpose.

> So, the union of the two requirements seems to be having such
> parameter as a GUC.

That sounds doable to me.  We can use this opportunity to make all
operators consistent.  So the epsilon would apply to the ones that it
current is not.  We can still add btree and hash opclasses, and make
them give an error when this GUC is not 0.  We can even make this or
another GUC apply to floats making whole system more consistent.

Though, I know the community is against behaviour changing GUCs.  I
will not spend more time on this, before I get positive feedback from
others.

[1] https://www.postgresql.org/message-id/CACowWR0DBEjCfBscKKumdRLJUkObjB7D%3Diw7-0_ZwSFJM9_gpw%40mail.gmail.com

Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types

От

Kyotaro HORIGUCHI

Дата:

26 января 2017 г., 18:11:22

Hello,

At Thu, 26 Jan 2017 11:53:28 +0100, Emre Hasegeli <emre@hasegeli.com> wrote in
<CAE2gYzwYLcx-3ffToVm7JEEniYt1fU31y5BAikXzEqvCbQyTMg@mail.gmail.com>
> > Even though I'm not sure but I don't see a "natural" (or
> > agreeable by many poeple) ordering of geometric types in
> > general. Anyway it's quite application (not application program
> > but the relationship with the real world) specific.
> 
> We can just define it for point as "ORDER BY point.x, point.y".

It's nonsense. Totally for a convenient. Anyone can do so on
their own application but PostgreSQL cannot have it as a
platform.

> > What we should not forget is that PostGIS does the same thing and
> > it is widly used (I believe..). This means it not broken at least
> > on a certain context. But it is a fact that it also quite
> > inconvenient to get performance from, say, indexes.
> 
> I understand from Paul Ramsey's email [1] on this thread that PostGIS
> doesn't currently have a tolerance.

Thank you for the pointer. (My memory is too small as 8bit CPU)
Looking into the source of PostGIS 2.3.0 (Maybe the latest)
Surely EPSILON is used is someplaces. FPeq and similars are also
defined. liblwgeom_internal.h even defines another tolerance
EPSILON_SQLMM..

Paul> The real answer for a GIS system is to have an explicit tolerance
Paul> parameter for calculations like distance/touching/containment, but
Paul> unfortunately we didn't do that so now we have our own
Paul> compatibility/boil the ocean problem if we ever wanted/were funded to
Paul> add one.

This doesn't seem saying PostGIS doesn't have fixed-amount
tolerance. Alhtough we don't necessarily need compatibility with
PostGIS, it makes the problem rather complex since we lose an
apparent start point for this. It would be good if we make geom
comparators to have explicit tolerances as Paul said but maybe it
is overdone. So I proposed the varialbe implicit tolerance.

> > Yeah, the EPSILON is mere a fuzz factor so we cannot expect any
> > kind of stable behavior for differences under the level. But on
> > the analogy of comparisons of floating point numbers, I suppose
> > that inequality comparison could be done without the tolerance.
> 
> What do you mean exactly?

Sorry for poor wording. I'll try in different way. It means
comparison on numbers that contains certain amount of error gives
unstable result for differences small enough comparing their
precision.

> >> >> - Some operators are violating commutative property.
> >> >>
> >> >> For example, you cannot say "if line_a ?|| line_b then line_b ?|| line_a".
> >> >
> >> > Whether EPSILON is introduced or not, commutativity cannot be
> >> > assured for it from calculation error, I suppose.
> >>
> >> It can easily be assured by treating both sides of the operator the
> >> same.  It is actually assured on my patch.
> >
> > It surely holds for certain cases. Even how many applicable cases
> > we guess, finally we cannot proof that it works generally. Just
> > three times of 1/3 rotation breakes it.
> 
> It is a different story.  We cannot talk about commutative property of
> rotation function that we currently don't have, because it would be an
> asymmetrical operator.

That's wrong. Any shpaes represented by geometric types assumed
to get any geometric operatsions such as transision, rotation and
others. It is fundamental for geometric types.

> The parallel operator is currently marked as commutative.  The planner
> is free to switch the sides of the operation.  Therefore this is not
> only a surprise, but a bug.

Strictly it is not commutative, but assuming FP error or EPSILON,
commutation among them would be acceptable. Having larger
tolerance, the defference from commutated expression becomes
relatively small for the type's domain. As far as I know, even
summation of two floating points is not guaranteed to yield
strictly the same result for commutated opeation. But the
difference doesn't affect for most cases and programmers make a
program so that such differences don't matter in the
objective. This is basically the same thing with the case of
geo-types.

> > Hmm, I have nothing more to say if you don't agree that floating
> > point numbers involving any kind of arithmetic is hardly
> > deterministic especially not defining its usage.
> 
> The floating point results are not random.  There are certain
> guarantees.  The transitive property of equality is one of them.  Our
> aim should be making things easier for our users by providing more
> guarantees not breaking what is already there.

Sorry for repeating the same thing but floating point numbers
after getting arithmetic operations must considered that they
have fuzziness or error (the same can occur for even just after
assigning). They must not be handled as exact numbers. Please
study about handling floating point numbers. If such strictness
is required and no arithmetic involved, it is proper to store
them, say, in a string form.

If such a behavior is required but want to use floating points,
maybe it is easier to create a extension conforming such a
specification, rather than chainging the core behavior.

> > The world of the geometric types in PostgreSQL *is* built
> > so. There is nothing different with that Windows client can make
> > different results from PostgreSQL on a Linux server for a simple
> > floating point arithmetics, or even different binaries made by
> > different version of compilers on the same platoform can. Relying
> > on such coherency by accident is a bad policy.
> 
> Yes, the results are not portable.  We should only rely on the results
> being stable on the same build.  The epsilon doesn't cure this
> problem.  It arguably makes it worse.

Yes, EPSILON doesn't improve such cross-platform consistency. It
is totally a different issue. Such exact consistency should not
be expected regardless of EPSILON.

> > PostGIS or libgeos seems to prove it. They are designed exactly
> > for this purpose and actually used.
> 
> Yes, PostGIS is a GIS application.  We are not.  Geometric types name
> suggests to me them being useful for general purpose.
> 
> > So, the union of the two requirements seems to be having such
> > parameter as a GUC.
> 
> That sounds doable to me.  We can use this opportunity to make all
> operators consistent.  So the epsilon would apply to the ones that it
> current is not.  We can still add btree and hash opclasses, and make
> them give an error when this GUC is not 0.  We can even make this or
> another GUC apply to floats making whole system more consistent.

Maybe.

> Though, I know the community is against behaviour changing GUCs.  I
> will not spend more time on this, before I get positive feedback from
> others.

That's too bad. I'm sorry that I wans't very helpful..


> [1] https://www.postgresql.org/message-id/CACowWR0DBEjCfBscKKumdRLJUkObjB7D%3Diw7-0_ZwSFJM9_gpw%40mail.gmail.com

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center

Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types

От

Michael Paquier

Дата:

31 января 2017 г., 10:40:39

On Thu, Jan 26, 2017 at 9:11 PM, Kyotaro HORIGUCHI
<horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>> Though, I know the community is against behaviour changing GUCs.  I
>> will not spend more time on this, before I get positive feedback from
>> others.
>
> That's too bad. I'm sorry that I wasn't very helpful..

You make a constructive discussion in a way that you think makes sense
by providing feedback, there's nothing bad in that IMO, quite the
contrary to be honest.

(I am just showing up here to tell that I have moved this patch to CF 2017-03).
-- 
Michael

Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types

От

Robert Haas

Дата:

31 января 2017 г., 22:31:38

On Thu, Jan 26, 2017 at 5:53 AM, Emre Hasegeli <emre@hasegeli.com> wrote:
> Though, I know the community is against behaviour changing GUCs.  I
> will not spend more time on this, before I get positive feedback from
> others.

As if on cue, let me say that a behavior-changing GUC sounds like a
terrible idea to me.  It's almost never good when a GUC can cause the
same queries to return answers in different sessions, and even worse,
it seems like the GUC might have the effect of letting us build
indexes that are only valid for the value of the GUC with which they
were built.

Backing up a bit here, have we lost track of the problem that we're
trying to solve?  Tom gave his opinion here:

https://www.postgresql.org/message-id/3895.1464791274@sss.pgh.pa.us

But I don't see that the latest patch I can find does anything to fix
that.  Now maybe that's not the problem that Emre is trying to solve,
but then it is not very clear to me what problem he IS trying to
solve.  And I think Kyotaro Horiguchi is trying to solve yet another
problem which is again different.  So IMHO the first task here is to
agree on a clear statement of what we'd like to fix, and then, given a
patch, we can judge whether it's fixed.

Maybe I'm being dumb here and it's clear to you guys what the issues
under discussion are.  If so, apologies for that, but the thread has
gotten too theoretical for me and I can't figure out what the
top-level concern is any more.  I believe we all agree these macros
are bad, but there seems to be no agreement that I can discern on what
would be better or why.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types

От

Emre Hasegeli

Дата:

01 февраля 2017 г., 00:06:39

> Backing up a bit here, have we lost track of the problem that we're
> trying to solve?  Tom gave his opinion here:
>
> https://www.postgresql.org/message-id/3895.1464791274@sss.pgh.pa.us
>
> But I don't see that the latest patch I can find does anything to fix
> that.

This is what he wrote:

> As I understand it, the key problem is that tests like "is point on line"
> would basically never succeed except in the most trivial cases, because of
> roundoff error.  That's not very nice, and it might cascade to larger
> problems like object-containment tests failing unexpectedly.  We would
> need to go through all the geometric operations and figure out where that
> kind of gotcha is significant and what we can do about it.  Seems like a
> fair amount of work :-(.  If somebody's willing to do that kind of
> investigation, then sure, but I don't think just blindly removing these
> macros is going to lead to anything good.

I re-implemented "is point on line" operator on my patch so that it
would, at least, work on the most trivial cases.  This is not very
nice, but it is predictable.  I have tried to prevent it cascade to
larger problems like object-containment tests failing unexpectedly.  I
have gone through all of the geometric operations and re-implemented
the ones with similar problems, too.  It is no work at all compared to
discussions we have to have :-).

My initial idea was to keep the fuzzy behaviour for some operators
like "is point on line", but the more I get into it the less value I
see in doing so.  The same family operators like "is point on line" is
currently badly broken:

> postgres=# select '(1,0)'::point ## '{1,2,0}'::line;
> ?column?
> ----------
> (2,2)
> (1 row)

This makes me wonder if anybody is ever using those operators.  In the
end, I don't care about those operators.  They are unlikely to be
supported by indexes.  I can simplify my patch to leave them as they
are.

> Now maybe that's not the problem that Emre is trying to solve,
> but then it is not very clear to me what problem he IS trying to
> solve.  And I think Kyotaro Horiguchi is trying to solve yet another
> problem which is again different.  So IMHO the first task here is to
> agree on a clear statement of what we'd like to fix, and then, given a
> patch, we can judge whether it's fixed.

I listed the problems I am trying to solve in here:

https://www.postgresql.org/message-id/CAE2gYzzNufOZqh4HO3Od8urzamNSeX-OXJxfNkRcU3ex9RD8jw%40mail.gmail.com

> Maybe I'm being dumb here and it's clear to you guys what the issues
> under discussion are.  If so, apologies for that, but the thread has
> gotten too theoretical for me and I can't figure out what the
> top-level concern is any more.  I believe we all agree these macros
> are bad, but there seems to be no agreement that I can discern on what
> would be better or why.

We couldn't agree on how we should treat on floating point numbers.  I
think we should threat them just like the "float" datatype.

Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types

От

Robert Haas

Дата:

01 февраля 2017 г., 00:49:52

On Tue, Jan 31, 2017 at 1:06 PM, Emre Hasegeli <emre@hasegeli.com> wrote:
> This is what he wrote:
>
>> As I understand it, the key problem is that tests like "is point on line"
>> would basically never succeed except in the most trivial cases, because of
>> roundoff error.  That's not very nice, and it might cascade to larger
>> problems like object-containment tests failing unexpectedly.  We would
>> need to go through all the geometric operations and figure out where that
>> kind of gotcha is significant and what we can do about it.  Seems like a
>> fair amount of work :-(.  If somebody's willing to do that kind of
>> investigation, then sure, but I don't think just blindly removing these
>> macros is going to lead to anything good.
>
> I re-implemented "is point on line" operator on my patch so that it
> would, at least, work on the most trivial cases.

Right.  So you and he do not agree on the goal.  He wants it to work
in MORE than the most trivial cases, and you wrote it so that it will
work in AT LEAST the most trivial cases.  Those are not the same.

> I listed the problems I am trying to solve in here:
>
> https://www.postgresql.org/message-id/CAE2gYzzNufOZqh4HO3Od8urzamNSeX-OXJxfNkRcU3ex9RD8jw%40mail.gmail.com

Those problems boil down to "let's get rid of the tolerance
completely", and there's no consensus on that being a good thing to
do.  In particular, I can't remember anybody but you being
unequivocally in favor of it.

> We couldn't agree on how we should treat on floating point numbers.  I
> think we should threat them just like the "float" datatype.

Got it, but if other people don't agree then this is going nowhere.

Personally, I've got my doubts about how sane it is to make anything
work here with the built-in floating point types.  The CPU is going to
do some kind of black magic, which may differ between machines, and
good luck getting consistent semantics that work everywhere.  For
example it seems entirely plausible (with some implementation) that
you might find that (1, 2) is on the line from (0, 0) to (2, 4) but
that (1, 3) is not on the line from (0, 0) to (2, 6) because the slope
1/2 can be represented exactly in base 2 and the slope 1/3 cannot.  If
you try to fix that kind of thing by introducing a tolerance, then you
might get it wrong when the line goes from (0, 0) to (1000000000,
2000000001).  Now, if we represented values using home-grown
arbitrary-precision arithmetic - like numeric does - then it might be
possible to get such cases right, especially if you supported exact
rather than approximate representations of fractional quantities.
Otherwise, I think there are inevitably going to be artifacts, and
tinkering with this is just changing which artifacts we get without
really solving the problem.  But I am notorious for my low opinion of
floating-point arithmetic; maybe somebody who understands it better or
likes it more can come up with something that's more clearly an
improvement.  Regardless, we have to have an agreement in order to
commit anything here, and I don't see one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: [HACKERS] Floating point comparison inconsistencies of thegeometric types

От

Emre Hasegeli

Дата:

01 февраля 2017 г., 17:36:21

> Got it, but if other people don't agree then this is going nowhere.

Yes.  As I wrote, I don't particularly care about functions like "is
point on line".  I can prepare a patch to fix as many problems as
possible around those operators by preserving the current epsilon.

I though we were arguing about *all* operators.  Having containment
and placement operators consistent with each other, is the primary
thing I am trying to fix.  Is removing epsilon from them is
acceptable?

We can also stay away from changing operators like "~=" to minimise
compatibility break, if we keep the epsilon on some places.  We can
instead document this operator as "close enough", and introduce
another symbol for really "the same" operator.

That said, there are some places where it is hard to decide to apply
the epsilon or not.  For example, we can keep the epsilon to check for
two lines being parallel, but then should we return the intersection
point, or not?  Those issues may become more clear when I start
working on it, if preserving epsilon for those operators is the way to
go forward.

[HACKERS] Re: Floating point comparison inconsistencies of the geometrictypes

От

David Steele

Дата:

16 марта 2017 г., 22:10:22

On 2/1/17 6:36 AM, Emre Hasegeli wrote:
>> Got it, but if other people don't agree then this is going nowhere.
> 
> Yes.  As I wrote, I don't particularly care about functions like "is
> point on line".  I can prepare a patch to fix as many problems as
> possible around those operators by preserving the current epsilon.
> 
> I though we were arguing about *all* operators.  Having containment
> and placement operators consistent with each other, is the primary
> thing I am trying to fix.  Is removing epsilon from them is
> acceptable?
> 
> We can also stay away from changing operators like "~=" to minimise
> compatibility break, if we keep the epsilon on some places.  We can
> instead document this operator as "close enough", and introduce
> another symbol for really "the same" operator.
> 
> That said, there are some places where it is hard to decide to apply
> the epsilon or not.  For example, we can keep the epsilon to check for
> two lines being parallel, but then should we return the intersection
> point, or not?  Those issues may become more clear when I start
> working on it, if preserving epsilon for those operators is the way to
> go forward.

The current patches do not apply cleanly at cccbdde:

$ git apply ../other/0001-float-header-v03.patch
error: patch failed: contrib/btree_gist/btree_ts.c:1
error: contrib/btree_gist/btree_ts.c: patch does not apply
error: patch failed: contrib/postgres_fdw/postgres_fdw.c:26
error: contrib/postgres_fdw/postgres_fdw.c: patch does not apply
error: patch failed: src/backend/access/gist/gistutil.c:14
error: src/backend/access/gist/gistutil.c: patch does not apply
error: patch failed: src/backend/utils/adt/float.c:339
error: src/backend/utils/adt/float.c: patch does not apply
error: patch failed: src/backend/utils/adt/geo_ops.c:14
error: src/backend/utils/adt/geo_ops.c: patch does not apply
error: patch failed: src/backend/utils/misc/guc.c:68
error: src/backend/utils/misc/guc.c: patch does not apply
error: patch failed: src/include/utils/builtins.h:334
error: src/include/utils/builtins.h: patch does not apply

I don't believe this patch should be in the "Needs review" state anyway.There are clearly a number of issues that need
workand agreement.
 

Given that this thread has been idle since the beginning of February and
no resolution is likely for v10, I'm marking this submission "Returned
with Feedback".

-- 
-David
david@pgmasters.net

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Floating point comparison inconsistencies of the geometric types

Вложения

Вложения

Вложения

Вложения

Вложения