Обсуждение: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"

Поиск

Список

Период

Сортировка

[BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"

От

kcwitt@gmail.com

Дата:

15 февраля 2017 г., 12:14:08

The following bug has been logged on the website:

Bug reference:      14546
Logged by:          Casey Witt
Email address:      kcwitt@gmail.com
PostgreSQL version: 9.6.1
Operating system:   N/A
Description:

The point type is a built-in type, yet does not work with the "IS DISTINCT"
clause and throws the error "42883: operator does not exist: point =
point".

This means that generated SQL template code such as the following does not
work:
UPDATE dcc.tbl_document_coordinates SET lower_left=@newValue WHERE
(lower_left IS NOT DISTINCT FROM @oldValue OR lower_left IS NOT DISTINCT
FROM @newValue) AND id=@primary_key;

The point of the above statement is to update the "lower_left" point unless
somebody has changed it already (IS NOT DISTINCT FROM @original_value) or
somebody else has already changed it to the desired new value (IS NOT
DISTINCT FROM @new_value).

To update this single statement is easy, but the issue is that this
statement is generated automatically by a function and includes all fields
that have changed.

Whatever issue there is with comparing points should be dealt with in
postgres and not pushed to client applications.

Conceptually, there should not be any difference between updating/comparing
an int4 and a point (considering that they are both built-in types).

This is a bug because
https://www.postgresql.org/docs/9.0/static/functions-comparison.html states
that  "Comparison operators are available for all relevant data types."


--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"

От

Tom Lane

Дата:

15 февраля 2017 г., 21:29:00

kcwitt@gmail.com writes:
> Whatever issue there is with comparing points should be dealt with in
> postgres and not pushed to client applications.

The problem with this is that it's not entirely clear what definition of
"equality" IS [NOT] DISTINCT FROM should use.

The current definition is "use the operator named =", which is rather ad
hoc, but you can point to chapter and verse in the SQL standard where
X IS [NOT] DISTINCT FROM Y is defined in terms of "X = Y", so it's not
completely nuts either.  Personally I'd prefer it to be defined as "use
the equality operator of the data type's default btree opclass, or hash
opclass if no btree opclass"; but it's likely that that would break some
cross-type cases that work today, so it wouldn't be a panacea.

But in either case, "point" loses because it has neither an operator
named "=" nor a btree or hash opclass.  This is not just an oversight.
There is a "point ~= point" operator but it implements fuzzy equality,
making it unsuitable as a basis for btree or hash behavior, even assuming
that you could invent the linear ordering of points that would also be
needed for a btree opclass.  It would surely not be terribly hard to
invent an exact-equality "=" operator for points, but it's not clear how
useful that would be.  Almost all the existing operators for points are
fuzzy.

The same problems exist for other geometric types, usually worse, because
many of the others do have "=" operators but they compare areas :-(.
Nobody's really wanted to break backwards compatibility enough to bring
some sanity to that mess.

Short answer is I don't think this is likely to change in the near future.

            regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"

От

Emre Hasegeli

Дата:

17 февраля 2017 г., 12:21:06

> Nobody's really wanted to break backwards compatibility enough to bring
> some sanity to that mess.

I am trying to bring some sanity to that mess:

https://www.postgresql.org/message-id/flat/CAE2gYzwwxPWbzxY3mtN4WL7W0DCkWo8gnB2ThUHU2XQ9XwgHMg%40mail.gmail.com

Any comment helps to keep the discussion moving.


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"

От

"Casey Witt"

Дата:

18 февраля 2017 г., 04:55:07

Tom,

Thanks for the explanation.

It looks like there are three separate issues here for the point type:
1) how to know if two points are equal (really equal, not fuzzy)
2) how to know if two points are "close enough" (fuzzy equal, which I assume
must be very useful for GIS stuff but haven't used myself yet)
3) how to order the points in a btree (ie. which is bigger (0,1) or (1,0))

Does the following make sense:
1) implement an equality operator for a point (x1=x2 AND y1=y2). This would
allow "IS [NOT] DISTINCT" to work out of the box, and most importantly (from
my perspective at least) would allow "UPDATE table SET (p1 = @p1) WHERE p1
IS NOT DISTINCT FROM @p1_original;" to just work.
2) points still have the "~=" operator, so no impact there
3) couldn't a general ordering operator be defined as:
    If x1 < x2 then point1 < point 2
    If x1 > x2 then point 1 > point 2
    If x1 = x2 then
        If y1 < y2 then point1 < point 2
        If y1 > y2 then point1 > point 2
        If y1 = y2 then point1 = point 2

For my use case, the reason it is so important to get the "IS [NOT]
DISTINCT" working for points is because my update queries are generated
automatically based on a list of "dirty fields" in my application front-end.
The application front-end creates a SQL update statement saying "update all
these values which have changed AS LONG AS they haven't been updated by
somebody else in the meantime". The point field is just one of many fields
(and data types) subject to change. So without having "=" defined for point
I only have two [very ugly] options:
1) store the point as two separate numeric columns (but I want to ensure
that I either have a complete point or no point, so then I would have to
also add check constraints), and when there are lots of points in the table
this just "feels icky"
2) update my SQL generator to make a special exception for points (which is
what I am doing now), but again, this just "feels icky".

Regarding the ordering operator, somebody could say that for their case the
y value should be compared before the x value, but does it really matter for
the purposes of implementing a general comparison btree operator? Isn't the
only thing that is truly required is that it is deterministic and guaranteed
to produce the same ordering every time, with minimizing computation being a
distant second place consideration? I say that minimizing computation should
be a "distant" second place to imply that it is better to have an
inefficient implementation of something than no implementation at all.

I know that this discussion of "what is equals" comes up a lot, and in a lot
of different programming languages, and the answer is often "we leave it
undefined because we don't know the exact use case". I have never embraced
this approach, and much prefer the "implement an equality operator that is
rational for the simplest general case".

As an exaggerated example, one could say, we can't implement an equality
operator for "numeric" because somebody may be storing a "length" in that
field and without knowing whether each value is in feet or meters it is not
possible to compare two of these values. But in practice, nobody gets too
excited about this because for most use cases the numeric type works as
expected and when there is a special use case such as storing length we are
accustomed to the design pattern of storing the length value in one column,
the units of measure in another column, write a function to convert the
length to a common base, and then use that function to create an index.

Extending this general idea beyond points then, two "boxes" could be
considered equal if all four points are equal, and could be ordered by the
length of the perimeter (I proposed perimeter because I assume it would be
the cheapest to calculate, but it could be area or anything else as long as
it is the cheapest to calculate and deterministic). Anybody needing more
than that (ie. geometric transforms) would still be able to write functions
to do their transforms which could be used in the index. The key point here
is that if the extra information placing doubt on the equality is not
actually embedded in the data type, then it should not be considered when
determining "built-in" data type equality and ordering operations.

If there is currently no "=" defined for points, then no existing code base
uses the "=" operator for points, so how could defining this operator break
backwards compatibility?

I know that this is a bit of a simplistic view of the situation, and I
appreciate your time to help me understand the real complexities of this
issue.

Thanks,
Casey

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us] 
Sent: Thursday, February 16, 2017 02:29
To: kcwitt@gmail.com
Cc: pgsql-bugs@postgresql.org
Subject: Re: [BUGS] BUG #14546: "point" type does not work with "IS
DISTINCT"

kcwitt@gmail.com writes:
> Whatever issue there is with comparing points should be dealt with in 
> postgres and not pushed to client applications.

The problem with this is that it's not entirely clear what definition of
"equality" IS [NOT] DISTINCT FROM should use.

The current definition is "use the operator named =", which is rather ad
hoc, but you can point to chapter and verse in the SQL standard where X IS
[NOT] DISTINCT FROM Y is defined in terms of "X = Y", so it's not completely
nuts either.  Personally I'd prefer it to be defined as "use the equality
operator of the data type's default btree opclass, or hash opclass if no
btree opclass"; but it's likely that that would break some cross-type cases
that work today, so it wouldn't be a panacea.

But in either case, "point" loses because it has neither an operator named
"=" nor a btree or hash opclass.  This is not just an oversight.
There is a "point ~= point" operator but it implements fuzzy equality,
making it unsuitable as a basis for btree or hash behavior, even assuming
that you could invent the linear ordering of points that would also be
needed for a btree opclass.  It would surely not be terribly hard to invent
an exact-equality "=" operator for points, but it's not clear how useful
that would be.  Almost all the existing operators for points are fuzzy.

The same problems exist for other geometric types, usually worse, because
many of the others do have "=" operators but they compare areas :-(.
Nobody's really wanted to break backwards compatibility enough to bring some
sanity to that mess.

Short answer is I don't think this is likely to change in the near future.

            regards, tom lane

-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"

От

"Casey Witt"

Дата:

18 февраля 2017 г., 05:37:23

Emre,

Yeah, changing existing Macros is tough, because it can break backwards compatibility.

I had in mind that fixing this for a point would be easy since the "=" and btree operators are not defined, no existing
codebases would be impacted by implementing them, but when I re-read Tom Lanes email I noticed the line that says: 
    "The same problems exist for other geometric types, usually worse, because many of the others do have "=" operators
butthey compare areas :-(. 

I feel like it is tragic that "=" got implemented for geometric types in PostgreSQL using area, because my gut sense is
thatequality in PostgreSQL should be based on the raw values stored, and not some arbitrary transformation of them. In
thiscase I am specifically making a distinction between how PostgreSQL should deal with points (and other geometric
types)vs. how other things (ie. PostGIS) should deal with them. If PostgreSQL can deal with geometry simply and
deterministically(ie. "=" means the raw values are equal, and ordering is based on the cheapest determistic function
available),then that would provide a good foundation for everything else (ie. PostGIS) to define whatever makes sense
inthat environment based on the fundamental definition in PostgreSQL. 

I don't really even like the fact that "~=" is defined for points for three reasons:
1) it is fuzzy, but no way to control the error. If you think of "~=" as being the same as "close to", coordinates on
inIC chip will want to consider "close to" in the nanometer range, whereas coordinates for a house will consider "close
to"in the meter range (if anybody disagrees that coordinates for a house would be in the meter range, that demonstrates
thepoint that the definition of "close to" should not be baked into PostgreSQL because it is use case specific).
AlthoughPostgreSQL doesn't innately understand the difference between nanometers and meters, the point is for the IC
chipcase you may be comparing at the 6th decimal place, whereas for the house case you may want to compare at the first
decimalplace. 
2) there is no expectation that "~=" is defined for every type (unlike "=", which is documented (incorrectly) to be
definedfor every type) 
3) "~=" could be handled by a function which takes two points and an error value (thus allowing the user to use the
errorvalue suitable for their use case) 

The idea here being that PostgreSQL doesn't even know what it is storing or in what context it is being used (as far as
PostgreSQLis concerned, it is just storing bytes that fit the format of some defined type), so it shouldn't be making
assumptionsabout what "close" means. Whereas PostGIS does know what it is storing and in what context it is being used,
soPostGIS can make assumptions about what is close (but note that I have never actually used PostGIS, so this is just
anassumption). 

I am NOT suggesting any change to "~=" because that clearly would break backward compatibility, but I can't help
feelingthat PostgreSQL is building a pretty big "technical debt" by not sorting this out sooner rather than later. 

Casey



-----Original Message-----
From: Emre Hasegeli [mailto:emre@hasegeli.com]
Sent: Friday, February 17, 2017 17:21
To: Tom Lane
Cc: kcwitt@gmail.com; PostgreSQL Bugs
Subject: Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"

> Nobody's really wanted to break backwards compatibility enough to
> bring some sanity to that mess.

I am trying to bring some sanity to that mess:

https://www.postgresql.org/message-id/flat/CAE2gYzwwxPWbzxY3mtN4WL7W0DCkWo8gnB2ThUHU2XQ9XwgHMg%40mail.gmail.com

Any comment helps to keep the discussion moving.



--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"

[BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"

Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"

Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"

Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"

Re: [BUGS] BUG #14546: "point" type does not work with "IS DISTINCT"