Обсуждение: Infinity vs Error for division by zero

Поиск
Список
Период
Сортировка

Infinity vs Error for division by zero

От
Matt Pulver
Дата:
Hello,

PostgreSQL FLOAT appears to support +/-Infinity and NaN per the IEEE 754 standard, with expressions such as CAST('NaN' AS FLOAT) and CAST('Infinity' AS FLOAT) and even supports ordering columns of floats that contain NaN.

However the query "SELECT 1.0/0.0;" produces an exception:

ERROR:  division by zero

Question: If Infinity and NaN are supported, then why throw an exception here, instead of returning Infinity? Is it purely for historical reasons, or if it could all be done again, would an exception still be preferred?

For purely integer arithmetic, I can see how an exception would make sense. However for FLOAT, I would expect/prefer Infinity to be returned.

Best regards,
Matt

Re: Infinity vs Error for division by zero

От
Andrew Gierth
Дата:
>>>>> "Matt" == Matt Pulver <mpulver@unitytechgroup.com> writes:

 Matt> ERROR:  division by zero

 Matt> Question: If Infinity and NaN are supported, then why throw an
 Matt> exception here, instead of returning Infinity?

Spec says so:

  4) The dyadic arithmetic operators <plus sign>, <minus sign>,
     <asterisk>, and <solidus> (+, -, *, and /, respectively) specify
     addition, subtraction, multiplication, and division, respectively.
     If the value of a divisor is zero, then an exception condition is
     raised: data exception -- division by zero.

-- 
Andrew (irc:RhodiumToad)


Re: Infinity vs Error for division by zero

От
"David G. Johnston"
Дата:
On Friday, March 1, 2019, Matt Pulver <mpulver@unitytechgroup.com> wrote:
However the query "SELECT 1.0/0.0;" produces an exception:

ERROR:  division by zero

Question: If Infinity and NaN are supported, then why throw an exception here, instead of returning Infinity? Is it purely for historical reasons, or if it could all be done again, would an exception still be preferred?

For purely integer arithmetic, I can see how an exception would make sense. However for FLOAT, I would expect/prefer Infinity to be returned.

1/0 is an illegal operation.  We could return NaN for it but the choice of throwing an error is just as correct.  Returning infinity is strictly incorrect.

Changing the behavior is not going to happen for any existing data types.

David J.

Re: Infinity vs Error for division by zero

От
Andres Freund
Дата:
Hi,

On 2019-03-01 12:46:55 -0500, Matt Pulver wrote:
> PostgreSQL FLOAT appears to support +/-Infinity and NaN per the IEEE 754
> standard, with expressions such as CAST('NaN' AS FLOAT) and CAST('Infinity'
> AS FLOAT) and even supports ordering columns of floats that contain NaN.
> 
> However the query "SELECT 1.0/0.0;" produces an exception:
> 
> ERROR:  division by zero
> 
> 
> Question: If Infinity and NaN are supported, then why throw an exception
> here, instead of returning Infinity? Is it purely for historical reasons,
> or if it could all be done again, would an exception still be preferred?
> 
> For purely integer arithmetic, I can see how an exception would make sense.
> However for FLOAT, I would expect/prefer Infinity to be returned.

It'd be good for performance reasons to not have to check for that, and
for under/overflow. But the historical behaviour has quite some weight,
and there's some language in the standard that can legitimate be
interpreted that both conditions need to be signalled, if I recall
correctly.

Greetings,

Andres Freund


Re: Infinity vs Error for division by zero

От
Chapman Flack
Дата:
On 3/1/19 1:04 PM, David G. Johnston wrote:

> 1/0 is an illegal operation.  We could return NaN for it but the choice of
> throwing an error is just as correct.  Returning infinity is strictly
> incorrect.

That differs from my understanding of how the operations are specified
in IEEE 754 (as summarized in, e.g., [1]).

Andrew posted the relevant part of the SQL spec that requires the
operation to raise 22012.

That's a requirement specific to SQL (which is, of course, what matters
here.)

But if someone wanted to write a user-defined division function or
operator that would return Inf for (anything > 0) / 0 and for
(anything < 0) / -0, and -Inf for (anything < 0) / 0 and for
(anything > 0) / -0, and NaN for (either zero) / (either zero), I think
that function or operator would be fully in keeping with IEEE 754.

-Chap


[1] https://steve.hollasch.net/cgindex/coding/ieeefloat.html#operations


Re: Infinity vs Error for division by zero

От
Andres Freund
Дата:
On 2019-03-01 11:04:04 -0700, David G. Johnston wrote:
> Changing the behavior is not going to happen for any existing data types.

For the overflow case that really sucks, because we're leaving a very
significant amount of performance on the table because we recheck for
overflow in every op. The actual float operation is basically free, but
the overflow check and the calling convention is not. JIT can get of the
latter, but not the former. Which is why we spend like 30% in one of the
TPCH queries doing overflow checks...

I still kinda wonder whether we can make trapping operations work, but
it's not trivial.


Re: Infinity vs Error for division by zero

От
"David G. Johnston"
Дата:
On Friday, March 1, 2019, Chapman Flack <chap@anastigmatix.net> wrote:

But if someone wanted to write a user-defined division function or
operator that would return Inf for (anything > 0) / 0 and for
(anything < 0) / -0, and -Inf for (anything < 0) / 0 and for
(anything > 0) / -0, and NaN for (either zero) / (either zero), I think
that function or operator would be fully in keeping with IEEE 754.

Upon further reading you are correct - IEEE 754 has chosen to treat n/0 differently for n=0 and n<>0 cases.  I'm sure they have their reasons but within the scope of this database, and the core arithmetic functions it provides, those distinctions don't seeming meaningful and having to add query logic to deal with both cases would just be annoying.  I don't use, or have time for the distraction, to understand why such a decision was made and how it could be useful.  Going from an exception to NaN makes sense to me, going instead to infinity - outside of limit expressions which aren't applicable here - does not.

For my part in the queries I have that encounter divide-by-zero I end up transforming the result to zero which is considerably easier to present/absorb along side other valid fractions in a table or chart.

David J.

Re: Infinity vs Error for division by zero

От
Chapman Flack
Дата:
On 3/1/19 2:26 PM, David G. Johnston wrote:

> Upon further reading you are correct - IEEE 754 has chosen to treat n/0
> differently for n=0 and n<>0 cases.  I'm sure they have their reasons but
> ... I don't use,
> or have time for the distraction, to understand why such a decision was
> made and how it could be useful.

The answer may be as simple as the inherent difference between
the cases.

0/0 is funny because of a uniqueness problem. Try to name q such that
0/0 = q, rewritten as q × 0 = 0, and the problem you run into is that
that's true for any value of q. So you would have to make some
completely arbitrary decision to name any value at all as "the" result.

(anything nonzero)/0 is funny because of a representability problem.
n/0 = q, rewritten as q × 0 = n, only has the problem that it's
untrue for every finite value q; they're never big enough. Calling the
result infinity is again a definitional decision, but this time it
is not an arbitrary one among multiple equally good choices; all
finite choices are ruled out, and the definitional choice is fully
consistent with what you see happening as a divisor *approaches* zero.

> For my part in the queries I have that encounter divide-by-zero I end up
> transforming the result to zero which is considerably easier to
> present

Easy to present it may be, but it lacks the mathematical
motivation behind the choice IEEE made ... as a value for q, zero
fails the q × 0 = n test fairly convincingly for nonzero n. :)

Regards,
-Chap


Re: Infinity vs Error for division by zero

От
Matt Pulver
Дата:
On Fri, Mar 1, 2019 at 12:59 PM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Matt" == Matt Pulver <mpulver@unitytechgroup.com> writes:

 Matt> ERROR:  division by zero

 Matt> Question: If Infinity and NaN are supported, then why throw an
 Matt> exception here, instead of returning Infinity?

Spec says so:

  4) The dyadic arithmetic operators <plus sign>, <minus sign>,
     <asterisk>, and <solidus> (+, -, *, and /, respectively) specify
     addition, subtraction, multiplication, and division, respectively.
     If the value of a divisor is zero, then an exception condition is
     raised: data exception -- division by zero.

Thank you, that is what I was looking for. In case anyone else is looking for source documentation on the standard, there is a link from https://en.wikipedia.org/wiki/SQL:2003#Documentation_availability to a zip file of the SQL 2003 draft http://www.wiscorp.com/sql_2003_standard.zip where one can confirm this (page 242 of 5WD-02-Foundation-2003-09.pdf).
 

On Fri, Mar 1, 2019 at 2:26 PM David G. Johnston <david.g.johnston@gmail.com> wrote:
On Friday, March 1, 2019, Chapman Flack <chap@anastigmatix.net> wrote:

But if someone wanted to write a user-defined division function or
operator that would return Inf for (anything > 0) / 0 and for
(anything < 0) / -0, and -Inf for (anything < 0) / 0 and for
(anything > 0) / -0, and NaN for (either zero) / (either zero), I think
that function or operator would be fully in keeping with IEEE 754.

Upon further reading you are correct - IEEE 754 has chosen to treat n/0 differently for n=0 and n<>0 cases.  I'm sure they have their reasons but within the scope of this database, and the core arithmetic functions it provides, those distinctions don't seeming meaningful and having to add query logic to deal with both cases would just be annoying.  I don't use, or have time for the distraction, to understand why such a decision was made and how it could be useful.  Going from an exception to NaN makes sense to me, going instead to infinity - outside of limit expressions which aren't applicable here - does not.

For my part in the queries I have that encounter divide-by-zero I end up transforming the result to zero which is considerably easier to present/absorb along side other valid fractions in a table or chart.

In heavy financial/scientific calculations with tables of data, using inf and nan are very useful, much more so than alternatives such as throwing an exception (which row(s) included the error?), or replacing them with NULL or 0. There are many intermediate values where using inf makes sense and results in finite outcomes at the appropriate limit: atan(1.0/0)=pi/2, erf(1.0/0)=1, exp(-1.0/0)=0, etc.

In contrast, nan represents a mathematically indeterminate form, in which the appropriate limit could not be ascertained. E.g. 0.0/0, inf-inf, 0.0*inf, etc. In many applications, I would much rather see calculations carried out via IEEE 754 all the way to the end, with nans and infs, which provides much more useful diagnostic information than an exception that doesn't return any rows at all. As Andres Freund pointed out, it is also more expensive to do the intermediate checks. Just let IEEE 754 do its thing! (More directed at the SQL standard than to PostgreSQL.)

Best regards,
Matt

Re: Infinity vs Error for division by zero

От
Chapman Flack
Дата:
On 3/1/19 3:49 PM, Matt Pulver wrote:

> In many applications, I would much rather see calculations carried out
> via IEEE 754 all the way to the end, with nans and infs, which
> provides much more useful diagnostic information than an exception that
> doesn't return any rows at all. As Andres Freund pointed out, it is also
> more expensive to do the intermediate checks. Just let IEEE 754 do its
> thing! (More directed at the SQL standard than to PostgreSQL.)

I wanted to try this out a little before assuming it would work,
and there seems to be no trouble creating a trivial domain over
float8 (say, CREATE DOMAIN ieeedouble AS float8), and then creating
operators whose operand types are the domain type.

So it seems an extension could easily do that, and supply happily
inf-returning and NaN-returning versions of the operators and
functions, and those will be used whenever operands have the domain
type.

It might even be useful and relatively elegant, while leaving the
SQL-specified base types to have the SQL-specified behavior.

-Chap


Re: Infinity vs Error for division by zero

От
Matt Pulver
Дата:
On Fri, Mar 1, 2019 at 4:51 PM Chapman Flack <chap@anastigmatix.net> wrote:
On 3/1/19 3:49 PM, Matt Pulver wrote:

> In many applications, I would much rather see calculations carried out
> via IEEE 754 all the way to the end, with nans and infs, which
> provides much more useful diagnostic information than an exception that
> doesn't return any rows at all. As Andres Freund pointed out, it is also
> more expensive to do the intermediate checks. Just let IEEE 754 do its
> thing! (More directed at the SQL standard than to PostgreSQL.)

I wanted to try this out a little before assuming it would work,
and there seems to be no trouble creating a trivial domain over
float8 (say, CREATE DOMAIN ieeedouble AS float8), and then creating
operators whose operand types are the domain type.

So it seems an extension could easily do that, and supply happily
inf-returning and NaN-returning versions of the operators and
functions, and those will be used whenever operands have the domain
type.

It might even be useful and relatively elegant, while leaving the
SQL-specified base types to have the SQL-specified behavior.

That would be very useful. I've been wanting this for years, and I'm sure the data users I work with will appreciate it (but don't directly understand this to be the solution).

There are issues relating to ordering and aggregation that perhaps are already transparent to you, but I'll mention anyway for the record. Conceptually, there would be different contexts of ordering:
  1. When writing mathematical functions, <, =, and > are all false when comparing to NaN (NaN != NaN is true.)
  2. In SQL when sorting or aggregating, NaN=NaN. Consider that there are 2^53-2 different double precision representations of NaN at the bit level. Under the same floating point ordering logic used for finite numbers, when applied to inf and nan, we get the following ordering: -nan < -inf < (all finite numbers) < inf < nan. When the bit patterns are taken into consideration, an efficient sort algorithm can be implemented. (Forgive me for stating the obvious, but just mentioning this for whoever is going to take this on.)
I would be most interested to hear of and discuss any other unforeseen complications or side-effects.

Best regards,
Matt

Re: Infinity vs Error for division by zero

От
Tom Lane
Дата:
Chapman Flack <chap@anastigmatix.net> writes:
> I wanted to try this out a little before assuming it would work,
> and there seems to be no trouble creating a trivial domain over
> float8 (say, CREATE DOMAIN ieeedouble AS float8), and then creating
> operators whose operand types are the domain type.

While you can do that to some extent, you don't have a lot of control
over when the parser will use your operator --- basically, it'll only
do so if both inputs are exact type matches.  Maybe that'd be enough
but I think it'd be fragile to use.  (See the "Type Conversion"
chapter in the manual for the gory details, and note that domains
get smashed to their base types mighty readily.)

Using custom operator names would work better/more reliably.

            regards, tom lane


Re: Infinity vs Error for division by zero

От
Chapman Flack
Дата:
On 03/01/19 17:34, Tom Lane wrote:

> but I think it'd be fragile to use.  (See the "Type Conversion"
> chapter in the manual for the gory details, and note that domains
> get smashed to their base types mighty readily.)
> 
> Using custom operator names would work better/more reliably.

Or a new base type (LIKE float8) rather than a domain?

Regards,
-Chap


Re: Infinity vs Error for division by zero

От
Tom Lane
Дата:
Chapman Flack <chap@anastigmatix.net> writes:
> On 03/01/19 17:34, Tom Lane wrote:
>> Using custom operator names would work better/more reliably.

> Or a new base type (LIKE float8) rather than a domain?

Yeah, it'd be more work but you would have control over the
coercion rules.

            regards, tom lane