Обсуждение: Incorrect rounding of double values at max precision

Поиск
Список
Период
Сортировка

Incorrect rounding of double values at max precision

От
Gilleain Torrance
Дата:

Hi,

 

When storing a double in Postgres, it looks like under specific circumstances it can get rounded incorrectly:

 

select round(cast(float8 '42258656681.38498' as numeric), 2), round(numeric '42258656681.38498', 2);

 

which returns either 42258656681.38 or 42258656681.39 depending on whether it is float8 or not.

 

Thanks,

 

Gilleain



Gilleain Torrance
Consultant

Alfa

e: Gilleain.Torrance@alfasystems.com | w: alfasystems.com
t: +44 (0)20 7920 2855 | Moor Place, 1 Fore Street Avenue, London, EC2Y 9DT,


The contents of this communication are not intended to be binding or constitute any form of offer or acceptance or give rise to any legal obligations on behalf of the sender or Alfa. The views or opinions expressed represent those of the author and not necessarily those of Alfa. This email and any attachments are strictly confidential and are intended solely for use by the individual or entity to whom it is addressed. If you are not the addressee (or responsible for delivery of the message to the addressee) you may not copy, forward, disclose or use any part of the message or its attachments. At present the integrity of email across the internet cannot be guaranteed and messages sent via this medium are potentially at risk. All liability is excluded to the extent permitted by law for any claims arising as a result of the use of this medium to transmit information by or to Alfa or its affiliates.

Alfa Financial Software Ltd
Reg. in England No: 0248 2325

Re: Incorrect rounding of double values at max precision

От
Tom Lane
Дата:
Gilleain Torrance <Gilleain.Torrance@alfasystems.com> writes:
> When storing a double in Postgres, it looks like under specific circumstances it can get rounded incorrectly:

> select round(cast(float8 '42258656681.38498' as numeric), 2), round(numeric '42258656681.38498', 2);

> which returns either 42258656681.38 or 42258656681.39 depending on whether it is float8 or not.

I think this is behaving as expected.  float8-to-numeric conversion
rounds the float8 to 15 (DBL_DIG) decimal places, since that's as much
precision as you're guaranteed to have.  So what comes out of the cast
is

regression=# select cast(float8 '42258656681.38498' as numeric);
     numeric
-----------------
 42258656681.385
(1 row)

and then that rounds up to 42258656681.39.  In the other case you
have an exact numeric value of 42258656681.38498, so it's unsurprisingly
rounded to 42258656681.38.

You could quibble about whether numeric round() ought to apply
round-up or round-to-nearest-even when dealing with exact halfway
cases.  If it did the latter, this particular case would match up,
but other cases would not, so I don't think it's a helpful proposal
for this issue.

The other thing we could conceivably do is ask sprintf for more digits.
But since those extra digit(s) aren't fully precise, I'm afraid that
would likewise introduce as many oddities as it fixes.  Still, it's
somewhat interesting to wonder whether applying the Ryu algorithm
would produce better or worse results on average.

            regards, tom lane



Re: Incorrect rounding of double values at max precision

От
Andrew Gierth
Дата:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 > Gilleain Torrance <Gilleain.Torrance@alfasystems.com> writes:
 >> When storing a double in Postgres, it looks like under specific
 >> circumstances it can get rounded incorrectly:

 >> select round(cast(float8 '42258656681.38498' as numeric), 2),
 >> round(numeric '42258656681.38498', 2);

 >> which returns either 42258656681.38 or 42258656681.39 depending on
 >> whether it is float8 or not.

 Tom> I think this is behaving as expected. float8-to-numeric conversion
 Tom> rounds the float8 to 15 (DBL_DIG) decimal places, since that's as
 Tom> much precision as you're guaranteed to have.

Yes. This came up for discussion in the Ryu patch, but did not get much
input; I think some sort of case could be made for making the casts
exact, but the cast can't look at a config GUC without losing its
immutability, and changing the value could have an effect on functional
indexes. So I ended up not touching that at all.

 Tom> The other thing we could conceivably do is ask sprintf for more
 Tom> digits. But since those extra digit(s) aren't fully precise, I'm
 Tom> afraid that would likewise introduce as many oddities as it fixes.
 Tom> Still, it's somewhat interesting to wonder whether applying the
 Tom> Ryu algorithm would produce better or worse results on average.

Hmm.

The Ryu output values will still throw out edge cases similar to the
above; for example, 502.15::float8 / 10 = 50.214999999999996 whereas
502.15::numeric / 10 = 50.215, so rounding the result of that to 2
digits will give a different result.

Perhaps it would make more sense for the float8 to numeric cast to look
at the requested typmod and use that for the conversion? That way we
could make casts like fltval::numeric(20,2) or whatever produce the
correct result without any double-rounding issues. But the nature of
floating point means that this would still throw out occasionally
unexpected values (e.g. the 502.15::float8/10 example would still give
50.21 for a 2-digit result rather than 50.22).

(502.15::float8 is exactly
502.14999999999997726263245567679405212402343750)

I also did consider adding functions to convert a float8 value to the
_exact_ numeric that it represents. This is easy enough to write using
numeric arithmetic (I have SQL versions that I used extensively when
testing the Ryu code) but the performance isn't exceptionally good.
Might be good enough for many applications, though.

-- 
Andrew (irc:RhodiumToad)



Re: Incorrect rounding of double values at max precision

От
Tom Lane
Дата:
Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:
>  Tom> The other thing we could conceivably do is ask sprintf for more
>  Tom> digits. But since those extra digit(s) aren't fully precise, I'm
>  Tom> afraid that would likewise introduce as many oddities as it fixes.
>  Tom> Still, it's somewhat interesting to wonder whether applying the
>  Tom> Ryu algorithm would produce better or worse results on average.

> Hmm.

> The Ryu output values will still throw out edge cases similar to the
> above; for example, 502.15::float8 / 10 = 50.214999999999996 whereas
> 502.15::numeric / 10 = 50.215, so rounding the result of that to 2
> digits will give a different result.

Yeah.  Worse, casting that to numeric currently gives the "correct"
result:

regression=# select  (502.15::float8 / 10)::numeric;
 numeric 
---------
  50.215
(1 row)

while if we changed float8_numeric to apply Ryu, the result would be
50.214999999999996.  So that's not great.  But there are other cases
where the result would be better than before, such as the OP's example
of 42258656681.38498::float8.  I'd like to get my hands around how
many "better" and "worse" cases there would be, but I'm not sure how
to attack that question.

> Perhaps it would make more sense for the float8 to numeric cast to look
> at the requested typmod and use that for the conversion?

As things stand right now, float8_numeric has no idea what the target
typmod is; any typmod-driven rounding happens in a separate function
call afterwards.  I don't recall whether the parser's casting
infrastructure could support merging those steps, and I'm not sure
it matters in most cases.  Commonly, we don't have a target typmod.
(Still, if we do, having two separate rounding steps isn't nice.)

            regards, tom lane



Re: Incorrect rounding of double values at max precision

От
Andrew Gierth
Дата:
>>>>> "Tom" == Tom Lane <tgl@sss.pgh.pa.us> writes:

 >> Perhaps it would make more sense for the float8 to numeric cast to
 >> look at the requested typmod and use that for the conversion?

 Tom> As things stand right now, float8_numeric has no idea what the
 Tom> target typmod is; any typmod-driven rounding happens in a separate
 Tom> function call afterwards. I don't recall whether the parser's
 Tom> casting infrastructure could support merging those steps,

As far as I can tell it does; it looks at whether the cast function
takes a typmod parameter, and if it does, it passes the typmod,
otherwise it generates a separate typmod coercion and stacks that on top
of the cast proper.

So changing the function declaration to include a typmod parameter, and
using it, should just work... but I've not tested it yet.

 Tom> and I'm not sure it matters in most cases. Commonly, we don't have
 Tom> a target typmod. (Still, if we do, having two separate rounding
 Tom> steps isn't nice.)

So we'd still need to decide what to do in the no-typmod case.

-- 
Andrew (irc:RhodiumToad)



RE: Incorrect rounding of double values at max precision

От
Gilleain Torrance
Дата:
> I think this is behaving as expected.  float8-to-numeric conversion
rounds the float8 to 15 (DBL_DIG) decimal places, since that's as much
precision as you're guaranteed to have.

Perhaps, but the original path we took to find this was through JDBC, specifically the one from
https://jdbc.postgresql.org/.Tracking the double (42258656681.38498) through the driver shows that it is converted from
javadouble to float8 bytes, and is then stored as 42258656681.39. Which is definitely not expected, even if it can be
explainedby double rounding. 

The half-up/half-down of the rounding may be a side issue, although also important.

A simple test like this will show the behaviour we see:

   Connection conn = getConnection(); // some db connection
   PreparedStatement pstmt = conn.prepareStatement("INSERT into mytable VALUES (?, ?)");
   double x = 4.225865668138498E10;
   int id = 123;
   pstmt.setObject(1, id);
   pstmt.setDouble(2, x);
   pstmt.execute();

where mytable just has a decimal(13, 2) column and an integer id. When selected afterwards, we get the 42258656681.39
valueinstead of an expected value of 42258656681.38. 

thanks

gilleain

-----Original Message-----
From: Tom Lane <tgl@sss.pgh.pa.us>
Sent: 21 October 2019 19:43
To: Gilleain Torrance <Gilleain.Torrance@alfasystems.com>
Cc: pgsql-bugs@lists.postgresql.org; Andrew Gierth <andrew@tao11.riddles.org.uk>
Subject: Re: Incorrect rounding of double values at max precision

Gilleain Torrance <Gilleain.Torrance@alfasystems.com> writes:
> When storing a double in Postgres, it looks like under specific circumstances it can get rounded incorrectly:

> select round(cast(float8 '42258656681.38498' as numeric), 2), round(numeric '42258656681.38498', 2);

> which returns either 42258656681.38 or 42258656681.39 depending on whether it is float8 or not.

I think this is behaving as expected.  float8-to-numeric conversion
rounds the float8 to 15 (DBL_DIG) decimal places, since that's as much
precision as you're guaranteed to have.  So what comes out of the cast
is

regression=# select cast(float8 '42258656681.38498' as numeric);
     numeric
-----------------
 42258656681.385
(1 row)

and then that rounds up to 42258656681.39.  In the other case you
have an exact numeric value of 42258656681.38498, so it's unsurprisingly
rounded to 42258656681.38.

You could quibble about whether numeric round() ought to apply
round-up or round-to-nearest-even when dealing with exact halfway
cases.  If it did the latter, this particular case would match up,
but other cases would not, so I don't think it's a helpful proposal
for this issue.

The other thing we could conceivably do is ask sprintf for more digits.
But since those extra digit(s) aren't fully precise, I'm afraid that
would likewise introduce as many oddities as it fixes.  Still, it's
somewhat interesting to wonder whether applying the Ryu algorithm
would produce better or worse results on average.

            regards, tom lane



Re: Incorrect rounding of double values at max precision

От
Andrew Gierth
Дата:
>>>>> "Gilleain" == Gilleain Torrance <Gilleain.Torrance@alfasystems.com> writes:

 Gilleain> A simple test like this will show the behaviour we see:

 Gilleain>    Connection conn = getConnection(); // some db connection
 Gilleain>    PreparedStatement pstmt = conn.prepareStatement("INSERT into mytable VALUES (?, ?)");
 Gilleain>    double x = 4.225865668138498E10;
 Gilleain>    int id = 123;
 Gilleain>    pstmt.setObject(1, id);
 Gilleain>    pstmt.setDouble(2, x);
 Gilleain>    pstmt.execute();

 Gilleain> where mytable just has a decimal(13, 2) column and an integer
 Gilleain> id.

I do not see any way to make that code work as you apparently expect in
all cases. Here is a simple counterexample: imagine setting x to 502.215
instead. Currently, that will insert a value of 502.22 into the table,
as you would expect. If we fixed your example above by doing the
conversion with maximum precision, then 502.215 would instead be
inserted as 502.21, because the actual float value that represents
502.215 is equal to 502.21499999999997498889570124447345733642578125.

Or for another example, 4.225865668139500E10 currently inserts as
42258656681.40 in your code, but using maximum precision would cause it
to insert as 42258656681.39 instead (the true value of a float8
'4.225865668139500E10' is 42258656681.39499664306640625.)

So while what we currently do is arguably wrong since it's doing two
rounding steps, fixing it wouldn't actually help your problem but would
just move the failure cases to different values.

-- 
Andrew (irc:RhodiumToad)



Re: Incorrect rounding of double values at max precision

От
Andres Freund
Дата:
Hi,

Re-found this thread due to
https://postgr.es/m/CH2PR19MB3798B24BCC34D3F9949F629C83000%40CH2PR19MB3798.namprd19.prod.outlook.com

On 2019-10-21 22:41:11 -0400, Tom Lane wrote:
> Andrew Gierth <andrew@tao11.riddles.org.uk> writes:
> > The Ryu output values will still throw out edge cases similar to the
> > above; for example, 502.15::float8 / 10 = 50.214999999999996 whereas
> > 502.15::numeric / 10 = 50.215, so rounding the result of that to 2
> > digits will give a different result.
> 
> Yeah.  Worse, casting that to numeric currently gives the "correct"
> result:
> 
> regression=# select  (502.15::float8 / 10)::numeric;
>  numeric 
> ---------
>   50.215
> (1 row)
> 
> while if we changed float8_numeric to apply Ryu, the result would be
> 50.214999999999996.  So that's not great.  But there are other cases
> where the result would be better than before, such as the OP's example
> of 42258656681.38498::float8.  I'd like to get my hands around how
> many "better" and "worse" cases there would be, but I'm not sure how
> to attack that question.

The above bug report has a, in my opinion, pretty egregious cases of
wrongness. From the bug:

postgres[1296822][1]=# select '1234567'::float4::numeric;
┌─────────┐
│ numeric │
├─────────┤
│ 1234570 │
└─────────┘
(1 row)

It seems crazy that we throw away integer precision in the range it's
guaranteed to be accurate (that's what, −16777216 to 16777216 for
float4, more than a magnitude larger than this value). It does feel
different to me that we're throwing away precision that we could know is
not just the result of floating point imprecision.

I can't really see outputting a bunch of "fake precision" post decimal
points digits being comparably bad?


I didn't find much discussion about how we could deal with pg_upgrade
issues if we were to change the behaviour. Detect indexes involving such
casts, and mark them as invalid?

Greetings,

Andres Freund



Re: Incorrect rounding of double values at max precision

От
Tom Lane
Дата:
[ btw, thanks for finding this thread; I searched for relevant
discussion earlier today and couldn't find it ]

Andres Freund <andres@anarazel.de> writes:
> It seems crazy that we throw away integer precision in the range it's
> guaranteed to be accurate (that's what, −16777216 to 16777216 for
> float4, more than a magnitude larger than this value). It does feel
> different to me that we're throwing away precision that we could know is
> not just the result of floating point imprecision.

Meh.  Yeah, we could improve the observed results for float4 values
that are integers between 1M and 16M, and some similarly-sized band
for float8; but to what end?  The most likely practical result is
just to postpone the user's discovery that they're Doing It Wrong.
If you expect exact answers out of float calculations then you are
going to learn an expensive lesson sooner or later.  Better sooner,
before you've stored even more inexact data that you cannot fix.

> I didn't find much discussion about how we could deal with pg_upgrade
> issues if we were to change the behaviour. Detect indexes involving such
> casts, and mark them as invalid?

For what that's worth, I do not think I buy the argument that
float4_numeric's behavior can't ever be changed because it's marked
immutable.  Compare other recent discussions about what "immutable"
really means; or compare the fact that various text search functions
are marked immutable despite being quite dependent on config files
that we don't even track.  If we think this is actually an improvement
I'd be fine with changing it in a major release, and documenting
that users should reindex any indexes whose semantics are affected
(of which there'd be epsilon in the real world, anyway).  But I'm
not convinced that changing this is doing anything except putting
more lipstick on the pig.  float is imprecise, and we do no one
a service by trying to hide that.

            regards, tom lane



Re: Incorrect rounding of double values at max precision

От
Andres Freund
Дата:
Hi,

On 2020-10-20 21:48:52 -0400, Tom Lane wrote:
> Andres Freund <andres@anarazel.de> writes:
> > It seems crazy that we throw away integer precision in the range it's
> > guaranteed to be accurate (that's what, −16777216 to 16777216 for
> > float4, more than a magnitude larger than this value). It does feel
> > different to me that we're throwing away precision that we could know is
> > not just the result of floating point imprecision.
> 
> Meh.  Yeah, we could improve the observed results for float4 values
> that are integers between 1M and 16M, and some similarly-sized band
> for float8; but to what end?

I'm not actually arguing that we should improve it by relying on range
based heuristics. Just that throwing away precision that can't just
argued to have been conjured by float representation issues is an
indicator of our current approach to be quite wrong.


> The most likely practical result is just to postpone the user's
> discovery that they're Doing It Wrong.  If you expect exact answers
> out of float calculations then you are going to learn an expensive
> lesson sooner or later.  Better sooner, before you've stored even more
> inexact data that you cannot fix.

I don't buy this, not even for a second. Why is FLT_DIG = 6 the right
way to hint at that? Why not a precision of 5, 4, 3? Sure 6 digits is
guaranteed to roundtrip the same way, but that's an *extremely* coarse
filter.

And if we like exposing people to floating point imprecision, why is
select (502.15::float8 / 10)::numeric resulting in 50.214999999999996
bad?

Greetings,

Andres Freund



Re: Incorrect rounding of double values at max precision

От
Tom Lane
Дата:
Andres Freund <andres@anarazel.de> writes:
> On 2020-10-20 21:48:52 -0400, Tom Lane wrote:
>> Meh.  Yeah, we could improve the observed results for float4 values
>> that are integers between 1M and 16M, and some similarly-sized band
>> for float8; but to what end?

> I'm not actually arguing that we should improve it by relying on range
> based heuristics. Just that throwing away precision that can't just
> argued to have been conjured by float representation issues is an
> indicator of our current approach to be quite wrong.

I think your argument is founded on the unprovable assumption that
the original input value was an integer.  If we see 12345678::float4,
the "true" value could have been anything between 12345677.5 and
12345678.5.  Printing it as an exact integer isn't going to seem
too exact to a person who knows they put in a fraction.

Or for a person who does know a little bit about float arithmetic, they
might reasonably wonder why 12345.678::float4 is "exactly" converted to
numeric but 1234.5678::float4, with the same number of digits, is not.
(Ryu renders the latter as 1234.5677.)

>> The most likely practical result is just to postpone the user's
>> discovery that they're Doing It Wrong.

> I don't buy this, not even for a second. Why is FLT_DIG = 6 the right
> way to hint at that? Why not a precision of 5, 4, 3? Sure 6 digits is
> guaranteed to roundtrip the same way, but that's an *extremely* coarse
> filter.

It's also the *highest* precision that is guaranteed to round-trip,
which 5,4,3 are not, so that argument is unconvincing too.

> And if we like exposing people to floating point imprecision, why is
> select (502.15::float8 / 10)::numeric resulting in 50.214999999999996
> bad?

The point there was that that's "worse" (seemingly less precise)
than what happens now.

Please notice that I did not say that that one example means we
shouldn't change; any more than I think that isolated other examples
mean we should.  The question I was trying to get at is how
we could decide whether using Ryu for this is an overall win when
some examples will get better and others worse.  No one's stepped
up to that plate yet.

            regards, tom lane