Re: [HACKERS] Patch: Avoid precision error in to_timestamp().

Поиск
Список
Период
Сортировка
От Erik Nordström
Тема Re: [HACKERS] Patch: Avoid precision error in to_timestamp().
Дата
Msg-id CAHuQZDQyT4Qc+hCbmf+N-x_8UUQ30Bncn=wJ002aos5QKqK9pg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] Patch: Avoid precision error in to_timestamp().  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tom,

You are of course right that you cannot add precision that was not there to begin with. My patch does nothing for input values that cannot be represented accurately to begin with. However, that was not my main point.

The idea with the calculation is this: When you remove the integral/seconds part of the value before converting to integral microseconds, you are creating a number that can be represented with a float at higher accuracy compared to the original value (i.e., simply because there are less digits in the mantissa/significand). Thus when doing 0.312311172485352 * USECS_PER_SEC, you suffer less precision-related errors with regards to microseconds than when doing 14864803242.312311172485352 * USECS_PER_SEC as in the original code. In other words, with this approach there are fewer cases when 1 ULP is bigger than 1 microsecond.

FWIW, this is the output from latest postgres HEAD, which includes your rint() patch:

postgres=# select to_timestamp(14864803242.312311);
         to_timestamp
-------------------------------
 2441-01-17 09:00:42.312312+01
(1 row)

And this is the output with my patch:

postgres=# select to_timestamp(14864803242.312311);
         to_timestamp
-------------------------------
 2441-01-17 09:00:42.312311+01
(1 row)


I don't know why you would get different output (that would be worrying in itself).

In any case, I would prefer a to_timestamp(numeric), although I see no reason not to make the float8 version as accurate as possible anyway.

-Erik



On Thu, Feb 9, 2017 at 3:56 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
=?UTF-8?Q?Erik_Nordstr=C3=B6m?= <erik.nordstrom@gmail.com> writes:
> Thanks for the insightful feedback. You are right, the patch does suffer
> from overflow (and other possible issues when I think about it). Using
> rint(), as you suggest, helps in my original example case, but there are
> still cases when the output is not what you would expect. For instance,
> converting the Unix time 14864803242.312311 gives back the timestamp
> "2441-01-17 09:00:42.312312+01", even if using rint() (see below).

Hm, that particular case works for me using the patch I committed
yesterday (which just added a rint() call to the existing code).
I'm a bit skeptical of the version you propose here because it seems
mighty prone to subtractive cancellation.  You're basically computing
x - int(x) which can't add any precision that wasn't there before.
Looking at successive microsecond values in this range:

regression=# select 14864803242.312310::float8 - 14864803242;
     ?column?
-------------------
 0.312309265136719
(1 row)

regression=# select 14864803242.312311::float8 - 14864803242;
     ?column?
-------------------
 0.312311172485352
(1 row)

regression=# select 14864803242.312312::float8 - 14864803242;
     ?column?
-------------------
 0.312311172485352
(1 row)

regression=# select 14864803242.312313::float8 - 14864803242;
     ?column?
-------------------
 0.312313079833984
(1 row)

Basically, 1 ULP in this range is more than 1 microsecond, so there are
going to be places where you cannot get the right answer.  Reformulating
the computation will just shift the errors to nearby values.  float8
simply doesn't have enough bits to represent this many microseconds since
1970 exactly, and the problem's only going to get worse the further out
you look.

I think we might be better advised to add to_timestamp(numeric) alongside
to_timestamp(float8).  There's plenty of precedent for that (e.g, exp(),
ln()) so I would not expect problems with ambiguous function calls.
It'd be slower though ...

                        regards, tom lane

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Patch: Avoid precision error in to_timestamp().
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: [HACKERS] ICU integration