Обсуждение: pgsql: Change floating-point output format for improved performance.

Поиск
Список
Период
Сортировка

pgsql: Change floating-point output format for improved performance.

От
Andrew Gierth
Дата:
Change floating-point output format for improved performance.

Previously, floating-point output was done by rounding to a specific
decimal precision; by default, to 6 or 15 decimal digits (losing
information) or as requested using extra_float_digits. Drivers that
wanted exact float values, and applications like pg_dump that must
preserve values exactly, set extra_float_digits=3 (or sometimes 2 for
historical reasons, though this isn't enough for float4).

Unfortunately, decimal rounded output is slow enough to become a
noticable bottleneck when dealing with large result sets or COPY of
large tables when many floating-point values are involved.

Floating-point output can be done much faster when the output is not
rounded to a specific decimal length, but rather is chosen as the
shortest decimal representation that is closer to the original float
value than to any other value representable in the same precision. The
recently published Ryu algorithm by Ulf Adams is both relatively
simple and remarkably fast.

Accordingly, change float4out/float8out to output shortest decimal
representations if extra_float_digits is greater than 0, and make that
the new default. Applications that need rounded output can set
extra_float_digits back to 0 or below, and take the resulting
performance hit.

We make one concession to portability for systems with buggy
floating-point input: we do not output decimal values that fall
exactly halfway between adjacent representable binary values (which
would rely on the reader doing round-to-nearest-even correctly). This
is known to be a problem at least for VS2013 on Windows.

Our version of the Ryu code originates from
https://github.com/ulfjack/ryu/ at commit c9c3fb1979, but with the
following (significant) modifications:

 - Output format is changed to use fixed-point notation for small
   exponents, as printf would, and also to use lowercase 'e', a
   minimum of 2 exponent digits, and a mandatory sign on the exponent,
   to keep the formatting as close as possible to previous output.

 - The output of exact midpoint values is disabled as noted above.

 - The integer fast-path code is changed somewhat (since we have
   fixed-point output and the upstream did not).

 - Our project style has been largely applied to the code with the
   exception of C99 declaration-after-statement, which has been
   retained as an exception to our present policy.

 - Most of upstream's debugging and conditionals are removed, and we
   use our own configure tests to determine things like uint128
   availability.

Changing the float output format obviously affects a number of
regression tests. This patch uses an explicit setting of
extra_float_digits=0 for test output that is not expected to be
exactly reproducible (e.g. due to numerical instability or differing
algorithms for transcendental functions).

Conversions from floats to numeric are unchanged by this patch. These
may appear in index expressions and it is not yet clear whether any
change should be made, so that can be left for another day.

This patch assumes that the only supported floating point format is
now IEEE format, and the documentation is updated to reflect that.

Code by me, adapting the work of Ulf Adams and other contributors.

References:
https://dl.acm.org/citation.cfm?id=3192369

Reviewed-by: Tom Lane, Andres Freund, Donald Dong
Discussion: https://postgr.es/m/87r2el1bx6.fsf@news-spur.riddles.org.uk

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/02ddd499322ab6f2f0d58692955dc9633c2150fc

Modified Files
--------------
configure                                          |   10 +-
configure.in                                       |    9 +-
contrib/btree_gist/expected/float4.out             |   20 +-
contrib/btree_gist/expected/float8.out             |   20 +-
contrib/cube/expected/cube.out                     |   32 +-
contrib/cube/expected/cube_sci.out                 |   18 +-
contrib/cube/sql/cube.sql                          |    8 +-
contrib/pg_trgm/expected/pg_strict_word_trgm.out   |    2 +
contrib/pg_trgm/expected/pg_trgm.out               |    2 +
contrib/pg_trgm/expected/pg_word_trgm.out          |    2 +
contrib/pg_trgm/sql/pg_strict_word_trgm.sql        |    3 +
contrib/pg_trgm/sql/pg_trgm.sql                    |    3 +
contrib/pg_trgm/sql/pg_word_trgm.sql               |    3 +
contrib/seg/expected/seg.out                       |    6 +-
doc/src/sgml/config.sgml                           |   37 +-
doc/src/sgml/datatype.sgml                         |   89 +-
src/Makefile.global.in                             |    1 +
src/backend/utils/adt/float.c                      |   24 +-
src/backend/utils/misc/guc.c                       |    7 +-
src/backend/utils/misc/postgresql.conf.sample      |    3 +-
src/common/Makefile                                |   15 +-
src/common/d2s.c                                   | 1076 ++++++++++++++++++++
src/common/d2s_full_table.h                        |  358 +++++++
src/common/d2s_intrinsics.h                        |  202 ++++
src/common/digit_table.h                           |   21 +
src/common/f2s.c                                   |  804 +++++++++++++++
src/common/ryu_common.h                            |  133 +++
src/include/common/shortest_dec.h                  |   63 ++
src/test/regress/expected/aggregates.out           |    2 +
src/test/regress/expected/circle.out               |    2 +
.../regress/expected/float4-misrounded-input.out   |  656 ++++++++++--
src/test/regress/expected/float4.out               |  656 ++++++++++--
src/test/regress/expected/float8-small-is-zero.out |  449 +++++++-
src/test/regress/expected/float8.out               |  449 +++++++-
src/test/regress/expected/int8.out                 |   42 +-
src/test/regress/expected/jsonb.out                |    6 +-
src/test/regress/expected/line.out                 |    6 +-
src/test/regress/expected/point.out                |    2 +
src/test/regress/expected/rules.out                |   30 +-
src/test/regress/expected/tsearch.out              |    6 +-
src/test/regress/expected/tstypes.out              |   64 +-
src/test/regress/expected/updatable_views.out      |    2 +
src/test/regress/expected/window.out               |   48 +-
src/test/regress/sql/aggregates.sql                |    3 +
src/test/regress/sql/circle.sql                    |    3 +
src/test/regress/sql/float4.sql                    |  219 ++++
src/test/regress/sql/float8.sql                    |  210 +++-
src/test/regress/sql/point.sql                     |    3 +
src/test/regress/sql/updatable_views.sql           |    3 +
src/tools/msvc/Mkvcbuild.pm                        |    2 +-
50 files changed, 5466 insertions(+), 368 deletions(-)


Re: pgsql: Change floating-point output format for improved performance.

От
Andrew Gierth
Дата:
Already aware of the windows breakage, will fix in a sec.

-- 
Andrew (irc:RhodiumToad)


Re: pgsql: Change floating-point output format for improved performance.

От
Andrew Gierth
Дата:
>>>>> "Andrew" == Andrew Gierth <andrew@tao11.riddles.org.uk> writes:

Fallout so far:

Cygwin claims to have strtof, but it silently underflows to zero and
misrounds input. Could be fixable with a variant output file?

ICC seems to be miscompiling something, that'll need investigation.

s390 failures: ts_rank isn't as numerically stable as I thought it would
be, that can be fixed by setting extra_float_digits=0 for that test like
I did with some others. To be honest I'm not sure why I didn't do that
before - probably just an oversight.

Cross-version upgrade is the big problem; I have no real idea how to
make that test work short of adding another GUC; revert?

-- 
Andrew (irc:RhodiumToad)


Re: pgsql: Change floating-point output format for improvedperformance.

От
Andres Freund
Дата:
Hi Andrew^2,

On 2019-02-13 16:38:16 +0000, Andrew Gierth wrote:
> Cross-version upgrade is the big problem; I have no real idea how to
> make that test work short of adding another GUC; revert?

Andrew Dunstan might be able to help, although I'm not immediately sure
how...

Greetings,

Andres Freund


Re: pgsql: Change floating-point output format for improvedperformance.

От
Andrew Dunstan
Дата:
On 2/13/19 12:01 PM, Andres Freund wrote:
> Hi Andrew^2,
>
> On 2019-02-13 16:38:16 +0000, Andrew Gierth wrote:
>> Cross-version upgrade is the big problem; I have no real idea how to
>> make that test work short of adding another GUC; revert?
> Andrew Dunstan might be able to help, although I'm not immediately sure
> how...
>

Me either. I can, of course, have the check module drop tables that make
the tests fail, but that seems like bad course of action in this case.
Meanwhile, I'm going to run a series of tests with different back
branches to see the scope of the problem. Expect to see these in the
buildfarm results today.


cheers


andrew



-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: pgsql: Change floating-point output format for improved performance.

От
Andrew Gierth
Дата:
>>>>> "Andrew" == Andrew Dunstan <andrew.dunstan@2ndquadrant.com> writes:

 >>> Cross-version upgrade is the big problem; I have no real idea how to
 >>> make that test work short of adding another GUC; revert?

 >> Andrew Dunstan might be able to help, although I'm not immediately
 >> sure how...

 Andrew> Me either. I can, of course, have the check module drop tables
 Andrew> that make the tests fail, but that seems like bad course of
 Andrew> action in this case. Meanwhile, I'm going to run a series of
 Andrew> tests with different back branches to see the scope of the
 Andrew> problem. Expect to see these in the buildfarm results today.

I think we know the scope of the problem - the test will fail if
upgrading any back branch to HEAD, since the pg_dump output is
different. But this discussion should probably continue on the Ryu
thread on -hackers rather than here.

-- 
Andrew (irc:RhodiumToad)