Re: Adding IEEE 754:2008 decimal floating point and hardware support for it

Поиск
Список
Период
Сортировка
От Craig Ringer
Тема Re: Adding IEEE 754:2008 decimal floating point and hardware support for it
Дата
Msg-id 51B7EE9F.5070808@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: Adding IEEE 754:2008 decimal floating point and hardware support for it  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 06/12/2013 08:35 AM, Tom Lane wrote:
> Craig Ringer <craig@2ndquadrant.com> writes:
>> Currently DECIMAL is an alias for NUMERIC, Pg's built-in arbitrary
>> precision and scale decimal type. I'd like to explore the possibility of
>> using hardware decimal floating point support in newer processors,
>> compilers and C libraries to enhance DECIMAL / NUMERIC performance.
> 
> As near as I can tell, there is no such hardware support.  The Intel
> paper you reference describes a pure-software library, and states
> "A software implementation was deemed sufficient for the foreseeable
> future".

Indeed... it looks like hardware IEEE 754:2008 decimal fp is limited to
POWER 6 / POWER 7, which is a bit of a niche area for Pg.

Interestingly, some general reading suggests that a lot of mainframe
hardware has had decimal number support for a long time due to high
adoption by the finance industry.

BTW, another relevant Intel paper that goes into the background and
history more is http://www.intel.com/standards/floatingpoint.pdf . The
"Implementation strategies for Decimal Floating-Point Arithmetic"
section (pp9) is interesting, as are the paper's references.


> The source code for that library is apparently available under a
> liberal license.  It might be more useful to eyeball what they did
> and see if we can learn anything towards speeding up the existing
> variable-precision NUMERIC type.

It certainly looks like a fair bit can be learned.

For one thing, there's the representation. The standard specifies
densely packed decimal
(http://en.wikipedia.org/wiki/Densely_packed_decimal) and binary integer
decimal (http://en.wikipedia.org/wiki/Binary_Integer_Decimal).

One nice characteristic of that is that it converts very efficiently
to/from BCD, and is identical to BCD for small values, which would be
nice for us. The Intel paper suggests that BID is generally considered
superior for a software implementation, though.

> Indeed.  I think you're basically between a rock and a hard place there.
> It would be very very difficult to shoehorn such types into the existing
> numeric hierarchy if you wanted any sort of transparency of behavior,
> I fear.

I was afraid of that - I wasn't seeing any ways to do it nicely, but was
hoping someone with more experience with the type system would point out
something wonderful.

> On the other hand, I doubt that it's going to work to make the
> existing numeric type switch to the "hardware" representation for
> suitably-constrained columns, because what are you going to do when,
> say, the result of an addition overflows the hardware width?  You can't
> just throw an error immediately, because you won't know whether the
> output is supposed to be getting shoved back into a limited-width column
> or not.

That does sound like a hard problem. Even if we're going to cram it back
into a small field the user may still want higher precision intermediate
values to be used.

That these means these types would probably to behave more like
smallint/integer/bigint/etc, reporting out-of-range errors rather than
silently promoting. That would be hard to fit into the SQL spec's use of
a single DECIMAL type unless we just redefined DECIMAL as _Decimal128,
which would go down about as well as swallowing tacks.

The standard doesn't really seem to allow for multiple different sized
decimal sub types, it just has the idea of one "DECIMAL" and that's what
you get, with the implementation taking care of all the messy details.

We could take care of those messy details by selecting suitable types
(DECIMAL32, DECIMAL64, DECIMAL128, NUMERIC) for different
DECIMAL(scale,precision) specifications like we do with
FLOAT(precision), but because Pg disregards typmods in intermediate
results that'd cause problems with things that currently work, like:

regress=> SELECT DECIMAL(8,2) '123456.78' * DECIMAL(2,0) '10'; ?column?
------------1234567.80
(1 row)

which currently succeeds despite being out of bounds for the type, since
the type is silently converted to unqualified 'numeric'. So the result
wouldn't be legal as an input but can be produced as an output:

regress=> SELECT DECIMAL(8,2) '1234567.80';
ERROR:  numeric field overflow
DETAIL:  A field with precision 8, scale 2 must round to an absolute
value less than 10^6.

That's pretty ugly, but it's well established behaviour.

We can't carry typmods through calculations without huge and expensive
re-work from what I've seen raised in prior discussions. I think those
were mostly about standards compliance issues with the JDBC driver that
our discarding typmods creates. Without that we'd have to use an
approach like that used for float(p) ... and then magic up a sane way to
deal with the backward compat nightmare.

If it weren't for already treating DECIMAL as an alias for NUMERIC I'd
be all for just using the FLOAT(p) approach.

> And on top of that, you have the very strong likelihood that the
> "hardware" implementation(s) won't behave exactly like our existing
> NUMERIC routines --- for instance, I'd bet a nickel that Intel took more
> care with last-place roundoff than our code does.

Agreed. That's where the standardisation effort and test suite helps,
though - there's a single "correct" result to aim for.

Dealing with people who're relying on subtleties of the current results,
though... that's harder.

> On the whole, I think the effort would be a lot more usefully spent on
> trying to make the existing NUMERIC support go faster.

I guess that makes sense, though I still think there might be real value
in adding IEEE 754:2008 DECIMAL32, DECIMAL64 and DECIMAL128 based on one
of the existing implementations. Even if users have to explicitly select
them it could be exceedingly useful.

For one thing, TPC-H uses 'DECIMAL' heavily.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: JSON and unicode surrogate pairs
Следующее
От: "David E. Wheeler"
Дата:
Сообщение: Re: [PATCH] Exorcise "zero-dimensional" arrays (Was: Re: Should array_length() Return NULL)