Обсуждение: numeric/decimal docs bug?
In datatype.sgml:
    The type numeric can store numbers of practically    unlimited size and precision,...
I think this is simply wrong since the current implementation of
numeric and decimal data types limit the precision up to 1000.
#define NUMERIC_MAX_PRECISION        1000
Comments?
--
Tatsuo Ishii
			
		Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> In datatype.sgml:
>      The type numeric can store numbers of practically
>      unlimited size and precision,...
> I think this is simply wrong since the current implementation of
> numeric and decimal data types limit the precision up to 1000.
> #define NUMERIC_MAX_PRECISION        1000
I was thinking just the other day that there's no reason for that
limit to be so low.  Jan, couldn't we bump it up to 8 or 16K or so?
(Not that I'd care to do heavy arithmetic on such numbers, or that
I believe there's any practical use for them ... but why set the
limit lower than we must?)
        regards, tom lane
			
		Are there other cases where the pgsql docs may say unlimited where it might not be? I remember when the FAQ stated unlimited columns per table (it's been corrected now so that's good). Not asking for every limit to be documented but while documentation is written if one does not yet know (or remember) the actual (or even rough/estimated) limit it's better to skip it for later than to falsely say "unlimited". Better to have no signal than noise in this case. Regards, Link. At 11:14 PM 02-03-2002 +0900, Tatsuo Ishii wrote: >In datatype.sgml: > > The type numeric can store numbers of practically > unlimited size and precision,... > >I think this is simply wrong since the current implementation of >numeric and decimal data types limit the precision up to 1000. > >#define NUMERIC_MAX_PRECISION 1000 > >Comments?
Tom Lane writes: > > #define NUMERIC_MAX_PRECISION 1000 > > I was thinking just the other day that there's no reason for that > limit to be so low. Jan, couldn't we bump it up to 8 or 16K or so? Why have an arbitrary limit at all? Set it to INT_MAX, or whatever the index variables have for a type. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane writes:
> #define NUMERIC_MAX_PRECISION        1000
>> 
>> I was thinking just the other day that there's no reason for that
>> limit to be so low.  Jan, couldn't we bump it up to 8 or 16K or so?
> Why have an arbitrary limit at all?  Set it to INT_MAX,
The hard limit is certainly no more than 64K, since we store these
numbers in half of an atttypmod.  In practice I suspect the limit may
be less; Jan would be more likely to remember...
        regards, tom lane
			
		Tom Lane wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > Tom Lane writes: > > #define NUMERIC_MAX_PRECISION 1000 > >> > >> I was thinking just the other day that there's no reason for that > >> limit to be so low. Jan, couldn't we bump it up to 8 or 16K or so? > > > Why have an arbitrary limit at all? Set it to INT_MAX, > > The hard limit is certainly no more than 64K, since we store these > numbers in half of an atttypmod. In practice I suspect the limit may > be less; Jan would be more likely to remember... It is arbitrary of course. I don't recall completely, have to dig into the code, but there might be some side effect when mucking with it. The NUMERIC code increases the actual internal precision when doing multiply and divide, what happens a gazillion times when doing higher functions like trigonometry. I think there was some connection between the max precision and how high this internal precision can grow, so increasing the precision might affect the computational performance of such higher functions significantly. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
Jan Wieck wrote: > > The hard limit is certainly no more than 64K, since we store these > > numbers in half of an atttypmod. In practice I suspect the limit may > > be less; Jan would be more likely to remember... > > It is arbitrary of course. I don't recall completely, have to > dig into the code, but there might be some side effect when > mucking with it. > > The NUMERIC code increases the actual internal precision when > doing multiply and divide, what happens a gazillion times > when doing higher functions like trigonometry. I think there > was some connection between the max precision and how high > this internal precision can grow, so increasing the precision > might affect the computational performance of such higher > functions significantly. Oh, interesting, maybe we should just leave it alone. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > Jan Wieck wrote: > > > The hard limit is certainly no more than 64K, since we store these > > > numbers in half of an atttypmod. In practice I suspect the limit may > > > be less; Jan would be more likely to remember... > > > > It is arbitrary of course. I don't recall completely, have to > > dig into the code, but there might be some side effect when > > mucking with it. > > > > The NUMERIC code increases the actual internal precision when > > doing multiply and divide, what happens a gazillion times > > when doing higher functions like trigonometry. I think there > > was some connection between the max precision and how high > > this internal precision can grow, so increasing the precision > > might affect the computational performance of such higher > > functions significantly. > > Oh, interesting, maybe we should just leave it alone. As said, I have to look at the code. I'm pretty sure that it currently will not use hundreds of digits internally if you use only a few digits in your schema. So changing it isn't that dangerous. But who's going to write and run a regression test, ensuring that the new high limit can really be supported. Ididn't even run the numeric_big test lately, which tests with 500 digits precision at least ... and therefore takessome time (yawn). Increasing the number of digits used you first have to have some other tool to generate the test data (I originally used bc(1) with some scripts). Based on that we still claim that our systemdeals correctly with up to 1,000 digits precision. I don't like the idea of bumping up that number to some higher nonsense, claiming we support 32K digits precisionon exact numeric, and noone ever tested if natural log really returns it's result in that precision insteadof a 30,000 digit precise approximation. I missed some of the discussion, because I considered the 1,000 digits already beeing complete nonsense and droppedthe thread. So could someone please enlighten me what the real reason for increasing our precision is? AFAIR it had something to do with the docs. If it's just because the docs and the code aren't in sync, I'd votefor changing the docs. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com # _________________________________________________________ Do You Yahoo!? Get your free @yahoo.com address at http://mail.yahoo.com
> Jan Wieck wrote: > > > The hard limit is certainly no more than 64K, since we store these > > > numbers in half of an atttypmod. In practice I suspect the limit may > > > be less; Jan would be more likely to remember... > > > > It is arbitrary of course. I don't recall completely, have to > > dig into the code, but there might be some side effect when > > mucking with it. > > > > The NUMERIC code increases the actual internal precision when > > doing multiply and divide, what happens a gazillion times > > when doing higher functions like trigonometry. I think there > > was some connection between the max precision and how high > > this internal precision can grow, so increasing the precision > > might affect the computational performance of such higher > > functions significantly. > > Oh, interesting, maybe we should just leave it alone. So are we going to just fix the docs? -- Tatsuo Ishii
Jan Wieck wrote:
> Bruce Momjian wrote:
> > Jan Wieck wrote:
> > > > The hard limit is certainly no more than 64K, since we store these
> > > > numbers in half of an atttypmod.  In practice I suspect the limit may
> > > > be less; Jan would be more likely to remember...
> > >
> > >     It is arbitrary of course. I don't recall completely, have to
> > >     dig into the code, but there might be some side  effect  when
> > >     mucking with it.
> > >
> > >     The NUMERIC code increases the actual internal precision when
> > >     doing multiply and divide, what  happens  a  gazillion  times
> > >     when  doing higher functions like trigonometry. I think there
> > >     was some connection between the max precision  and  how  high
> > >     this internal precision can grow, so increasing the precision
> > >     might affect the computational  performance  of  such  higher
> > >     functions significantly.
> >
> > Oh, interesting, maybe we should just leave it alone.
> 
>     As  said, I have to look at the code. I'm pretty sure that it
>     currently will not use hundreds of digits internally  if  you
>     use  only  a  few digits in your schema. So changing it isn't
>     that dangerous.
> 
>     But who's going to write and run a regression test,  ensuring
>     that  the  new  high  limit can really be supported. I didn't
>     even run the numeric_big test lately, which  tests  with  500
>     digits  precision  at least ... and therefore takes some time
>     (yawn). Increasing the number of digits used you  first  have
>     to  have  some  other  tool  to  generate  the  test  data (I
>     originally used bc(1) with some scripts). Based  on  that  we
>     still  claim that our system deals correctly with up to 1,000
>     digits precision.
> 
>     I don't like the idea of  bumping  up  that  number  to  some
>     higher  nonsense, claiming we support 32K digits precision on
>     exact numeric, and noone ever tested if  natural  log  really
>     returns  it's  result  in  that precision instead of a 30,000
>     digit precise approximation.
> 
>     I missed some of the discussion,  because  I  considered  the
>     1,000 digits already beeing complete nonsense and dropped the
>     thread. So could someone please enlighten me  what  the  real
>     reason  for  increasing  our  precision  is?   AFAIR  it  had
>     something to do with the docs. If it's just because the  docs
>     and  the code aren't in sync, I'd vote for changing the docs.
I have done a little more research on this.  If you create a numeric
with no precision:
CREATE TABLE test (x numeric);
You can insert numerics that are greater in length that 1000 digits:
INSERT INTO test values ('1111(continues 1010 times)');
You can even do computations on it:
SELECT x+1 FROM test;
1000 is pretty arbitrary.  If we can handle 1000, I can't see how larger
values somehow could fail.
Also, the numeric regression tests takes much longer than the other
tests.  I don't see why a test of that length is required, compared to
the other tests.  Probably time to pair it back a little.
--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 
			
		Bruce Momjian wrote:
> Jan Wieck wrote:
> >
> >     I missed some of the discussion,  because  I  considered  the
> >     1,000 digits already beeing complete nonsense and dropped the
> >     thread. So could someone please enlighten me  what  the  real
> >     reason  for  increasing  our  precision  is?   AFAIR  it  had
> >     something to do with the docs. If it's just because the  docs
> >     and  the code aren't in sync, I'd vote for changing the docs.
>
> I have done a little more research on this.  If you create a numeric
> with no precision:
>
>    CREATE TABLE test (x numeric);
>
> You can insert numerics that are greater in length that 1000 digits:
>
>    INSERT INTO test values ('1111(continues 1010 times)');
>
> You can even do computations on it:
>
>    SELECT x+1 FROM test;
>
> 1000 is pretty arbitrary.  If we can handle 1000, I can't see how larger
> values somehow could fail.
   And  I  can't  see  what more than 1,000 digits would be good   for.  Bruce, your research is neat, but IMHO wasted
time.
   Why do we need to change it now? Is the more important  issue   (doing  the  internal  storage representation in
base10,000,   done yet? If not, we can open up for unlimited  precision  at   that time.
 
   Please,  adjust the docs for now, drop the issue and let's do   something useful.
> Also, the numeric regression tests takes much longer than the other
> tests.  I don't see why a test of that length is required, compared to
> the other tests.  Probably time to pair it back a little.
   What exactly do you mean with "pair it back"?  Shrinking  the   precision   of   the   test  or  reducing  it's
coverage of   functionality?
 
   For the former, it only uses 10 of the possible 1,000  digits   after  the  decimal  point.   Run the numeric_big
test(which   uses  800)  at  least  once  and  you'll  see  what  kind  of   difference precision makes.
 
   And  on  functionality,  it  is  absolutely  insufficient for   numerical functionality that  has  possible  carry,
rounding  etc.  issues,  to  check a function just for one single known   value, and if it computes that result
correctly,consider  it   OK for everything.
 
   I  thought  the  actual  test  is sloppy already ... but it's   still too much for you ... hmmmm.
Jan
--
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #
			
		...
> Also, the numeric regression tests takes much longer than the other
> tests.  I don't see why a test of that length is required, compared to
> the other tests.  Probably time to pair it back a little.
The numeric types are inherently slow. You might look at what effect you
can achieve by restructuring that regression test to more closely
resemble the other tests. In particular, it defines several source
tables, each one of which containing similar initial values. And it
defines a results table, into which intermediate results are placed,
which are then immediately queried for display and comparison to obtain
a test result. If handling the values is slow, we could certainly remove
these intermediate steps and still get most of the test coverage.
On another related topic:
I've been wanting to ask: we have in a few cases moved aggregate
calculations from small, fast data types to using numeric as the
accumulator. It would be nice imho to allow, say, an int8 accumulator
for an int4 data type, rather than requiring numeric.
But not all platforms (I assume) have an int8 data type. So we would
need to be able to fall back to numeric for those platforms which need
to use it. What would it take to make some of the catalogs configurable
or sensitive to configuration results?
                    - Thomas
			
		Thomas Lockhart <lockhart@fourpalms.org> writes:
> I've been wanting to ask: we have in a few cases moved aggregate
> calculations from small, fast data types to using numeric as the
> accumulator.
Which ones are you concerned about?  As of 7.2, the only ones that use
numeric accumulators for non-numeric input types are
aggname  |  basetype   |     aggtransfn      |  transtype
----------+-------------+---------------------+-------------avg      | int8        | int8_accum          | _numericsum
   | int8        | int8_sum            | numericstddev   | int2        | int2_accum          | _numericstddev   | int4
     | int4_accum          | _numericstddev   | int8        | int8_accum          | _numericvariance | int2        |
int2_accum         | _numericvariance | int4        | int4_accum          | _numericvariance | int8        | int8_accum
        | _numeric
 
All of these seem to have good precision/range arguments for using
numeric accumulators, or to be enough off the beaten track that it's
not worth much angst to optimize them.
        regards, tom lane
			
		> Which ones are you concerned about?  As of 7.2, the only ones that use
> numeric accumulators for non-numeric input types are
...
OK, I did imply that I've been wanting to ask this for some time. I
should have asked during the 7.1 era, when this was true for more cases.
:)
> All of these seem to have good precision/range arguments for using
> numeric accumulators, or to be enough off the beaten track that it's
> not worth much angst to optimize them.
Well, they *are* on the beaten track for someone, just not you! ;)
I'd think that things like stddev might be OK with 52 bits of
accumulation, so could be done with doubles. Were they implemented that
way at one time? Do we have a need to provide precision greater than
that, or to guard against the (unlikely) case of having so many values
that a double-based accumulator overflows its ability to see the next
value?
I'll point out that for the case of accumulating so many integers that
they can't work with a double, the alternative implementation of using
numeric may approach infinite computation time.
But in any case, I can ask the same question, only reversed:
We now have some aggregate functions which use, say, int4 to accumulate
int4 values, if the target platform does *not* support int8. What would
it take to make the catalogs configurable or able to respond to
configuration results so that, for example, platforms without int8
support could instead use numeric or double values as a substitute?
                     - Thomas
			
		Thomas Lockhart <lockhart@fourpalms.org> writes:
>> All of these seem to have good precision/range arguments for using
>> numeric accumulators, or to be enough off the beaten track that it's
>> not worth much angst to optimize them.
> Well, they *are* on the beaten track for someone, just not you! ;)
> I'd think that things like stddev might be OK with 52 bits of
> accumulation, so could be done with doubles.
ISTM that people who are willing to have it done in a double can simply
write stddev(x::float8).  Of course you will rejoin that if they want
it done in a numeric, they can write stddev(x::numeric) ... but since
we are talking about exact inputs, I would prefer that the default
behavior be to carry out the summation without loss of precision.
The stddev calculation *is* subject to problems if you don't do the
summation as accurately as you can.
> Do we have a need to provide precision greater than
> that, or to guard against the (unlikely) case of having so many values
> that a double-based accumulator overflows its ability to see the next
> value?
You don't see the cancellation problems inherent in N*sum(x^2) - sum(x)^2?
You're likely to be subtracting bignums even with not all that many
input values; they just have to be large input values.
> But in any case, I can ask the same question, only reversed:
> We now have some aggregate functions which use, say, int4 to accumulate
> int4 values, if the target platform does *not* support int8. What would
> it take to make the catalogs configurable or able to respond to
> configuration results so that, for example, platforms without int8
> support could instead use numeric or double values as a substitute?
Haven't thought hard about it.  I will say that I don't like the idea
of changing the declared output type of the aggregates across platforms.
Changing the internal implementation (ie, transtype) would be acceptable
--- but I doubt it's worth the trouble.  In most other arguments that
touch on this point, I seem to be one of the few holdouts for insisting
that we worry about int8-less platforms anymore at all ;-).  For those
few old platforms, the 7.2 behavior of avg(int) and sum(int) is no worse
than it was for everyone in all pre-7.1 versions; I am not excited about
expending significant effort to make it better.
        regards, tom lane
			
		Jan Wieck wrote:
> Bruce Momjian wrote:
> > Jan Wieck wrote:
> > >
> > >     I missed some of the discussion,  because  I  considered  the
> > >     1,000 digits already beeing complete nonsense and dropped the
> > >     thread. So could someone please enlighten me  what  the  real
> > >     reason  for  increasing  our  precision  is?   AFAIR  it  had
> > >     something to do with the docs. If it's just because the  docs
> > >     and  the code aren't in sync, I'd vote for changing the docs.
> >
> > I have done a little more research on this.  If you create a numeric
> > with no precision:
> >
> >    CREATE TABLE test (x numeric);
> >
> > You can insert numerics that are greater in length that 1000 digits:
> >
> >    INSERT INTO test values ('1111(continues 1010 times)');
> >
> > You can even do computations on it:
> >
> >    SELECT x+1 FROM test;
> >
> > 1000 is pretty arbitrary.  If we can handle 1000, I can't see how larger
> > values somehow could fail.
> 
>     And  I  can't  see  what more than 1,000 digits would be good
>     for.  Bruce, your research is neat, but IMHO wasted time.
> 
>     Why do we need to change it now? Is the more important  issue
>     (doing  the  internal  storage representation in base 10,000,
>     done yet? If not, we can open up for unlimited  precision  at
>     that time.
I certainly would like the 10,000 change done, but few of us are
capable of doing it.  :-(
>     Please,  adjust the docs for now, drop the issue and let's do
>     something useful.
Thats how I got started.  The problem is that the limit isn't 1,000. 
Looking at NUMERIC_MAX_PRECISION, I see it used in gram.y to prevent
creation of NUMERIC columns that exceed the maximum length, and I see it
used in numeric.c to prevent exponients that exceed the maximum length,
but I don't see other cases that would actually enforce the limit in
INSERT and other cases.
Remember how people complained when I said "unlimited" in the FAQ for
some items that actually had a limit.  Well, in this case, we have a
limit that is only enforced in some places.  I would like to see this
cleared up on way or the other so the docs would be correct.
Jan, any chance on doing the 10,000 change in your spare time?  ;-)
> > Also, the numeric regression tests takes much longer than the other
> > tests.  I don't see why a test of that length is required, compared to
> > the other tests.  Probably time to pair it back a little.
> 
>     What exactly do you mean with "pair it back"?  Shrinking  the
>     precision   of   the   test  or  reducing  it's  coverage  of
>     functionality?
> 
>     For the former, it only uses 10 of the possible 1,000  digits
>     after  the  decimal  point.   Run the numeric_big test (which
>     uses  800)  at  least  once  and  you'll  see  what  kind  of
>     difference precision makes.
> 
>     And  on  functionality,  it  is  absolutely  insufficient for
>     numerical functionality that  has  possible  carry,  rounding
>     etc.  issues,  to  check a function just for one single known
>     value, and if it computes that result correctly, consider  it
>     OK for everything.
> 
>     I  thought  the  actual  test  is sloppy already ... but it's
>     still too much for you ... hmmmm.
Well, our regression tests are not intended to test every possible
NUMERIC combination, just a resonable subset.  As it is now, I often
think the regression tests have hung because numeric takes so much
longer than any of the other tests.  We have had this code in there for
a while now, and it is not OS-specific stuff, so I think we should just
pair it back so we know it is working.  We already have bignumeric for a
larger test.
--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 
			
		Bruce Momjian wrote: > Well, our regression tests are not intended to test every possible > NUMERIC combination, just a resonable subset. As it is now, I often > think the regression tests have hung because numeric takes so much > longer than any of the other tests. We have had this code in there for > a while now, and it is not OS-specific stuff, so I think we should just > pair it back so we know it is working. We already have bignumeric for a > larger test. Bruce, have you even taken one single look at the test? It does 100 of each add, sub, mul and div, these are the fast operations that don't really take much time. Then it does 10 of each sqrt(), ln(), log10(), pow10() and 10 combined power(ln()). These are the time consuming operations, working iterative alas Newton, Taylor and McLaurin. All that is done with 10 digits after the decimal point only! So again, WHAT exactly do you mean with "pair it back"? Sorry, I don't get it. Do you want to remove the entiretest? Reduce it to an INSERT, one SELECT (so that we know the input- and output functions work) and the four basic operators used once? Well, that's a hell of a test, makes me really feel comfortable. Like the mechanic kicking against the tire then saying "I ain't see noth'n wrong with the brakes, ya sure can makea trip in the mountains". Yeah, at least once! Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck wrote: > Bruce Momjian wrote: > > Well, our regression tests are not intended to test every possible > > NUMERIC combination, just a resonable subset. As it is now, I often > > think the regression tests have hung because numeric takes so much > > longer than any of the other tests. We have had this code in there for > > a while now, and it is not OS-specific stuff, so I think we should just > > pair it back so we know it is working. We already have bignumeric for a > > larger test. > > Bruce, > > have you even taken one single look at the test? It does 100 > of each add, sub, mul and div, these are the fast operations > that don't really take much time. > > Then it does 10 of each sqrt(), ln(), log10(), pow10() and 10 > combined power(ln()). These are the time consuming > operations, working iterative alas Newton, Taylor and > McLaurin. All that is done with 10 digits after the decimal > point only! > > So again, WHAT exactly do you mean with "pair it back"? > Sorry, I don't get it. Do you want to remove the entire test? > Reduce it to an INSERT, one SELECT (so that we know the > input- and output functions work) and the four basic > operators used once? Well, that's a hell of a test, makes me > really feel comfortable. Like the mechanic kicking against > the tire then saying "I ain't see noth'n wrong with the > brakes, ya sure can make a trip in the mountains". Yeah, at > least once! Jan, regression is not a test of the level a developer would use to make sure his code works. It is merely to make sure the install works on a limited number of cases. Having seen zero reports of any numeric failures since we installed it, and seeing it takes >10x times longer than the other tests, I think it should be paired back. Do we really need 10 tests of each complex function? I think one would do the trick. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > Jan Wieck wrote: > > Bruce Momjian wrote: > > > Well, our regression tests are not intended to test every possible > > > NUMERIC combination, just a resonable subset. As it is now, I often > > > think the regression tests have hung because numeric takes so much > > > longer than any of the other tests. We have had this code in there for > > > a while now, and it is not OS-specific stuff, so I think we should just > > > pair it back so we know it is working. We already have bignumeric for a > > > larger test. > > > > Bruce, > > > > have you even taken one single look at the test? It does 100 > > of each add, sub, mul and div, these are the fast operations > > that don't really take much time. > > > > Then it does 10 of each sqrt(), ln(), log10(), pow10() and 10 > > combined power(ln()). These are the time consuming > > operations, working iterative alas Newton, Taylor and > > McLaurin. All that is done with 10 digits after the decimal > > point only! > > > > So again, WHAT exactly do you mean with "pair it back"? > > Sorry, I don't get it. Do you want to remove the entire test? > > Reduce it to an INSERT, one SELECT (so that we know the > > input- and output functions work) and the four basic > > operators used once? Well, that's a hell of a test, makes me > > really feel comfortable. Like the mechanic kicking against > > the tire then saying "I ain't see noth'n wrong with the > > brakes, ya sure can make a trip in the mountains". Yeah, at > > least once! > > Jan, regression is not a test of the level a developer would use to make > sure his code works. It is merely to make sure the install works on a > limited number of cases. Having seen zero reports of any numeric > failures since we installed it, and seeing it takes >10x times longer > than the other tests, I think it should be paired back. Do we really > need 10 tests of each complex function? I think one would do the trick. You forgot who wrote that code originally. I feel alot better WITH the tests in place :-) And if it's merely to make sure the install worked, man who is doing source installations these days and runs the regression tests anyway? Most people throw in a RPM or the like, only a few serious users install from sources,and only a fistfull of them then runs regression. Aren't it mostly developers and distro-maintainers who use that directory? I think your entire point isn't justweak, IMNSVHO you don't really have a point. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck wrote: > You forgot who wrote that code originally. I feel alot > better WITH the tests in place :-) > > And if it's merely to make sure the install worked, man who > is doing source installations these days and runs the > regression tests anyway? Most people throw in a RPM or the > like, only a few serious users install from sources, and only > a fistfull of them then runs regression. > > Aren't it mostly developers and distro-maintainers who use > that directory? I think your entire point isn't just weak, > IMNSVHO you don't really have a point. It is my understanding that RPM does run that test. My main issue is why does numeric have to be so much larger than the other tests? I have not heard that explained. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
...
> Jan, regression is not a test of the level a developer would use to make
> sure his code works.  It is merely to make sure the install works on a
> limited number of cases.  Having seen zero reports of any numeric
> failures since we installed it, and seeing it takes >10x times longer
> than the other tests, I think it should be paired back.  Do we really
> need 10 tests of each complex function?  I think one would do the trick.
Whoops. We rely on the regression tests to make sure that previous
behaviors continue to be valid behaviors. Another use is to verify that
a particular installation can reproduce this same test. But regression
testing is a fundamental and essential development tool, precisely
because it covers cases outside the range you might be thinking of
testing as you do development.
As a group, we might tend to underestimate the value of this, which
could be evidenced by the fact that our regression test suite has not
grown substantially more than it has over the years. It could have many
more tests within each module, and bug reports *could* be fed back into
regression updates to make sure that failures do not reappear.
All imho of course ;)
                    - Thomas
			
		...
> It is my understanding that RPM does run that test.  My main issue is
> why does numeric have to be so much larger than the other tests?  I have
> not heard that explained.
afaict it is not larger. It *does* take more time, but the number of
tests is relatively small, or at least compatible with the number of
tests which appear, or should appear, in other tests of data types
covering a large problem space (e.g. date/time).
It does illustrate that BCD-like encodings are expensive, and that
machine-supported math is usually a win. If it is a big deal, jump in
and widen the internal math operations!
                  - Thomas
			
		Bruce Momjian wrote: > Jan Wieck wrote: > > You forgot who wrote that code originally. I feel alot > > better WITH the tests in place :-) > > > > And if it's merely to make sure the install worked, man who > > is doing source installations these days and runs the > > regression tests anyway? Most people throw in a RPM or the > > like, only a few serious users install from sources, and only > > a fistfull of them then runs regression. > > > > Aren't it mostly developers and distro-maintainers who use > > that directory? I think your entire point isn't just weak, > > IMNSVHO you don't really have a point. > > It is my understanding that RPM does run that test. My main issue is > why does numeric have to be so much larger than the other tests? I have > not heard that explained. Well, I heard Thomas commenting that it's horribly slow implemented (or so, don't recall his exact wording). But he's right. I think the same test done with float8 would run in less than a tenth of that time. This is only an explanation "why it takes so long"? It is no argument pro or con the test itself. I think I made my point clear enough, that I consider calling these functions just once is plain sloppy. But that'sjust my opinion. What do others think? Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #
Jan Wieck <janwieck@yahoo.com> writes:
>     I think I made my point clear enough, that I consider calling
>     these  functions  just once is plain sloppy.  But that's just
>     my opinion. What do others think?
I don't have a problem with the current length of the numeric test.
The original form of it (now shoved over to bigtests) did seem
excessively slow to me ... but I can live with this one.
I do agree that someone ought to reimplement numeric using base10k
arithmetic ... but it's not bugging me so much that I'm likely
to get around to it anytime soon myself ...
Bruce, why is there no TODO item for that project?
        regards, tom lane
			
		Thomas Lockhart wrote: > ... > > It is my understanding that RPM does run that test. My main issue is > > why does numeric have to be so much larger than the other tests? I have > > not heard that explained. > > afaict it is not larger. It *does* take more time, but the number of > tests is relatively small, or at least compatible with the number of > tests which appear, or should appear, in other tests of data types > covering a large problem space (e.g. date/time). > > It does illustrate that BCD-like encodings are expensive, and that > machine-supported math is usually a win. If it is a big deal, jump in > and widen the internal math operations! OK, as long as everyone else is fine with the tests, we can leave it alone. The concept that the number of tests is realisitic, and that they are just slower than other data types, makes sense. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane wrote: > Jan Wieck <janwieck@yahoo.com> writes: > > I think I made my point clear enough, that I consider calling > > these functions just once is plain sloppy. But that's just > > my opinion. What do others think? > > I don't have a problem with the current length of the numeric test. > The original form of it (now shoved over to bigtests) did seem > excessively slow to me ... but I can live with this one. > > I do agree that someone ought to reimplement numeric using base10k > arithmetic ... but it's not bugging me so much that I'm likely > to get around to it anytime soon myself ... > > Bruce, why is there no TODO item for that project? Not sure. I was aware of it for a while. Added: * Change NUMERIC data type to use base 10,000 internally -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tatsuo Ishii wrote:
> > Jan Wieck wrote:
> > > > The hard limit is certainly no more than 64K, since we store these
> > > > numbers in half of an atttypmod.  In practice I suspect the limit may
> > > > be less; Jan would be more likely to remember...
> > >
> > >     It is arbitrary of course. I don't recall completely, have to
> > >     dig into the code, but there might be some side  effect  when
> > >     mucking with it.
> > >
> > >     The NUMERIC code increases the actual internal precision when
> > >     doing multiply and divide, what  happens  a  gazillion  times
> > >     when  doing higher functions like trigonometry. I think there
> > >     was some connection between the max precision  and  how  high
> > >     this internal precision can grow, so increasing the precision
> > >     might affect the computational  performance  of  such  higher
> > >     functions significantly.
> >
> > Oh, interesting, maybe we should just leave it alone.
>
> So are we going to just fix the docs?
OK, I have updated the docs.  Patch attached.
I have also added this to the TODO list:
    * Change NUMERIC to enforce the maximum precision, and increase it
--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 853-3000
  +  If your life is a hard drive,     |  830 Blythe Avenue
  +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
Index: datatype.sgml
===================================================================
RCS file: /cvsroot/pgsql/doc/src/sgml/datatype.sgml,v
retrieving revision 1.87
diff -c -r1.87 datatype.sgml
*** datatype.sgml    3 Apr 2002 05:39:27 -0000    1.87
--- datatype.sgml    13 Apr 2002 01:26:54 -0000
***************
*** 506,518 ****
      <title>Arbitrary Precision Numbers</title>
      <para>
!      The type <type>numeric</type> can store numbers of practically
!      unlimited size and precision, while being able to store all
!      numbers and carry out all calculations exactly.  It is especially
!      recommended for storing monetary amounts and other quantities
!      where exactness is required.  However, the <type>numeric</type>
!      type is very slow compared to the floating-point types described
!      in the next section.
      </para>
      <para>
--- 506,517 ----
      <title>Arbitrary Precision Numbers</title>
      <para>
!      The type <type>numeric</type> can store numbers with up to 1,000
!      digits of precision and perform calculations exactly. It is
!      especially recommended for storing monetary amounts and other
!      quantities where exactness is required. However, the
!      <type>numeric</type> type is very slow compared to the
!      floating-point types described in the next section.
      </para>
      <para>
			
		Jan Wieck wrote: > > Oh, interesting, maybe we should just leave it alone. > > As said, I have to look at the code. I'm pretty sure that it > currently will not use hundreds of digits internally if you > use only a few digits in your schema. So changing it isn't > that dangerous. > > But who's going to write and run a regression test, ensuring > that the new high limit can really be supported. I didn't > even run the numeric_big test lately, which tests with 500 > digits precision at least ... and therefore takes some time > (yawn). Increasing the number of digits used you first have > to have some other tool to generate the test data (I > originally used bc(1) with some scripts). Based on that we > still claim that our system deals correctly with up to 1,000 > digits precision. > > I don't like the idea of bumping up that number to some > higher nonsense, claiming we support 32K digits precision on > exact numeric, and noone ever tested if natural log really > returns it's result in that precision instead of a 30,000 > digit precise approximation. > > I missed some of the discussion, because I considered the > 1,000 digits already beeing complete nonsense and dropped the > thread. So could someone please enlighten me what the real > reason for increasing our precision is? AFAIR it had > something to do with the docs. If it's just because the docs > and the code aren't in sync, I'd vote for changing the docs. Jan, if the numeric code works on 100 or 500 digits, could it break with 10,000 digits. Is there a reason to believe longer digits could cause problems not present in shorter tests? -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> Jan, regression is not a test of the level a developer would use to make > sure his code works. It is merely to make sure the install works on a > limited number of cases. News to me! If anything, I don't think a lot of the current regression tests are comprehensive enough! For the SET/DROP NOT NULL patch I submitted, I included a regression test that tests every one of the preconditions in my code - that way if anything gets changed or broken, we'll find out very quickly. I personally don't have a problem with the time taken to regression test - and I think that trimming the numeric test _might_ be a false economy. Who knows what's going to turn around and bite us oneday? > Having seen zero reports of any numeric > failures since we installed it, and seeing it takes >10x times longer > than the other tests, I think it should be paired back. Do we really > need 10 tests of each complex function? I think one would do the trick. A good point tho, I didn't submit a regression test that tries to ALTER 3 different non-existent tables to check for failures - one test was enough... Chris
Christopher Kings-Lynne wrote: > > Having seen zero reports of any numeric > > failures since we installed it, and seeing it takes >10x times longer > > than the other tests, I think it should be paired back. Do we really > > need 10 tests of each complex function? I think one would do the trick. > > A good point tho, I didn't submit a regression test that tries to ALTER 3 > different non-existent tables to check for failures - one test was enough... That was my point. Is there much value in testing each function ten times. Anyway, seems only I care so I will drop it. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > Christopher Kings-Lynne wrote: > > > Having seen zero reports of any numeric > > > failures since we installed it, and seeing it takes >10x times longer > > > than the other tests, I think it should be paired back. Do we really > > > need 10 tests of each complex function? I think one would do the trick. > > > > A good point tho, I didn't submit a regression test that tries to ALTER 3 > > different non-existent tables to check for failures - one test was enough... > > That was my point. Is there much value in testing each function ten > times. Anyway, seems only I care so I will drop it. Yes there is value in it. There is conditional code in it that depends on the values. I wrote that before (I saidthere are possible carry, rounding etc. issues), and it looked to me that you simply ignored these facts. Jan -- #======================================================================# # It's easier to get forgiveness for being wrong than for being right. # # Let's break this rule - forgive me. # #================================================== JanWieck@Yahoo.com #