Обсуждение: Reduce NUMERIC size by 2 bytes, reduce max length to 508 digits
[ Moved to hackers for patch discussion.] John D. Burger wrote: > >> There are practical applications, eg, 1024-bit keys are fairly common > >> objects in cryptography these days, and that equates to about 10^308. > >> I don't really foresee anyone trying to run crypto algorithms with SQL > >> NUMERIC arithmetic, though ... > > > > 2046 bit keys are becoming more common. However, math using these keys > > is > > usually done modulo a product of two primes and there are ways of > > doing the > > calculations that are going to be much faster than doing them the way > > Postgres does. So it is unlikely that anyone would be using Postgres' > > numeric > > type to do this in any case. > > Nonetheless, the fact that people can think of practical applications > for numbers whose length is easily within a factor of two of the > proposed limitation makes me squeamish about it being shrunk. Also, I > would say the same arguments about doing math with NUMERICs suggest > that saving a few byes in representation is not a big deal. On the few > occasions where I have used NUMERICs, I didn't care about stuff like > that. > > For what it's worth. Good point, but I am not 100% sure on the limitation. Look at this: test=> CREATE TABLE test(x NUMERIC);CREATE TABLEtest=> INSERT INTO test SELECT pow(10::numeric, 10000) + 1;INSERT 0 1test=>SELECT log(x) FROM test; log------------------------ 10000.0000000000000000(1 row)test=> SELECT x % 10 FROMtest; ?column?-------------------- 1.0000000000000000(1 row) And this seems to work too: test=> INSERT INTO test SELECT pow(10::numeric, 120000) + 1;INSERT 0 1 The limit seems to be around 150k digits: test=> INSERT INTO test SELECT pow(10::numeric, 150000) + 1;ERROR: value overflows numeric format With current code, you can not define a NUMERIC column with greater than 1000 digits because we just placed an arbitrary limit on the length, but the computational length was obviously much larger than the storage limit. And I suppose you could exceed 1000 if you stored the result as text and converted it to NUMERIC just for computations. In fact we have this TODO, but I wonder if it is still an open issue: * Change NUMERIC to enforce the maximum precision We seem to enforce things just fine. Now, with the new patch, I see a _much_ lower limit: test=> SELECT pow(10::NUMERIC, 511) + 1;...(1 row)test=> SELECT pow(10::NUMERIC, 512) + 1;ERROR: value overflows numericformattest=> SELECT pow(10::NUMERIC, 512);ERROR: value overflows numeric format I thought maybe I could do the computations at least and then convert into text, but seeing the above it seems higher precision computation is just not possible --- it is more than just storage in a table that is changed. So, with the patch, the storage length is going from 1000 digits to 508, but the computational length is reduced from around 150k digits to 508. Now, because no one has complained about the 1000-digit limit, it is unlikely that anyone is doing calculations over 1000 or the would have had problems with storing the value, but I felt I should point out that we are dramatically changing the computational length. In fact, for the tests we have been running to debug the *printf problem, none of those queries will work with the patch: stest=> SELECT factorial(4000);ERROR: value overflows numeric formattest=> SELECT factorial(400);ERROR: value overflowsnumeric format Not only does 4000! not work, but 400! doesn't even work. I just lost demo "wow" factor points! -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
Bruce Momjian <pgman@candle.pha.pa.us> writes: > The limit seems to be around 150k digits: It's exactly 10^(128K), as I've mentioned more than once. > So, with the patch, the storage length is going from 1000 digits to 508, > but the computational length is reduced from around 150k digits to 508. > Now, because no one has complained about the 1000-digit limit, it is > unlikely that anyone is doing calculations over 1000 or the would have > had problems with storing the value, Only if they declared their columns as numeric(N) and not just plain unconstrained numeric. Not to mention the possibility that they're doing the same thing you just did, ie computing values and returning them to the client without ever storing them in a table. So I don't think the above reasoning is defensible. > Not only does 4000! not work, but 400! doesn't even work. I just lost > demo "wow" factor points! It looks like the limit would be about factorial(256). The question remains, though, is this computational range good for anything except demos? regards, tom lane
On 12/5/05, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Not only does 4000! not work, but 400! doesn't even work. I just lost > > demo "wow" factor points! > > It looks like the limit would be about factorial(256). > > The question remains, though, is this computational range good for > anything except demos? I've hesitated commenting, because I think it might be a silly reason, but perhaps it's one other people share. ... I use PG as a calculator for big numbers because it's the only user friendly thing on my system that can do factorial(300) - factorial(280). I'd rather use something like octave, but I've found its pretty easy to escape its range. If the range for computation is changed, then I'll probably keep an old copy around just for this, though I'm not quite sure how much I'd be affected..
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > The limit seems to be around 150k digits: > > It's exactly 10^(128K), as I've mentioned more than once. > > > So, with the patch, the storage length is going from 1000 digits to 508, > > but the computational length is reduced from around 150k digits to 508. > > Now, because no one has complained about the 1000-digit limit, it is > > unlikely that anyone is doing calculations over 1000 or the would have > > had problems with storing the value, > > Only if they declared their columns as numeric(N) and not just plain > unconstrained numeric. Not to mention the possibility that they're > doing the same thing you just did, ie computing values and returning > them to the client without ever storing them in a table. So I don't > think the above reasoning is defensible. > > > Not only does 4000! not work, but 400! doesn't even work. I just lost > > demo "wow" factor points! > > It looks like the limit would be about factorial(256). > > The question remains, though, is this computational range good for > anything except demos? I can say that the extended range is good for finding *printf problems. ;-) Let me also add that as far as saving disk space, this is the _big_ improvement on the TODO list: * Merge xmin/xmax/cmin/cmax back into three header fields Before subtransactions, there used to be only three fields neededto store these four values. This was possible because only the current transaction looks at the cmin/cmax values.If the current transaction created and expired the row the fields stored where xmin (same as xmax), cmin, cmax,and if the transaction was expiring a row from a another transaction, the fields stored were xmin (cmin was not needed),xmax, and cmax. Such a system worked because a transaction could only see rows from another completed transaction.However, subtransactions can see rows from outer transactions, and once the subtransaction completes, the outertransaction continues, requiring the storage of all four fields. With subtransactions, an outer transaction can createa row, a subtransaction expire it, and when the subtransaction completes, the outer transaction still has to have proper visibility of the row's cmin, for example, for cursors. One possible solution is to create a phantom cid whichrepresents a cmin/cmax pair and is stored in local memory. Another idea is to store both cmin and cmax only in localmemory. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Mon, Dec 05, 2005 at 11:59:10PM -0500, Bruce Momjian wrote: > Tom Lane wrote: > > It looks like the limit would be about factorial(256). > > > > The question remains, though, is this computational range good for > > anything except demos? > > I can say that the extended range is good for finding *printf problems. ;-) Might anybody be calculating permutations or combinations with the textbook functions that use factorials? Not a show-stopper since those calculations can be optimized (at least the basic formulas I know), but somebody might get bit by the change. Maybe the release notes could mention the new upper limit of factorial(). -- Michael Fuhr
Michael Fuhr <mike@fuhr.org> writes: > On Mon, Dec 05, 2005 at 11:59:10PM -0500, Bruce Momjian wrote: >> Tom Lane wrote: >>> The question remains, though, is this computational range good for >>> anything except demos? >> >> I can say that the extended range is good for finding *printf problems. ;-) > Might anybody be calculating permutations or combinations with the > textbook functions that use factorials? Hm ... between that, the possible crypto connection, and John's personal testimony that he actually uses PG for calculations in this range, I'm starting to lean to the idea that we shouldn't cut the range. We could get the same 2-byte savings (in fact 3 bytes on average, considering alignment issues) by implementing a 2-byte length word format for numeric. I had originally hoped to do both things to save an average 5 bytes per numeric, which is starting to get to the point of actually being interesting ;-). But maybe we should just do the part that we can do without removing any user-visible functionality. regards, tom lane
Tom Lane wrote: > Hm ... between that, the possible crypto connection, and John's > personal > testimony that he actually uses PG for calculations in this range, I'm > starting to lean to the idea that we shouldn't cut the range. Just to be clear, this John has yet to use NUMERIC for any calculations, let alone in that range. (I've only used NUMERIC for importing real-valued data where I didn't want to lose precision with a floating point representation, for instance, decimal latitude-longitude values.) There was this post, though: Gregory Maxwell wrote: > I've hesitated commenting, because I think it might be a silly reason, > but perhaps it's one other people share. ... I use PG as a > calculator for big numbers because it's the only user friendly thing > on my system that can do factorial(300) - factorial(280). I'd rather > use something like octave, but I've found its pretty easy to escape > its range. If the range for computation is changed, then I'll > probably keep an old copy around just for this, though I'm not quite > sure how much I'd be affected.. - John D. Burger MITRE
"John D. Burger" <john@mitre.org> writes: > Tom Lane wrote: >> Hm ... between that, the possible crypto connection, and John's >> personal testimony > Just to be clear, this John has yet to use NUMERIC for any > calculations, let alone in that range. My mistake, got confused as to who had said what. The point remains though: in discussing this proposed patch, we were assuming that 10^508 would still be far beyond what people actually needed. Even one or two reports from the list membership of actual use of larger values casts a pretty big shadow on that assumption. regards, tom lane
Tom Lane wrote: > "John D. Burger" <john@mitre.org> writes: > > Tom Lane wrote: > >> Hm ... between that, the possible crypto connection, and John's > >> personal testimony > > > Just to be clear, this John has yet to use NUMERIC for any > > calculations, let alone in that range. > > My mistake, got confused as to who had said what. > > The point remains though: in discussing this proposed patch, we were > assuming that 10^508 would still be far beyond what people actually > needed. Even one or two reports from the list membership of actual > use of larger values casts a pretty big shadow on that assumption. Agreed. I would like to see us hit the big savings first, like merging cmin/cmax (4 bytes per row) and reducing the varlena header size (2-3 bytes for short values), before we start going after disk savings that actually limit our capabilites. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073