Обсуждение: Re: [BUGS] numerics lose scale and precision in views of unions

Поиск
Список
Период
Сортировка

Re: [BUGS] numerics lose scale and precision in views of unions

От
Tom Lane
Дата:
"Brian C. DeRocher" <brian.derocher@mitretek.org> writes:
> It appears that when you create a view of a union, numeric data types
> loose their scale and precision.

I think this is the same issue discussed here:
http://archives.postgresql.org/pgsql-hackers/2004-12/msg00408.php

That was just before 8.0 release, so the change got postponed and then
seems to have slipped through the cracks :-(.  Any objections to
fixing it for 8.2?
        regards, tom lane


Re: [BUGS] numerics lose scale and precision in views of unions

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> "Brian C. DeRocher" <brian.derocher@mitretek.org> writes:
> > It appears that when you create a view of a union, numeric data types
> > loose their scale and precision.
>
> I think this is the same issue discussed here:
> http://archives.postgresql.org/pgsql-hackers/2004-12/msg00408.php
>
> That was just before 8.0 release, so the change got postponed and then
> seems to have slipped through the cracks :-(.  Any objections to
> fixing it for 8.2?

Sounds good to me.  I'd like to talk a bit about the expected behavior
of a numeric hash function.  This is the current behavior:

abc=# select * from test1;  a1
--------  1.001.0000   1.0
(3 rows)

abc=# select * from test1 group by a1; a1
------1.00
(1 row)

abc=# select distinct a1 from test1; a1
------1.00
(1 row)

I just want to double-check that this is the correct/expected behavior
(hopefully from the SQL spec?) and that a hash function need not concern
itself with the scale?

I'm guessing it's too late for 8.2, or would such a relatively
isolated/simple change be acceptable during the beta period (adding hash
functions for numeric types)?  Don't swear I'll actually get to them but
I'd like to and so I'd like to be hash out (hah!) the correct semantics.
Thanks!
    Stephen

Re: [BUGS] numerics lose scale and precision in views of unions

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> Sounds good to me.  I'd like to talk a bit about the expected behavior
> of a numeric hash function.  This is the current behavior:

You're hijacking the thread, tsk tsk.

> abc=# select * from test1;
>    a1   
> --------
>    1.00
>  1.0000
>     1.0
> (3 rows)

> abc=# select * from test1 group by a1;
>   a1  
> ------
>  1.00
> (1 row)

> abc=# select distinct a1 from test1;
>   a1  
> ------
>  1.00
> (1 row)

Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
could it say?  "less" and "greater" are surely wrong).  So you need to
ensure that dscale is not included in the hash calculation.  The
comments in numeric.h claim that you should not need to worry about
leading or trailing zeroes, but it sounds like you do need to worry
about differing weights for zero.  Might be easiest and safest to use
only nonzero digit words in the hash calculation.
        regards, tom lane


Re: [BUGS] numerics lose scale and precision in views of unions

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > Sounds good to me.  I'd like to talk a bit about the expected behavior
> > of a numeric hash function.  This is the current behavior:
>
> You're hijacking the thread, tsk tsk.

Yeah, but Brian sits across the hall from me at work... :)

> Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
> could it say?  "less" and "greater" are surely wrong).  So you need to

It could say "not equal" pretty reasonably as the scale is
different.  With all due respect, it seems to me that we track the scale
for the very reason that it may be different for different numbers.
Unless the SQL spec says differently or we get complaints from people
I'm all for keeping the current semantics though.

> ensure that dscale is not included in the hash calculation.  The
> comments in numeric.h claim that you should not need to worry about
> leading or trailing zeroes, but it sounds like you do need to worry
> about differing weights for zero.  Might be easiest and safest to use
> only nonzero digit words in the hash calculation.

Makes sense to me.
Thanks!
    Stephen

Re: [BUGS] numerics lose scale and precision in views of unions

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> * Tom Lane (tgl@sss.pgh.pa.us) wrote:
>> Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
>> could it say?  "less" and "greater" are surely wrong).  So you need to

> It could say "not equal" pretty reasonably as the scale is
> different.

Nope, there are exactly three options: equal, less, greater.
btree doesn't understand anything else.

> Unless the SQL spec says differently or we get complaints from people
> I'm all for keeping the current semantics though.

The SQL spec?  Oh, that old thing ... I can't find anything very
specific about it in SQL99, but certainly there is nothing mandating
a different treatment than we are using.  The closest material I can
find is

5.3 <literal>        3) The numeric value of an <exact numeric literal> is determined           by the normal
mathematicalinterpretation of positional decimal           notation.
 

8.2 <comparison predicate>        2) Numbers are compared with respect to their algebraic value.

There's certainly not anything in 8.2 contemplating the idea that two
non-nulls could have any other comparison result than less, equal, or
greater.
        regards, tom lane


Re: [BUGS] numerics lose scale and precision in views of unions

От
Andrew - Supernews
Дата:
On 2006-08-10, Stephen Frost <sfrost@snowman.net> wrote:
>> Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
>> could it say?  "less" and "greater" are surely wrong).  So you need to
>
> It could say "not equal" pretty reasonably as the scale is
> different.

The comparison function must be trichotomous, that is the possible results
are "equal", "greater", and "less". There is no other "not equal" result.

-- 
Andrew, Supernews
http://www.supernews.com - individual and corporate NNTP services


Re: [BUGS] numerics lose scale and precision in views of unions

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > It could say "not equal" pretty reasonably as the scale is
> > different.
>
> Nope, there are exactly three options: equal, less, greater.
> btree doesn't understand anything else.

Ah, yeah, I can see how something else would cause some difficulties. :)

> There's certainly not anything in 8.2 contemplating the idea that two
> non-nulls could have any other comparison result than less, equal, or
> greater.

Makes me curious if it really makes sense to keep trailing zeros...
Having them there but not treating them any differently except for
display comes across as inconsistant to me..  Either 1.0 and 1.00 are
the same thing (and thus should be displayed the same way), or they
aren't (in which case they should be treated distinctly in, eg, a
'select distinct' clause).

Was there a particular motivation for the current way things are being
done?  I seriously doubt anything I'd propose for the hash functions
would have any impact on it either way but it seems like it might be a
change worth considering in some future release (probably post-8.3).
Thanks,
    Stephen

Re: [BUGS] numerics lose scale and precision in views of unions

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> Makes me curious if it really makes sense to keep trailing zeros...

AFAIR we consider them mainly as a display artifact.  An application
that's declared a column as numeric(7,2) is likely to expect to see
exactly two digits after the decimal point.

> Either 1.0 and 1.00 are
> the same thing (and thus should be displayed the same way), or they
> aren't (in which case they should be treated distinctly in, eg, a
> 'select distinct' clause).

Consistency has never been SQL's strong point ;-)
        regards, tom lane


Re: [BUGS] numerics lose scale and precision in views of unions

От
Stephen Frost
Дата:
* Tom Lane (tgl@sss.pgh.pa.us) wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > Makes me curious if it really makes sense to keep trailing zeros...
>
> AFAIR we consider them mainly as a display artifact.  An application
> that's declared a column as numeric(7,2) is likely to expect to see
> exactly two digits after the decimal point.

Hmm.  I should have mentioned this previously (since I was thinking
about it at the time...) but this display artifact is unfortunately not
without consequences.  I'm about 80% sure that having the scale too
large (as in, larger than about 6 or 7 decimal places) breaks MS Access
using ODBC.  It complains about not being able to represent the number
(even though it's just trailing zeros).  It might be possible to handle
that in the ODBC driver but I don't think it'd be very clean
(considering you would want to fail cases where it's not just trailing
zeros).

This was using just a straight-up 'numeric' data type though.  Perhaps
for that case we could drop the unnecessary zeros?  I can understand
having them there when a specific scale is specified (I suppose...) but
when there isn't a specific scale given any application would have to
deal with the variability in the number of digits after the decimal
point anyway.

> > Either 1.0 and 1.00 are
> > the same thing (and thus should be displayed the same way), or they
> > aren't (in which case they should be treated distinctly in, eg, a
> > 'select distinct' clause).
>
> Consistency has never been SQL's strong point ;-)

Indeed.  I think my suggestion above would be at least a half-step
towards consistancy without breaking things.  I *think* this would also
mean that we'd always have either a fixed number of decimal places
(scale specified), or no trailing zeros.

This would, in fact, be awfully nice for me since I wouldn't have to
deal with things like:
                  78.4                2.3625    4.1666675000000000   16.66666750000000000.83333250000000000000

where I then get to do some *very* ugly magic to find the records with
the extra zeros off on the end and truncate them (think cast to text and
then use a regexp, not fun at all...).  Unfortunately round(a,b) <> a
doesn't work so hot in these cases where you do want the precision just
not the extra zeros off on the end.
Thanks,
    Stephen

Re: [BUGS] numerics lose scale and precision in views of unions

От
mark@mark.mielke.cc
Дата:
On Wed, Aug 09, 2006 at 11:35:48PM -0400, Tom Lane wrote:
> Stephen Frost <sfrost@snowman.net> writes:
> > * Tom Lane (tgl@sss.pgh.pa.us) wrote:
> >> Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
> >> could it say?  "less" and "greater" are surely wrong).  So you need to
> > It could say "not equal" pretty reasonably as the scale is
> > different.
> Nope, there are exactly three options: equal, less, greater.
> btree doesn't understand anything else.

Mathematically, 1.0 = 0.9500... -> 1.05000...

In theory, B-Tree could be fine with this. As long as the operators
for =, <>, <, and > are made to consistently understand this principle.
For example:
   1.0 = 0.95   1.0 = 1.05   1.0 <> 0.94   1.0 <> 1.06   0.94 < 1.0   1.06 > 1.0

I'm not sure that it warrants being changed, though. Storing mixed
precision values in a column and expecting the database to understand
rounding rules does not appeal to me. Too magical.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: [BUGS] numerics lose scale and precision in views of unions

От
mark@mark.mielke.cc
Дата:
On Thu, Aug 10, 2006 at 03:40:11AM -0000, Andrew - Supernews wrote:
> On 2006-08-10, Stephen Frost <sfrost@snowman.net> wrote:
> >> Yeah, because numeric_cmp says that 1.0 and 1.00 are equal (what else
> >> could it say?  "less" and "greater" are surely wrong).  So you need to
> > It could say "not equal" pretty reasonably as the scale is
> > different.
> The comparison function must be trichotomous, that is the possible results
> are "equal", "greater", and "less". There is no other "not equal" result.

And it is wrong - because they are equal. But 1.0 is also equal to 1.01.
We shouldn't go there. :-)

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/



Re: [BUGS] numerics lose scale and precision in views of unions

От
Tom Lane
Дата:
Stephen Frost <sfrost@snowman.net> writes:
> This was using just a straight-up 'numeric' data type though.  Perhaps
> for that case we could drop the unnecessary zeros?

That would make numeric useless for the common scientific/engineering
usage where you write the number of decimal places you think are
significant in your measurement.  In that usage, "1.0" and "1.000"
do have different meanings.
        regards, tom lane


Re: [BUGS] numerics lose scale and precision in views of unions

От
mark@mark.mielke.cc
Дата:
On Thu, Aug 10, 2006 at 09:20:09AM -0400, mark@mark.mielke.cc wrote:
> Mathematically, 1.0 = 0.9500... -> 1.05000...
> 
> In theory, B-Tree could be fine with this. As long as the operators
> for =, <>, <, and > are made to consistently understand this principle.
> For example:
> 
>     1.0 = 0.95
>     1.0 = 1.05
>     1.0 <> 0.94
>     1.0 <> 1.06
>     0.94 < 1.0
>     1.06 > 1.0
> 
> I'm not sure that it warrants being changed, though. Storing mixed
> precision values in a column and expecting the database to understand
> rounding rules does not appeal to me. Too magical.

I take this back. It's magical enough to be impossible.

If 1.0 is inserted, and then 1.05, it could consider them 'equal' and
put them together. If 1.1 and 1.14 were inserted, they could be 'equal'
and put together. Then an insertion of '1' would have to re-group them
all together, making the B-Tree ineffective.

So - taking back what I say above.

Cheers,
mark

-- 
mark@mielke.cc / markm@ncf.ca / markm@nortel.com     __________________________
.  .  _  ._  . .   .__    .  . ._. .__ .   . . .__  | Neighbourhood Coder
|\/| |_| |_| |/    |_     |\/|  |  |_  |   |/  |_   | 
|  | | | | \ | \   |__ .  |  | .|. |__ |__ | \ |__  | Ottawa, Ontario, Canada
 One ring to rule them all, one ring to find them, one ring to bring them all                      and in the darkness
bindthem...
 
                          http://mark.mielke.cc/