Обсуждение: wrong behavior using to_char() again

Поиск
Список
Период
Сортировка

wrong behavior using to_char() again

От
Euler Taveira de Oliveira
Дата:
Hi,

Looking again at bug report [1], I agree that's a glibc bug. Numbers in
pt_BR has its format 1.234.567,89; sometimes the format 1234567,89 is
acceptable too, ie, the thousand separator is optional. I guess that
some locales use the 'optional' thousand separator too (yep, they are
all broken too).

euler@harman:/a/pgsql$ ./a.out pt_BR
decimal_point: ,
thousands_sep:
euler@harman:/a/pgsql$ ./a.out fr_FR
decimal_point: ,
thousands_sep:
euler@harman:/a/pgsql$ ./a.out es_ES
decimal_point: ,
thousands_sep:
euler@harman:/a/pgsql$ ./a.out de_DE
decimal_point: ,
thousands_sep: .
euler@harman:/a/pgsql$ ./a.out C
decimal_point: .
thousands_sep:

The actual behavior is set: (i) "," if the thousand separator is "" (ii)
"." if the decimal point is "". It is not what glibc says (even in the C
locale). I expect that PostgreSQL agrees with glibc (even it's the wrong
behavior). Given this assumption, i propose the attached patch (it needs
to adjust the regression tests).

Comments?


[1] http://archives.postgresql.org/pgsql-bugs/2006-09/msg00074.php


--
  Euler Taveira de Oliveira
  http://www.timbira.com/
#include <stdio.h>
#include <locale.h>

int main(int argc, char *argv[])
{
    struct lconv *x;
    char *lang;

    if (argc == 2)
        lang = argv[1];
    else
        lang = "C";

    setlocale(LC_NUMERIC, (const char *) lang);
    x = localeconv();

    printf("decimal_point: %s\n", x->decimal_point);
    printf("thousands_sep: %s\n", x->thousands_sep);

    return 0;
}
*** src/backend/utils/adt/formatting.c.orig    2007-11-16 01:34:41.000000000 -0200
--- src/backend/utils/adt/formatting.c    2007-11-17 17:30:00.000000000 -0200
***************
*** 3917,3931 ****

          /*
           * Number thousands separator
-          *
-          * Some locales (e.g. broken glibc pt_BR), have a comma for
-          * decimal, but "" for thousands_sep, so we might make the
-          * thousands_sep comma too.  2007-02-12
           */
          if (lconv->thousands_sep && *lconv->thousands_sep)
              Np->L_thousands_sep = lconv->thousands_sep;
          else
!             Np->L_thousands_sep = ",";

          /*
           * Currency symbol
--- 3917,3927 ----

          /*
           * Number thousands separator
           */
          if (lconv->thousands_sep && *lconv->thousands_sep)
              Np->L_thousands_sep = lconv->thousands_sep;
          else
!             Np->L_thousands_sep = "";

          /*
           * Currency symbol
***************
*** 3943,3949 ****
          Np->L_negative_sign = "-";
          Np->L_positive_sign = "+";
          Np->decimal = ".";
!         Np->L_thousands_sep = ",";
          Np->L_currency_symbol = " ";
      }
  }
--- 3939,3945 ----
          Np->L_negative_sign = "-";
          Np->L_positive_sign = "+";
          Np->decimal = ".";
!         Np->L_thousands_sep = "";
          Np->L_currency_symbol = " ";
      }
  }

Re: wrong behavior using to_char() again

От
Alvaro Herrera
Дата:
Euler Taveira de Oliveira wrote:
> Hi,
> 
> Looking again at bug report [1], I agree that's a glibc bug. Numbers in
> pt_BR has its format 1.234.567,89; sometimes the format 1234567,89 is
> acceptable too, ie, the thousand separator is optional. I guess that
> some locales use the 'optional' thousand separator too (yep, they are
> all broken too).

Yeah, formatting.c revs 1.106 and 1.105 contains this (it was already
pointed out in the previous thread):


revision 1.106
date: 2006-02-12 20:48:23 -0300;  author: momjian;  state: Exp;  lines: +3 -4;
Revert because C locale uses "" for thousands_sep, meaning "n/a", while
French uses "" for "don't want".  Seems we have to keep the existing
behavior.
----------------------------
revision 1.105
date: 2006-02-12 16:52:06 -0300;  author: momjian;  state: Exp;  lines: +5 -4;
Support "" for thousands separator and plus sign in to_char(), per
report from French Debian user.  psql already handles "" fine.


I'm not sure that your proposed patch is OK for the C locale.  It was
proposed that the C locale should be handled as an exception, but it
seems nothing got done in that direction.

Are we going to do something for 8.3?

-- 
Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
"La experiencia nos dice que el hombre peló millones de veces las patatas,
pero era forzoso admitir la posibilidad de que en un caso entre millones,
las patatas pelarían al hombre" (Ijon Tichy)


Re: wrong behavior using to_char() again

От
Euler Taveira de Oliveira
Дата:
Bruce Momjian wrote:

> OK, I researched this and realized it should have been obvious to me
> when I added this code in 2006 that making the thousands separator
> always "," for a locale of "" was going to cause a problem.
>
I tested your patch and IMHO it breaks the glibc behavior. I'm providing
a SQL script [1] and a diff [2] showing the differences between before
and after applying it. In [2], I see a lot of common used (pt_*, es_*,
and fr_*) locales that we'll be changed. Is it the behavior we want to
support? I think we shouldn't try to fix glibc bug inside PostgreSQL (in
this case, use should accept "" as a possible value for thousands_sep).


> I don't think there is any change needed for the C locale.  That part
> seems fine, as Alvaro already pointed out.
>
I don't know about C locale, but it's broken too. In PostgreSQL, it's
following the en_US behavior. Comments?

euler@harman:/a/pgsql$ ./a.out C
decimal_point: "."
thousands_sep: ""
euler@harman:/a/pgsql$ ./a.out en_US
decimal_point: "."
thousands_sep: ","

[1] http://timbira.com/tmp/lcn3.sql
[2] http://timbira.com/tmp/lcnumeric.diff


--
  Euler Taveira de Oliveira
  http://www.timbira.com/

Re: wrong behavior using to_char() again

От
Bruce Momjian
Дата:
Euler Taveira de Oliveira wrote:
> Bruce Momjian wrote:
> 
> > OK, I researched this and realized it should have been obvious to me
> > when I added this code in 2006 that making the thousands separator
> > always "," for a locale of "" was going to cause a problem.
> > 
> I tested your patch and IMHO it breaks the glibc behavior. I'm providing
> a SQL script [1] and a diff [2] showing the differences between before
> and after applying it. In [2], I see a lot of common used (pt_*, es_*,
> and fr_*) locales that we'll be changed. Is it the behavior we want to
> support? I think we shouldn't try to fix glibc bug inside PostgreSQL (in
> this case, use should accept "" as a possible value for thousands_sep).

I am confused.  You stated in your earlier email:

> Looking again at bug report [1], I agree that's a glibc bug.  Numbers
> in pt_BR has its format 1.234.567,89; sometimes the format 1234567,89
> is acceptable too, ie, the thousand separator is optional. I guess

so I assumed that you were OK with having "." be the thousands
separator.  I think we have to try to get a proper fix even if glibc is
incorrect. The problem we had with psql print.c is that when we didn't
provide a "." default we had people complaining about that.  The idea I
think is that if people are asking for a thousands separator in the
to_char() format they certainly want to see a thousands separator.

The backend behavior now matches the psql numericlocale behavior which
was accepted a while back.

> > I don't think there is any change needed for the C locale.  That part
> > seems fine, as Alvaro already pointed out.
> > 
> I don't know about C locale, but it's broken too. In PostgreSQL, it's
> following the en_US behavior. Comments?
> 
> euler@harman:/a/pgsql$ ./a.out C
> decimal_point: "."
> thousands_sep: ""
> euler@harman:/a/pgsql$ ./a.out en_US
> decimal_point: "."
> thousands_sep: ","

Yes, I think that is correct.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +


Re: wrong behavior using to_char() again

От
Euler Taveira de Oliveira
Дата:
Bruce Momjian wrote:

> I am confused.  You stated in your earlier email:
> 
>> Looking again at bug report [1], I agree that's a glibc bug.  Numbers
>> in pt_BR has its format 1.234.567,89; sometimes the format 1234567,89
>> is acceptable too, ie, the thousand separator is optional. I guess
> 
> so I assumed that you were OK with having "." be the thousands
> separator.  I think we have to try to get a proper fix even if glibc is
> incorrect. The problem we had with psql print.c is that when we didn't
> provide a "." default we had people complaining about that.  The idea I
> think is that if people are asking for a thousands separator in the
> to_char() format they certainly want to see a thousands separator.
> 
Maybe I'm not so clear (too few caffeine) but what I tried to say
(suggest) is that we could accept the thousands_sep from glibc instead
of guessing it ("."). I'm fine with the current behavior (at least in
pt_BR) but I'm afraid we have broken some locales (those that a
presented in the lcnumeric.diff).


--  Euler Taveira de Oliveira http://www.timbira.com/


Re: [PATCHES] wrong behavior using to_char() again

От
Alvaro Herrera
Дата:
Euler Taveira de Oliveira wrote:
> Bruce Momjian wrote:
>
> > OK, I researched this and realized it should have been obvious to me
> > when I added this code in 2006 that making the thousands separator
> > always "," for a locale of "" was going to cause a problem.
> >
> I tested your patch and IMHO it breaks the glibc behavior. I'm providing
> a SQL script [1] and a diff [2] showing the differences between before
> and after applying it. In [2], I see a lot of common used (pt_*, es_*,
> and fr_*) locales that we'll be changed. Is it the behavior we want to
> support?

Well, what I can say is that the behavior you show for es_* that we were
historically doing is quite wrong, and the corrected output looks
better.

   lc_numeric |        to_char
  ------------+------------------------
!  es_CL      |      123,456,789,01230
  (1 registro)

--- 379,397 ----

  SET
   lc_numeric |        to_char
  ------------+------------------------
!  es_CL      |      123.456.789,01230
  (1 registro)


The first output makes no sense whereas the second is correct (ISTM
we've been doing it wrong for a lot of locales and it has just been
fixed).

--
Alvaro Herrera                 http://www.amazon.com/gp/registry/CTMLCN8V17R4
"No deja de ser humillante para una persona de ingenio saber
que no hay tonto que no le pueda enseñar algo." (Jean B. Say)

Re: wrong behavior using to_char() again

От
Bruce Momjian
Дата:
Euler Taveira de Oliveira wrote:
> Bruce Momjian wrote:
> 
> > I am confused.  You stated in your earlier email:
> > 
> >> Looking again at bug report [1], I agree that's a glibc bug.  Numbers
> >> in pt_BR has its format 1.234.567,89; sometimes the format 1234567,89
> >> is acceptable too, ie, the thousand separator is optional. I guess
> > 
> > so I assumed that you were OK with having "." be the thousands
> > separator.  I think we have to try to get a proper fix even if glibc is
> > incorrect. The problem we had with psql print.c is that when we didn't
> > provide a "." default we had people complaining about that.  The idea I
> > think is that if people are asking for a thousands separator in the
> > to_char() format they certainly want to see a thousands separator.
> > 
> Maybe I'm not so clear (too few caffeine) but what I tried to say
> (suggest) is that we could accept the thousands_sep from glibc instead
> of guessing it ("."). I'm fine with the current behavior (at least in
> pt_BR) but I'm afraid we have broken some locales (those that a
> presented in the lcnumeric.diff).

Yea, I am afraid we will have to wait for feedback during 8.3 to see. 
We did hammer out the psql behavior with quite a bit of discussion so I
am hopeful doing the same in the backend will help.  The new code is
certainly better than what was there before because no one wants the
thousands separator to be the same as the decimal point, so at least
that is a fix, and it seems better for your language.  Basically we have
never treated "" as no thousands separator and I don't remember anyone
asking for that behavior.

If we want to start honoring "" as really no thousands separator we are
going to have to have additional discussion and go back and read from
the many people who complained when we had that behavior.  I know most
people didn't like the C locale having "" for thousands separator so we
had to hard-code that.

--  Bruce Momjian  <bruce@momjian.us>        http://momjian.us EnterpriseDB
http://postgres.enterprisedb.com
 + If your life is a hard drive, Christ can be your backup. +