PostgreSQL 7.1 and bugs with locale support

Поиск
Список
Период
Сортировка
От pgsql-bugs@postgresql.org
Тема PostgreSQL 7.1 and bugs with locale support
Дата
Msg-id 200104042323.f34NN7i14830@hub.org
обсуждение исходный текст
Список pgsql-bugs
Rob Gaszewski (graszew@poland.com) reports a bug with a severity of 2
The lower the number the more severe it is.

Short Description
PostgreSQL 7.1 and bugs with locale support

Long Description
I've discovered bugs in locale support in PostgreSQL (encoding set to UNICODE, locale set to pl_PL).

I've compiled PostgreSQL 7.RC2 with --enable-multibyte=UNICODE
--enable-unicode-conversion --enable-locale

locale settings:
LANG=pl_PL  LC_ALL=pl_PL  LC_CTYPE=pl_PL  LC_COLLATE=pl_PL  LC_MONETARY=pl_PL

I have Debian GNU/Linux 2.2 "Potato" - Intel Celeron - kernel 2.2.19
PostgreSQL compiled with gcc 2.95.2  - glibc 2.1


When I try SELECT UPPER('some_text_with_polish_national_chars'); or
SELECT LOWER('some_text_with_polish_national_chars'); I get wrong results.
But when I try upper() and lower() functions with other chars (a...z A...Z)
everything works OK.
Detailed results below.



Tests doing with polish national chars
    |----------------------------------------------------
    | char |  Hex   ||           UPPER(char)             |
    |      |        ||-----------------------------------|
 No |      |        ||  result  | should be | conclusion |
----|------|--------||----------------------|------------|
   1|   ±  | 0xc485 ||   0xc485 |   0xc484  |    WRONG   |
   2|   æ  | 0xc487 ||   0xc487 |   0xc486  |    WRONG   |
   3|   ê  | 0xc499 ||   0xc499 |   0xc498  |    WRONG   |
   4|   ³  | 0xc582 ||   0xc582 |   0xc581  |    WRONG   |
   5|   ñ  | 0xc584 ||   0xc584 |   0xc583  |    WRONG   |
   6|   ó  | 0xc3b3 ||   0xc3a3 |   0xc393  |    WRONG   |
   7|   ¶  | 0xc59b ||   0xc59b |   0xc59a  |    WRONG   |
   8|   ¼  | 0xc5ba ||   0xc5aa |   0xc5b9  |    WRONG   |
   9|   ¿  | 0xc5bc ||   0xc5ac |   0xc5bb  |    WRONG   |
    |      |        ||          |           |            |
  10|   ¡  | 0xc484 ||   0xc484 |   0xc484  |     OK     |
  11|   Æ  | 0xc486 ||   0xc486 |   0xc486  |     OK     |
  12|   Ê  | 0xc498 ||   0xc498 |   0xc498  |     OK     |
  13|   £  | 0xc581 ||   0xc581 |   0xc581  |     OK     |
  14|   Ñ  | 0xc583 ||   0xc583 |   0xc583  |     OK     |
  15|   Ó  | 0xc393 ||   0xc393 |   0xc393  |     OK     |
  16|   ¦  | 0xc59a ||   0xc59a |   0xc59a  |     OK     |
  17|   ¬  | 0xc5b9 ||   0xc5b9 |   0xc5b9  |     OK     |
  18|   ¯  | 0xc5bb ||   0xc5bb |   0xc5bb  |     OK     |
---------------------------------------------------------


    |----------------------------------------------------
    | char |  Hex   ||           LOWER(char)             |
    |      |        ||-----------------------------------|
 No |      |        ||  result  | should be | conclusion |
----|------|--------||----------------------|------------|
   1|   ±  | 0xc485 ||   0xe485 |   0xc485  |    WRONG   |
   2|   æ  | 0xc487 ||   0xe487 |   0xc487  |    WRONG   |
   3|   ê  | 0xc499 ||   0xe499 |   0xc499  |    WRONG   |
   4|   ³  | 0xc582 ||   0xe582 |   0xc582  |    WRONG   |
   5|   ñ  | 0xc584 ||   0xe584 |   0xc584  |    WRONG   |
   6|   ó  | 0xc3b3 ||   0xe3b3 |   0xc3b3  |    WRONG   |
   7|   ¶  | 0xc59b ||   0xe59b |   0xc59b  |    WRONG   |
   8|   ¼  | 0xc5ba ||   0xe5ba |   0xc5ba  |    WRONG   |
   9|   ¿  | 0xc5bc ||   0xe5bc |   0xc5bc  |    WRONG   |
    |      |        ||          |           |            |
  10|   ¡  | 0xc484 ||   0xe484 |   0xc485  |    WRONG   |
  11|   Æ  | 0xc486 ||   0xe486 |   0xc487  |    WRONG   |
  12|   Ê  | 0xc498 ||   0xe498 |   0xc499  |    WRONG   |
  13|   £  | 0xc581 ||   0xe581 |   0xc582  |    WRONG   |
  14|   Ñ  | 0xc583 ||   0xe583 |   0xc584  |    WRONG   |
  15|   Ó  | 0xc393 ||   0xe393 |   0xc3b3  |    WRONG   |
  16|   ¦  | 0xc59a ||   0xe59a |   0xc59b  |    WRONG   |
  17|   ¬  | 0xc5b9 ||   0xe5b9 |   0xc5ba  |    WRONG   |
  18|   ¯  | 0xc5bb ||   0xe5bb |   0xc5bc  |    WRONG   |
---------------------------------------------------------
Letters from 1 to 9 are small, from 10 to 18 are capital.
For example: letter 12 is capital version of letter 3



Also I've discovered that rows are sorted (ORDER BY) impropertly.

And "automatic encoding translation between backend and frontend" works
improperly. For example:
setting client encoding \encoding LATIN2 and doing a test :
SELECT upper('acelnoszx'); (these are Polish national chars, not the ASCII ones),
I keep getting the message:

utf_to_latin: could not convert UTF-8 (0xc3a3) ignored
(repeated 3x for different chars).

The letters are not converted to uppercase, either.



When I do all tests with PostgreSQL compiled only with --enable-locale, everything works good.

Unfortunately, unicode support is a must because of the i18n issues with Tcl 8.x.


Greetings,
Robert

------------------
Robert Gaszewski
graszew@poland.com

Sample Code


No file was uploaded with this report

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Karel Zak
Дата:
Сообщение: Re: to_char miscalculation on April Fool's Day, the start of daylight savings
Следующее
От: pgsql-bugs@postgresql.org
Дата:
Сообщение: compilation error