Re: a strange order by behavior

Поиск
Список
Период
Сортировка
От Samuel Gendler
Тема Re: a strange order by behavior
Дата
Msg-id BANLkTi=7Lqyj4LbRPHp1mLo_0-mFN5XPmA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: a strange order by behavior  (Pavel Stehule <pavel.stehule@gmail.com>)
Ответы Re: a strange order by behavior  (Pavel Stehule <pavel.stehule@gmail.com>)
Re: a strange order by behavior  (Samuel Gendler <sgendler@ideasculptor.com>)
Re: a strange order by behavior  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: a strange order by behavior  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-sql
Interesting.  The original thread to which I was referring has a subject of "Sorting Issue" and the original request showed a list of vehicle model names which were sorting as though there were no spaces. The user had collation set to en_US.UTF-8.  However, my database (on OS X) sorts both his example and the example that started this thread correctly, despite my lc_collate being set to the same value.

Then I just ran the exact same test on a Centos 5 linux host (postgresql 8.4 as well) and I am seeing the 'erroneous' sort order that prompted both this thread and the other.  So you can't even assume the same behaviour for the same collation on different platforms.

On OS X:

# \l
       Name        |   Owner    | Encoding |  Collation  |    Ctype    |   Access privileges   
-------------------+------------+----------+-------------+-------------+-----------------------
 col_test          | u1         | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 
# show lc_collate;
 lc_collate  
-------------
 en_US.UTF-8

# select * from t1 order by f1;
 id |        f1         
----+-------------------
  1 | CX Hatchback
  2 | CX Minivan
  2 | CX Plus Minivan
  2 | CX Sedan
  2 | CX Sport Utility
  2 | CXL Minivan
  2 | CXL Premium Sedan
  2 | CXL Sedan
  2 | CXL Sport Utility
  2 | CXL Turbo Sedan
  2 | CXS Sedan

On CentOS 5:

# \l
       Name        |   Owner    | Encoding |  Collation  |    Ctype    |   Access privileges   
-------------------+------------+----------+-------------+-------------+-----------------------
 col_test          | u1         | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 

# show lc_collate
ecorithm-# ;
 lc_collate  
-------------
 en_US.UTF-8

# select * from t1 order by f1;
 id |        f1         
----+-------------------
  1 | CX Hatchback
  2 | CXL Minivan
  2 | CXL Premium Sedan
  2 | CXL Sedan
  2 | CXL Sport Utility
  2 | CXL Turbo Sedan
  2 | CX Minivan
  2 | CX Plus Minivan
  2 | CX Sedan
  2 | CX Sport Utility
  2 | CXS Sedan

Further testing would seem to reveal that OS X is using something resembling C collation order, despite the fact that it says en_US.UTF-8. I say this because it is also case sensitive.  CentOS, on the other hand, is ignoring spaces, but is also case-insensitive.

# select * from t1 order by f1 asc;
 id |        f1         
----+-------------------
  1 | CX Hatchback
  2 | CXL Minivan
  2 | CXL Premium Sedan
  2 | CXL Sedan
  2 | CXL Sport Utility
  2 | CXL Turbo Sedan
  2 | CX Minivan
  2 | CX Plus Minivan
  2 | cx Sedan
  2 | CX Sedan
  2 | CX Sport Utility
  2 | CXS Sedan

Pavel suggested using a collation of ucs_basic, but I get an error when I try that on linux:

$ createdb -U u1 --lc-collate=ucs_basic -E UTF-8 test 
createdb: database creation failed: ERROR:  invalid locale name ucs_basic

I was able to create the db with --lc_collate=C and get case-sensitive sorting that treats spaces 'correctly,' but I have no idea how reliable that is with multibyte characters and it almost certainly doesn't handle accented characters correctly in languages that have a mix of ascii and non-ascii characters, since the non-ascii chars will all sort as greater than the ascii chars, which is probably not how things are alphabetized in those languages.

I happen to have my computer set so I can type cyrillic and it appears to work correctly with non-ascii chars when the entire charset is non-ascii, but it'd be a lot of effort to construct a test that utilizes a mix of 1,2,3, and 4 byte characters, since I'm not familiar with enough alphabets.

test=# select * from t1 order by f1;
 id |        f1         
----+-------------------
  1 | CX Hatchback
  2 | CX Minivan
  2 | CX Plus Minivan
  2 | CX Sedan
  2 | CX Sport Utility
  2 | CXL Minivan
  2 | CXL Premium Sedan
  2 | CXL Sedan
  2 | CXL Sport Utility
  2 | CXL Turbo Sedan
  2 | CXS Sedan
  2 | cx Sedan
  2 | АИВЕТ
  2 | П АИВЕТ
  2 | П РИВЕТ
  2 | П СИВЕТ
  2 | ПРИВЕТ
  2 | РИВЕТ
  2 | СИВЕТ
  2 | привет


В списке pgsql-sql по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: Re: a strange order by behavior
Следующее
От: Pavel Stehule
Дата:
Сообщение: Re: a strange order by behavior