Обсуждение: order by different on mac vs linux

Поиск
Список
Период
Сортировка

order by different on mac vs linux

От
Wes James
Дата:
I have postgresql 9.1.3 on a mac and on linux.

On the mac the results come out:

! *`-=[];',./~@#$%^&()_+{}|:"<>?\

then

\--\

On ubuntu 12.04 x64 it comes out (compiled and installed postgres from
tbz2 from postgresql.org repo):

\--\

then

! *`-=[];',./~@#$%^&()_+{}|:"<>?\


Why is there a different order on the different platforms.  I even
copied the data folder on the mac to the linux box instead of doing
pg_dump and then \i pg.sql file on Linux to move the data and the
order is still different like this.

Any idea why?  the ! line should come before the \ line, lexically.  !
is char 33 and \ is char 92.

This is done with

select * from table order by field;

on both systems and "field" is character varying on both systems.

Thanks,

Wes


Re: order by different on mac vs linux

От
Tom Lane
Дата:
Wes James <comptekki@gmail.com> writes:
> Why is there a different order on the different platforms.

This is not exactly unusual.  You should first check to see if
lc_collate is set differently in the two installations --- but even if
it's the same, there are often platform-specific interpretations of
the sorting rules.  (Not to mention that OS X is flat out broken when
it comes to sorting UTF8 data ...)

If you want consistent cross-platform results, "C" locale will get
that for you, but it's pretty stupid about non-ASCII characters.

For more info read
http://www.postgresql.org/docs/9.1/static/charset.html
        regards, tom lane


Re: order by different on mac vs linux

От
Wes James
Дата:
On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Wes James <comptekki@gmail.com> writes:
>> Why is there a different order on the different platforms.
>
> This is not exactly unusual.  You should first check to see if
> lc_collate is set differently in the two installations --- but even if
> it's the same, there are often platform-specific interpretations of
> the sorting rules.  (Not to mention that OS X is flat out broken when
> it comes to sorting UTF8 data ...)
>
> If you want consistent cross-platform results, "C" locale will get
> that for you, but it's pretty stupid about non-ASCII characters.
>
> For more info read
> http://www.postgresql.org/docs/9.1/static/charset.html
>
>                        regards, tom lane

Thanks,

I'll take a look.

-wes


Re: order by different on mac vs linux

От
Wes James
Дата:
On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Wes James <comptekki@gmail.com> writes:
>> Why is there a different order on the different platforms.
>
> This is not exactly unusual.  You should first check to see if
> lc_collate is set differently in the two installations --- but even if
> it's the same, there are often platform-specific interpretations of
> the sorting rules.  (Not to mention that OS X is flat out broken when
> it comes to sorting UTF8 data ...)
>
> If you want consistent cross-platform results, "C" locale will get
> that for you, but it's pretty stupid about non-ASCII characters.
>
> For more info read
> http://www.postgresql.org/docs/9.1/static/charset.html
>
>                        regards, tom lane

I checked both and they both say this:


show lc_collate;lc_collate
-------------en_US.UTF-8
(1 row)


You mention OS X is broken, but it looks like the linux box is broken
in this case.

With the current database I have, it looks like everything is sorting
correctly on both platforms, except those two records (the two lines
from my first email) which I added later for testing characters and
escaping, etc. on a web app.

-wes


Re: order by different on mac vs linux

От
Wes James
Дата:
On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Wes James <comptekki@gmail.com> writes:
>> Why is there a different order on the different platforms.
>
> This is not exactly unusual.  You should first check to see if
> lc_collate is set differently in the two installations --- but even if
> it's the same, there are often platform-specific interpretations of
> the sorting rules.  (Not to mention that OS X is flat out broken when
> it comes to sorting UTF8 data ...)
>
> If you want consistent cross-platform results, "C" locale will get
> that for you, but it's pretty stupid about non-ASCII characters.
>
> For more info read
> http://www.postgresql.org/docs/9.1/static/charset.html
>
>                        regards, tom lane

I tried using the postgres that comes with ubuntu (sudo apt-get
install postgresql).

With my app I kept getting invalid password.  I went in to the
database sudo -u postgres database and did \password and set a
password, but I still got invalid password error from the app api
trying to make a connection.

I then went back to the source installed version and now the output is
correct.  I'm not sure what changed. hmmm.

-wes


Re: order by different on mac vs linux

От
Scott Marlowe
Дата:
On Tue, May 15, 2012 at 10:06 AM, Wes James <comptekki@gmail.com> wrote:
> On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Wes James <comptekki@gmail.com> writes:
>>> Why is there a different order on the different platforms.
>>
>> This is not exactly unusual.  You should first check to see if
>> lc_collate is set differently in the two installations --- but even if
>> it's the same, there are often platform-specific interpretations of
>> the sorting rules.  (Not to mention that OS X is flat out broken when
>> it comes to sorting UTF8 data ...)
>>
>> If you want consistent cross-platform results, "C" locale will get
>> that for you, but it's pretty stupid about non-ASCII characters.
>>
>> For more info read
>> http://www.postgresql.org/docs/9.1/static/charset.html
>>
>>                        regards, tom lane
>
> I tried using the postgres that comes with ubuntu (sudo apt-get
> install postgresql).
>
> With my app I kept getting invalid password.  I went in to the
> database sudo -u postgres database and did \password and set a
> password, but I still got invalid password error from the app api
> trying to make a connection.
>
> I then went back to the source installed version and now the output is
> correct.  I'm not sure what changed. hmmm.

It's most likely an issue with the settings in your pg_hba.conf file.


Re: order by different on mac vs linux

От
Wes James
Дата:
On Tue, May 15, 2012 at 10:16 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> On Tue, May 15, 2012 at 10:06 AM, Wes James <comptekki@gmail.com> wrote:
>> On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Wes James <comptekki@gmail.com> writes:
>>>> Why is there a different order on the different platforms.
>>>
>>> This is not exactly unusual.  You should first check to see if
>>> lc_collate is set differently in the two installations --- but even if
>>> it's the same, there are often platform-specific interpretations of
>>> the sorting rules.  (Not to mention that OS X is flat out broken when
>>> it comes to sorting UTF8 data ...)
>>>
>>> If you want consistent cross-platform results, "C" locale will get
>>> that for you, but it's pretty stupid about non-ASCII characters.
>>>
>>> For more info read
>>> http://www.postgresql.org/docs/9.1/static/charset.html
>>>
>>>                        regards, tom lane
>>
>> I tried using the postgres that comes with ubuntu (sudo apt-get
>> install postgresql).
>>
>> With my app I kept getting invalid password.  I went in to the
>> database sudo -u postgres database and did \password and set a
>> password, but I still got invalid password error from the app api
>> trying to make a connection.
>>
>> I then went back to the source installed version and now the output is
>> correct.  I'm not sure what changed. hmmm.
>
> It's most likely an issue with the settings in your pg_hba.conf file.

Ok - if I try the apt-get version, I'll look at that.  The apt-get
version is a cluster version.  Looks like that file is in
/etc/postgresql/9.1/dbname

I see

local all postgres peer
local all all peer
host all all 127.0.0.1/32 md5
host all all ::1/128 md5

in there.

-wes


Re: order by different on mac vs linux

От
Wes James
Дата:


On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Wes James <comptekki@gmail.com> writes:
> Why is there a different order on the different platforms.

This is not exactly unusual.  You should first check to see if
lc_collate is set differently in the two installations --- but even if
it's the same, there are often platform-specific interpretations of
the sorting rules.  (Not to mention that OS X is flat out broken when
it comes to sorting UTF8 data ...)


I just ran these:

linux:

on linux

# SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM pg_user;
 case
-------
 FALSE
(1 row)

# show lc_collate;
 lc_collate
-------------
 en_US.UTF-8
(1 row)

------------------------

on mac os x:

# SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM pg_user;
 case 
------
 TRUE
(1 row)

# show lc_collate;
 lc_collate  
-------------
 en_US.UTF-8
(1 row) 


-----------------------

Why is the linux postgres saying false with the lc_collage set the way it is?

-wes

Re: order by different on mac vs linux

От
Samuel Gendler
Дата:


On Wed, May 16, 2012 at 3:46 PM, Wes James <comptekki@gmail.com> wrote:


On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Wes James <comptekki@gmail.com> writes:
> Why is there a different order on the different platforms.

This is not exactly unusual.  You should first check to see if
lc_collate is set differently in the two installations --- but even if
it's the same, there are often platform-specific interpretations of
the sorting rules.  (Not to mention that OS X is flat out broken when
it comes to sorting UTF8 data ...)


I just ran these:

linux:

on linux

# SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM pg_user;
 case
-------
 FALSE
(1 row)

# show lc_collate;

 lc_collate
-------------
 en_US.UTF-8
(1 row)

------------------------

on mac os x:

# SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM pg_user;
 case 
------
 TRUE
(1 row)

# show lc_collate;
 lc_collate  
-------------
 en_US.UTF-8
(1 row) 


-----------------------

Why is the linux postgres saying false with the lc_collage set the way it is?

That's the point - UTF-8 collation is just completely broken under OS X.  There's much previous discussion of the topic on this list and elsewhere.  If you're developing on OS X but running linux and you are mostly using an ascii character set in your test dataset, set your development OS X boxes to use C collation, which will basically do what you expect it do do until you start throwing multibyte characters at it.  If you can't constrain your testing/development dataset in such a manner and collation order really matters during development, then you probably shouldn't develop on OS X.  I spent a fair amount of time investigating how to define a new charset in what proved to ultimately be a futile attempt to redefine UTF-8 on OSX to behave just like ti does on Linux.  I just gave it up after wasting a few too many hours on it. It may be possible to do it, but the return on invested time was non-existent for me so I abandoned my effort.


Re: order by different on mac vs linux

От
Wes James
Дата:


On Wed, May 16, 2012 at 5:00 PM, Samuel Gendler <sgendler@ideasculptor.com> wrote:


On Wed, May 16, 2012 at 3:46 PM, Wes James <comptekki@gmail.com> wrote:


On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Wes James <comptekki@gmail.com> writes:
> Why is there a different order on the different platforms.

This is not exactly unusual.  You should first check to see if
lc_collate is set differently in the two installations --- but even if
it's the same, there are often platform-specific interpretations of
the sorting rules.  (Not to mention that OS X is flat out broken when
it comes to sorting UTF8 data ...)


I just ran these:

linux:

on linux

# SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM pg_user;
 case
-------
 FALSE
(1 row)

# show lc_collate;

 lc_collate
-------------
 en_US.UTF-8
(1 row)

------------------------

on mac os x:

# SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM pg_user;
 case 
------
 TRUE
(1 row)

# show lc_collate;
 lc_collate  
-------------
 en_US.UTF-8
(1 row) 


-----------------------

Why is the linux postgres saying false with the lc_collage set the way it is?

That's the point - UTF-8 collation is just completely broken under OS X.  There's much previous discussion of the topic on this list and elsewhere.  If you're developing on OS X but running linux and you are mostly using an ascii character set in your test dataset, set your development OS X boxes to use C collation, which will basically do what you expect it do do until you start throwing multibyte characters at it.  If you can't constrain your testing/development dataset in such a manner and collation order really matters during development, then you probably shouldn't develop on OS X.  I spent a fair amount of time investigating how to define a new charset in what proved to ultimately be a futile attempt to redefine UTF-8 on OSX to behave just like ti does on Linux.  I just gave it up after wasting a few too many hours on it. It may be possible to do it, but the return on invested time was non-existent for me so I abandoned my effort.


Why are people saying os x is broken in my case?  Looking at  http://www.utf8-chartable.de/  and  http://www.asciitable.com/  for sort order, ! should come before apache.  On os x it is correct, on ubuntu linux it is not.  In the order by output per my previous emails, it is correct on os x, but no on linux.  Why do people keep saying os x is broken, when it is doing the correct thing from what I've seen?

-wes 

Re: order by different on mac vs linux

От
Scott Marlowe
Дата:
On Wed, May 16, 2012 at 7:58 PM, Wes James <comptekki@gmail.com> wrote:
>
>
> On Wed, May 16, 2012 at 5:00 PM, Samuel Gendler <sgendler@ideasculptor.com>
> wrote:
>>
>>
>>
>> On Wed, May 16, 2012 at 3:46 PM, Wes James <comptekki@gmail.com> wrote:
>>>
>>>
>>>
>>> On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>
>>>> Wes James <comptekki@gmail.com> writes:
>>>> > Why is there a different order on the different platforms.
>>>>
>>>> This is not exactly unusual.  You should first check to see if
>>>> lc_collate is set differently in the two installations --- but even if
>>>> it's the same, there are often platform-specific interpretations of
>>>> the sorting rules.  (Not to mention that OS X is flat out broken when
>>>> it comes to sorting UTF8 data ...)
>>>>
>>>
>>> I just ran these:
>>>
>>> linux:
>>>
>>> on linux
>>>
>>> # SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM
>>> pg_user;
>>>  case
>>> -------
>>>  FALSE
>>> (1 row)
>>>
>>> # show lc_collate;
>>>
>>>  lc_collate
>>> -------------
>>>  en_US.UTF-8
>>> (1 row)
>>>
>>> ------------------------
>>>
>>> on mac os x:
>>>
>>> # SELECT CASE WHEN 'apache' > '!yada' THEN 'TRUE' ELSE 'FALSE' END FROM
>>> pg_user;
>>>  case
>>> ------
>>>  TRUE
>>> (1 row)
>>>
>>> # show lc_collate;
>>>  lc_collate
>>> -------------
>>>  en_US.UTF-8
>>> (1 row)
>>>
>>>
>>> -----------------------
>>>
>>> Why is the linux postgres saying false with the lc_collage set the way it
>>> is?
>>
>>
>> That's the point - UTF-8 collation is just completely broken under OS X.
>>  There's much previous discussion of the topic on this list and elsewhere.
>>  If you're developing on OS X but running linux and you are mostly using an
>> ascii character set in your test dataset, set your development OS X boxes to
>> use C collation, which will basically do what you expect it do do until you
>> start throwing multibyte characters at it.  If you can't constrain your
>> testing/development dataset in such a manner and collation order really
>> matters during development, then you probably shouldn't develop on OS X.  I
>> spent a fair amount of time investigating how to define a new charset in
>> what proved to ultimately be a futile attempt to redefine UTF-8 on OSX to
>> behave just like ti does on Linux.  I just gave it up after wasting a few
>> too many hours on it. It may be possible to do it, but the return on
>> invested time was non-existent for me so I abandoned my effort.
>>
>
> Why are people saying os x is broken in my case?  Looking
> at  http://www.utf8-chartable.de/  and  http://www.asciitable.com/  for sort
> order, ! should come before apache.  On os x it is correct, on ubuntu linux
> it is not.  In the order by output per my previous emails, it is correct on
> os x, but no on linux.  Why do people keep saying os x is broken, when it is
> doing the correct thing from what I've seen?

You're confusing encoding with locale.  UTF-8 is an encoding.  If
there were no Locale (i.e. it was set to C, simple byte ordering) then
you'd be correct.  HOWEVER, a locale is a different animal altogether.For instance, most locales ignore many characters
whenit comes to 
sort ordering.  Such as spaces:

smarlowe=# create table test (i text);
CREATE TABLE
smarlowe=# insert into test values ('abc'),(' abc'),('def'),(' def');
INSERT 0 4
smarlowe=# select * from test order by i; i
------abc abcdef def
(4 rows)

Note the spaces are ignored for sorting purposes.


Re: order by different on mac vs linux

От
Wes James
Дата:


On Mon, May 14, 2012 at 4:42 PM, Wes James <comptekki@gmail.com> wrote:
I have postgresql 9.1.3 on a mac and on linux.

On the mac the results come out:

! *`-=[];',./~@#$%^&()_+{}|:"<>?\

then

\--\

On ubuntu 12.04 x64 it comes out (compiled and installed postgres from
tbz2 from postgresql.org repo):

\--\

then

! *`-=[];',./~@#$%^&()_+{}|:"<>?\


Why is there a different order on the different platforms.  I even
copied the data folder on the mac to the linux box instead of doing
pg_dump and then \i pg.sql file on Linux to move the data and the
order is still different like this.

Any idea why?  the ! line should come before the \ line, lexically.  !
is char 33 and \ is char 92.

This is done with

select * from table order by field;

on both systems and "field" is character varying on both systems.


I'd like to bring up my question again after testing on windows xp.  Why does xp and mac os x sort properly when linux does not?  I tested this last week with ubuntu 11.10 and it is doing the same thing as 12.04.  UTF8 encoding and lc_collate =  en_US.UTF-8 on mac and linux and american_usa on windows which is the same thing as en_US.UTF-8.  All are using UTF8 encoding.

I'd really like to use linux to host this DB (and have it hosted in vmware ESXi), but the order is not coming out correct.  I've got foreign titles of books in this db, so I need the UTF8 all the way through.  "C" won't cut it.

Here is a dump of a db that you can test on ubuntu.


do:

select * from dbooks order by title;

in psql

and it won't sort correctly.  Add a record with something like "!a" in the title and run the query above and it won't sort properly.  (This is unverified as I don't have linux in front of me right now  - but I did add records and they did not sort properly with "order by title")

-wes

Re: order by different on mac vs linux

От
Wes James
Дата:
Back to the drawing board.  Windows sorts these lines thus (postgresql 9.1.3):

! *`-=[];',./~@#$%^&()_+{}|:"<>?\
!!
!a
!A
!B
!ia
!test
\--\
African agricultural research and technological development

on the ascii table here:


upper case letters should sort before lowercase letters.

-wes

Re: order by different on mac vs linux

От
Jasen Betts
Дата:
On 2012-05-27, Wes James <comptekki@gmail.com> wrote:
>>
>> Why is there a different order on the different platforms. 

postgres uses the text comparison operators provided by the host
platform.  for linux these are defined in the locales package
this may be part of the glibc package.

> I'd like to bring up my question again after testing on windows xp.  Why
> does xp and mac os x sort properly when linux does not? 

osx and windows hare broken sorting,

> I tested this last
> week with ubuntu 11.10 and it is doing the same thing as 12.04.  UTF8
> encoding and lc_collate =  en_US.UTF-8 on mac and linux and american_usa on
> windows which is the same thing as en_US.UTF-8.  All are using UTF8
> encoding.
>
> I'd really like to use linux to host this DB (and have it hosted in vmware
> ESXi), but the order is not coming out correct.  I've got foreign titles of
> books in this db, so I need the UTF8 all the way through.  "C" won't cut it.

I briefly worked for a library software company.  They had defined their own
ordering operator in the form of a function that mangled strings (they were 
using paradox database)

As I understand it postgres allows you to define your own types with
their own ordering operators 

even ignoring the issue of non-alphabetic symbols there are problems
that should be addressed for sorting titles and names. Many of these
are due to abbreviations which shouls be sorted asif expanded.

some examples:
St. Stephen of HungarySaint Stephen the MartyrSt. RitaStreet art.Wall StickersWall st.Wall StreetMcArthurMacArthur,
DouglassMaccaroni
Wye RoadY Road

-- 
⚂⚃ 100% natural



Re: order by different on mac vs linux

От
Wes James
Дата:
On Mon, May 14, 2012 at 5:00 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Wes James <comptekki@gmail.com> writes:
> Why is there a different order on the different platforms.

This is not exactly unusual.  You should first check to see if
lc_collate is set differently in the two installations --- but even if
it's the same, there are often platform-specific interpretations of
the sorting rules.  (Not to mention that OS X is flat out broken when
it comes to sorting UTF8 data ...)

If you want consistent cross-platform results, "C" locale will get
that for you, but it's pretty stupid about non-ASCII characters.

For more info read
http://www.postgresql.org/docs/9.1/static/charset.html

                       regards, tom lane

I did pg_dump of data, set LC_ALL=C, did initdb and createdb, pgsql -f pgdump.sql db, etc, and now it is sorting the way I'd expect and for the needs I have.

Thanks all for your responses.

-wes 

Re: order by different on mac vs linux

От
Peter Eisentraut
Дата:
On lör, 2012-05-26 at 21:47 -0600, Wes James wrote:
> on the ascii table here:
> 
> http://www.ascii-code.com/
> 
> upper case letters should sort before lowercase letters.

ASCII has nothing to do with how letters "should" be sorted.  It is not
a sorting standard, it is a character encoding standard.  If you open a
dictionary, you won't (usually) find all upper case letters before all
lower case letters.

The sort orders provided by glibc are, individual bugs aside, correct,
where "correct" means, they implement various national and international
standards.  (Equally, the sort orders on Mac OS X are, degenerate cases
aside, broken, because they don't implement any recognized sorting
standard.)  Of course, you are not required to like them, and there are
many people who have particular issues with them, but you won't get far
claiming they are wrong, if you don't have a recognized alternative
source to point to that defines correctness.