Обсуждение: sort order
Hi,
context:
one postgres 12 on centos 7
one postgres 12 on windows 10
both on machines with french as default
the centos 7 lc_collate and lc_ctype: fr_FR.UTF_8
the w10 lc_collate and lc_ctype: French_France.1252
create table test (ble text, id serial primary key);
insert into test(ble) values(' ');
insert into test(ble) values('Marc');
insert into test(ble) values(' Marc');
insert into test(ble) values('marc');
insert into test(ble) values(' marc');
insert into test(ble) values('bobo');
insert into test(ble) values(' bobo'):
insert into test(ble) values('élise');
differences include a french é character, and some white spaces at the beginning.
then select * from test order by ble;
centos result:
ble | id
-------+----
| 3
bobo | 2
bobo | 1
élise | 6
marc | 5
marc | 4
Marc | 8
Marc | 7
(8 lignes)
-------+----
| 3
bobo | 2
bobo | 1
élise | 6
marc | 5
marc | 4
Marc | 8
Marc | 7
(8 lignes)
w10 result:
ble | id
-------+----
| 3
bobo | 1
marc | 4
Marc | 7
élise | 8
bobo | 2
marc | 5
Marc | 6
(8 lignes)
-------+----
| 3
bobo | 1
marc | 4
Marc | 7
élise | 8
bobo | 2
marc | 5
Marc | 6
(8 lignes)
so, obviously, both lc_collate knows about the é
but obviously, too, they do behave differently on the impact of the beginning white space.
I didn't see anything about this behaviour on the doc, unless the reference at the libc should be understood as please read and test libc doc on each platform.
So my first question is: why ?
My second question is: how to make the centos postgres behave like the w10 one ??
ie. knowing about french characters AND taking beginning white spaces into account ?
thanks,
regards,
Re-reading my post, I see that even the élise is not sorted correctly on w10...
On Tue, Jul 27, 2021 at 7:07 PM Marc Millas <marc.millas@mokadb.com> wrote:
Hi,context:one postgres 12 on centos 7one postgres 12 on windows 10both on machines with french as defaultthe centos 7 lc_collate and lc_ctype: fr_FR.UTF_8the w10 lc_collate and lc_ctype: French_France.1252create table test (ble text, id serial primary key);insert into test(ble) values(' ');insert into test(ble) values('Marc');insert into test(ble) values(' Marc');insert into test(ble) values('marc');insert into test(ble) values(' marc');insert into test(ble) values('bobo');insert into test(ble) values(' bobo'):insert into test(ble) values('élise');differences include a french é character, and some white spaces at the beginning.then select * from test order by ble;centos result:ble | id
-------+----
| 3
bobo | 2
bobo | 1
élise | 6
marc | 5
marc | 4
Marc | 8
Marc | 7
(8 lignes)w10 result:ble | id
-------+----
| 3
bobo | 1
marc | 4
Marc | 7
élise | 8
bobo | 2
marc | 5
Marc | 6
(8 lignes)so, obviously, both lc_collate knows about the ébut obviously, too, they do behave differently on the impact of the beginning white space.I didn't see anything about this behaviour on the doc, unless the reference at the libc should be understood as please read and test libc doc on each platform.So my first question is: why ?My second question is: how to make the centos postgres behave like the w10 one ??ie. knowing about french characters AND taking beginning white spaces into account ?thanks,regards,
Marc Millas <marc.millas@mokadb.com> writes: > Re-reading my post, I see that even the élise is not sorted correctly on > w10... Sort orders aren't even particularly guaranteed on different releases of the same platform, let alone totally different platforms. glibc made major changes to their collation rules not long ago: https://wiki.postgresql.org/wiki/Locale_data_changes At one time we thought that migrating to ICU sorting might bring a little more stability to this, but I think what we've found so far is that it's just different :-( regards, tom lane
On 27.07.21 19:07, Marc Millas wrote: > so, obviously, both lc_collate knows about the é > but obviously, too, they do behave differently on the impact of the > beginning white space. > > I didn't see anything about this behaviour on the doc, unless the > reference at the libc should be understood as please read and test libc > doc on each platform. > So my first question is: why ? > My second question is: how to make the centos postgres behave like the > w10 one ?? > ie. knowing about french characters AND taking beginning white spaces > into account ? There are multiple standard ways to deal with space and punctuation characters when sorting. See <https://unicode-org.github.io/icu/userguide/collation/customization/ignorepunct.html> for a description. Not all collation providers implement all of them, but the behavior you happen to get is usually one of them. The centos 7 behavior corresponds to "shift-trimmed", the Windows one appears to match "non-ignorable". If you want to get that latter one on Linux as well, you can use the ICU locales, which also default to non-ignorable. For example select * from test order by ble collate "fr-x-icu"; matches your Windows output for me.