Обсуждение: BUG #1082: Order by doesn't sort correctly.
The following bug has been logged online:
Bug reference: 1082
Logged by: Richard Neill
Email address: rn214@cam.ac.uk
PostgreSQL version: 7.3.4
Operating system: Linux
Description: Order by doesn't sort correctly.
Details:
ORDER BY sorts the following in this order:
Cymbal #1
Cymbal - 18 inch
Cymbal #2
It ought to be thus:
Cymbal #1
Cymbal #2
Cymbal - 18 inch
or possibly thus:
Cymbal - 18 inch
Cymbal #1
Cymbal #2
-------------------------------------------------
Here's an example sql script to reproduce the bug.
CREATE TABLE tbl_testinstruments(
instrumentid integer, instrument character varying(300)
);
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (1,
'Antique Cymbals #1');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (2,
'Antique Cymbals #2');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (3, 'Clash
Cymbals, French - 20 inch');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (4,
'Cymbal #1');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (5,
'Cymbal #2');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (6,
'Cymbal - 18 inch');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (7,
'Cymbal, Sizzle');
INSERT INTO tbl_testinstruments (instrumentid, instrument) VALUES (8,
'Cymbal, Splash');
SELECT instrument FROM tbl_testinstruments ORDER BY instrument;
------------------------------------------------
This is the output I get:
instrument
---------------------------------
Antique Cymbals #1
Antique Cymbals #2
Clash Cymbals, French - 20 inch
Cymbal #1
Cymbal - 18 inch
Cymbal #2
Cymbal, Sizzle
Cymbal, Splash
(8 rows)
-------------------------------------------------
I'm using version:
PostgreSQL 7.3.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.1
(Mandrake Linux 9.2 3.3.1-1mdk)
"PostgreSQL Bugs List" <pgsql-bugs@postgresql.org> writes:
> Description: Order by doesn't sort correctly.
It almost certainly is the correct sort order according to the locale
you're using. Use pg_controldata to check the database locale. You'll
probably want to re-initdb in C locale. Most non-C locales have weird
rules that try to approximate dictionary sort order.
regards, tom lane
Richard Neill <rn214@hermes.cam.ac.uk> writes:
> This ordering is perverse!
No kidding.
> No matter what the priority is of the
> different characters, I cannot understand how the above can arise.
You are assuming that it's a byte-at-a-time process. It's not. I
believe the first pass considers only letters and digits.
You can easily prove to yourself that it's not just Postgres. Here's
an example on my Linux laptop:
[tgl@g3 tgl]$ cat zzz
Cymbal #1
Cymbal - 18 inch
Cymbal #2
[tgl@g3 tgl]$ LC_ALL=C sort zzz
Cymbal #1
Cymbal #2
Cymbal - 18 inch
[tgl@g3 tgl]$ LC_ALL=en_GB sort zzz
Cymbal #1
Cymbal - 18 inch
Cymbal #2
[tgl@g3 tgl]$
regards, tom lane
Tom Lane wrote: >Richard Neill <rn214@hermes.cam.ac.uk> writes: > > >>This ordering is perverse! >> >> > >No kidding. > > > >>No matter what the priority is of the >>different characters, I cannot understand how the above can arise. >> >> > >You are assuming that it's a byte-at-a-time process. It's not. I >believe the first pass considers only letters and digits. > >You can easily prove to yourself that it's not just Postgres. Here's >an example on my Linux laptop: > >[tgl@g3 tgl]$ cat zzz >Cymbal #1 >Cymbal - 18 inch >Cymbal #2 >[tgl@g3 tgl]$ LC_ALL=C sort zzz >Cymbal #1 >Cymbal #2 >Cymbal - 18 inch >[tgl@g3 tgl]$ LC_ALL=en_GB sort zzz >Cymbal #1 >Cymbal - 18 inch >Cymbal #2 >[tgl@g3 tgl]$ > > I verified this, and it's not GB specific as one might suggest... Same with en_US, de_DE, fr_FR, af_ZA. Does this behaviour really make sense to anybody? Regards, Andreas
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1
Hi,
On Tue, 24 Feb 2004, Andreas Pflug wrote:
<snip>
> I verified this, and it's not GB specific as one might suggest... Same
> with en_US, de_DE, fr_FR, af_ZA. Does this behaviour really make sense
> to anybody?
Ummm, same with tr_TR (for a long time).
We digged glibc and found out that tr_TR locale is broken there. Trying to
fix it. Maybe in the next release...
Regards,
- --
Devrim GUNDUZ
devrim@gunduz.org devrim.gunduz@linux.org.tr
http://www.TDMSoft.com
http://www.gunduz.org
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.1 (GNU/Linux)
iD8DBQFAO1Dctl86P3SPfQ4RAotmAJwJTUGRtYezYQDByGJ4f04+bJv3wgCgpz08
TxjQCMoYTYogCBSPYkjppLs=
=YyRq
-----END PGP SIGNATURE-----
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> I verified this, and it's not GB specific as one might suggest... Same
> with en_US, de_DE, fr_FR, af_ZA.
Yeah, most of the locales use dictionary ordering rules.
> Does this behaviour really make sense to anybody?
You'd have to argue about it with the people who work on locales.
AFAIK it is a standard of some kind.
regards, tom lane
Dear Tom, Thank you for your explanation. It's very helpful, although I was extremely surprised! I agree, it's not a postgresql bug. Can I suggest it might be worth a mention on the "Order By" part of the documentation. i.e. this page: http://www.postgresql.org/docs/7.3/static/sql-select.html#SQL-ORDERBY could possibly use a little more emphasis of this last paragraph: Data of character types is sorted according to the locale-specific collation order that was established when the database cluster was initialized. or perhaps a link to here: http://www.postgresql.org/docs/7.3/static/charset.html#AEN21582 I did realise that the sort would be locale dependent, but failed to realise it wasn't byte-at-a-time. Best wishes Richard Tom Lane wrote: > Richard Neill <rn214@hermes.cam.ac.uk> writes: > >>This ordering is perverse! > > > No kidding. > > >>No matter what the priority is of the >>different characters, I cannot understand how the above can arise. > > > You are assuming that it's a byte-at-a-time process. It's not. I > believe the first pass considers only letters and digits. > > You can easily prove to yourself that it's not just Postgres. Here's > an example on my Linux laptop: > > [tgl@g3 tgl]$ cat zzz > Cymbal #1 > Cymbal - 18 inch > Cymbal #2 > [tgl@g3 tgl]$ LC_ALL=C sort zzz > Cymbal #1 > Cymbal #2 > Cymbal - 18 inch > [tgl@g3 tgl]$ LC_ALL=en_GB sort zzz > Cymbal #1 > Cymbal - 18 inch > Cymbal #2 > [tgl@g3 tgl]$ > > regards, tom lane > -- rn214@hermes.cam.ac.uk ** http://www.richardneill.org Richard Neill, Trinity College, Cambridge, CB21TQ, U.K.
Tom Lane wrote: > "PostgreSQL Bugs List" <pgsql-bugs@postgresql.org> writes: > >>Description: Order by doesn't sort correctly. > > > It almost certainly is the correct sort order according to the locale > you're using. Use pg_controldata to check the database locale. You'll > probably want to re-initdb in C locale. Most non-C locales have weird > rules that try to approximate dictionary sort order. > > regards, tom lane > Dear Tom, Thanks for your email. I did check pg_controldata and found: LC_COLLATE: en_GB LC_CTYPE: en_GB The bug isn't the particular ascii-betical (or other) order. But what I am getting as a supposedly ordered list includes: Cymbal #1 Cymbal - 18 inch Cymbal #2 This ordering is perverse! No matter what the priority is of the different characters, I cannot understand how the above can arise. Whether '#' comes before or after '-', '#1' and '#2' should be adjacent. Richard