Обсуждение: Weird sorting order

Поиск
Список
Период
Сортировка

Weird sorting order

От
Robert Voinea
Дата:
Hi...

I have the following table:

CREATE TABLE test
(
    id SERIAL PRIMARY KEY,
    val VARCHAR(32) NOT NULL
);

INSERT INTO test VALUES
    (DEFAULT, '##34''),
    (DEFAULT, '##32'),
    (DEFAULT, '##31'),
    (DEFAULT, '2ff'),
    (DEFAULT, '##26'),
    (DEFAULT, '2##33'),
    (DEFAULT, '2##25'),
    (DEFAULT, '2##24'),
    (DEFAULT, '2##23'),
    (DEFAULT, '211'),
    (DEFAULT, '210'),
    (DEFAULT, '203'),
    (DEFAULT, '202'),
    (DEFAULT, '201'),
    (DEFAULT, '200');

Why is it that when running the query:

SELECT * FROM test ORDER BY val;

I get the following result?
 id |  val
----+-------
  1 | 200
  2 | 201
  3 | 202
  4 | 203
  5 | 210
  6 | 211
  7 | 2##23
  8 | 2##24
  9 | 2##25
 10 | 2##33
 11 | ##26
 12 | 2ff
 13 | ##31
 14 | ##32
 15 | ##34
(15 rows)

Shouldn't value '2ff' be placed right after '211' but before '2##23'?

Tested on PostgreSQL 8.2, 8.4, Linux SuSE & Kubuntu.

Thank you!...


--
Robert Voinea <robert (dot) voinea (at) topex (dot) ro>
Software Developer
Phone: +40 21 408 38 00 / ext. 343
Fax: +40 21 408 38 08
Local time: GMT+2
http://www.topex.ro


Re: Weird sorting order

От
"Kevin Grittner"
Дата:
Robert Voinea <robert.voinea@topex.ro> wrote:

> Why is it that when running the query:
>
> SELECT * FROM test ORDER BY val;
>
> I get the following result?
>  id |  val
> ----+-------
>   1 | 200
>   2 | 201
>   3 | 202
>   4 | 203
>   5 | 210
>   6 | 211
>   7 | 2##23
>   8 | 2##24
>   9 | 2##25
>  10 | 2##33
>  11 | ##26
>  12 | 2ff
>  13 | ##31
>  14 | ##32
>  15 | ##34
> (15 rows)
>
> Shouldn't value '2ff' be placed right after '211' but before
> '2##23'?

That depends on your collation configuration.  What do you get
from?:

show lc_collate;

In many collations, special characters such as '#' are ignored.

-Kevin

Re: Weird sorting order

От
Bill MacArthur
Дата:
It's ignoring the hash marks. It's like they are invisible characters.
Tested on 8.1.16

-------- Original Message --------
Subject: [ADMIN] Weird sorting order
Date: Fri, 16 Jul 2010 15:14:09 +0300
From: Robert Voinea <robert.voinea@topex.ro>
Organization: Topex
To: pgsql-admin@postgresql.org

Hi...

I have the following table:

CREATE TABLE test
(
    id SERIAL PRIMARY KEY,
    val VARCHAR(32) NOT NULL
);

INSERT INTO test VALUES
    (DEFAULT, '##34''),
    (DEFAULT, '##32'),
    (DEFAULT, '##31'),
    (DEFAULT, '2ff'),
    (DEFAULT, '##26'),
    (DEFAULT, '2##33'),
    (DEFAULT, '2##25'),
    (DEFAULT, '2##24'),
    (DEFAULT, '2##23'),
    (DEFAULT, '211'),
    (DEFAULT, '210'),
    (DEFAULT, '203'),
    (DEFAULT, '202'),
    (DEFAULT, '201'),
    (DEFAULT, '200');

Why is it that when running the query:

SELECT * FROM test ORDER BY val;

I get the following result?
  id |  val
----+-------
   1 | 200
   2 | 201
   3 | 202
   4 | 203
   5 | 210
   6 | 211
   7 | 2##23
   8 | 2##24
   9 | 2##25
  10 | 2##33
  11 | ##26
  12 | 2ff
  13 | ##31
  14 | ##32
  15 | ##34
(15 rows)

Shouldn't value '2ff' be placed right after '211' but before '2##23'?

Tested on PostgreSQL 8.2, 8.4, Linux SuSE & Kubuntu.

Thank you!...


--
Robert Voinea <robert (dot) voinea (at) topex (dot) ro>
Software Developer
Phone: +40 21 408 38 00 / ext. 343
Fax: +40 21 408 38 08
Local time: GMT+2
http://www.topex.ro


--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin


Вложения

Re: Weird sorting order

От
Robert Voinea
Дата:
On Friday 16 July 2010 18:34:06 Kevin Grittner wrote:
> Robert Voinea <robert.voinea@topex.ro> wrote:
> > Why is it that when running the query:
> >
> > SELECT * FROM test ORDER BY val;
> >
> > I get the following result?
> >
> >  id |  val
> >
> > ----+-------
> >
> >   1 | 200
> >   2 | 201
> >   3 | 202
> >   4 | 203
> >   5 | 210
> >   6 | 211
> >   7 | 2##23
> >   8 | 2##24
> >   9 | 2##25
> >
> >  10 | 2##33
> >  11 | ##26
> >  12 | 2ff
> >  13 | ##31
> >  14 | ##32
> >  15 | ##34
> >
> > (15 rows)
> >
> > Shouldn't value '2ff' be placed right after '211' but before
> > '2##23'?
>
> That depends on your collation configuration.  What do you get
> from?:
>
> show lc_collate;
>
> In many collations, special characters such as '#' are ignored.

=> show lc_collate;
 lc_collate
-------------
 en_US.UTF-8

--
Robert Voinea <robert (dot) voinea (at) topex (dot) ro>
Software Developer
Phone: +40 21 408 38 00 / ext. 343
Fax: +40 21 408 38 08
Local time: GMT+2
http://www.topex.ro


Re: Weird sorting order

От
"Kevin Grittner"
Дата:
Robert Voinea <robert.voinea@topex.ro> wrote:

> => show lc_collate;
>  lc_collate
> -------------
>  en_US.UTF-8

I'm afraid the order you're seeing is what you're supposed to get
for that collation sequence.  In that collation, special characters
(including spaces) are only used as tie-breakers for values which
are tied when the special characters are ignored.  There may be a
few consequences of that which you haven't yet found.  As one
example:

test=# show lc_collate;
 lc_collate
-------------
 en_US.UTF-8
(1 row)

test=# create table t1 (c1 text);
CREATE TABLE
test=# insert into t1 values ('one'),(' one'),('one
'),('##one'),('one##');
INSERT 0 5

test=# select '"' || c1 || '"' from t1 order by c1;
 ?column?
----------
 "one"
 " one"
 "##one"
 "one "
 "one##"
(5 rows)

test=# select '"' || c1 || '"' from t1 order by c1 desc;
 ?column?
----------
 "one##"
 "one "
 "##one"
 " one"
 "one"
(5 rows)

For that reason, we have (so far) used the C locale, which provides
the binary sort you probably expected, and we use special columns,
maintained by triggers, to control selection and sequencing as
needed -- for example we have a "searchName" column in any table
where we have name columns, which is forced into a canonical format.

FWIW, our algorithm for generating a canonical name also excludes
spaces and the '#' character, although it still has significant
differences from the en_US.UTF-8 collation.

-Kevin

Re: Weird sorting order

От
"Burgholzer, Robert (DEQ)"
Дата:
This is really interesting.  Is there a way to set the locale on the fly
for a sort?  Having looked at the docs, it appears as if the initDB step
is the only opportunity to do so.

Regards,
r.b.


Re: Weird sorting order

От
"Kevin Grittner"
Дата:
Burgholzer, Robert (DEQ)" <Robert.Burgholzer@deq.virginia.gov>
wrote:

> This is really interesting.  Is there a way to set the locale on
> the fly for a sort?  Having looked at the docs, it appears as if
> the initDB step is the only opportunity to do so.

Starting with 8.4 you can specify it at the database level:

http://www.postgresql.org/docs/8.4/interactive/sql-createdatabase.html

Peter Eisentraut is working on a way to set the collation for an
individual column or (I think) for a sort; but that won't be in 9.0.
Hopefully in 9.1....

http://archives.postgresql.org/message-id/1279045531.32647.14.camel@vanquo.pezone.net

-Kevin

Re: Weird sorting order

От
Robert Voinea
Дата:
On Monday 19 July 2010 19:15:32 Kevin Grittner wrote:
> Burgholzer, Robert (DEQ)" <Robert.Burgholzer@deq.virginia.gov>
>
> wrote:
> > This is really interesting.  Is there a way to set the locale on
> > the fly for a sort?  Having looked at the docs, it appears as if
> > the initDB step is the only opportunity to do so.
>
> Starting with 8.4 you can specify it at the database level:
>
> http://www.postgresql.org/docs/8.4/interactive/sql-createdatabase.html
>
> Peter Eisentraut is working on a way to set the collation for an
> individual column or (I think) for a sort; but that won't be in 9.0.
> Hopefully in 9.1....
>
> http://archives.postgresql.org/message-id/1279045531.32647.14.camel@vanquo.
> pezone.net
>
> -Kevin


Thank you for clarifying this.

--
Robert Voinea <robert (dot) voinea (at) topex (dot) ro>
Software Developer
Phone: +40 21 408 38 00 / ext. 343
Fax: +40 21 408 38 08
Local time: GMT+2
http://www.topex.ro