Обсуждение: Re: Question on Bizarre Sorting (ORDER BY in 7.1) (fwd)

Поиск
Список
Период
Сортировка

Re: Question on Bizarre Sorting (ORDER BY in 7.1) (fwd)

От
Дата:
On Tue, 24 Apr 2001, Clayton Vernon wrote:
>seems to be treating 'threadid' as numeric, not alpha.
>did you run any macros that needed to guess the format of the field? many of
>these are very poor in terms of looking down the table. sometimes the first
>row can inadvertently define things for you.

Nope, to my knowledge nothing is going on except for normal INSERTs
and SELECTs. What confuses me is that 'threadid' is of type text, and even
if it were being treated as if it were of type numeric, its results are
still inconsistent, and somewhat inscrutable.

I can't see any reasonable way that this sorting:

>>lxp=# SELECT threadid FROM test ORDER BY threadid;
>>       threadid
>>----------------------
>> 000-0987877374-00313
>> ___-0987877410-00316
>> ___-0987877430-00317
>> 100-0987877381-00314
>> 100-0987877395-00315
>> 200-0987877461-00318
>>(6 rows)

...would be occuring, unless it's going out of its way to try to strip
non-numeric characters and treat the remaining assembled numbers as a
whole number to sort by, possibly?

But if that were the case, that still doesn't explain why the substrings
(below) behave the way that they do, in re-ordering the underscores.

>lxp=# SELECT substr(threadid,1,5) FROM test ORDER BY substr(threadid, 1, 5);
> substr
>--------
> ___-0
> ___-0
> 000-0
> 100-0
> 100-0
> 200-0
>(6 rows)
>> Now, the underscores appear to PRECEDE the 0's. This seems at least a
>little more sane, however this is completely the opposite of where the
>underscore would be sorted with 7.0.3. Now consider the next substring, of
>six characters instead of five.
>
>lxp=# SELECT substr(threadid,1,6) FROM test ORDER BY substr(threadid, 1,6);
> substr
>---------
> 000-09
> ___-09
> ___-09
> 100-09
> 100-09
> 200-09
>(6 rows)



Thanks, though still puzzled,
Jw @ Command Prompt.
--
By way of pgsql-general@commandprompt.com



Re: Question on Bizarre Sorting (ORDER BY in 7.1) (fwd)

От
Justin Clift
Дата:
Dunno if this is helpful, but there is a pattern.

After stripping off the leading non-numeric characters, from the results
in the previous email, this leaves :

lxp=# SELECT threadid FROM test ORDER BY threadid;
       threadid
----------------------
000-0987877374-00313
0987877410-00316
0987877430-00317
100-0987877381-00314
100-0987877395-00315
200-0987877461-00318
(6 rows)

lxp=# SELECT substr(threadid,1,5) FROM test ORDER BY substr(threadid, 1,
5);
 substr
--------
0
0
000-0
100-0
100-0
200-0
(6 rows)

lxp=# SELECT substr(threadid,1,6) FROM test ORDER BY substr(threadid,
1,6);
 substr
---------
000-09
09
09
100-09
100-09
200-09
(6 rows)

As for why?  Don't know.

But the sorting is consistent.

Regards and best wishes,

Justin Clift


pgsql-general@commandprompt.com wrote:
>
> On Tue, 24 Apr 2001, Clayton Vernon wrote:
> >seems to be treating 'threadid' as numeric, not alpha.
> >did you run any macros that needed to guess the format of the field? many of
> >these are very poor in terms of looking down the table. sometimes the first
> >row can inadvertently define things for you.
>
> Nope, to my knowledge nothing is going on except for normal INSERTs
> and SELECTs. What confuses me is that 'threadid' is of type text, and even
> if it were being treated as if it were of type numeric, its results are
> still inconsistent, and somewhat inscrutable.
>
> I can't see any reasonable way that this sorting:
>
> >>lxp=# SELECT threadid FROM test ORDER BY threadid;
> >>       threadid
> >>----------------------
> >> 000-0987877374-00313
> >> ___-0987877410-00316
> >> ___-0987877430-00317
> >> 100-0987877381-00314
> >> 100-0987877395-00315
> >> 200-0987877461-00318
> >>(6 rows)
>
> ...would be occuring, unless it's going out of its way to try to strip
> non-numeric characters and treat the remaining assembled numbers as a
> whole number to sort by, possibly?
>
> But if that were the case, that still doesn't explain why the substrings
> (below) behave the way that they do, in re-ordering the underscores.
>
> >lxp=# SELECT substr(threadid,1,5) FROM test ORDER BY substr(threadid, 1, 5);
> > substr
> >--------
> > ___-0
> > ___-0
> > 000-0
> > 100-0
> > 100-0
> > 200-0
> >(6 rows)
> >> Now, the underscores appear to PRECEDE the 0's. This seems at least a
> >little more sane, however this is completely the opposite of where the
> >underscore would be sorted with 7.0.3. Now consider the next substring, of
> >six characters instead of five.
> >
> >lxp=# SELECT substr(threadid,1,6) FROM test ORDER BY substr(threadid, 1,6);
> > substr
> >---------
> > 000-09
> > ___-09
> > ___-09
> > 100-09
> > 100-09
> > 200-09
> >(6 rows)
>
> Thanks, though still puzzled,
> Jw @ Command Prompt.
> --
> By way of pgsql-general@commandprompt.com
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)

--
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
     - Indira Gandhi

Re: Question on Bizarre Sorting (ORDER BY in 7.1) (fwd)

От
Barry Lind
Дата:
The behavior seen here seems correct although not intuitive.  My guess
is that your database executables where created with locale support
enabled and your database was inited with a US locale.  The sorting
logic for the english locales only sort on alpha-numeric characters.
All other characters are ignored for sorting purposes.

So these values:

000-0987877374-00313
___-0987877410-00316
___-0987877430-00317
100-0987877381-00314
100-0987877395-00315
200-0987877461-00318
get sorted as:

000098787737400313
098787741000316
098787743000317
100098787738100314
100098787739500315
200098787746100318
thus you see it is sorted correctly.  If you look at all of the other
examples you will see this explains all of the results you are seeing.

What I think you really want here is a binary sort order, not a locale
specific sort order.  So you should initdb your database with a locale
of C, or rebuild postgresql without locale support enabled.

I think there is a contrib program that you can use to see what locale
your database was created with.  Once your database is created you can't
change the locale, you will need to re-initdb.

thanks,
--Barry

Justin Clift wrote:

> Dunno if this is helpful, but there is a pattern.
>
> After stripping off the leading non-numeric characters, from the results
> in the previous email, this leaves :
>
> lxp=# SELECT threadid FROM test ORDER BY threadid;
>        threadid
> ----------------------
> 000-0987877374-00313
> 0987877410-00316
> 0987877430-00317
> 100-0987877381-00314
> 100-0987877395-00315
> 200-0987877461-00318
> (6 rows)
>
> lxp=# SELECT substr(threadid,1,5) FROM test ORDER BY substr(threadid, 1,
> 5);
>  substr
> --------
> 0
> 0
> 000-0
> 100-0
> 100-0
> 200-0
> (6 rows)
>
> lxp=# SELECT substr(threadid,1,6) FROM test ORDER BY substr(threadid,
> 1,6);
>  substr
> ---------
> 000-09
> 09
> 09
> 100-09
> 100-09
> 200-09
> (6 rows)
>
> As for why?  Don't know.
>
> But the sorting is consistent.
>
> Regards and best wishes,
>
> Justin Clift
>
>
> pgsql-general@commandprompt.com wrote:
>
>> On Tue, 24 Apr 2001, Clayton Vernon wrote:
>>
>>> seems to be treating 'threadid' as numeric, not alpha.
>>> did you run any macros that needed to guess the format of the field? many of
>>> these are very poor in terms of looking down the table. sometimes the first
>>> row can inadvertently define things for you.
>>
>> Nope, to my knowledge nothing is going on except for normal INSERTs
>> and SELECTs. What confuses me is that 'threadid' is of type text, and even
>> if it were being treated as if it were of type numeric, its results are
>> still inconsistent, and somewhat inscrutable.
>>
>> I can't see any reasonable way that this sorting:
>>
>>>> lxp=# SELECT threadid FROM test ORDER BY threadid;
>>>>       threadid
>>>> ----------------------
>>>> 000-0987877374-00313
>>>> ___-0987877410-00316
>>>> ___-0987877430-00317
>>>> 100-0987877381-00314
>>>> 100-0987877395-00315
>>>> 200-0987877461-00318
>>>> (6 rows)
>>>
>> ...would be occuring, unless it's going out of its way to try to strip
>> non-numeric characters and treat the remaining assembled numbers as a
>> whole number to sort by, possibly?
>>
>> But if that were the case, that still doesn't explain why the substrings
>> (below) behave the way that they do, in re-ordering the underscores.
>>
>>> lxp=# SELECT substr(threadid,1,5) FROM test ORDER BY substr(threadid, 1, 5);
>>> substr
>>> --------
>>> ___-0
>>> ___-0
>>> 000-0
>>> 100-0
>>> 100-0
>>> 200-0
>>> (6 rows)
>>>
>>>> Now, the underscores appear to PRECEDE the 0's. This seems at least a
>>>
>>> little more sane, however this is completely the opposite of where the
>>> underscore would be sorted with 7.0.3. Now consider the next substring, of
>>> six characters instead of five.
>>>
>>> lxp=# SELECT substr(threadid,1,6) FROM test ORDER BY substr(threadid, 1,6);
>>> substr
>>> ---------
>>> 000-09
>>> ___-09
>>> ___-09
>>> 100-09
>>> 100-09
>>> 200-09
>>> (6 rows)
>>
>> Thanks, though still puzzled,
>> Jw @ Command Prompt.
>> --
>> By way of pgsql-general@commandprompt.com
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 2: you can get off all lists at once with the unregister command
>>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>