Обсуждение: [BUGS] Problem with PostgreSQL string sorting

Поиск
Список
Период
Сортировка

[BUGS] Problem with PostgreSQL string sorting

От
Hari Sankar A
Дата:
Hello All,

I am a Database administrator and we are using PostgreSQL-9.6.1 version in RHEL-7 linux machine. Current server Encoding setup is UTF8 and LC_COLLATE and LC_CTYPE is en_US.UTF-8

Lets say table person has field name with values likes

name
------
Abc
abc
.dcb
Dcb
$sdf
EDF

en_US.UTF-8 collate sorting this field as:

name
------
abc
Abc
.dcb
Dcb
EDF
$sdf

It ignore's Special characters.

"C" collate sorting this field as:
name
------
$sdf
.dcb
Abc
Dcb
EDF
abc

It does case sensitive sort.

My expected sorting should be: (Case insensitive and special character) 

name
------
$sdf
.dcb
Abc
abc
Dcb
EDF

I can use lower() and collate 'C' to get this result. But I need a default collate to support this.

Is there any collation that support both (Case insensitive and special character) sorting in utf8?


your help would be really appreciated.


Thanks,
Hari

Re: [BUGS] Problem with PostgreSQL string sorting

От
Peter Geoghegan
Дата:
On Wed, Feb 15, 2017 at 7:44 AM, Hari Sankar A <hsshanthamhari@gmail.com> wrote:
> I am a Database administrator and we are using PostgreSQL-9.6.1 version in
> RHEL-7 linux machine. Current server Encoding setup is UTF8 and LC_COLLATE
> and LC_CTYPE is en_US.UTF-8.

The behavior of each collation comes from the operating system's own
libc, except for the C collation, which is based on the ordering
implied by strcmp() comparisons. Generally, most implementations have
the behavior you describe, in that they assign least weight of all to
caseness and whitespace, and somewhat more weight to punctuation. I
don't think that there is much that can be done about it in practice,
though in principal there could be a collation that has all the
properties you want.

Future versions of PostgreSQL may have better support for this kind of thing.

-- 
Peter Geoghegan


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Problem with PostgreSQL string sorting

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> The behavior of each collation comes from the operating system's own
> libc, except for the C collation, which is based on the ordering
> implied by strcmp() comparisons. Generally, most implementations have
> the behavior you describe, in that they assign least weight of all to
> caseness and whitespace, and somewhat more weight to punctuation. I
> don't think that there is much that can be done about it in practice,
> though in principal there could be a collation that has all the
> properties you want.

I believe most versions of libc use declarative specifications of locale
properties, so if you wanted to spend some time learning the language in
which those are written, you could make and install a locale that acts
like you want.  It's possible there's one out there already, because
a lot of people seem to want that.

In any case, this is not a bug, this is a feature request; and it's not
even for a Postgres feature, it's for a libc feature.

            regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Problem with PostgreSQL string sorting

От
Peter Geoghegan
Дата:
On Wed, Feb 15, 2017 at 9:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I believe most versions of libc use declarative specifications of locale
> properties, so if you wanted to spend some time learning the language in
> which those are written, you could make and install a locale that acts
> like you want.  It's possible there's one out there already, because
> a lot of people seem to want that.

True. And, the declarative language glibc uses doesn't look that
complicated to me. However, you'd really have to vendor your own glibc
to make this work, since they are compiled to some binary
representation when glibc is built. That's probably prohibitively
complicated to manage.

-- 
Peter Geoghegan


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Problem with PostgreSQL string sorting

От
Hari Sankar A
Дата:
Thank you all for your valuable comments.

I will work on glibc to get this feature.

Appreciate your help.

Thanks,
Hari
On Wed, Feb 15, 2017 at 12:14 PM Peter Geoghegan <pg@bowt.ie> wrote:
On Wed, Feb 15, 2017 at 9:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I believe most versions of libc use declarative specifications of locale
> properties, so if you wanted to spend some time learning the language in
> which those are written, you could make and install a locale that acts
> like you want.  It's possible there's one out there already, because
> a lot of people seem to want that.

True. And, the declarative language glibc uses doesn't look that
complicated to me. However, you'd really have to vendor your own glibc
to make this work, since they are compiled to some binary
representation when glibc is built. That's probably prohibitively
complicated to manage.

--
Peter Geoghegan

Re: [BUGS] Problem with PostgreSQL string sorting

От
Tom Lane
Дата:
Peter Geoghegan <pg@bowt.ie> writes:
> On Wed, Feb 15, 2017 at 9:05 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I believe most versions of libc use declarative specifications of locale
>> properties, so if you wanted to spend some time learning the language in
>> which those are written, you could make and install a locale that acts
>> like you want.  It's possible there's one out there already, because
>> a lot of people seem to want that.

> True. And, the declarative language glibc uses doesn't look that
> complicated to me. However, you'd really have to vendor your own glibc
> to make this work, since they are compiled to some binary
> representation when glibc is built. That's probably prohibitively
> complicated to manage.

Really?  "man localedef" certainly gives the impression that you can
just compile some more locales if you want.  Admittedly, your system's
package manager might have a hissy fit about nonstandard files
suddenly appearing in /usr/lib/locale, but that's not glibc's fault.

            regards, tom lane


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Re: [BUGS] Problem with PostgreSQL string sorting

От
Peter Geoghegan
Дата:
On Wed, Feb 15, 2017 at 9:58 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Really?  "man localedef" certainly gives the impression that you can
> just compile some more locales if you want.  Admittedly, your system's
> package manager might have a hissy fit about nonstandard files
> suddenly appearing in /usr/lib/locale, but that's not glibc's fault.

I agree, but the apparent lack of any standard, high-level guidance on
creating a custom locale would leave me hesitant to take that course
of action.

-- 
Peter Geoghegan


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs