Re: Locale agnostic unicode text

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: Locale agnostic unicode text
Дата
Msg-id 87y8ei4sh9.fsf@stark.xeocode.com
обсуждение исходный текст
Ответ на Re: Locale agnostic unicode text  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Locale agnostic unicode text  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> >
> > So it's slow but not spectacularly awful.
> 
> glibc is not the world.  

Sorry, I should have said "It's not *necessarily* spectacularly awful"

> I tried Dawid's functions on Mac OS X, being a
> random non-glibc platform that I happen to use.  Using some text data
> I had handy (44500 lines, 1.9MB) I made a single-column text table and
> timed
>     explain analyze select * from foo order by f1;
> The results were
>     In C locale, SQL_ASCII encoding:    820 ms
>     In C locale, UNICODE encoding:        825 ms
>     Using Dawid's functions:        62010 ms
>     Stripped-down functions:        21010 ms

I don't think these are fair comparisons though. The C locale probably
short-circuits much of the work that strxfrm/strcoll have to do for other
locales. I think the fair comparison is to compare a database initdb'd in a
non-C locale like en_US using strcoll with no setlocale calls against one
calling setlocale twice for every record.

In any case it's true, some platforms have bad implementations of things.

But if you have to do this (and I have to do this too) it doesn't really
matter that some platforms don't handle it well. This just means those
platforms aren't feasible and I'm forced to use glibc-based platforms. It
doesn't mean I should dismiss Postgres for the project.

Incidentally Dawid, if you are on a platform like OSX with a performance
problem with this there is a possible optimization you can use. If you store
and update the data rarely but sort it frequently you can store the output of
strxfrm in a bytea column. Then you can sort on that column without having to
call setlocale repeatedly.

If you have few queries that can be optimized to always use indexes you can
even store this information in a functional index instead of denormalizing the
table.

-- 
greg



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: [COMMITTERS] pgsql: Disallow LOAD to non-superusers.
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Locale agnostic unicode text