Re: An idea on faster CHAR field indexing

Поиск
Список
Период
Сортировка
От Randall Parker
Тема Re: An idea on faster CHAR field indexing
Дата
Msg-id 21130920118861@mail.nls.net
обсуждение исходный текст
Ответ на An idea on faster CHAR field indexing  ("Randall Parker" <randall@nls.net>)
Ответы Re: An idea on faster CHAR field indexing  (Giles Lean <giles@nemeton.com.au>)
Список pgsql-hackers
Giles,

I'm curious as to why the need for multiple passes. Is that true even in Latin 1 code pages? If not, this optimization

could at least be used for code pages that don't require multiple passes.

As for memory usage: I don't see the issue here. The translation to some collation sequence has to be done anyhow. 
Writing one's own routine to do look-ups into a collation sequence table is a fairly 
trivial exercise. 

One would have the option with SBCS code pages to either translate to 8 bit collation values or to translate them into

master Unicode collation values. Not sure what the advantage would be of doing the 
latter. I only see it as useful if you have different rows storing text in different code pages and then only if the
RDBMS
 
can know for a given field on a per row basis what its code page is.

On Thu, 22 Jun 2000 06:59:06 +1000, Giles Lean wrote:

>
>> So let me cut to the chase: I'm thinking that rather than store the
>> actual character sequence of each field (or some subset of a field)
>> in an index why not translate the characters into their collation
>> sequence values and store _those_ in the index?
>
>This is not an obvious win, since:
>
>1. some collations rules require multiple passes over the data
>
>2. POSIX strxfrm() will convert strings of characters to a form that
>   can be compared by strcmp() [i.e. single pass] but tends to greatly
>   increase memory requirements
>
>   I've only data for one implementation of strxfrm(), but the memory
>   usage startled me.  In my application it was faster to use
>   strcoll() directly for collation than to pre-expand the data with
>   strxfrm().
>
>Regards,
>
>Giles
>





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Giles Lean
Дата:
Сообщение: Re: An idea on faster CHAR field indexing
Следующее
От: "Randall Parker"
Дата:
Сообщение: Thoughts on multiple simultaneous code page support