An idea on faster CHAR field indexing

Поиск
Список
Период
Сортировка
От Randall Parker
Тема An idea on faster CHAR field indexing
Дата
Msg-id 20123500514726@mail.nls.net
обсуждение исходный текст
Ответы Re: An idea on faster CHAR field indexing  (Giles Lean <giles@nemeton.com.au>)
Список pgsql-hackers
Hi folks,

This is my first post to your list. I've been reading it for about a week. I like the quality of the developers here
andthink 
 
this portends well for the future of Postgres.

Anyway, an idea. Not sure if RDBMSs internally already implement this technique. But in case them don't and in case 
you've never thought of it here something I just thought of:

CHAR fields have different sorting (aka collation) rules for each code page. eg the very fact that A comes before B is

something that the collation info for a given code page has to specify. Well, just because a character has a lower
value
 
than another character in its encoding in a given code page doesn't mean it gets sorted first. 

So let me cut to the chase: I'm thinking that rather than store the actual character sequence of each field (or some 
subset of a field) in an index why not translate the characters into their collation sequence values and store _those_
in
 
the index? 

The idea is to reduce the number of times that string has to be converted to its mathematical sorting order
representation.
 
Don't do it every time two strings get compared. Do it when a record is inserted or that field is updated.

Is this already done? Or is it not such a good idea for some reason? 

I'd consider this idea of greater value in something like Unicode. For 16 bit Unicode the lookup table to find each 
character's ordinal value (or sorting value, whatever its called) is 128k, right? Doing a bunch of look-ups into that
hasto 
 
not be good for L1 and L2 cache in a processor. 






В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruce Momjian
Дата:
Сообщение: Re: Big 7.1 open items
Следующее
От: Giles Lean
Дата:
Сообщение: Re: An idea on faster CHAR field indexing