Re: An idea on faster CHAR field indexing

Поиск

Список

Период

Сортировка

От	Giles Lean
Тема	Re: An idea on faster CHAR field indexing
Дата	22 июня 2000 г. 00:11:08
Msg-id	11897.961636374@nemeton.com.au обсуждение исходный текст
Ответ на	Re: An idea on faster CHAR field indexing ("Randall Parker" <randall@nls.net>)
Ответы	Re: An idea on faster CHAR field indexing (Tom Lane <tgl@sss.pgh.pa.us>)
Список	pgsql-hackers

Дерево обсуждения

> I'm curious as to why the need for multiple passes. Is that true
> even in Latin 1 code pages?

Yes.  Some locales want strings to be ordered first by ignoring any
accents on chracters, then using a tie-break on equal strings by doing
a comparison that includes the accents.

To take another of your points out of order: this is an obstacle that
Unicode doesn't resolve.  Unicode gives you a character set capable of
representing characters from many different locales, but collation
order will remain locale specific.

> If not, this optimization could at least
> be used for code pages that don't require multiple passes.

... but due to the increased memory/disk space, this is likely not an
optimisation.  Measurements needed, I'd suggest.

My only experience of this was tuning a sort utility, where the extra
time to convert the strings with strxfrm() and the large additional
memory requirement killed any advantage strcmp() had over strcoll().
Whether this would be the case for database indexes in general or
ideed ever I don't know.

> As for memory usage: I don't see the issue here. The translation to
> some collation sequence has to be done anyhow. 

No; you can do the comparisons in multiple passes instead without
extra storage allocation.  Using multiple passes will be efficient if
the comparisons mostly don't need the second pass, which I suspect is
typical.

> Writing one's own routine to do look-ups into a collation sequence
> table is a fairly trivial exercise.

True.  But if you can't do character-by-character comparisons then
such a simplistic implementation will fail.

I hadn't mentioned this time around (but see the archives for the
recent discussion of LIKE) that there are locales with 2:1 and 1:2
mappings of characters too.

Regards,

Giles

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Chris Bitmead
Дата: 21 июня 2000 г., 23:49:52
Сообщение: Re: Big 7.1 open items

Следующее

От: Giles Lean
Дата: 22 июня 2000 г., 00:15:19
Сообщение: Re: Thoughts on multiple simultaneous code page support

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: An idea on faster CHAR field indexing

Предыдущее

Следующее