Re: An idea on faster CHAR field indexing

Поиск
Список
Период
Сортировка
От Giles Lean
Тема Re: An idea on faster CHAR field indexing
Дата
Msg-id 11897.961636374@nemeton.com.au
обсуждение исходный текст
Ответ на Re: An idea on faster CHAR field indexing  ("Randall Parker" <randall@nls.net>)
Ответы Re: An idea on faster CHAR field indexing  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
> I'm curious as to why the need for multiple passes. Is that true
> even in Latin 1 code pages?

Yes.  Some locales want strings to be ordered first by ignoring any
accents on chracters, then using a tie-break on equal strings by doing
a comparison that includes the accents.

To take another of your points out of order: this is an obstacle that
Unicode doesn't resolve.  Unicode gives you a character set capable of
representing characters from many different locales, but collation
order will remain locale specific.

> If not, this optimization could at least
> be used for code pages that don't require multiple passes.

... but due to the increased memory/disk space, this is likely not an
optimisation.  Measurements needed, I'd suggest.

My only experience of this was tuning a sort utility, where the extra
time to convert the strings with strxfrm() and the large additional
memory requirement killed any advantage strcmp() had over strcoll().
Whether this would be the case for database indexes in general or
ideed ever I don't know.

> As for memory usage: I don't see the issue here. The translation to
> some collation sequence has to be done anyhow. 

No; you can do the comparisons in multiple passes instead without
extra storage allocation.  Using multiple passes will be efficient if
the comparisons mostly don't need the second pass, which I suspect is
typical.

> Writing one's own routine to do look-ups into a collation sequence
> table is a fairly trivial exercise.

True.  But if you can't do character-by-character comparisons then
such a simplistic implementation will fail.

I hadn't mentioned this time around (but see the archives for the
recent discussion of LIKE) that there are locales with 2:1 and 1:2
mappings of characters too.

Regards,

Giles



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Chris Bitmead
Дата:
Сообщение: Re: Big 7.1 open items
Следующее
От: Giles Lean
Дата:
Сообщение: Re: Thoughts on multiple simultaneous code page support