adding collation to a SQL database

Поиск
Список
Период
Сортировка
От Dave Gudeman
Тема adding collation to a SQL database
Дата
Msg-id 7b079fba0810180255w11f78652l540f04d88e85264d@mail.gmail.com
обсуждение исходный текст
Список pgsql-hackers
<div dir="ltr">Normally I would lurk on this list for a month or two before I considered posting, but I saw a note on
collationand thought some of you might be interested in my experience implementing collation for the ANTs Data
Server.<br/><br />In ADS, we treated the collation internally as part of the type for string types. In other words,
char,varchar, and clob expressions and columns had not only a length and a can-be-null flag, but also a collation. This
letus do a lot less work to add collations because we didn't have to munge the expression code very much. We just
adaptedthe code that does type assignment and then the information was carried through the compiler in the type info
withno further work (except where the compiler actually had to know about collation). We didn't need to mess with index
selection,for example, because it was handled correctly just based on types. And it let us add collated indexes very
easilyjust by adding the new type information. We treated those weird collation expressions like type casts, which made
themeasier to implement also.<br /><br />Collated compares are very slow and it's worth quite a bit of effort to to
avoidthem when possible (for example by putting them last in the list of predicates to evaluate). ICU has a facility
forprepocessing search strings so they can be byte compared (which is much faster than normal collated compares). You
canconstruct a collated index by using proprocessed search strings as the keys so that index searches are faster. We
didn'tdo that in ADS for several reasons: first, the preprocessed strings are bigger (about 30% as I recall) so you can
getfewer keys in a node. Second, you don't do really do a lot of comparisons in an index lookup. Third, ADS had an
optimizationwhere it didn't read the data files if the index contained all of the columns needed for the query, and
thatoptimization would not work with these sorts of collation indexes. Still, the parameters of Postgresql are quite a
bitdifferent so it might be worth considering.<br /><br />Partly for backward compatibility and partly because collated
comparesare so slow, ADS had a default collation that was just a normal ascii compare. Since strings were all encoded
inUTF-8, the collation order was incorrect if there were any multi-byte characters, but in English-only columns, or
columnswhere they just wanted a reproducible ordering and didn't care that much about language rules, it was much
fasterthan the ICU compare.<br /><br />regards,<br />David Gudeman<br /><br /></div> 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: Reducing some DDL Locks to ShareLock
Следующее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: PGDay.it collation discussion notes