Ideas needed: How to create and store collation tables

Поиск
Список
Период
Сортировка
От Peter Eisentraut
Тема Ideas needed: How to create and store collation tables
Дата
Msg-id Pine.LNX.4.44.0211181846430.12428-100000@localhost.localdomain
обсуждение исходный текст
Ответы Re: Ideas needed: How to create and store collation tables  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
Re: Ideas needed: How to create and store collation tables  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
I am trying to figure out which is the best way to store custom collation
tables on a PostgreSQL server system, and what kind of interface to
provide to users to allow them to create their own.

A collation table essentially consists of a mapping 'character code ->
weight' for every character in the set and some additional considerations
for one-to-many and many-to-one mappings, plus a few feature flags.

How would a user go about creating such a table?

CREATE COLLATION foo (  ...  <10000 lines of data>  ...
);

or would it be preferrable to store the table in some external file and
then have the call simply be, say,

CREATE COLLATION foo SOURCE 'some file';

The latter has the disadvantage that we'd need some smarts so that pg_dump
would not repeat the mistakes that were made with dynamically loadable
modules (such as absolute file paths).  The former has the disadvantage
that it is too unwieldy to be useful.

We also need to consider the following two problems:

Firstly, if the collation data -- no matter how it is created -- is stored
within the database (that is, in some table(s)), then it would be
duplicated in every database.  Depending on the storage format, a
collation table takes between around 100 kB and 800 kB.  Multiply that by
a few dozen languages, for each database.  That would make an external
file seem more attractive.  (The external file would need to be a binary
file that is precomputed for efficient processing, unless we want to
reparse and reprocess it every so often, like for every session.)

Secondly, because each collation table depends on a particular character
encoding (since it is indexed by character code), some sort of magic needs
to happen when someone creates a database with a different encoding than
the template database.  One option is to do some mangling on the
registered external file name (such as appending the encoding name to the
file name).  Another option is to have the notional pg_collate system
catalog contain a column for the encoding, and then simply ignore all
entries pertaining to encodings other than the database encoding.

Comments or better ideas?

-- 
Peter Eisentraut   peter_e@gmx.net



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bruno Wolff III
Дата:
Сообщение: Re: Getting float8 data into cube?
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: char(n) to varchar or text conversion should strip