Обсуждение: Refined LC_COLLATE or multiple database clusters?

Поиск
Список
Период
Сортировка

Refined LC_COLLATE or multiple database clusters?

От
Grega Bremec
Дата:
Hello, List,

I recently stumbled across a problem that I can't really get across.

We have a database cluster (PG 7.4.2) that was initialized as follows:

    $ pg_controldata /data/dir
    pg_control version number:            72
    Catalog version number:               200310211
    Database cluster state:               in production
    pg_control last modified:             sre 09 jun 2004 03:00:26 CEST
    Current log file ID:                  20
    Next log file segment:                63
    ... <snip irrelevant checkpoint info> ...
    Database block size:                  8192
    Blocks per segment of large relation: 131072
    Maximum length of identifiers:        64
    Maximum number of function arguments: 32
    Date/time type storage:               64-bit integers
    Maximum length of locale name:        128
    LC_COLLATE:                           C
    LC_CTYPE:                             C

Inside that cluster, there are several databases using different encodings:

    template1=# \l
         List of databases
    Name     |  Owner   | Encoding
    -------------+----------+----------
     db1         | ownera   | UNICODE
     db2         | ownera   | UNICODE
     db3         | ownerb   | LATIN2
     db4         | ownerc   | LATIN2
     db5         | ownera   | LATIN2
     db6         | ownera   | LATIN2
     template0   | postgres | UNICODE
     template1   | postgres | UNICODE

Collate order for those databases, however, needs to be different. Obviously,
db3, db4, db5 and db6 will want to use some collate ordering scheme based on
ISO-8859-2, whereas the other two could possibly have table- or even column-
based collate requirements, as they contain unicode data in UTF-8 encoding,
which doesn't give any warranties wrt the contents of these databases.

Producing a list of requirements and/or imposing certain conventions on the
format of data stored in those tables is outside the scope of my authorities,
the only reasonable assumption I can make is that these databases could be
limited to one collating order per database (or several databases, as it is)
without much hassle.

Also, running several postmasters on this same machine is not an option, as
it only has 1.5GB RAM, of which only 1GB is available for cache (~260MB is
swapped after roughly a month's uptime, but that doesn't change much after
settling down in a week or two).

My question to the list would be the following:

Is it possible to do either of these things that could solve this problem
adequately:

    - somehow manage to make one postmaster run on top of two separate
      database clusters that would each have a different collate ordering
      scheme

    - use some other method of initializing one database from a different
      template and taking with it LC_COLLATE setting (I suppose not, as
      the "${PGDATA}/global/" directory is global to the cluster)

    - use a patch that would add such functionality or upgrade to a version
      (even if release-candidate, beta is not really an option, i gather)
      of PostgreSQL that supported it

    - in absence of any other viable solution, change the global setting of
      the database cluster without having to dump/reinitdb/restore it

I thank you in advance for your valuable input,
--
    Grega Bremec
    Senior Administrator
    Noviforum Ltd., Software & Media
    http://www.noviforum.si/

Re: Refined LC_COLLATE or multiple database clusters?

От
Tom Lane
Дата:
Grega Bremec <grega.bremec@noviforum.si> writes:
> Collate order for those databases, however, needs to be different.

If you need multiple LC_COLLATE settings then you have to run multiple
postmasters.  There is no other solution today, nor likely to be one in
the near future.

> Also, running several postmasters on this same machine is not an option,

Sure it is.  Just don't go overboard with the shared_buffers settings.
Let the kernel do the bulk of the buffering and you'll be fine.

            regards, tom lane

Re: [HACKERS] Refined LC_COLLATE or multiple database clusters?

От
Honza Pazdziora
Дата:
On Wed, Jun 09, 2004 at 12:33:03PM +0200, Grega Bremec wrote:
>
> Collate order for those databases, however, needs to be different. Obviously,

[...]

> Is it possible to do either of these things that could solve this problem
> adequately:
>
>     - somehow manage to make one postmaster run on top of two separate
>       database clusters that would each have a different collate ordering
>       scheme
>
>     - use some other method of initializing one database from a different
>       template and taking with it LC_COLLATE setting (I suppose not, as
>       the "${PGDATA}/global/" directory is global to the cluster)
>
>     - use a patch that would add such functionality or upgrade to a version
>       (even if release-candidate, beta is not really an option, i gather)
>       of PostgreSQL that supported it
>
>     - in absence of any other viable solution, change the global setting of
>       the database cluster without having to dump/reinitdb/restore it

If you do not need the collating sequence to affect index operations,
you can use nls_string function to sort using

    order by nls_string(column, 'sl_SI.utf-8')

where any locale can be specified in the runtime. The nls_string
result can also be used to compare strings in collating manner, however,
indexes will not be used in that case, which may or may not be a
problem for your usage.

The source of nls_string with installation instructions can be found
at

    http://www.fi.muni.cz/~adelton/l10n/

--
------------------------------------------------------------------------
 Honza Pazdziora | adelton@fi.muni.cz | http://www.fi.muni.cz/~adelton/
 .project: Perl, mod_perl, DBI, Oracle, large Web systems, XML/XSL, ...
        Only self-confident people can be simple.