Re: advice on schema for multilingual text

Поиск
Список
Период
Сортировка
От Michael Glaesemann
Тема Re: advice on schema for multilingual text
Дата
Msg-id 81044C87-E88F-48D4-9668-42C5485047B6@myrealbox.com
обсуждение исходный текст
Ответ на advice on schema for multilingual text  ("Daniel McBrearty" <danielmcbrearty@gmail.com>)
Список pgsql-general
On Apr 9, 2006, at 0:31 , Daniel McBrearty wrote:

> Hi
>
> I have a website that has multilingual text stored in the database.
> Currently I just have a flat table (lets called it "translations"),
> one row per text item, one column per language. This works OK, for
> now, but I am looking at a redesign. Mostly I want to keep
> information about the languages in teh db as well, so that look
> like an extra table, one row per lang.
>
> The problem that now arises is that there is an expected
> correlation between the "languages" and "translations" tables -
> there should be a row in languages for each column of translations.
> AFAIK (could well be wrong, I am no expert in db theory) there is
> no real way to express in the ddl. Not ideal.
>
> An alternative layout would now be to lose the "translations"
> table, and have two tables in place; one called "base_text"
> containing the text to be translated, and another called, say,
> "tx_text" which contains the translations. Each row of "tx_text"
> references both "base_text" and also "languages".
>
> This looks like a nice layout, as there is an abstract rep of the
> languages, and we lose the "translations" table which can get very
> wide. It's nice that the schema doesn't actually change to add a
> new language.
>
> BUT there are certain invariants that need to be enforced. The main
> one is this:
>
> There must only be one row in "site_text" for any given language
> referencing a given row of "base_text".

You'd want a unique index on (base_text, language), like:

create table tx_text
(
    base_text text not null
        references base_text(base_text)
    , language text not null
        references languages (language)
    , primary key (base_text, language)
    , tx_text text not null
);

> I would also like to have a column in "languages", type boolean,
> called "is_base" - this says what the base language is. Here, only
> ONE row can have a true value. (Obviously it has default value of
> false and is not null).

Here you want a partial unique index on languages where is_base is true
create table languages
(
    language text primary key
    , is_base boolean not null
);

create unique index languages_only_one_true_base_idx
on languages (is_base)
where is_base;


> Another invariant now comes in - the language referenced by every
> row of "site_text" MUST have "is_base" set to FALSE.

I can think of two ways to do this: one is to write a trigger to
enforce this, something like:

create function non_base_language_translations_check
returns trigger()
language plpgsql as $$
begin
    if exists (
        select *
        from tx_text
        natural join languages
        where not is_base
        )
    then
        raise exception
        'Language of translation text must not be a base language.';
    end if;
end;
$$;

Then use this function on triggers that fire on insert and update on
tx_text and on update on languages.

Another is to include the is_base column in tx_text (with both
language and is_base referencing languages) and use a check
constraint to make sure is_base is false:

create table tx_text
(
    base_text text not null
        references base_text(base_text)
    , language text not null
    , is_base boolean not null check not is_base
    , foreign key (language, is_base) references languages (language,
is_base)
        on update cascade
    , primary key (base_text, language)
    , tx_text text not null
);

The latter method is denormalized, which is not something I usually
recommend. I don't know how the constraint checking overhead of using
a trigger compares with using the foreign key and check constraint.

A third idea would be to have two languages tables: one with a single
row for the base language and another for the target languages (You
might even do this using table inheritance, though I haven't thought
this completely through). tx_text would reference the
target_languages table (or child table, as the case may be).

I'm sure others have opinions on this as well.

Hope this helps.

Michael Glaesemann
grzm myrealbox com




В списке pgsql-general по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: pl/perl error
Следующее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: Is difference between PostgreSQL and mySQL licences