Обсуждение: TOAST & performance with lots of big columns in a table

Поиск
Список
Период
Сортировка

TOAST & performance with lots of big columns in a table

От
Frank Joerdens
Дата:
I've got an articles table where I want to store texts, of which several translations
exist. Thanks to TOAST I can now store texts of arbitrary length directly in the table,
which is already a big advantage over stuffing them into the file system and trying to
keep the database and file system in sync. What I am wondering is:

From a conceptual point of view, it appears better to keep all translations in one table.
I forget what exactly the argument is; it has something to do with normalization theory.
Anyway I've already got meta information about articles that applies to all translations -
such as author, position within the overall structure, related articles etc.; so if I were
to have a table for every language, then every article row in any language-table
corresponding to a particular article would have to link with the same row in authors,
index, etc., and the structure would get more complicated than it needs to be. However,
with a long article of maybe several 100 K, and translations in 6 languages (this is
theoretical, actually I have only 2 at the moment), the row size would increase
accordingly. Does this pose a problem for TOAST? Is it a better plan to have a separate
table for each language?

- Frank

Re: TOAST & performance with lots of big columns in a table

От
Frank Joerdens
Дата:
Uh, I think I was wired rather the wrong way up, this question is
confused. What a little fresh air can do. Cycling home from the office
cleared the confusion in my head: It is of course nonsense to store all
translations in a single row, also to have different tables for
different languages. You have one table with a 'language' field that
stores the information as to whether this is English, French, etc.; and
then another table for the meta stuff, that also links to the authors
table etc.. So simple. I am a little embarassed.

Frank Joerdens wrote:
>
> I've got an articles table where I want to store texts, of which several translations
> exist. Thanks to TOAST I can now store texts of arbitrary length directly in the table,
> which is already a big advantage over stuffing them into the file system and trying to
> keep the database and file system in sync. What I am wondering is:
>
> >From a conceptual point of view, it appears better to keep all translations in one table.
> I forget what exactly the argument is; it has something to do with normalization theory.
> Anyway I've already got meta information about articles that applies to all translations -
> such as author, position within the overall structure, related articles etc.; so if I were
> to have a table for every language, then every article row in any language-table
> corresponding to a particular article would have to link with the same row in authors,
> index, etc., and the structure would get more complicated than it needs to be. However,
> with a long article of maybe several 100 K, and translations in 6 languages (this is
> theoretical, actually I have only 2 at the moment), the row size would increase
> accordingly. Does this pose a problem for TOAST? Is it a better plan to have a separate
> table for each language?
>
> - Frank

Re: TOAST & performance with lots of big columns in a table

От
Jan Wieck
Дата:
Frank Joerdens wrote:
> Uh, I think I was wired rather the wrong way up, this question is
> confused. What a little fresh air can do. Cycling home from the office
> cleared the confusion in my head: It is of course nonsense to store all
> translations in a single row, also to have different tables for
> different languages. You have one table with a 'language' field that
> stores the information as to whether this is English, French, etc.; and
> then another table for the meta stuff, that also links to the authors
> table etc.. So simple. I am a little embarassed.

    That'd  be my suggestion too, because it makes it alot easier
    to add a 7th and 8th language later.

    Anyway, having many big columns in one  table  will  make  it
    more  likely  that  the toaster is invoked. Actually, it only
    does some work until the main tuple fits into BLKSZ/4,  which
    is  a  little  less  than  2K in the default setup. This will
    cause more data to get toasted. The application would need to
    only select those columns that are actually required to avoid
    detoasting of all the stuff it doesn't use, so this  approach
    is  a  little  more  complicated  at  the client side. If the
    application selects all translations every time anyway, there
    shouldn't  be  much  of a difference. But if it first selects
    the original and just the  available  translations,  it  only
    needs  to  fetch the original text plus the language codes of
    all others with a second query, avoiding the  detoasting  and
    data transfer for all the available translations, so your new
    schema is definitely better.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #