Re: Selecting across Multiple Tables

Поиск
Список
Период
Сортировка
От Antonio Fiol Bonnín
Тема Re: Selecting across Multiple Tables
Дата
Msg-id 3C0E246B.4050903@w3ping.com
обсуждение исходный текст
Ответ на Selecting across Multiple Tables  (Tielman J de Villiers <tjdevil@bondnet.co.za>)
Ответы Re: Selecting across Multiple Tables  (will trillich <will@serensoft.com>)
Список pgsql-general
>
>
>I am now rewriting the structure to rather insert/update/delete the details
>in 10 little tables, all linked by a unique ID.
>
I see no reason to do so unless all 10 "attributes" (a,b,c,d,...) are
very unfrequent, and otherwise tables would get far too big.

>The problem is selecting ALL details from all 10 the tables in this kind of
>format:
>
>Select a,b,c,
>       d,e,
>       f,g,
>       ...
>From 1,2,3,4,5,6,7,8,9,10
>Where 1.id = (select last_value from sequence) and
>      2.id = 1.id and
>      3.id = 1.id and
>      4.id = 1.id and
>      5.id = 1.id and
>      ...
>
>When I developed the new application, it was on postgres 7.1.3, and
>initially it caused a heavy load on the postmaster. This load was
>substantially reduced by changing the where statement to:
>
>      ...
>Where 1.id = (select last_value from sequence) and
>      2.id = (select last_value from sequence) and
>      3.id = (select last_value from sequence) and
>      4.id = (select last_value from sequence) and
>      ...
>
>When I now tried the same on the live server, running postgres 6.5.3, and
>any which way I try, I get an extremely heavy load on the postmaster -- with
>the last test I had a (cost=737.78 rows=11 width=40) when selecting only 1
>column from table 1!.
>
>Am I misunderstanding the "relational" data model completely or selecting
>wrongly?
>
I'm afraid that performance shoud be substantially better if you had a
single table with all the attributes. Except for situations like:

Contact_name, Company_name, Phone_number, Billing_address

In this case, billing address is something "company-specific", and not
"contact-specific". So I'd see something like:

Table1: Contact_name, Phone_number, id_company
Table2: id_company, Company_name, Billing_address

If and only if you intend to have multiple contacts for each company. If
you only intend to have one contact for each company, performance (and
file size) will be better if you stick everything in the same table.

Another situation that could lead to good results by splitting tables is
the following:

Original Table: Report_number, magnitudeA, magnitudeB, magnitudeC

If reports usually only contain one magnitude, either A, B, or C, you
may think of splitting that into three tables:

Table A: Report_number, magnitudeA
Table B: Report_number, magnitudeB
Table C: Report_number, magnitudeC

You may substantially improve disk usage, but it depends mostly on how
your data is organised. However, performance does not seem to me that it
may get improved by splitting tables, in general, as you will need (in
general, again) to re-join the information to access it.

Each case is different, and it depends much on your numbers.

As a general advice:

EXPLAIN SELECT ...
And then analyze precisely what is happening. A long (high cost) seq
scan is bad. But an index scan can also be bad, if it is not on the best
possible index. I have recently optimised a request that used to take
about 50 sec, and now it is sub-second, just by finding (and creating)
the missing index.

In your case, I suppose I should not need to say that if you have the
split tables you should index all of them on the ID you use to access
the data.

I hope that helps, but I could be completely wrong.

Good luck!

Antonio



В списке pgsql-general по дате отправления:

Предыдущее
От: Martijn van Oosterhout
Дата:
Сообщение: Re: Two very basic questions..
Следующее
От: "Jeff Boes"
Дата:
Сообщение: Are there theoretical performance "elbows" in tables with many records?