Re: Multiple selects versus join

Поиск
Список
Период
Сортировка
От scott.marlowe
Тема Re: Multiple selects versus join
Дата
Msg-id Pine.LNX.4.33.0208200215410.2333-100000@css120.ihs.com
обсуждение исходный текст
Ответ на Re: Multiple selects versus join  (Adrian 'Dagurashibanipal' von Bidder <avbidder@fortytwo.ch>)
Список pgsql-general
On 20 Aug 2002, Adrian 'Dagurashibanipal' von Bidder wrote:

> On Tue, 2002-08-20 at 04:00, Jean-Christian Imbeault wrote:
> > Is it more efficient to do multiple selects on multiple tables to get
> > data or do a join of those tables and extract the data from the
> > resulting temp table?
> >
> > For example:
> >
> > select name from a where id=1;
> > select pub_date from b where id=1;
> > pub_id = select publisher_id from c where id=1;
> > select pub_name from d where id = pub_id;
> >
> > or (I don't know the syntax for join so this is just pseudo-sql);
> >
> > select name, pub_date, pub_name from join((join(A, B, C) on id), D) on
> > pub_id where id = 1;
> >
> > All my tables are related to each other so I am wondering which is more
> > efficient, do multile selects or joining the tables. (Joining the table
> > would return just one row).
>
> Yo!
>
> If you *know* that you will get just one row, you know more than the
> database system can know. Generally: a join of tables with n and m rows
> can return at most n*m rows, so joins are not very efficient compared to
> multiple selects. A join is basically a select on both tables with
> additional processing done. If you do multiple selects and can do the
> join 'by hand' because you know much about the data you expect, this
> will almost always be faster.

Not true.  What if you're joining on fields that have a unique constraint
on them?  There can be only one (to paraphrase Highlander) so you know how
many rows you're gonna get back.  It's quite possible you're searching for
a single employee record and all the tables joining that one (even if
there are dozens) are all joined on a unique field that references the
main employee table.  In that case, it WILL be faster to let the database
do it in one query, unless it has a brain dead planner.

The real issue is how many rows are you tossing around versus how many is
the database tossing around.  So, just like you'd pick the table that
would return the fewest rows (most likely anyway) to process first by
hand, so you should build your query using join syntax that forces
postgresql to use that table first.  Indexes make all the difference on
<10% table being returned too.

Basically, if you're gonna return many of the rows in a table, then it may
well be faster to do it in code, but if you know it will be <10% it's much
faster to let the database do it.

Of course, the real reason to let the database do it is to make sure it
gets done right.  It's way likelier that you or I will write code that
makes some fatal mistake we don't see than it is that the database will do
that.  If you make a mistake in your join syntax you'll know right up
front, when you get 0 or 1,000,000,000,000 rows   :-)


В списке pgsql-general по дате отправления:

Предыдущее
От: Adrian 'Dagurashibanipal' von Bidder
Дата:
Сообщение: Re: Explain Output made Colorful
Следующее
От: "Darrin Domoney"
Дата:
Сообщение: Event recurrence - in database or in application code ????