Re: improving a badly optimized query

Поиск
Список
Период
Сортировка
От Brandon Craig Rhodes
Тема Re: improving a badly optimized query
Дата
Msg-id w67kf5eh73.fsf@guinness.ts.gatech.edu
обсуждение исходный текст
Ответ на Re: improving a badly optimized query  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: improving a badly optimized query  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-general
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Brandon Craig Rhodes <brandon@oit.gatech.edu> writes:
> > (a) (slow)
> >         SELECT * FROM role_keys NATURAL LEFT JOIN role_person
> >                 WHERE person = 28389;
>
> > (b) (fast)
> >         SELECT * FROM role_keys NATURAL JOIN role_person
> >                 WHERE person = 28389;
>
> > ... the rows created for unmatched role_keys rows by the LEFT JOIN
> > are guaranteed to be thrown out by the WHERE clause ...  [so] the
> > LEFT JOIN could be reduced to a JOIN
>
> It seems like a useful optimization, but I have an uncomfortable
> feeling that there's something wrong with it.  Can you point to a
> rigorous proof that this is okay in complicated contexts such as
> nested outer joins?

We can optimize the above query simply by observing that the result of
a LEFT JOIN includes both the rows that would have been produced by a
simple JOIN, and those rows of the left table that did not match any
from the right.  That is:

  SELECT * FROM role_keys NATURAL LEFT JOIN role_person WHERE person = 28389;

is the equivalent of:

  SELECT * FROM role_keys NATURAL JOIN role_person WHERE person = 28389
  UNION
  SELECT * FROM
    (SELECT role, CAST(NULL AS integer) AS person FROM role_keys) AS rp
      WHERE person = 28389
    EXCEPT
    SELECT role, CAST(NULL AS integer) AS person FROM role_person;

Since the parenthesized sub-select generates only NULL person fields,
the WHERE condition will yield an empty result, so the entire query
can be reduced to:

  SELECT * FROM role_keys NATURAL JOIN role_person WHERE person = 28389;

Similar technique will yield transforms for RIGHT and OUTER JOINs that
can reveal similar optimizations.

I should state our original intention:

We wanted to gain simplicity and save space by inserting rows into
role_person only for those roles associated with people, leaving all
other roles unmentioned; but obviously a view that contains all roles
and mentions their people requires a LEFT JOIN, which kills query
performance.  There are three alternatives:

  (1) Mention every role in the role_person table, using NULLs for
  those roles without associated people, and then our view can be
  implemented with a simple JOIN rather than an expensive LEFT JOIN.
  (This wastes space and we find it logically unattractive, as well as
  more difficult to maintain.)

  (2) Build queries on top of the actual tables, instead of the view,
  choosing for each query the least sufficient JOIN.  (This destroys
  the usefulness of views, and makes our queries more complex since we
  must hand-craft each JOIN to fit the query.)

  (3) Include the above optimization into at least our own copy of
  PostgreSQL.  (This will be our first experience dealing directly
  with the query optimizer, and we are not sure where to begin.)

Please advise,
--
Brandon Craig Rhodes                         http://www.rhodesmill.org/brandon
Georgia Tech                                            brandon@oit.gatech.edu

В списке pgsql-general по дате отправления:

Предыдущее
От: Stephan Szabo
Дата:
Сообщение: Re: Lack of use of indexes
Следующее
От: "scott.marlowe"
Дата:
Сообщение: Re: Changing the type of a column in an already populated