Re: optimizing queries and indexes...

Поиск
Список
Период
Сортировка
От Josh Berkus
Тема Re: optimizing queries and indexes...
Дата
Msg-id web-115943@davinci.ethosmedia.com
обсуждение исходный текст
Ответ на optimizing queries and indexes...  ("Robert J. Sanford, Jr." <rsanford@nolimitsystems.com>)
Ответы Re: optimizing queries and indexes...  (Stephan Szabo <sszabo@megazone23.bigpanda.com>)
Список pgsql-sql
Robert,

I'd advise you to buy a book, but frankly I don't know a good one on DB
performance optimization.  The DB optimizers I've met tend to guard
their secrets closely.

Suggestions, anyone?

> i have several friends that are DBA's by profession and work on
> oracle
> and/or ms sql server. they have all told me that while there are some
> general rules to follow that each database is different.

Yup.  They told ya right.

> for example, one thing that one of my friends said is:
>    select X
>    from big_table
>        ,little_table
>
>    Generally speaking, Oracle optimizes better
>    when the smaller/reducing tables are on the
>    bottom and the larger tables are on the top.
>    I believe SQLServer likes them in the opposite
>    direction.

Generally speaking, in PG SQL it pays to leave the join order up to the
optimizer as much as possible.  Tom and Stephan have built a phenominal
query optimizer, and you are much more likely to slow it down if you
limit its choices.  Also, for the implicit join style in Postgres the
order in which you give tables is largely ignored by the optimizer.
Order only matters in explicit joins.

That being said, there are a few practices that can help:

1. When joining a limited subset of an exceptionally large table to
several smaller tables, consider using a subselect for the large table.
In some cases this will speed up query execution.  Sometimes it won't.
Example:

2. When doing several Inner (normal) Joins and several Outer Joins, do
the inner joins first and the outer joins second, as the inner joins
should limit the result set that is being matched for the outer joins.

3. All joins and where conditions, ideally, should be executed on
indexed columns.

4. EXISTS and NOT EXISTS are almost always faster than IN and NOT IN for
a sub-select in the WHERE clause.

5. Some Postgres-specific tips:
    a. VACUUM ANALYZE regularly!
    b. If you have indexes on tables with a large number of regular
deletions, you will need to drop and re-create the indexes during
periods of inactivity.  This is a Postgres defect that will be fixed in
ver. 7.3 or later.

6. Make sure to establish Foriegn Key constraints wherever appropriate.
This will speed up joins on the constrained columns considerably, as the
parser does not have to worry about unmatched rows.

7. Set Unique indexes on all unique columns.  This also helps the
parser.

> and also:
>    Generally speaking indexes should be built
>    with column names in the order of higher
>    cardinality.  I frequently screw this up
>    because it runs counter to the way you think
>    about building your joins.

This may be true in Oracle (opinions?) but is is neither true in
Postgres nor true in SQL Server (or Frontbase).  If you build your
indexes according to this rule you will be dissapointed in the results.

The real rule is:
    Indexes should always be built according to the manner and order in
which they will be queried.

Example:  If you have a database in which there is a unique set of
EmployeeIDs for each Region, but not unique between Regions.  Thus all
joins to Employees join on both the regionID and the employeeID.
Further, you force the user to select a region before s/he can look up
an employee.  In this case, you would want to establish your indexes as
CREATE INDEX reg_emp_ix UNIQUE ( regionID, employeeID )
Create all your joins as:
JOIN employees e ON (t.regionID = e.regionID AND t.employeeID =
e.employeeID)
And your WHERE clauses as:
WHERE regionID = $region AND employeeID = $emp
For all of these, order is immensely important.

However, this index, while immensely useful for the above purposes,
would be useless if there are more than 2 regions and you wanted to
select on employeeID only for some reason.  In that case, you would need
to add a second index on employeeID alone, as the regionID, employeeID
index would not be used in a query that filtered only by employeeID and
ignored the regionID.

One of my clients didn't know any of the above and established a number
of indexes on tables > 3,000,000 records using columns in alphabetical
order (!?!).  They then had to call me, and re-ordering the index
columns cut the delay in single-row queries (especially DELETE queries)
by 80%.

> so, i am hopeful that there is some sort of postgresql performance
> faq
> for queries.

Somebody wanna re-organize the above with more examples?

-Josh Berkus

______AGLIO DATABASE SOLUTIONS___________________________
                                       Josh Berkus
  Complete information technology      josh@agliodbs.com
   and data management solutions       (415) 565-7293
  for law firms, small businesses        fax 621-2533
    and non-profit organizations.      San Francisco

Вложения

В списке pgsql-sql по дате отправления:

Предыдущее
От: "postgresql"
Дата:
Сообщение: Re:
Следующее
От: Stephan Szabo
Дата:
Сообщение: Re: optimizing queries and indexes...