Re: optimizing queries and indexes...

Поиск
Список
Период
Сортировка
От Stephan Szabo
Тема Re: optimizing queries and indexes...
Дата
Msg-id Pine.BSF.4.21.0109091646370.11941-100000@megazone23.bigpanda.com
обсуждение исходный текст
Ответ на Re: optimizing queries and indexes...  ("Josh Berkus" <josh@agliodbs.com>)
Ответы Re: optimizing queries and indexes...  ("Josh Berkus" <josh@agliodbs.com>)
Список pgsql-sql
On Sun, 9 Sep 2001, Josh Berkus wrote:

> > i have several friends that are DBA's by profession and work on
> > oracle
> > and/or ms sql server. they have all told me that while there are some
> > general rules to follow that each database is different.
> 
> Yup.  They told ya right.
> 
> > for example, one thing that one of my friends said is:
> >    select X
> >    from big_table
> >        ,little_table
> > 
> >    Generally speaking, Oracle optimizes better
> >    when the smaller/reducing tables are on the
> >    bottom and the larger tables are on the top.
> >    I believe SQLServer likes them in the opposite
> >    direction.
> 
> Generally speaking, in PG SQL it pays to leave the join order up to the
> optimizer as much as possible.  Tom and Stephan have built a phenominal

Just wanted to jump in to say that I've actually had nothing to do with
the optimizer, I just often field questions. :)  

> query optimizer, and you are much more likely to slow it down if you
> limit its choices.  Also, for the implicit join style in Postgres the
> order in which you give tables is largely ignored by the optimizer.
> Order only matters in explicit joins.

This bit above is very important.  If explain (see below) is showing you a
plan you think is suboptimal, using explicit joins to force join order
is sometimes a win.  The downside is that if the situation changes, this
may no longer be an optimization.

> 4. EXISTS and NOT EXISTS are almost always faster than IN and NOT IN for
> a sub-select in the WHERE clause.  

As a mention, a little info is given on this in the FAQ entry 4.23 with a
simple example of a conversion from IN to EXISTS.

> 
> 5. Some Postgres-specific tips:
>     a. VACUUM ANALYZE regularly!
>     b. If you have indexes on tables with a large number of regular
> deletions, you will need to drop and re-create the indexes during
> periods of inactivity.  This is a Postgres defect that will be fixed in
> ver. 7.3 or later.

For b, I think you may also be able to use REINDEX to do that, but I'm not
sure.

> 6. Make sure to establish Foriegn Key constraints wherever appropriate.
> This will speed up joins on the constrained columns considerably, as the
> parser does not have to worry about unmatched rows.

I don't actually think that the optimizer takes that into account
(although I could be wrong).  It's possible that it will in the future
however.  As a note however, foreign keys *do* make insert/update on the
fk table and update/delete on the pk table a small bit more
expensive.  It's a balancing act, just like indexes.

> The real rule is:
>     Indexes should always be built according to the manner and order in
> which they will be queried.
> 
> Example:  If you have a database in which there is a unique set of
> EmployeeIDs for each Region, but not unique between Regions.  Thus all
> joins to Employees join on both the regionID and the employeeID.
> Further, you force the user to select a region before s/he can look up
> an employee.  In this case, you would want to establish your indexes as
> CREATE INDEX reg_emp_ix UNIQUE ( regionID, employeeID )
> Create all your joins as:
> JOIN employees e ON (t.regionID = e.regionID AND t.employeeID =
> e.employeeID)
> And your WHERE clauses as:
> WHERE regionID = $region AND employeeID = $emp
> For all of these, order is immensely important.
> 
> However, this index, while immensely useful for the above purposes,
> would be useless if there are more than 2 regions and you wanted to
> select on employeeID only for some reason.  In that case, you would need
> to add a second index on employeeID alone, as the regionID, employeeID
> index would not be used in a query that filtered only by employeeID and
> ignored the regionID.

The general rule for this is that AFAIK that for multi-column indexes the
index can only be used up to the first missing column in its order of
definition.
Index (a,b,c) can be used for a query on a, a & b, a & b & c, or to a
limited extent on a & c (it'll only look up a in the index however).


Additional things (in no particular order):Explain is your friend.  Whenever you have a query that you expect to do
alot on your dataset (well, except straight inserts) use explain to get
an idea of what the database thinks is necessary to do the query.  It's
simple: EXPLAIN <query>; and is not immediately understandable, but it's
enough info to start getting help from the mailing lists. :)Sometimes a sequence scan across the heap will be faster in
postgresthan
 
an index scan.  If a reasonable % of rows are going to be returned, the
index scan will actually often be slower.  Sometimes postgres gets it
wrong, but you can test that by experimentation.
Indexes are a double-edged sword. Too many indexes can be bad, especially
ones that aren't likely to actually be used in queries since there is a
cost involved on updates and inserts for updating the indexes.
Aggregates like count(*) on a single table are not done via index
information or stored information and will require a sequence scan.  Often
people expect count to be very cheap, but it isn't always.
If you're using UNION but know that there will not be overlapping rows in
the two queries, use UNION ALL instead to prevent needing to do the
duplicate removal checking.
At least on 7.1 and below, if you have a dummy value that is very common
but doesn't really pass any information (like 'N/A' for example), consider
using NULL instead.  The optimizer statistics can often be thrown
off-kilter by values that are much more common than the real data.




В списке pgsql-sql по дате отправления:

Предыдущее
От: "Josh Berkus"
Дата:
Сообщение: Re: optimizing queries and indexes...
Следующее
От: "Josh Berkus"
Дата:
Сообщение: Re: optimizing queries and indexes...