Re: optimizing queries and indexes...
От | Stephan Szabo |
---|---|
Тема | Re: optimizing queries and indexes... |
Дата | |
Msg-id | Pine.BSF.4.21.0109091646370.11941-100000@megazone23.bigpanda.com обсуждение исходный текст |
Ответ на | Re: optimizing queries and indexes... ("Josh Berkus" <josh@agliodbs.com>) |
Ответы |
Re: optimizing queries and indexes...
("Josh Berkus" <josh@agliodbs.com>)
|
Список | pgsql-sql |
On Sun, 9 Sep 2001, Josh Berkus wrote: > > i have several friends that are DBA's by profession and work on > > oracle > > and/or ms sql server. they have all told me that while there are some > > general rules to follow that each database is different. > > Yup. They told ya right. > > > for example, one thing that one of my friends said is: > > select X > > from big_table > > ,little_table > > > > Generally speaking, Oracle optimizes better > > when the smaller/reducing tables are on the > > bottom and the larger tables are on the top. > > I believe SQLServer likes them in the opposite > > direction. > > Generally speaking, in PG SQL it pays to leave the join order up to the > optimizer as much as possible. Tom and Stephan have built a phenominal Just wanted to jump in to say that I've actually had nothing to do with the optimizer, I just often field questions. :) > query optimizer, and you are much more likely to slow it down if you > limit its choices. Also, for the implicit join style in Postgres the > order in which you give tables is largely ignored by the optimizer. > Order only matters in explicit joins. This bit above is very important. If explain (see below) is showing you a plan you think is suboptimal, using explicit joins to force join order is sometimes a win. The downside is that if the situation changes, this may no longer be an optimization. > 4. EXISTS and NOT EXISTS are almost always faster than IN and NOT IN for > a sub-select in the WHERE clause. As a mention, a little info is given on this in the FAQ entry 4.23 with a simple example of a conversion from IN to EXISTS. > > 5. Some Postgres-specific tips: > a. VACUUM ANALYZE regularly! > b. If you have indexes on tables with a large number of regular > deletions, you will need to drop and re-create the indexes during > periods of inactivity. This is a Postgres defect that will be fixed in > ver. 7.3 or later. For b, I think you may also be able to use REINDEX to do that, but I'm not sure. > 6. Make sure to establish Foriegn Key constraints wherever appropriate. > This will speed up joins on the constrained columns considerably, as the > parser does not have to worry about unmatched rows. I don't actually think that the optimizer takes that into account (although I could be wrong). It's possible that it will in the future however. As a note however, foreign keys *do* make insert/update on the fk table and update/delete on the pk table a small bit more expensive. It's a balancing act, just like indexes. > The real rule is: > Indexes should always be built according to the manner and order in > which they will be queried. > > Example: If you have a database in which there is a unique set of > EmployeeIDs for each Region, but not unique between Regions. Thus all > joins to Employees join on both the regionID and the employeeID. > Further, you force the user to select a region before s/he can look up > an employee. In this case, you would want to establish your indexes as > CREATE INDEX reg_emp_ix UNIQUE ( regionID, employeeID ) > Create all your joins as: > JOIN employees e ON (t.regionID = e.regionID AND t.employeeID = > e.employeeID) > And your WHERE clauses as: > WHERE regionID = $region AND employeeID = $emp > For all of these, order is immensely important. > > However, this index, while immensely useful for the above purposes, > would be useless if there are more than 2 regions and you wanted to > select on employeeID only for some reason. In that case, you would need > to add a second index on employeeID alone, as the regionID, employeeID > index would not be used in a query that filtered only by employeeID and > ignored the regionID. The general rule for this is that AFAIK that for multi-column indexes the index can only be used up to the first missing column in its order of definition. Index (a,b,c) can be used for a query on a, a & b, a & b & c, or to a limited extent on a & c (it'll only look up a in the index however). Additional things (in no particular order):Explain is your friend. Whenever you have a query that you expect to do alot on your dataset (well, except straight inserts) use explain to get an idea of what the database thinks is necessary to do the query. It's simple: EXPLAIN <query>; and is not immediately understandable, but it's enough info to start getting help from the mailing lists. :)Sometimes a sequence scan across the heap will be faster in postgresthan an index scan. If a reasonable % of rows are going to be returned, the index scan will actually often be slower. Sometimes postgres gets it wrong, but you can test that by experimentation. Indexes are a double-edged sword. Too many indexes can be bad, especially ones that aren't likely to actually be used in queries since there is a cost involved on updates and inserts for updating the indexes. Aggregates like count(*) on a single table are not done via index information or stored information and will require a sequence scan. Often people expect count to be very cheap, but it isn't always. If you're using UNION but know that there will not be overlapping rows in the two queries, use UNION ALL instead to prevent needing to do the duplicate removal checking. At least on 7.1 and below, if you have a dummy value that is very common but doesn't really pass any information (like 'N/A' for example), consider using NULL instead. The optimizer statistics can often be thrown off-kilter by values that are much more common than the real data.
В списке pgsql-sql по дате отправления: