Re: Performance degradation in PostgreSQL 7.1beta3 vs 6.5.3

Поиск
Список
Период
Сортировка
От bruc@stone.congenomics.com (Robert E. Bruccoleri)
Тема Re: Performance degradation in PostgreSQL 7.1beta3 vs 6.5.3
Дата
Msg-id 200101170148.UAA62114@stone.congenomics.com
обсуждение исходный текст
Ответ на Re: Performance degradation in PostgreSQL 7.1beta3 vs 6.5.3  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Dear Tom,

> You can't afford to run a VACUUM ANALYZE even once in the lifetime of
> the table?

Not very often at best, and certainly not routinely. Some of my
tables exceed 10GB and have multiple indices.

However, to test your suggestion, I modified my performance test
script to "VACUUM ANALYZE" all the tables prior to invoking EXPLAIN,
and it improves all of the searches except this one (EXPLAIN output
also included):

explain select count(*) from comparisons_4 where code = 80003;
NOTICE:  QUERY PLAN:

Aggregate  (cost=15659.29..15659.29 rows=1 width=0) ->  Seq Scan on comparisons_4  (cost=0.00..15640.81 rows=7391
width=0)

EXPLAIN

The choice of sequential scan here takes 30x longer.

> 
> > More importantly, PostgreSQL 6.5.3 works very, very well without
> > VACUUM'ing.
> 
> 6.5 effectively assumes that "foo = constant" will select exactly one
> row, if it has no statistics to prove otherwise.  I don't regard that
> as a well-chosen default, even if it does happen to work OK for your
> application.  Selecting an indexscan when a seqscan is needed is just
> as evil as doing the reverse; what's much worse is that 6.5 will
> pick incredibly bad join plans (ie, nested loops) because it thinks
> that very little data is coming out of the scans.

I've tuned my applications to work well with these defaults (and
demonstrate to my peers that PostgreSQL is comparable to Oracle in
performance for these types of queries). I am willing to make changes
to my applications to make them work as well with 7.1, but the
performance of the query above worries me. I think the current planner
will make the wrong decision more often than the right one.  To test
this further on this table, I went through the comparisons_4 table and
found that code 13 appears the most (73912) out of 591825 rows. In
this case, 6.5.3 takes 8.56 seconds to return its answer, whereas 7.1
takes 12.11 seconds. Even in the worst case for this table, the
indexed scan is faster, but the optimizer decides on the sequential
scan. It appears that the decision point for the switch to sequential
scans isn't set properly.  To me, this is a bug. 

> If you want to revert to the 6.5 behavior without doing a VACUUM, you
> could probably get pretty close with
>     update pg_attribute set attdispersion = -1.0;

Does VACUUM ANALYZE set this column to its calculated value?  What
kinds of queries would not give 6.5 behavior if I set this column as
you suggest?

Alternatively, how hard would it be to add another SET variable like
USE_6_5_PLANNING_RULES? Personally, that would be most helpful from an
application development viewpoint because I could switch to PostgreSQL
7.1 without destroying the performance of my applications, and then
test new versions with the 7.1 planner with less potential for service
disruption.

> Stats-gathering and planning certainly does need a great deal of
> additional work, but I'm afraid that none of that will happen before
> 7.1.

As I said above, I've put a lot of effort into making my applications
work quickly with Postgres, and I'm looking forward to using the new
features that are available with version 7.1. However, I'm very
concerned that I will not be able to achieve the same performance
without detailed knowledge of the internals.  Shouldn't I be assured
that I will improve the performance of a query by creating a index on
the fields used for selecting the row? That is not the case for the
query above.

Finally, I apologize for being a little strident here. I've been
advocating for and using Postgres for four years, and it's frustrating
when a new version results in a serious and noticeable performance
degradation.

Sincerely,
Bob

+----------------------------------+------------------------------------+
| Robert E. Bruccoleri, Ph.D.      | Phone: 609 737 6383                |
| President, Congenomics, Inc.     | Fax:   609 737 7528                |
| 114 W Franklin Ave, Suite K1,4,5 | email: bruc@acm.org                |
| P.O. Box 314                     | URL:   http://www.congen.com/~bruc |
| Pennington, NJ 08534             |                                    |
+----------------------------------+------------------------------------+


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Keith Gray
Дата:
Сообщение: Boolean and Bit
Следующее
От: Bruce Momjian
Дата:
Сообщение: Slashdot and PostgreSQL