Re: query slows down with more accurate stats

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: query slows down with more accurate stats
Дата
Msg-id 25556.1081883922@sss.pgh.pa.us
обсуждение исходный текст
Ответ на query slows down with more accurate stats  (Robert Treat <xzilla@users.sourceforge.net>)
Ответы Re: query slows down with more accurate stats  (Manfred Koizar <mkoi-pg@aon.at>)
Re: query slows down with more accurate stats  (Robert Treat <xzilla@users.sourceforge.net>)
Список pgsql-performance
Robert Treat <xzilla@users.sourceforge.net> writes:
> live=# analyze cl;
> ANALYZE
> live=# select reltuples from pg_class where relname = 'cl';
>  reltuples
> -----------
>      53580
> (1 row)
> live=# vacuum cl;
> VACUUM
> live=# select reltuples from pg_class where relname = 'cl';
>   reltuples
> -------------
>  1.14017e+06
> (1 row)

Well, the first problem is why is ANALYZE's estimate of the total row
count so bad :-( ?  I suspect you are running into the situation where
the initial pages of the table are thinly populated and ANALYZE
mistakenly assumes the rest are too.  Manfred is working on a revised
sampling method for ANALYZE that should fix this problem in 7.5 and
beyond, but for now it seems like a VACUUM FULL might be in order.

> so i guess i am wondering if there is something I should be doing to
> help get the better plan at the more accurate stats levels and/or why it
> doesn't stick with the original plan (I noticed disabling merge joins
> does seem to push it back to the original plan).

With the larger number of estimated rows it's figuring the nestloop will
be too expensive.  The row estimate for the cl scan went up from 1248
to 10546, so the estimated cost for the nestloop plan would go to about
240000 units vs 80000 for the mergejoin plan.  This is obviously off
rather badly when the true runtimes are 1.7 vs 8.1 seconds :-(.

I think this is an example of a case where we really need better
estimation of nestloop costs --- it's drastically overestimating the
relative cost of the nestloop because it's not accounting for the cache
benefits of the repeated index searches.  You could probably force the
nestloop to be chosen by lowering random_page_cost, but that's just a
kluge solution ... the real problem is the model is wrong.

I have a to-do item to work on this, and will try to bump up its
priority a bit.

            regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: Qing Zhao
Дата:
Сообщение: configure shmmax on MAC OS X
Следующее
От: Jeff Bohmer
Дата:
Сообщение: Re: configure shmmax on MAC OS X