Re: raising the default default_statistics_target

Поиск
Список
Период
Сортировка
От Josh Berkus
Тема Re: raising the default default_statistics_target
Дата
Msg-id 200403080917.52738.josh@agliodbs.com
обсуждение исходный текст
Ответ на raising the default default_statistics_target  (Neil Conway <neilc@samurai.com>)
Ответы Re: raising the default default_statistics_target  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
Neil,

> In the simple test I performed, raising the default_statistics_target 
> from 10 to 25 resulted in a 40% increase in the time to ANALYZE a 
> large table. (I picked 25 more or less at random -- would 15 or 20 be 
> better?)

I find that very interesting, since I haven't found much higher increases to 
be a proportionate penality.   For example, on an 11-column table raising 3 
columns to statistics=250 merely doubled the ANALYZE time.  I have not done 
exact timing, but would be happy to ....

>It may also be the case that for those people for whom 10 
> is an insufficient stats target, 25 is also insufficient.

It is.   I've found that "problem" queries, especially those caused by real, 
uneven distribution of data, require raising statistics to 150-400 in order 
to fix.  This is much to high a level to assign as a default.

> Any comments on whether increasing the default stats target is a good 
> idea for 7.5? (Details on the test I performed are included below)

No.   I don't think it's a good idea to raise the default for *all* columns; 
for one thing, I'd really hate to think what, say, a default stats of 100 
would do to a TEXT column with an average of 8K of data per row.

Further, in 7.5 we'll be introducing correlated stats for multi-column indexes 
(unless something's gone off with that?) which should help a lot of problem 
queries.  And change our whole emphasis on brute forcing analyze through 
increasing stats into the 100's.   

If you really want to tackle this issue, though, here's what I suggest:

1) add a GUC called default_statistics_indexed, which starts at say 100 or 50.
2) When ever the user indexes a column, automatically increase the stats
to the level in default_statistics_indexed, if they are at the level in 
default_statistics_target.

This will then give indexed columns "automatically" a somewhat higher level of 
stats analysis than other columns.   This should help a lot of "slow query" 
problems, yet effectively leave the selection of "important" columns in the 
hands of the DBA.   Make sense?

Also, another great feature in this department would be to extend the 
multi-column correlation statistics to cover foriegn keys, as a way of 
improving cross-table estimates.

Anyway, keep me in the loop on this, I have a lot of very complex databases I 
can test such issues on.

-- 
-Josh BerkusAglio Database SolutionsSan Francisco



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Rod Taylor
Дата:
Сообщение: Re: question about selecting across multiple dbs
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: Tablespaces