Re: When to bump up statistics?

Поиск
Список
Период
Сортировка
От Josh Berkus
Тема Re: When to bump up statistics?
Дата
Msg-id 200411191232.59600.josh@agliodbs.com
обсуждение исходный текст
Ответ на When to bump up statistics?  (Dawid Kuroczko <qnex42@gmail.com>)
Список pgsql-performance
Dawid,

> I wonder what are the implications of using this statement,
> I know by using, say n=100, ANALYZE will take more time,
> pg_statistics will be bigger, planner will take longer time,
> on the other hand it will make better decisions... Etc, etc.

Yep.   And pg_statistics will need to be vacuumed more often.

> Is bumping up statistics is only useful for indexed columns?

No.   It's potentially useful for any queried column.

> 1) huge table with huge number of distinct values (_almost_
>     unique ;))

Yes.

> 2) huge table with relatively equally distributed values
>     (like each value is in between, say, 30-50 rows).

Not usually.

> 3) huge table with unequally distributed values (some
>     values are in 1-5 rows, some are in 1000-5000 rows).

Yes.

> 4) huge table with small number values (around ~100
>     distinct values, equally or uneqally distributed).

Not usually, especially if they are equally distributed.

> 5) boolean column.

Almost never, just as it is seldom useful to index a boolean column.

> I think SET STATISTICS 100 is very useful for case with
> unequally distributed values, but I wonder what about
> the other cases.  And as a side note -- what are the
> reasonable bounds for statistics (between 10 and 100?)

Oh, no, I've used values up to 500 in production, and we've tested up to the
max on DBT-3.    In my experience, if the default (10) isn't sufficient, you
often have to go up to > 250 to get a different plan.

> What are the runtime implications of setting statistics
> too large -- how much can it affect queries?

It won't affect select queries.   It will affect ANALYZE time (substantially
in the aggregate) and maintenance on the pg_statistics table.

> And finally -- how other RDBMS and RDBM-likes deal
> with this issue? :)

Most don't allow such fine-tuned adjustment.   MSSQL, for example, allows only
setting it per-table or maybe even database-wide, and on that platform it
doesn't seem to have much effect on query plans.    Oracle prefers to use
HINTS, which are a brute-force method to manage query plans.

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Steinar H. Gunderson"
Дата:
Сообщение: Re: index use
Следующее
От: Josh Berkus
Дата:
Сообщение: Re: index use