Re: [PATCH] Never convert n_distinct < 2 values to a ratio when computing stats

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: [PATCH] Never convert n_distinct < 2 values to a ratio when computing stats
Дата
Msg-id CA+TgmoYViEaSck=VjAgtRypoXiePjgQ+Mx4tROZZfYbYKfuA+w@mail.gmail.com
обсуждение исходный текст
Ответ на [PATCH] Never convert n_distinct < 2 values to a ratio when computing stats  (Dan McGee <dan@archlinux.org>)
Ответы Re: [PATCH] Never convert n_distinct < 2 values to a ratio when computing stats
Список pgsql-hackers
On Sat, Mar 24, 2012 at 12:17 AM, Dan McGee <dan@archlinux.org> wrote:
> This is a bit of a corner case in all honesty, but if you have a short
> table (under 20 rows), the 10% heuristic used that decides whether
> distinct values scale with the row count will result in rather odd
> values for stadistinct in pg_statistic, such as '-0.2' or '-0.666667',
> rather than the expected '2'. Additionally, this can cause only one of
> {t, f} to appear in the most common values array.
>
> Does this actually affect query planning in any way? Probably not, but
> it is extremely odd to look at pg_stats for these columns, and the
> solution seems easy.

But the stats aren't there to be looked at, but rather to guide query
planning.  If at execution time there are 100 rows in the table,
should we still assume that there are only 2 distinct values in the
table, or that it's gone up to about 50 distinct values?  It's hard to
say, but there's no apparent reason to think that the number of
distinct values will scale up for a large table but not a small table.

The bit about maybe not getting both t and f as MCVs on a Boolean does
seem a little worrying, but I'm not sure whether it actually affects
query planning in a materially negative way.  Can you demonstrate a
case where it matters?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: PostgreSQL optimisations on Linux machines with more than 24 cores
Следующее
От: Robert Haas
Дата:
Сообщение: Re: Gsoc2012 Idea --- Social Network database schema