Re: serious under-estimation of n_distinct for clustered distributions

Поиск
Список
Период
Сортировка
От Peter Geoghegan
Тема Re: serious under-estimation of n_distinct for clustered distributions
Дата
Msg-id CAEYLb_VDRSye1zieFcuKH2ZnCaht11O5DZX8-cWRs9CKs0M-cg@mail.gmail.com
обсуждение исходный текст
Ответ на serious under-estimation of n_distinct for clustered distributions  (Stefan Andreatta <s.andreatta@synedra.com>)
Ответы Re: serious under-estimation of n_distinct for clustered distributions  (Stefan Andreatta <s.andreatta@synedra.com>)
Re: serious under-estimation of n_distinct for clustered distributions  (Stefan Andreatta <s.andreatta@synedra.com>)
Список pgsql-performance
On 29 December 2012 20:57, Stefan Andreatta <s.andreatta@synedra.com> wrote:
> Now, the 2005 discussion goes into great detail on the advantages and
> disadvantages of this algorithm, particularly when using small sample sizes,
> and several alternatives are discussed. I do not know whether anything has
> been changed after that, but I know that the very distinct problem, which I
> will focus on here, still persists.

It's a really hard problem to solve satisfactorily. It's a problem
that has been studied in much detail. Yes, the algorithm used is still
the same. See the comments within src/backend/commands/analyze.c (IBM
Research Report RJ 10025 is referenced there).

The general advice here is:

1) Increase default_statistics_target for the column.

2) If that doesn't help, consider using the following DDL:

alter table foo alter column bar set ( n_distinct = 5.0);

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


В списке pgsql-performance по дате отправления:

Предыдущее
От: Stefan Andreatta
Дата:
Сообщение: serious under-estimation of n_distinct for clustered distributions
Следующее
От: Stefan Andreatta
Дата:
Сообщение: Re: serious under-estimation of n_distinct for clustered distributions