Re: serious under-estimation of n_distinct for clustered distributions

Поиск

Список

Период

Сортировка

От	Peter Geoghegan
Тема	Re: serious under-estimation of n_distinct for clustered distributions
Дата	14 января 2013 г. 14:34:21
Msg-id	CAEYLb_Xjr=PKUmPANKemEwcaFwN0AzJXWY2BogcXLDWg7x-atw@mail.gmail.com обсуждение исходный текст
Ответ на	Re: serious under-estimation of n_distinct for clustered distributions (Stefan Andreatta <s.andreatta@synedra.com>)
Список	pgsql-performance

Дерево обсуждения

On 14 January 2013 07:35, Stefan Andreatta <s.andreatta@synedra.com> wrote:
> The source of these troubles is the sampling method employed in
> src/backend/commands/analyze.c. Judging from Tom Lane's comment for the
> original implementation in 2004 this has never been thought to be perfect.
> Does anybody see a chance to improve that part? Should this discussion be
> taken elsewhere? Is there any input from my side that could help?

Numerous alternative algorithms exist, as this has been an area of
great interest for researchers for some time. Some alternatives may
even be objectively better than Haas & Stokes. A quick peruse through
the archives shows that Simon Riggs once attempted to introduce an
algorithm described in the paper "A Block Sampling Approach to
Distinct Value Estimation":

http://www.stat.washington.edu/research/reports/1999/tr355.pdf

However, the word on the street is that it may be worth pursuing some
of the ideas described by the literature in just the last few years.
I've often thought that this would be an interesting problem to work
on. I haven't had time to pursue it, though. You may wish to propose a
patch on the pgsql-hackers mailing list.

--
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

В списке pgsql-performance по дате отправления:

Предыдущее

От: Stefan Andreatta
Дата: 14 января 2013 г., 10:35:38
Сообщение: Re: serious under-estimation of n_distinct for clustered distributions

Следующее

От: Boszormenyi Zoltan
Дата: 14 января 2013 г., 17:29:06
Сообщение: Re: Two Necessary Kernel Tweaks for Linux Systems

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: serious under-estimation of n_distinct for clustered distributions

Предыдущее

Следующее