Re: improving GROUP BY estimation

Поиск
Список
Период
Сортировка
От Alexander Korotkov
Тема Re: improving GROUP BY estimation
Дата
Msg-id CAPpHfdswn19iSnpsukEJiAQSxK63Ewhkae=GDLuWrgvVHRP5jw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: improving GROUP BY estimation  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Ответы Re: improving GROUP BY estimation  (David Steele <david@pgmasters.net>)
Re: improving GROUP BY estimation  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
Hi, Tomas!

On Mon, Mar 21, 2016 at 2:39 PM, Alexander Korotkov <a.korotkov@postgrespro.ru> wrote:
On Fri, Mar 18, 2016 at 1:20 PM, Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
Probably a better URL to give is
http://www.adellera.it/investigations/distinct_balls/ which has a link
to the PDF version of the paper and also some supporting material.

However, while that paper is in general very clear, I don't think it
gives a very clear explanation of that particular formula, and it
doesn't explain what it represents. It merely says that if "i" can be
ignored "for some reason (e.g. i << Nr)", then that formula is an
approximation to the exact "without replacement" formula, which is the
subject of that paper.

But actually, that formula is the exact formula for the expected
number of distinct values when selecting *with replacement* from a
collection where each of the distinct values occurs an equal number of
times. So I think we should say that.

Perhaps something along the lines of:

            /*
             * Update the estimate based on the restriction selectivity.
             *
             * This uses the formula for the expected number of distinct values
             * when selecting with replacement from a collection where each of
             * the distinct values occurs an equal number of times (a uniform
             * distribution of values). This is a very close approximation to
             * the more correct method of selecting without replacement when
             * the number of distinct values is quite large --- for example,
             * see http://www.adellera.it/investigations/distinct_balls/.
             * Additionally, it also turns out to work very well even when the
             * number of distinct values is small.
             */

+1
Thank you for work on this patch. The formula you propose and explanation look great!

I think you should send a revision of patch including comments proposed by Deam Rasheed.
I'm switching patch status to waiting on author in commitfest.

------
Alexander Korotkov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Speed up Clog Access by increasing CLOG buffers
Следующее
От: Stephen Frost
Дата:
Сообщение: Re: NOT EXIST for PREPARE