Re: [HACKERS] multivariate statistics (v19)

Поиск
Список
Период
Сортировка
От Dean Rasheed
Тема Re: [HACKERS] multivariate statistics (v19)
Дата
Msg-id CAEZATCXn1jJSAR0RYtki3ByFiDnkCGZTqvp7GT_+oAj8cmWDvA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [HACKERS] multivariate statistics (v19)  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: [HACKERS] multivariate statistics (v19)  (David Fetter <david@fetter.org>)
Список pgsql-hackers
On 11 February 2017 at 01:17, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
> Thanks for the feedback, I'll fix this. I've allowed myself to be a bit
> sloppy because the number of attributes in the statistics is currently
> limited to 8, so the overflows are currently not an issue. But it doesn't
> hurt to make it future-proof, in case we change that mostly artificial limit
> sometime in the future.
>

Ah right, so it can't overflow at present, but it's neater to have an
overflow-proof algorithm.

Thinking about the exactness of the division steps is quite
interesting. Actually, the order of the multiplying factors doesn't
matter as long as the divisors are in increasing order. So in both my
proposal:
   result = 1   for (i = 1; i <= k; i++)       result = (result * (n-k+i)) / i;

and David's proposal, which is equivalent but has the multiplying
factors in the opposite order, equivalent to:
   result = 1   for (i = 1; i <= k; i++)       result = (result * (n-i+1)) / i;

the divisions are exact at each step. The first time through the loop
it divides by 1 which is trivially exact. The second time it divides
by 2, having multiplied by 2 consecutive factors, one of which is
therefore guaranteed to be divisible by 2. The third time it divides
by 3, having multiplied by 3 consecutive factors, one of which is
therefore guaranteed to be divisible by 3, and so on.

My approach originally seemed more logical to me because of the way it
derives from the recurrence relation binomial(n, k) = binomial(n-1,
k-1) * n / k, but they both work fine as long as they have suitable
overflow checks.

It's also interesting that descriptions of this algorithm tend to talk
about setting k to min(k, n-k) at the start as an optimisation step,
as I did in fact, whereas it's actually more than that -- it helps
prevent unnecessary intermediate overflows when k > n/2. Of course,
that's not a worry for the current use of this function, but it's good
to have a robust algorithm.

Regards,
Dean



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: [HACKERS] Re: \if, \elseif, \else, \endif (was Re: PSQL commands: \quit_if,\quit_unless)
Следующее
От: David Rowley
Дата:
Сообщение: Re: [HACKERS] Improve OR conditions on joined columns (common starschema problem)