O(N^2) when building multi-column MCV lists

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема O(N^2) when building multi-column MCV lists
Дата
Msg-id 20190618214313.xrrufnkewq3x63lv@development
обсуждение исходный текст
Ответы Re: O(N^2) when building multi-column MCV lists  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers
Hi,

When experimenting with multi-column MCV lists with statistic target set
to high value (e.g. 10k), I've realized there's an O(N^2) issue in
statext_mcv_build() when computing base frequencies.

We do this:

    for (i = 0; i < nitems; i++)
    {
        ...
        item->base_frequency = 1.0;
        for (j = 0; j < numattrs; j++)
        {
            int    count = 0;
            int    k;

            for (k = 0; k < ngroups; k++)
            {
                if (multi_sort_compare_dim(j, &groups[i], &groups[k], mss) == 0)
                    count += groups[k].count;
            }

            ...
        }
    }


That is, for each item on the MCV list, we walk through all the groups
(for each dimension independently) to determine the total frequency of
the value.

With many groups (which can easily happen for high statistics target),
this can easily get very expensive.

I think the best solution here is to pre-compute frequencies for values
in all dimensions, and then just access that instead of looping through
the groups over and over.

IMHO this is something we should fix for PG12, so I'll put that on the
open items list, and produce a fix shortly.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services 



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Multivariate MCV list vs. statistics target
Следующее
От: Tom Lane
Дата:
Сообщение: PL/Python fails on new NetBSD/PPC 8.0 install