O(N^2) when building multi-column MCV lists
От | Tomas Vondra |
---|---|
Тема | O(N^2) when building multi-column MCV lists |
Дата | |
Msg-id | 20190618214313.xrrufnkewq3x63lv@development обсуждение исходный текст |
Ответы |
Re: O(N^2) when building multi-column MCV lists
|
Список | pgsql-hackers |
Hi, When experimenting with multi-column MCV lists with statistic target set to high value (e.g. 10k), I've realized there's an O(N^2) issue in statext_mcv_build() when computing base frequencies. We do this: for (i = 0; i < nitems; i++) { ... item->base_frequency = 1.0; for (j = 0; j < numattrs; j++) { int count = 0; int k; for (k = 0; k < ngroups; k++) { if (multi_sort_compare_dim(j, &groups[i], &groups[k], mss) == 0) count += groups[k].count; } ... } } That is, for each item on the MCV list, we walk through all the groups (for each dimension independently) to determine the total frequency of the value. With many groups (which can easily happen for high statistics target), this can easily get very expensive. I think the best solution here is to pre-compute frequencies for values in all dimensions, and then just access that instead of looping through the groups over and over. IMHO this is something we should fix for PG12, so I'll put that on the open items list, and produce a fix shortly. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
В списке pgsql-hackers по дате отправления: