[MASSMAIL]Bad estimation for NOT IN clause with big null fraction

Поиск
Список
Период
Сортировка
От Donghang Lin
Тема [MASSMAIL]Bad estimation for NOT IN clause with big null fraction
Дата
Msg-id CAA=D8a0BfJZzGBTvwsmBmBZ9_W65j0BhshObCTbera0ppeQUjw@mail.gmail.com
обсуждение исходный текст
Список pgsql-hackers
Hi hackers

Discussion[1] and the relevant commit[2] improved the selectivity calculation for IN/NOT IN.

This is the current logic for NOT IN selectivity calculation and it loops over the array elements.

else
{
    s1 = s1 * s2;
    if (isInequality)
         s1disjoint += s2 - 1.0;
}

By calculating s2 for each array element, it calls neqsel and returns 1 - eqsel - nullfrac.
If I expand the s1disjoint calculation for a NOT IN (2,5,8) clause, 
It eventually becomes 1 - eqsel(2) - eqsel(5) - eqsel(8) - 3*nullfrac. 
If nullfrac is big, s1disjoint will be less than 0 quickly when the array has more elements, 
and the selectivity algorithm falls back to the one prior to commit[2] which had bad estimation for NOT IN as well. 

It seems to me that nullfrac should be subtracted only once. Is it feasible that we have a new variable s1disjoint2 
that add back nullfrac when we get back the result for each s2 and subtract it once at the end of the loop as a 2nd heuristic?
We then maybe prefer s1disjoint2 over s1disjoint and then s1?

Donghang Lin
(ServiceNow)

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Langote
Дата:
Сообщение: Re: remaining sql/json patches
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: LogwrtResult contended spinlock