Re: Index only scan sometimes switches to sequential scan for small amount of rows

Поиск
Список
Период
Сортировка
От Jeff Janes
Тема Re: Index only scan sometimes switches to sequential scan for small amount of rows
Дата
Msg-id CAMkU=1xvQYr1Uw7vDKqO5KODWa_796f6SwzMEnhEYJQRd33Q7w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Index only scan sometimes switches to sequential scan for small amount of rows  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: Index only scan sometimes switches to sequential scan for small amount of rows
Список pgsql-performance
On Thu, Mar 26, 2015 at 5:44 AM, Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On 26.3.2015 08:48, Jeff Janes wrote:
>
> OK, this is starting to look like a long-standing bug to me.
>
> If it only sees 3 distinct values, and all three are present at least
> twice, it throws all of them into the MCV list. But if one of those 3
> were present just once, then it tests them to see if they qualify.
> The test for inclusion is that it has to be present more than once,
> and that it must be "over-represented" by 25%.
>
> Lets say it sampled 30000 rows and found 29,900 of one value, 99 of
> another, and 1 of a third.
>
> But that turns into the second one needing to be present 12,500 times.
> The average value is present 10,000 times (30,000 samples with 3
> distinct values) and 25 more than that is 12,500.  So it excluded.
>
> It seems to me that a more reasonable criteria is that it must be
> over-represented 25% compared to the average of all the remaining values
> not yet accepted into the MCV list.  I.e. all the greater ones should be
> subtracted out before computing the over-representation threshold.

That might work IMO, but maybe we should increase the coefficient a bit
(say, from 1.25 to 2), not to produce needlessly long MCV lists.

That wouldn't work here, because at the point of decision the value present 99 times contributes half the average, so the average is 50, and of course it can't possibly be twice of that.

I have a patch, but is there a way to determine how it affects a wide variety of situations?  I guess run `make installcheck`, then analyze, then dump pg_stats, with the patch and without the patch, and then compare the dumpsj?
 



> It is also grossly inconsistent with the other behavior.  If they are
> "29900; 98; 2" then all three go into the MCV.

Isn't the mincount still 12500? How could all three get into the MCV?

If all observed values are observed at least twice, it takes a different path through the code.  It just keeps them all in the MCV list. That is what is causing the instability for the OP.  If the 3rd most common is seen twice, then all three are kept.  If it is seen once, then only the most common is kept.  See if statements at 2494 and 2585

else if (toowide_cnt == 0 && nmultiple == ndistinct)

        if (track_cnt == ndistinct ....

Cheers,

Jeff
Вложения

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Index only scan sometimes switches to sequential scan for small amount of rows
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: Index only scan sometimes switches to sequential scan for small amount of rows