Re: strange slow query - lost lot of time somewhere

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: strange slow query - lost lot of time somewhere
Дата
Msg-id CAApHDvqk+MDmSHhVwfhnsXYDvHoED-f0xO8kqfMRrg0n__EtaQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: strange slow query - lost lot of time somewhere  ("David G. Johnston" <david.g.johnston@gmail.com>)
Список pgsql-hackers
On Tue, 3 May 2022 at 15:22, David G. Johnston
<david.g.johnston@gmail.com> wrote:
> Plugging in:
> n = 2,000
> e = 500
> c = 10,000
>
> proper = 5%
> incorrect = 25%
>
> But of the 10,000 calls we will receive, the first 2,000 will be misses while 2,000 of the remaining 8,000 will be
hits,due to sharing 2,000 distinct groups among the available inventory of 500 (25% of 8,000 is 2,000).  2,000 hits in
10,000calls yields 20%. 
>
> I believe the correct formula to be:
>
> ((calls - ndistinct) / calls) * (est_entries / ndistinct) = hit_ratio
> .80 * .25 = .20

I think you're correct here.  The formula should be that.  However,
two things; 1) this being incorrect is not the cause of the original
problem reported on this thread, and 2) There's just no way we could
consider fixing this in v15, let alone back patch it to v14.

Maybe we should open a new thread about this and put an entry in the
first CF for v16 under bugs and come back to it after we branch.
Thinking the cache hit ratio is lower than it actually is going to be
will reduce the chances of the planner switching to a Nested Loop /
Memoize plan vs a Hash or Merge Join plan.

I was already fairly concerned that Memoize could cause performance
regressions when the ndistinct value or expected cache entry size is
underestimated or the outer side rows are overestimated.  What I've
got to calculate the cache hit ratio does seem incorrect given what
you're showing, however it does add an element of pessimism and
reduces the chances of a bad plan being picked when work_mem is too
low to cache all entries.  Swapping it out for your formula seems like
it would increase the chances of a Memoize plan being picked when the
row, ndistinct and cache entry size estimates are correct, however, it
could also increase the chance of a bad plan when being picked in
cases where the estimates are incorrect.

My problem with changing this now would be that we already often
perform Nested Loop joins when a Hash or Merge join would be a better
option. I'd hate to take us in a direction where we make that problem
even worse.

David



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Amit Kapila
Дата:
Сообщение: Re: bogus: logical replication rows/cols combinations
Следующее
От: Andres Freund
Дата:
Сообщение: Re: failures in t/031_recovery_conflict.pl on CI