Re: strange slow query - lost lot of time somewhere

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: strange slow query - lost lot of time somewhere
Дата
Msg-id CAApHDvp87+kCufDkiq=7n-BD8kYxgYzvQLb_GkDK0Gs9TAQYYQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: strange slow query - lost lot of time somewhere  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: strange slow query - lost lot of time somewhere  ("David G. Johnston" <david.g.johnston@gmail.com>)
Re: strange slow query - lost lot of time somewhere  (Justin Pryzby <pryzby@telsasoft.com>)
Список pgsql-hackers
On Tue, 3 May 2022 at 11:02, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > On Mon, 2 May 2022 at 21:00, Pavel Stehule <pavel.stehule@gmail.com> wrote:
> >> I found a query that is significantly slower with more memory
>
> > If it was work_mem you increased, it seems strange that the plan would
> > switch over to using a Nested Loop / Memoize plan.
>
> Yeah, there's something unexplained there.
>
> I think that the most probable explanation for the symptoms is that
> cost_memoize_rescan is computing some insane value for est_entries,
> causing ExecInitMemoize to allocate-and-zero a huge hash table,
> which ExecEndMemoize then frees again.  Neither of those steps
> gets counted into any plan node's runtime, but EXPLAIN's total
> execution time will include them.  An insane value for est_entries
> could perhaps go along with a cost misestimate that convinces the
> planner to include the memoize even though it seems pointless.

That seems pretty unlikely to me. est_entries is based on the minimum
value of the expected number of total cache entries and the ndistinct
value. ndistinct cannot be insane here as ndistinct is never going to
be higher than the number of calls, which is the row estimate from the
outer side of the join. That's 91 in both cases here.   As far as I
can see, that's just going to make a table of 128 buckets.

See estimate_num_groups_incremental() at:

/*
* We don't ever want to return an estimate of zero groups, as that tends
* to lead to division-by-zero and other unpleasantness.  The input_rows
* estimate is usually already at least 1, but clamp it just in case it
* isn't.
*/
input_rows = clamp_row_est(input_rows);


> I spent some time studying cost_memoize_rescan, and the only
> conclusions I arrived at were that the variable names are poorly
> chosen and the comments are unhelpful.  For instance, one would
> think that est_entry_bytes is the expected size of one cache entry,
> but it seems to actually be the total space that would be occupied
> if the whole input relation were loaded into the cache.

I think you've misunderstood. It *is* the estimated size of a single
entry. I think you might be going wrong in assuming "tuples" is the
expected tuples from all rescans of the inner side of the join. It's
actually from a single scan.  I can add a comment there to help make
that clear.

> And
> the est_cache_entries computation seems nonsensical; if it does
> make sense, the comment sure doesn't illuminate why.  So I am
> quite prepared to buy into the idea that cost_memoize_rescan is
> producing bogus answers, but it's hard to tell what it's coming out
> with in this example.  Too bad EXPLAIN doesn't print est_entries.

I'm wishing I put the initial hash table size and the final hash table
size in EXPLAIN + EXPLAIN ANALYZE now. Perhaps it's not too late for
v15 to do that so that it might help us figure things out in the
future.

I'm open to making improvements to the comments in that area. I do
remember spending quite a bit of time trying to make things as clear
as possible as it is fairly complex what's going on there.

David



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: testclient.exe installed under MSVC
Следующее
От: David Rowley
Дата:
Сообщение: Re: strange slow query - lost lot of time somewhere