Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Дата
Msg-id CAApHDvo2acQSogMCa3hB7moRntXWHO8G+WSwhyty2+c8vYRq3A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (Andres Freund <andres@anarazel.de>)
Ответы Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (David Rowley <dgrowleyml@gmail.com>)
Re: Hybrid Hash/Nested Loop joins and caching results from subplans  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-hackers
On Wed, 26 Aug 2020 at 03:52, Andres Freund <andres@anarazel.de> wrote:
>
> On 2020-08-25 20:48:37 +1200, David Rowley wrote:
> > However, given the correct planner choice, there will never be
> > a gross slowdown due to having the extra node.
>
> There'll be a significant reduction in increase in performance.

So I did a very rough-cut change to the patch to have the caching be
part of Nested Loop.  It can be applied on top of the other 3 v7
patches.

For the performance, the test I did results in the performance
actually being reduced from having the Result Cache as a separate
node.  The reason for this is mostly because Nested Loop projects.
Each time I fetch a MinimalTuple from the cache, the patch will deform
it in order to store it in the virtual inner tuple slot for the nested
loop. Having the Result Cache as a separate node can skip this step as
it's result tuple slot is a TTSOpsMinimalTuple, so we can just store
the cached MinimalTuple right into the slot without any
deforming/copying.

Here's an example of a query that's now slower:

select count(*) from hundredk hk inner join lookup100 l on hk.one = l.a;

In this case, the original patch does not have to deform the
MinimalTuple from the cache as the count(*) does not require any Vars
from it.   With the rough patch that's attached, the MinimalTuple is
deformed in during the transformation during ExecCopySlot(). The
slowdown exists no matter which column of the hundredk table I join to
(schema in [1]).

Performance comparison is as follows:

v7 (Result Cache as a separate node)
postgres=# explain (analyze, timing off) select count(*) from hundredk
hk inner join lookup l on hk.one = l.a;
 Execution Time: 652.582 ms

v7 + attached rough patch
postgres=# explain (analyze, timing off) select count(*) from hundredk
hk inner join lookup l on hk.one = l.a;
 Execution Time: 843.566 ms

I've not yet thought of any way to get rid of the needless
MinimalTuple deform.  I suppose the cache could just have already
deformed tuples, but that requires more memory and would result in a
worse cache hit ratios for workloads where the cache gets filled.

I'm open to ideas to make the comparison fairer.

(Renamed the patch file to .txt to stop the CFbot getting upset with me)

David

[1] https://www.postgresql.org/message-id/CAApHDvrPcQyQdWERGYWx8J+2DLUNgXu+fOSbQ1UscxrunyXyrQ@mail.gmail.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: New default role- 'pg_read_all_data'
Следующее
От: David Rowley
Дата:
Сообщение: Re: Hybrid Hash/Nested Loop joins and caching results from subplans