Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Поиск

Список

Период

Сортировка

От	Simon Riggs
Тема	Re: Hybrid Hash/Nested Loop joins and caching results from subplans
Дата	20 мая 2020 г. 15:56:34
Msg-id	CANP8+jLkh_g7Eu-=WoNP7P-Mi2Ye=Kt-5kceho80Xnd8xWrvfw@mail.gmail.com обсуждение исходный текст
Ответ на	Hybrid Hash/Nested Loop joins and caching results from subplans (David Rowley <dgrowleyml@gmail.com>)
Ответы	Re: Hybrid Hash/Nested Loop joins and caching results from subplans (David Rowley <dgrowleyml@gmail.com>) Re: Hybrid Hash/Nested Loop joins and caching results from subplans (Andy Fan <zhihui.fan1213@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On Wed, 20 May 2020 at 12:44, David Rowley <dgrowleyml@gmail.com> wrote:

Hackers,

Over on [1], Heikki mentioned about the usefulness of caching results
from parameterized subplans so that they could be used again for
subsequent scans which have the same parameters as a previous scan.
On [2], I mentioned that parameterized nested loop joins could see
similar gains with such a cache. I suggested there that instead of
adding code that only allows this to work for subplans, that instead,
we add a new node type that can handle the caching for us. We can
then just inject that node type in places where it seems beneficial.

Very cool

I've attached a patch which implements this. The new node type is
called "Result Cache". I'm not particularly wedded to keeping that
name, but if I change it, I only want to do it once. I've got a few
other names I mind, but I don't feel strongly or confident enough in
them to go and do the renaming.

How the caching works:

First off, it's only good for plugging in on top of parameterized
nodes that are rescanned with different parameters. The cache itself
uses a hash table using the simplehash.h implementation. The memory
consumption is limited to work_mem. The code maintains an LRU list and
when we need to add new entries but don't have enough space to do so,
we free off older items starting at the top of the LRU list. When we
get a cache hit, we move that entry to the end of the LRU list so that
it'll be the last to be evicted.

When should we cache:

For nested loop joins, the decision is made purely based on cost.

I thought the main reason to do this was the case when the nested loop subplan was significantly underestimated and we realize during execution that we should have built a hash table. So including this based on cost alone seems to miss a trick.

The patch does rely heavily on good ndistinct estimates.

Exactly. We know we seldom get those with many-way joins.

So +1 for adding this technique. My question is whether it should be added as an optional facility of a parameterised sub plan, rather than an always-needed full-strength node. That way the choice of whether to use it can happen at execution time once we notice that we've been called too many times.

Simon Riggs http://www.2ndQuadrant.com/
Mission Critical Databases

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Atsushi Torikoshi
Дата: 20 мая 2020 г., 15:56:04
Сообщение: Re: Is it useful to record whether plans are generic or custom?

Следующее

От: Vik Fearing
Дата: 20 мая 2020 г., 16:04:20
Сообщение: Re: SEARCH and CYCLE clauses

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Hybrid Hash/Nested Loop joins and caching results from subplans

Предыдущее

Следующее