Re: Subplan result caching

Поиск

Список

Период

Сортировка

От	David Rowley
Тема	Re: Subplan result caching
Дата	9 июля 2018 г. 15:08:07
Msg-id	CAKJS1f-kAk1cGVvzg9TXCLhPsxx_oFVOrTGSR5yTRXKWntTVFA@mail.gmail.com обсуждение исходный текст
Ответ на	Subplan result caching (Heikki Linnakangas <hlinnaka@iki.fi>)
Ответы	Re: Subplan result caching (Robert Haas <robertmhaas@gmail.com>) Re: Subplan result caching (Michael Paquier <michael@paquier.xyz>) Re: Subplan result caching (Andy Fan <zhihui.fan1213@gmail.com>)
Список	pgsql-hackers

Дерево обсуждения

On 23 May 2018 at 21:31, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
> I've been working on a patch to add a little cache to SubPlans, to speed up
> queries with correlated subqueries, where the same subquery is currently
> executed multiple times with the same parameters. The idea is to cache the
> result of the subplan, with the correlation vars as the cache key.

Hi,

This seems like an interesting area to make improvements, so I've
signed up to review the patch.

From looking at the code I see that the caching is being done inside
nodeSubplan.c. I don't think this is the right approach to the
problem. The problem exists for any parameterized path, so I think a
more general approach would be much better.

We already have Materialize nodes to cache the results of an entire
subplan, and this seems to have quite a bit in common with that, only
we'd want to cache multiple results with a key to determine which
result set should be returned. Due to the similarities with
Materialize, I think that the cache should be a node itself and not
bury the cache logic in some other node type that's meant for some
other purpose.

"LazyMaterialize" seems like a good option for a name. It seems better
than "LazyHash" since you may not want to restrict it to a hash table
based cache in the future.  A binary search tree may be a good option
for types that cannot be hashed.

Materialize nodes are injected above the inner side node of MergeJoins
based on cost, so I think this node type could just do the same. Maybe
something like estimate_num_groups(<exprs being compared to params>) /
path->rows is below some defined constant, perhaps something like 0.5.
Although experimentation would be required. It might be good to take
into account some other cost factors too.

I imagine we'd want to only allow this optimisation for hashjoinable
types. This seems pretty natural since your cache implementation is a
hash table, so, of course, we're going to need a hash function.

Wondering your thoughts on this idea.

I'll mark as waiting on author in the meantime.

It's great to see someone working on this.

-- 
 David Rowley                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Kyotaro HORIGUCHI
Дата: 09 июля 2018 г., 15:07:24
Сообщение: Re: Let's remove DSM_IMPL_NONE.

Следующее

От: Kyotaro HORIGUCHI
Дата: 09 июля 2018 г., 15:14:13
Сообщение: Re: Non-reserved replication slots and slot advancing

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Subplan result caching

Предыдущее

Следующее