Re: Subplan result caching
От | David Rowley |
---|---|
Тема | Re: Subplan result caching |
Дата | |
Msg-id | CAKJS1f-kAk1cGVvzg9TXCLhPsxx_oFVOrTGSR5yTRXKWntTVFA@mail.gmail.com обсуждение исходный текст |
Ответ на | Subplan result caching (Heikki Linnakangas <hlinnaka@iki.fi>) |
Ответы |
Re: Subplan result caching
(Robert Haas <robertmhaas@gmail.com>)
Re: Subplan result caching (Michael Paquier <michael@paquier.xyz>) Re: Subplan result caching (Andy Fan <zhihui.fan1213@gmail.com>) |
Список | pgsql-hackers |
On 23 May 2018 at 21:31, Heikki Linnakangas <hlinnaka@iki.fi> wrote: > I've been working on a patch to add a little cache to SubPlans, to speed up > queries with correlated subqueries, where the same subquery is currently > executed multiple times with the same parameters. The idea is to cache the > result of the subplan, with the correlation vars as the cache key. Hi, This seems like an interesting area to make improvements, so I've signed up to review the patch. From looking at the code I see that the caching is being done inside nodeSubplan.c. I don't think this is the right approach to the problem. The problem exists for any parameterized path, so I think a more general approach would be much better. We already have Materialize nodes to cache the results of an entire subplan, and this seems to have quite a bit in common with that, only we'd want to cache multiple results with a key to determine which result set should be returned. Due to the similarities with Materialize, I think that the cache should be a node itself and not bury the cache logic in some other node type that's meant for some other purpose. "LazyMaterialize" seems like a good option for a name. It seems better than "LazyHash" since you may not want to restrict it to a hash table based cache in the future. A binary search tree may be a good option for types that cannot be hashed. Materialize nodes are injected above the inner side node of MergeJoins based on cost, so I think this node type could just do the same. Maybe something like estimate_num_groups(<exprs being compared to params>) / path->rows is below some defined constant, perhaps something like 0.5. Although experimentation would be required. It might be good to take into account some other cost factors too. I imagine we'd want to only allow this optimisation for hashjoinable types. This seems pretty natural since your cache implementation is a hash table, so, of course, we're going to need a hash function. Wondering your thoughts on this idea. I'll mark as waiting on author in the meantime. It's great to see someone working on this. -- David Rowley http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
В списке pgsql-hackers по дате отправления:
Следующее
От: Kyotaro HORIGUCHIДата:
Сообщение: Re: Non-reserved replication slots and slot advancing