Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize
Дата
Msg-id CAApHDvotKW3wt36gqw2M0yjhN0jm9yQsEzLu6BCpUXwurH8Gpg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize  (David Rowley <dgrowleyml@gmail.com>)
Ответы Re: BUG #17213: Wrong result from a query involving Merge Semi Join and Memoize  (David Rowley <dgrowleyml@gmail.com>)
Список pgsql-bugs
On Tue, 5 Oct 2021 at 20:05, David Rowley <dgrowleyml@gmail.com> wrote:
> I'm currently thinking over what the best fix for this should be. My
> current thinking is that all parameters that are from above the
> Memoize that are required below the Memoize must be part of the Cache
> Key. Currently, it seems, that's not the case and in the plan from the
> attached plan, i1.id is being used below the Memoize but originates
> from above it.
>
> I'll need to look into how exactly to determine around when
> get_memoize_path() is called how we determine if there's any subplan
> parameters that will be needed in the Memoize path's sub path that
> originates above the memoize node.

I was looking at fixes for this bug. I started out with seeing what
would be required to make paraminfo_get_equal_hashops() find all the
SubPlan parameters that are on the inner side of the join that come
from a location other than the inner side of the join.   The only way
I could think on how to do that would be to invent some way to
traverse Paths to look for vars from a given set of Relids. Right now
we just have ways to walk expression trees (expression_tree_walker).
That's not going to work here. Let's call this way "option 1".

An alternative way to fix the problem would be just to flush the cache
whenever a Param changes that is not part of the cache key. The
Materalize node works a bit like this, except it'll just flush the
cache whenever a param changes, since the Material cache is not
parameterized.   One fairly hefty drawback with doing it this way is
that the planner does not get a chance to take into account the fact
that the cache could be flushed quite often.  This could result in a
Nested Loop / Memoize join being chosen instead of a Hash or Merge
Join when the hash or merge join would be much more favourable. Let's
call this "option 2".

For option 1, I don't really like the sound of adding ways to traverse
paths in back branches. I'm also not too sure what the implication
would be for Custom scans.  With option 2, I'm a bit worried that this
would cause Memoize to be costed much cheaper than it actually is due
to the planner thinking the cache hit ratio will be much higher than
it actually is.

Can anyone think of a 3rd option?

I've attached a patch for option 2.

David

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: conchuela timeouts since 2021-10-09 system upgrade
Следующее
От: Juan José Santamaría Flecha
Дата:
Сообщение: Re: BUG #17247: How to avoid crating multiple Foreign keys on same column on same table.