Re: Hash join explain is broken

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Hash join explain is broken
Дата
Msg-id 20190618070027.a3sh7z7afgsdyzyy@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: Hash join explain is broken  (Andres Freund <andres@anarazel.de>)
Ответы Re: Hash join explain is broken  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
Hi,

On 2019-06-13 16:23:34 -0700, Andres Freund wrote:
> On June 13, 2019 3:38:47 PM PDT, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >Andres Freund <andres@anarazel.de> writes:
> >> I am too tired to look further into this. I suspect the only reason
> >we
> >> didn't previously run into trouble with the executor stashing
> >hashkeys
> >> manually at a different tree level with:
> >> ((HashState *) innerPlanState(hjstate))->hashkeys
> >> is that hashkeys itself isn't printed...
> >
> >TBH, I think 5f32b29c is just wrong and should be reverted for now.
> >If there's a need to handle those expressions differently, it will
> >require some cooperation from the planner not merely a two-line hack
> >in executor startup.  That commit didn't include any test case or
> >other demonstration that it was solving a live problem, so I think
> >we can leave it for v13 to address the issue.
> 
> I'm pretty sure you'd get an assertion failure if you reverted it
> (that's why it was added). So it's a bit more complicated than that.
> Unfortunately I'll not get back to work until Monday, but I'll spend
> time on this then.

Indeed, there are assertion failures when initializing the expression
with HashJoinState as parent - that's because when computing the
hashvalue for nodeHash input, we expect the slot from the node below to
be of the type that HashState returns (as that's what INNER_VAR for an
expression at the HashJoin level refers to), rather than the type of the
input to HashState.  We could work around that by marking the slots from
underlying nodes as being of an unknown type, but that'd slow down
execution.

I briefly played with the dirty hack of set_deparse_planstate()
setting dpns->inner_planstate = ps for IsA(ps, HashState), but that
seems just too ugly.

I think the most straight-forward fix might just be to just properly
split the expression at plan time. Adding workarounds for things as
dirty as building an expression for a subsidiary node in the parent, and
then modifying the subsidiary node from the parent, doesn't seem like a
better way forward.

The attached *prototype* does so.

If we go that way, we probably need to:
- Add a test for the failure case at hand
- check a few of the comments around inner/outer in nodeHash.c
- consider moving the setrefs.c code into its own function?
- probably clean up the naming scheme in createplan.c

I think there's a few more things we could do, although it's not clear
that that needs to happen in v12:
- Consider not extracting hj_OuterHashKeys, hj_HashOperators,
  hj_Collations out of HashJoin->hashclauses, and instead just directly
  handing them individually in the planner.  create_mergejoin_plan()
  already partially does that.

Greetings,

Andres Freund

PS: If I were to write hashjoin today, it sure wouldn't be as two nodes
- it seems pretty clear that the boundaries are just too fuzzy. To the
point that I wonder if it'd not be worth merging them at some point.

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: PG 12 beta 1 segfault during analyze
Следующее
От: Andres Freund
Дата:
Сообщение: Re: PG 12 beta 1 segfault during analyze