Обсуждение: COALESCE() query yield different result with MJ vs. NLJ/HJ

Поиск
Список
Период
Сортировка

COALESCE() query yield different result with MJ vs. NLJ/HJ

От
Qingqing Zhou
Дата:
The symptom is that the same join query yield different results with
MJ and NLJ/HJ.  Here is a repro:

---
create table t1(a int);create table t2(b int);
insert into t1 values(10); insert into t2 values(2);
analyze t1; analyze t2;
set enable_mergejoin=on; set enable_nestloop=off; set enable_hashjoin=off;
explain analyze select a, b from t1 left join  t2 on coalesce(a, 1) =
coalesce(b,1)  where (coalesce(b,1))>0
set enable_mergejoin=off; set enable_nestloop=on; set enable_hashjoin=off;
explain analyze select a, b from t1 left join  t2 on coalesce(a, 1) =
coalesce(b,1)  where (coalesce(b,1))>0
---

A possible explanation is that in fix_join_expr_mutator(), we optimize
with the case that if child node already compute an expression then
upper node shall reuse it. In MJ, as coalesce() already computed in
sort node, thus the NULL is directly used for ExecQual(>0) for join
filter.

If we take out this optimization the problem is solved but may looks
like an overkill. What's a better fix?

Thanks,
Qingqing



Re: COALESCE() query yield different result with MJ vs. NLJ/HJ

От
Tom Lane
Дата:
Qingqing Zhou <zhouqq.postgres@gmail.com> writes:
> [ this fails: ]
> set enable_mergejoin=on; set enable_nestloop=off; set enable_hashjoin=off;
> explain analyze select a, b from t1 left join  t2 on coalesce(a, 1) =
> coalesce(b,1)  where (coalesce(b,1))>0

Ugh.  The core of the problem is a mistaken assumption that "b" below the
outer join means the same thing as "b" above it.  I've suspected for years
that the planner might someday have to explicitly distinguish the two
meanings, but at least up to now we've not really gotten burnt by failing
to make the distinction.

> A possible explanation is that in fix_join_expr_mutator(), we optimize
> with the case that if child node already compute an expression then
> upper node shall reuse it. In MJ, as coalesce() already computed in
> sort node, thus the NULL is directly used for ExecQual(>0) for join
> filter.
> If we take out this optimization the problem is solved but may looks
> like an overkill. What's a better fix?

Indeed, removing that optimization altogether seems likely to break
things, not to mention being pretty inefficient.  Maybe pending a
proper fix (which I'm afraid will entail major planner redesign)
we could refuse to match anything more complex than a Var or
PlaceHolderVar if it's bubbling up from the nullable side of an
outer join.
        regards, tom lane