Re: Improving worst-case merge join performance with often-null foreign key

Поиск
Список
Период
Сортировка
От Richard Guo
Тема Re: Improving worst-case merge join performance with often-null foreign key
Дата
Msg-id CAMbWs4-d5joRdrrgT_tkGLroTwg2On9B_q-jQmxPpCmbOY-e6w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Improving worst-case merge join performance with often-null foreign key  (Richard Guo <guofenglinux@gmail.com>)
Список pgsql-hackers

On Sun, Apr 23, 2023 at 5:29 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Sat, Apr 22, 2023 at 11:21 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Steinar Kaldager <steinar.kaldager@oda.com> writes:
> First-time potential contributor here. We recently had an incident due
> to a sudden 1000x slowdown of a Postgres query (from ~10ms to ~10s)
> due to a join with a foreign key that was often null. We found that it
> was caused by a merge join with an index scan on one join path --
> whenever the non-null data happened to be such that the merge join
> couldn't be terminated early, the index would proceed to scan all of
> the null rows and filter each one out individually. Since this was an
> inner join, this was pointless; the nulls would never have matched the
> join clause anyway.

Hmm.  I don't entirely understand why the existing stop-at-nulls logic
in nodeMergejoin.c didn't fix this for you.  Maybe somebody has broken
that?  See the commentary for MJEvalOuterValues/MJEvalInnerValues.

I think it's just because the MergeJoin didn't see a NULL foo_id value
from test_bar tuples because all such tuples are removed by the filter
'test_bar.active', thus it does not have a chance to stop at nulls.

# select count(*) from test_bar where foo_id is null and active;
 count
-------
     0
(1 row)

Instead, the index scan on test_bar will have to scan all the tuples
with NULL foo_id because none of them satisfies the qual clause.

BTW, in Steinar's case the query runs much faster with nestloop with a
parameterized inner path, since test_foo is small and test_bar is very
large, and there is an index on test_bar.foo_id.  You can see that by
"set enable_mergejoin to off":

# EXPLAIN (costs off)
SELECT SUM(test_foo.id) FROM test_bar, test_foo WHERE test_bar.foo_id = test_foo.id AND test_foo.active AND test_bar.active;
                          QUERY PLAN
--------------------------------------------------------------
 Aggregate
   ->  Nested Loop
         ->  Seq Scan on test_foo
               Filter: active
         ->  Index Scan using test_bar_foo_id_idx on test_bar
               Index Cond: (foo_id = test_foo.id)
               Filter: active
(7 rows)

In my box the total cost and execution time of mergejoin vs nestloop
are:

                        mergejoin       nestloop

Cost estimate           1458.40         12355.15
Actual (best of 3)      3644.685 ms     13.114 ms

So it seems we have holes in cost estimate for mergejoin or nestloop
with parameterized inner path, or both.

Thanks
Richard

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Julien Rouhaud
Дата:
Сообщение: Re: pg_upgrade and logical replication
Следующее
От: "Andrey M. Borodin"
Дата:
Сообщение: Re: New committers: Nathan Bossart, Amit Langote, Masahiko Sawada