Re: BUG #18234: Nested Loop joint strategy is ignored for a tiny table joined with UNION ALL of two filtered parts

Поиск
Список
Период
Сортировка
От Dmytro Astapov
Тема Re: BUG #18234: Nested Loop joint strategy is ignored for a tiny table joined with UNION ALL of two filtered parts
Дата
Msg-id CAFQUnFh=+KQAtjtNAWT14UQOJEBE_nSVAR6JEXfTc0wWRoU6sg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #18234: Nested Loop joint strategy is ignored for a tiny table joined with UNION ALL of two filtered parts  (Dmytro Astapov <dastapov@gmail.com>)
Список pgsql-bugs
Hello,  Tom

Thank you for the pointers and keywords! Just to be thorough, I instrumented the body of is_safe_append_member and confirmed that the "bad" query is indeed hitting this case and returning "false", preventing the pull-up:

while (IsA(jtnode, FromExpr))
{
    if (jtnode->quals != NULL)
         return false;

I've also tried to search the mailing list archives for the previous instances of this problem, now that I know less obvious keywords to use in the search. Results, for posterity (in chronological order):
  • Report from Mar 2012, where it is mentioned that "Sooner or later I'd like to relax the restriction that appendrel members can't have extra WHERE clauses, but don't hold your breath waiting..."
  • Report from Aug 2012, with conclusion "this will probably get fixed someday, but I wouldn't recommend holding your breath for it."
  • Report from 2015,which did not include the exact steps to reproduce, but is clearly the same problem
  • Bug report from October 2023 (bug #18252) which I missed in my search before bug submission. It seems like Richard Guo has an initial attempt to fix it, and  asks for "more thoughts", and as of today there was no followup. Perhaps this is the "previous discussions that identified some semantic issues with trying to just attach WHERE clauses to appendrel members" that Tom is referring to? 
However, apart from bug #18252 I was unable to find any other discussions about whether WHERE clauses could be attached to appendrel members or not (though, of course, I can't claim that I did a 100% thorough job, me not being a postgres developer and all).

Given that this limitation seems to have existed since at least 2012, if not earlier, it seems fair to assume that this is unlikely to be fixed - unless the patch mentioned in #18252 would be deemed worth submitting by Richard Guo?

However, if the fix does not seem worthwhile, perhaps it would be prudent to mention this limitation in the documentation for UNION, along the lines of "when using UNION ALL to combine subqueries that include WHERE conditions, the resulting query would always be optimized separately and you are likely to miss many/all optimization opportunities provided by indices on the tables that you UNION". What do you think? 

I think that in the absence of documentation this limitation is very hard to discover. As I mentioned in the original bug report, the optimizer would happily push WHERE conditions into UNION ALL, which makes it look like the equivalent step, done manually, should also work - and yet it does not, and makes the execution plan unexplainably and massively worse.

--
Best regards, Dmytro


On Fri, Dec 8, 2023 at 1:19 AM Dmytro Astapov <dastapov@gmail.com> wrote:
Hi!

To quickly address the valid point about non-self-contained bug reports, here is the export of the schema from the DB Fiddle:

/* Huge inheritance-partitioned table */
create table huge(id bigint, filter_out bool);
create table huge_partition1(id bigint, filter_out bool);
create table huge_partition2(id bigint, filter_out bool);
alter table huge_partition1 inherit huge;
alter table huge_partition2 inherit huge;

insert into huge_partition1(id, filter_out)
  select id, mod(id,7) = 0
  from generate_series(1,100000) id;

insert into huge_partition2(id, filter_out)
  select id, mod(id,7) = 0
  from generate_series(1,100000) id;

create index on huge_partition1(id);
create index on huge_partition2(id);
analyze huge_partition1;
analyze huge_partition2;

/* Medium inheritance-partitioned table (same structure, but 100x smaller) */
create table medium(id bigint, filter_out bool);
create table medium_partition1(id bigint, filter_out bool);
create table medium_partition2(id bigint, filter_out bool);
alter table medium_partition1 inherit medium;
alter table medium_partition2 inherit medium;

insert into medium_partition1(id, filter_out)
  select id, mod(id,7) = 0
  from generate_series(1,1000) id;

insert into medium_partition2(id, filter_out)
  select id, mod(id,7) = 0
  from generate_series(1,1000) id;

create index on medium_partition1(id);
create index on medium_partition2(id);
analyze medium_partition1;
analyze medium_partition2;

/* Tiny table of just 5 values */
create table tiny(id bigint);
insert into tiny(id) values (100),(200),(300),(400),(500);
analyze tiny;

/* Views that UNION ALL all non-filtered rows of HUGE and MEDIUM */
create view vw_broken as
  select id from huge where filter_out
  union all
  select id from medium where filter_out;

create view vw_not_broken as
  select id,filter_out from (
    select id,filter_out from huge
    union all
    select id,filter_out from medium
  ) q
  where filter_out;

/* This query does NOT use nested loops unexpectedly */
explain select * from tiny join vw_broken on tiny.id = vw_broken.id;

QUERY PLAN
---------------------------------------------------------------------------------------------
Hash Join  (cost=1.11..3798.30 rows=712 width=16)
  Hash Cond: (huge.id = tiny.id)
  ->  Append  (cost=0.00..3683.32 rows=28466 width=8)
        ->  Append  (cost=0.00..3222.91 rows=28181 width=8)
              ->  Seq Scan on huge huge_1  (cost=0.00..0.00 rows=1 width=8)
                    Filter: filter_out
              ->  Seq Scan on huge_partition1 huge_2  (cost=0.00..1541.00 rows=13987 width=8)
                    Filter: filter_out
              ->  Seq Scan on huge_partition2 huge_3  (cost=0.00..1541.00 rows=14193 width=8)
                    Filter: filter_out
        ->  Append  (cost=0.00..33.42 rows=285 width=8)
              ->  Seq Scan on medium medium_1  (cost=0.00..0.00 rows=1 width=8)
                    Filter: filter_out
              ->  Seq Scan on medium_partition1 medium_2  (cost=0.00..16.00 rows=142 width=8)
                    Filter: filter_out
              ->  Seq Scan on medium_partition2 medium_3  (cost=0.00..16.00 rows=142 width=8)
                    Filter: filter_out
  ->  Hash  (cost=1.05..1.05 rows=5 width=8)
        ->  Seq Scan on tiny  (cost=0.00..1.05 rows=5 width=8)


/* This query DOES use nested loops as expected */
explain select * from tiny join vw_not_broken on tiny.id = vw_not_broken.id;

QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Nested Loop  (cost=0.00..151.53 rows=712 width=17)
  ->  Seq Scan on tiny  (cost=0.00..1.05 rows=5 width=8)
  ->  Append  (cost=0.00..30.04 rows=6 width=9)
        ->  Seq Scan on huge  (cost=0.00..0.00 rows=1 width=9)
              Filter: (filter_out AND (tiny.id = id))
        ->  Index Scan using huge_partition1_id_idx on huge_partition1 huge_1  (cost=0.29..8.31 rows=1 width=9)
              Index Cond: (id = tiny.id)
              Filter: filter_out
        ->  Index Scan using huge_partition2_id_idx on huge_partition2 huge_2  (cost=0.29..8.31 rows=1 width=9)
              Index Cond: (id = tiny.id)
              Filter: filter_out
        ->  Seq Scan on medium  (cost=0.00..0.00 rows=1 width=9)
              Filter: (filter_out AND (tiny.id = id))
        ->  Index Scan using medium_partition1_id_idx on medium_partition1 medium_1  (cost=0.28..6.69 rows=1 width=9)
              Index Cond: (id = tiny.id)
              Filter: filter_out
        ->  Index Scan using medium_partition2_id_idx on medium_partition2 medium_2  (cost=0.28..6.69 rows=1 width=9)
              Index Cond: (id = tiny.id)
              Filter: filter_out

--
Best regards, Dmytro


On Thu, Dec 7, 2023 at 9:53 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
PG Bug reporting form <noreply@postgresql.org> writes:
> Summary of the issue: for a (5-row recordset) JOIN (massive partitioned
> recordset indexed by id) USING (id), the (Nested Loop over 5 values)
> strategy is completely ignored, and Hash Join or Merge Join is done instead,
> which does SeqScan over the "massive recordset".

> Reproduction in DB Fiddle:
> https://www.db-fiddle.com/f/sJUUWNgW7pqPWcJwihVoj5/1 (this demonstrates both
> the bad behaviour and a way to work around it)

We are generally not too happy with non-self-contained bug reports.
Once that DB Fiddle entry disappears, this bug report will be useless.
However ...

> 1)The massive recordset on the right side of the JOIN must come from the
> UNION ALL of two parts, both of which have a filter, like this view in my
> reproduction:

> create view vw_broken as
>   select id from huge where filter_out
>   union all
>   select id from medium where filter_out;

I suspect the WHERE clauses trigger the problem because the resulting
sub-selects can't be pulled up to become an "appendrel", per
is_safe_append_member:

     * It's only safe to pull up the child if its jointree contains exactly
     * one RTE, else the AppendRelInfo data structure breaks. The one base RTE
     * could be buried in several levels of FromExpr, however.  Also, if the
     * child's jointree is completely empty, we can pull up because
     * pull_up_simple_subquery will insert a single RTE_RESULT RTE instead.
     *
     * Also, the child can't have any WHERE quals because there's no place to
     * put them in an appendrel.  (This is a bit annoying...)

That means the sub-selects will be planned independently and there's
no chance to consider the nestloop-with-inner-indexscan plan you are
hoping for.

This is a longstanding wart, but improving matters would require some
fairly painstaking work.  The "appendrel" mechanism is core to both
traditional inheritance and partitioning; I don't recommend trying
to blow it up and start over.  I vaguely recall previous discussions
that identified some semantic issues with trying to just attach
WHERE clauses to appendrel members, but it was a long time ago and
the details escape me.

                        regards, tom lane

В списке pgsql-bugs по дате отправления:

Предыдущее
От: PG Bug reporting form
Дата:
Сообщение: BUG #18236: Backend processing a parallel query terminates badly when postmaster killed with SIGKILL
Следующее
От: PG Bug reporting form
Дата:
Сообщение: BUG #18237: C:\Users\админ\AppData\Local\Temp\postgresql_installer_7ddb2d40c9