Обсуждение: Missing constant propagation in planner on hash quals causes joinslowdown
Hi Hackers,
By optimising our application I stumbled over the join quals used very often in our application.
In general this concerns datasets, which are subdivided into chunks, like years, seasons (here half a year), multiple
tenantsin OLTP system etc. 
In these cases many tables are joined only to data of the same chunk, identified always by a separating column in the
table(here: xx_season). 
(I tested it on PG 12.0 on Windows 64 bit, but similar results on older stable releases and other OS)
Here is the test case, also in the attached file:
(I choose to join 2 tables with 2 seasons (2 and 3) of about 1 million of records for every season. I put some
randomnessin the table creation to simulate the normal situation in OLTP systems) 
----------------------------------- Source start
drop table if exists tmaster;
create table tmaster (
id_t1 integer,
t1_season integer,
t1_id_t2 integer,
t1_value integer,
t1_cdescr varchar,
primary key (id_t1)
);
--
select setseed (0.34512);
insert into tmaster
select
 inum
,iseason
,row_number () over () as irow
,irandom
,'TXT: '||irandom::varchar
from (
select
 inum::integer
,((inum>>20)+2)::integer as iseason
,inum::integer + (500000*random())::integer as irandom
from generate_series (1,(1<<21)) as inum
order by irandom
)qg
;
alter table tmaster add constraint uk_master_season_id unique (t1_season,id_t1);
drop table if exists tfact;
create table tfact (
id_t2 integer,
t2_season integer,
t2_value integer,
t2_cdescr varchar,
primary key (id_t2)
);
--
select setseed (-0.76543);
insert into tfact
select
 qg.*
,'FKT: '||irandom::varchar
from (
select
 inum::integer
,((inum>>20)+2)::integer as iseason
,inum::integer + (500000*random())::integer as irandom
from generate_series (1,(1<<21)) as inum
order by irandom
)qg
;
alter table tfact add constraint uk_fact_season_id unique (t2_season,id_t2);
-----------------
-- slower:
explain (analyze, verbose, costs, settings, buffers)
select *
from tmaster
left join tfact on id_t2=t1_id_t2 and t2_season=t1_season
where t1_season=3
;
-- faster by setting a constant in left join on condition:
explain (analyze, verbose, costs, settings, buffers)
select *
from tmaster
left join tfact on id_t2=t1_id_t2 and t2_season=3 --t1_season
where t1_season=3
;
----------------------------------- Source end
The results for the first query:
explain (analyze, verbose, costs, settings, buffers)
select *
from tmaster
left join tfact on id_t2=t1_id_t2 and t2_season=t1_season
where t1_season=3
;
                                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=53436.01..111610.15 rows=1046129 width=52) (actual time=822.784..2476.573 rows=1048576 loops=1)
   Output: tmaster.id_t1, tmaster.t1_season, tmaster.t1_id_t2, tmaster.t1_value, tmaster.t1_cdescr, tfact.id_t2,
tfact.t2_season,tfact.t2_value, tfact.t2_cdescr 
   Inner Unique: true
   Hash Cond: ((tmaster.t1_season = tfact.t2_season) AND (tmaster.t1_id_t2 = tfact.id_t2))
   Buffers: shared hit=2102193, temp read=10442 written=10442
   ->  Index Scan using uk_master_season_id on public.tmaster  (cost=0.43..32263.38 rows=1046129 width=28) (actual
time=0.008..565.222rows=1048576 loops=1) 
         Output: tmaster.id_t1, tmaster.t1_season, tmaster.t1_id_t2, tmaster.t1_value, tmaster.t1_cdescr
         Index Cond: (tmaster.t1_season = 3)
         Buffers: shared hit=1051086
   ->  Hash  (cost=31668.49..31668.49 rows=1043473 width=24) (actual time=820.960..820.961 rows=1048576 loops=1)
         Output: tfact.id_t2, tfact.t2_season, tfact.t2_value, tfact.t2_cdescr
         Buckets: 524288 (originally 524288)  Batches: 4 (originally 2)  Memory Usage: 28673kB
         Buffers: shared hit=1051107, temp written=4316
         ->  Index Scan using uk_fact_season_id on public.tfact  (cost=0.43..31668.49 rows=1043473 width=24) (actual
time=0.024..598.648rows=1048576 loops=1) 
               Output: tfact.id_t2, tfact.t2_season, tfact.t2_value, tfact.t2_cdescr
               Index Cond: (tfact.t2_season = 3)
               Buffers: shared hit=1051107
 Settings: effective_cache_size = '8GB', random_page_cost = '1', temp_buffers = '32MB', work_mem = '32MB'
 Planning Time: 0.627 ms
 Execution Time: 2502.702 ms
(20 rows)
and for the second one:
explain (analyze, verbose, costs, settings, buffers)
select *
from tmaster
left join tfact on id_t2=t1_id_t2 and t2_season=3 --t1_season
where t1_season=3
;
                                                                            QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Hash Left Join  (cost=50827.33..106255.38 rows=1046129 width=52) (actual time=758.086..2313.175 rows=1048576 loops=1)
   Output: tmaster.id_t1, tmaster.t1_season, tmaster.t1_id_t2, tmaster.t1_value, tmaster.t1_cdescr, tfact.id_t2,
tfact.t2_season,tfact.t2_value, tfact.t2_cdescr 
   Inner Unique: true
   Hash Cond: (tmaster.t1_id_t2 = tfact.id_t2)
   Buffers: shared hit=2102193, temp read=9024 written=9024
   ->  Index Scan using uk_master_season_id on public.tmaster  (cost=0.43..32263.38 rows=1046129 width=28) (actual
time=0.009..549.793rows=1048576 loops=1) 
         Output: tmaster.id_t1, tmaster.t1_season, tmaster.t1_id_t2, tmaster.t1_value, tmaster.t1_cdescr
         Index Cond: (tmaster.t1_season = 3)
         Buffers: shared hit=1051086
   ->  Hash  (cost=31668.49..31668.49 rows=1043473 width=24) (actual time=756.125..756.125 rows=1048576 loops=1)
         Output: tfact.id_t2, tfact.t2_season, tfact.t2_value, tfact.t2_cdescr
         Buckets: 524288  Batches: 4  Memory Usage: 18711kB
         Buffers: shared hit=1051107, temp written=4317
         ->  Index Scan using uk_fact_season_id on public.tfact  (cost=0.43..31668.49 rows=1043473 width=24) (actual
time=0.025..584.652rows=1048576 loops=1) 
               Output: tfact.id_t2, tfact.t2_season, tfact.t2_value, tfact.t2_cdescr
               Index Cond: (tfact.t2_season = 3)
               Buffers: shared hit=1051107
 Settings: effective_cache_size = '8GB', random_page_cost = '1', temp_buffers = '32MB', work_mem = '32MB'
 Planning Time: 0.290 ms
 Execution Time: 2339.651 ms
(20 rows)
By replacing the =t1_season with =3 the query took about 160 ms less or about 7 percent.
Both queries are logically equivalent. The planner correctly identifies the Index Cond: (tfact.t2_season = 3) for
selectingfrom the index uk_fact_season_id. 
But in the slower query the outer hash condition still hashes with the column t1_season and t2_season as in
Hash Cond: ((tmaster.t1_season = tfact.t2_season) AND (tmaster.t1_id_t2 = tfact.id_t2)).
This can only be detected with explain analyze verbose when the hash cond are shown.
The first query notation with and t2_season=t1_season is much more natural, as it requires only one numerical constant
toget good query Speed (often many fact tables are joined). 
The inclusion of the xx_season quals reduces the processed dataset and helps also when the seasons columns are used for
listpartitioning of all the involved tables. 
When omitting it, the whole fact table will be joined.
To me it seems that the "constantness" is not propagated to all equivalence columns and not considered in hash joining.
Unfortunely I am not in the position to write a patch, so I would appreciate any help to get this optimization
realized.
Much thanks in advance
Hans Buschmann
			
		Вложения
On Fri, Oct 18, 2019 at 03:40:34PM +0000, Hans Buschmann wrote:
>
> ...
>
>Both queries are logically equivalent. The planner correctly identifies
>the Index Cond: (tfact.t2_season = 3) for selecting from the index
>uk_fact_season_id.
>
Are those queries actually equivalent? I've been repeatedly bitten by
nullability in left join queries, when playing with optimizations like
this, so maybe this is one of such cases?
This seems to be happening because distribute_qual_to_rels() does this:
     ...
     else if (bms_overlap(relids, outerjoin_nonnullable))
     {
         /*
          * The qual is attached to an outer join and mentions (some of the)
          * rels on the nonnullable side, so it's not degenerate.
          *
          * We can't use such a clause to deduce equivalence (the left and
          * right sides might be unequal above the join because one of them has
          * gone to NULL) ... but we might be able to use it for more limited
          * deductions, if it is mergejoinable.  So consider adding it to the
          * lists of set-aside outer-join clauses.
          */
         is_pushed_down = false;
         ...
     }
     ...
and the clause does indeed reference the nullable side of the join,
preventing us from marking the clause as pushed-down.
I haven't managed to construct a query that would break this, though.
I.e. a case where the two queries would give different results. So maybe
those queries actually are redundant. Or maybe the example would need to
be more complicated (requiring more joins, or something like that).
regards
-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
			
		AW: Missing constant propagation in planner on hash quals causes joinslowdown
От
 
		    	Hans Buschmann
		    Дата:
		        
Thanks for looking at it.
I think these two queries are equivalent, as shown by the explain.
In both cases the index scan only selects tuples with xx_season=3 as shown in both explains:
         Index Cond: (tmaster.t1_season = 3)
               Index Cond: (tfact.t2_season = 3)
So no tuple can have a null value for xx_season.
My point is the construction of the hash table, wich includes the t2_season even if it is constant and not null. From
explain:
with overhead:
   Hash Cond: ((tmaster.t1_season = tfact.t2_season) AND (tmaster.t1_id_t2 = tfact.id_t2))
optimized:
   Hash Cond: (tmaster.t1_id_t2 = tfact.id_t2)
The planner correctly sets the index conditions (knows that the xx_season columns are constant), but fails to apply
thisconstantness to the hash conditions by discarding a constant column in a hash table. 
In my real application most of the xx_season columns are declared not null, but this should not change the outcome.
The performance difference is slightly lower when the created tables are previously analyzed (what I forgot).
But the percentual gain is much higher considering only the construction of the hash table, the only part of the query
executionaltered by this optimization. 
In my opinion this scenario could be quite common in multi-tenant cases, in logging, time based data sets etc.
I tried to look at the pg source code but could not yet find the place where the hash conditions are selected and
potentiallytested. 
When optimizing the constants away there my be a special case where all hash conditions are constants, so a hash table
hasnot to be build (or at least one hash cond has to be preserved).  
Hans Buschmann
			
		Hans Buschmann <buschmann@nidsa.net> writes:
> The planner correctly sets the index conditions (knows that the xx_season columns are constant), but fails to apply
thisconstantness to the hash conditions by discarding a constant column in a hash table. 
Yeah.  The reason for this behavior is that, after
reconsider_outer_join_clauses has decided that it can derive
tfact.t2_season = 3 from the given conditions, it still "throws back"
the original join condition tmaster.t1_season = tfact.t2_season into
the set of ordinary join clauses, so that that condition will still get
applied at the time the join is computed.  The comment about it claims
 * If we don't find any match for a set-aside outer join clause, we must
 * throw it back into the regular joinclause processing by passing it to
 * distribute_restrictinfo_to_rels().  If we do generate a derived clause,
 * however, the outer-join clause is redundant.  We still throw it back,
 * because otherwise the join will be seen as a clauseless join and avoided
 * during join order searching; but we mark it as redundant to keep from
 * messing up the joinrel's size estimate.
However, this seems to be a lie, or at least not the whole truth.  If you
try diking out the throw-back logic, which is simple enough to do, you'll
immediately find that some test cases in join.sql give the wrong answers.
The reason is that once we've derived tfact.t2_season = 3 and asserted
that that's an equivalence condition, the eclass logic believes that
tmaster.t1_season = tfact.t2_season must hold everywhere (that's more or
less the definition of an eclass).  But *it doesn't hold* above the left
join, because tfact.t2_season could be null instead.  In particular this
can break any higher joins involving tfact.t2_season.  By treating
tmaster.t1_season = tfact.t2_season as an ordinary join clause we force
the necessary tests to be made anyway, independently of the eclass logic.
(There's no bug in Hans' example because there's only one join; the
problem is not really with this particular clause, but with other columns
that might also be thought equal to tfact.t2_season.  It's those upper
join clauses that can't safely be thrown away.)
I've had a bee in my bonnet for a long time about redesigning all this
to be less klugy.  Fundamentally what we lack is an honest representation
that a given value might be NULL instead of the original value from its
table; this results in assorted compromises both here and elsewhere in
the planner.  The rough sketch that's been lurking in my hindbrain is
1) Early in the planner where we flatten join alias variables, do so
only when they actually are formally equivalent to the input variables,
ie only for the non-nullable side of any outer join.  This gives us
the desired representation distinction between original and
possibly-nulled values: the former are base-table Vars, the latter
are join Vars.
2) tmaster.t1_season = tfact.t2_season could be treated as an honest
equivalence condition *between those variables*, but it would not
imply anything about the join output variable "j.t2_season".
I think reconsider_outer_join_clauses goes away entirely.
3) Places where we're trying to estimate variable values would have
to be taught to look through unflattened join alias variables to see
what they're based on.  For extra credit they could assign some
higher-than-normal probability that the value is null (though that
could be done later, since there's certainly nothing accounting
for that today).
4) The final flattening of these alias variables would probably not
happen till setrefs.c.
5) I have not thought through what this implies for full joins.
The weird COALESCE business might go away entirely, or it might
be something that setrefs.c still has to insert.
6) There are lots of other, related kluges that could stand to be revisited
at the same time.  The business with "outer-join delayed" clauses is
a big example.  Maybe that all just magically goes away once we have
a unique understanding of what value a Var represents, but I've not
thought hard about it.  We'd certainly need some new understanding of
how to schedule outer-join clauses, since their Var membership would
no longer correspond directly to sets of baserels.  There might be
connections to the "PlaceHolderVar" mess as well.
This is a pretty major change and will doubtless break stuff throughout
the planner.  I'm also not clear on the planner performance implications;
both point (3) and the need for two rounds of alias flattening seem like
they'd slow things down.  But maybe we could buy some speed back by
eliminating kluges, and there is a hope that we'd end up with better
plans in a useful number of cases.
Anyway, the large amount of work involved and the rather small benefits
we'd probably get have discouraged me from working on this.  But maybe
someday it'll get to the top of the queue.  Basically this is all
technical debt left over from the way we bolted outer joins onto the
original planner design, so we really oughta fix it sometime.
            regards, tom lane