significant semi join overestimates (with CTEs)

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема significant semi join overestimates (with CTEs)
Дата
Msg-id 56CA41EB.8070101@2ndquadrant.com
обсуждение исходный текст
Ответы Re: significant semi join overestimates (with CTEs)  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
Hi,

while investigating a poor query plan choice, I've noticed suspicious
semi join estimates looking like this:

  ->  Nested Loop  (cost=35.52..1787.31 rows=281653470 width=566)
        Buffers: shared hit=9305
        ->  HashAggregate  (cost=34.81..36.81 rows=200 width=32)
              Group Key: data.column1
              ->  CTE Scan on data  (cost=0.00..30.94 rows=1547 width=32)
        ->  Index Scan using t_pkey on t (cost=0.71..8.74 rows=1 ...)
              Index Cond: ((a = data.column1) AND (...))
              Buffers: shared hit=9305

Notice the cardinality estimates - we expect the outer relation to have
200 rows, and for each of the 200 rows 1 matching row in the inner rel.
Yet we estimate the join cardinality to be ~280M rows, which is rather
excessive (it even exceeds the cartesian product cardinality).

So, how could this happen? Quite easily, actually ...

Consider a simple table

    CREATE TABLE t (a INT PRIMARY KEY, ...);

and this query:

WITH data AS (VALUES (1), (2), (3), ... list of more IDs ...)
SELECT * FROM t
WHERE a IN (SELECT column1 FROM data);

This is a simple semi join - the planner will do a HashAggregate on the
CTE to find the list of distinct values, and then it will do a lookup on
't' table to find matching rows. There may be additional conditions, as
suggested by the initial plan, but let's ignore that for now.

Now, the CTE actually has 1547 values in it, but the HashAggregate is
estimated to output just 200 values. Where does that come from?

Turns out, when there are no stats for the relation (e.g. when it's a
CTE), get_variable_numdistinct() behaves differently depending on the
cardinality of the relation. Essentially, this happens:

   if (rows < DEFAULT_NUM_DISTINCT)
   {
       *isdefault = false;
       return rows;
   }
   else
   {
      *isdefault = true;
      return DEFAULT_NUM_DISTINCT;
   }

i.e. we cap the ndistinct estimate to DEFAULT_NUM_DISTINCT and for some
reason return different value for the 'isdefault' flag.

Then, eqjoinsel_semi() falls through to this part of the code, as there
is no MCV for the CTE table:

     if (!isdefault1 && !isdefault2)
     {
         if (nd1 <= nd2 || nd2 < 0)
             selec = 1.0 - nullfrac1;
         else
             selec = (nd2 / nd1) * (1.0 - nullfrac1);
     }
     else
         selec = 0.5 * (1.0 - nullfrac1);

Assuming the other side of the join is a regular table with proper stats
(isdefault=false), depending on the CTE cardinality we fall into
different branches of the if condition.

When rows<200, we end up with isdefault=false and use e.g.

     selec = (nd2 / nd1) * (1.0 - nullfrac1)

which produces a sane estimate. But with rows >= 200, we suddenly switch
to the other estimate

     selec = 0.5 * (1.0 - nullfrac1);

which completely ignores the ndistinct values and instead evaluates to
0.5 (when nullfrac1=0.0). This is the source of overestimates, because
the actual join selectivity is way lower than 0.5 and we simply do

     rows = outer_rows * jselec;

in calc_joinrel_size_estimate() for JOIN_SEMI. It's not difficult to
construct examples of those over-estimates - attached are two scripts
with examples demonstrating the issue, differing in the scale of the
over-estimates.

It's also possible to construct examples for the inner if condition (see
example-3.sql):

     if (nd1 <= nd2 || nd2 < 0)
         selec = 1.0 - nullfrac1;
     else
         selec = (nd2 / nd1) * (1.0 - nullfrac1);

but in this case we only switch between selec=1.0 and 0.5 (again,
assuming nullfrac=0.0). While this sudden jump in estimates is not
great, it can't result in such excessive estimates.

Clearly, this use of CTEs is a bit unnecessary and it's trivial to
replace it with a simple IN() clause.

Moreover we can't really expect good estimates without stats, and I
don't think changing DEFAULT_NUM_DISTINCT from one arbitrary value to
some other arbitrary value would help much - we'd only see the strange
plan changes with different numbers of values in the CTE. But perhaps we
can do at least a bit better.

Firstly, we could clamp the join cardinality estimate to the size of
cartesian product, eliminating at least the most obvious over-estimates.

Secondly, I think it'd be nice to somehow eliminate the sudden jumps in
the estimates - I'm not quite sure why need the three different formulas
in eqjoinsel_semi, so maybe we can get rid of some of them?


regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Relaxing SSL key permission checks
Следующее
От: Thomas Munro
Дата:
Сообщение: Re: Proposal: "Causal reads" mode for load balancing reads without stale data