Re: Query became very slow after 9.6 -> 10 upgrade

От: Tom Lane
Тема: Re: Query became very slow after 9.6 -> 10 upgrade
Дата: ,
Msg-id: 9755.1511392027@sss.pgh.pa.us
(см: обсуждение, исходный текст)
Ответ на: Re: Query became very slow after 9.6 -> 10 upgrade  (Dmitry Shalashov)
Ответы: Re: Query became very slow after 9.6 -> 10 upgrade  (Dmitry Shalashov)
Re: Query became very slow after 9.6 -> 10 upgrade  (Dmitry Shalashov)
Список: pgsql-performance

Скрыть дерево обсуждения

Query became very slow after 9.6 -> 10 upgrade  (Dmitry Shalashov, )
 RE: Query became very slow after 9.6 -> 10 upgrade  ("Alex Ignatov", )
  Re: Query became very slow after 9.6 -> 10 upgrade  (Dmitry Shalashov, )
   RE: Query became very slow after 9.6 -> 10 upgrade  ("Alex Ignatov", )
    Re: Query became very slow after 9.6 -> 10 upgrade  (Dmitry Shalashov, )
     Re: Query became very slow after 9.6 -> 10 upgrade  (Tomas Vondra, )
 Re: Query became very slow after 9.6 -> 10 upgrade  (Tom Lane, )
  Re: Query became very slow after 9.6 -> 10 upgrade  (Dmitry Shalashov, )
   Re: Query became very slow after 9.6 -> 10 upgrade  (Tom Lane, )
    Re: Query became very slow after 9.6 -> 10 upgrade  (Dmitry Shalashov, )
     Re: Query became very slow after 9.6 -> 10 upgrade  (Tom Lane, )
      Re: Query became very slow after 9.6 -> 10 upgrade  (Dmitry Shalashov, )
       Re: Query became very slow after 9.6 -> 10 upgrade  (Tom Lane, )
        Re: Query became very slow after 9.6 -> 10 upgrade  (Dmitry Shalashov, )
         Re: Query became very slow after 9.6 -> 10 upgrade  (Michael Paquier, )
          Re: Query became very slow after 9.6 -> 10 upgrade  (Dmitry Shalashov, )
          Re: Query became very slow after 9.6 -> 10 upgrade  (Tom Lane, )
           Re: Query became very slow after 9.6 -> 10 upgrade  (Dmitry Shalashov, )
    Re: Query became very slow after 9.6 -> 10 upgrade  (Dmitry Shalashov, )
   Re: Query became very slow after 9.6 -> 10 upgrade  (Tom Lane, )

Dmitry Shalashov <> writes:
> Turns out we had not 9.6 but 9.5.

I'd managed to reproduce the weird planner behavior locally in the
regression database:

regression=# create table foo (f1 int[], f2 int);
CREATE TABLE
regression=# explain select * from tenk1 where unique2 in (select distinct unnest(f1) from foo where f2=1);
                                    QUERY PLAN
-----------------------------------------------------------------------------------
 Nested Loop  (cost=30.85..80.50 rows=6 width=244)
   ->  HashAggregate  (cost=30.57..30.63 rows=6 width=4)
         Group Key: (unnest(foo.f1))
         ->  HashAggregate  (cost=30.42..30.49 rows=6 width=4)
               Group Key: unnest(foo.f1)
               ->  ProjectSet  (cost=0.00..28.92 rows=600 width=4)
                     ->  Seq Scan on foo  (cost=0.00..25.88 rows=6 width=32)
                           Filter: (f2 = 1)
   ->  Index Scan using tenk1_unique2 on tenk1  (cost=0.29..8.30 rows=1 width=244)
         Index Cond: (unique2 = (unnest(foo.f1)))
(10 rows)

Digging into it, the reason for the duplicate HashAggregate step was that
query_supports_distinctness() punted on SRFs-in-the-targetlist, basically
on the argument that it wasn't worth extra work to handle that case.
Thinking a bit harder, it seems to me that the correct analysis is:
1. If we are proving distinctness on the grounds of a DISTINCT clause,
then it doesn't matter whether there are any SRFs, because DISTINCT
removes duplicates after tlist SRF expansion.
2. But tlist SRFs break the ability to prove distinctness on the grounds
of GROUP BY, unless all of them are within grouping columns.
It still seems like detecting the second case is harder than it's worth,
but we can trivially handle the first case, with little more than some
code rearrangement.

The other problem is that the output rowcount of the sub-select (ie, of
the HashAggregate) is being estimated as though the SRF weren't there.
This turns out to be because estimate_num_groups() doesn't consider the
possibility of SRFs in the grouping columns.  It never has, but in 9.6 and
before the problem was masked by the fact that grouping_planner scaled up
the result rowcount by tlist_returns_set_rows() *after* performing
grouping.  Now we're effectively doing that in the other order, which is
more correct, but that means estimate_num_groups() has to apply some sort
of adjustment.  I suggest that it just multiply its old estimate by the
maximum of the SRF expansion counts.  That's likely to be an overestimate,
but it's really hard to do better without specific knowledge of the
individual SRF's behavior.

In short, I propose the attached fixes.  I've checked this and it seems
to fix Dmitry's original problem according to the test case he sent
off-list.

            regards, tom lane

diff --git a/src/backend/optimizer/plan/analyzejoins.c b/src/backend/optimizer/plan/analyzejoins.c
index 5b0da14..5783f90 100644
*** a/src/backend/optimizer/plan/analyzejoins.c
--- b/src/backend/optimizer/plan/analyzejoins.c
*************** rel_is_distinct_for(PlannerInfo *root, R
*** 744,751 ****
  bool
  query_supports_distinctness(Query *query)
  {
!     /* we don't cope with SRFs, see comment below */
!     if (query->hasTargetSRFs)
          return false;

      /* check for features we can prove distinctness with */
--- 744,751 ----
  bool
  query_supports_distinctness(Query *query)
  {
!     /* SRFs break distinctness except with DISTINCT, see below */
!     if (query->hasTargetSRFs && query->distinctClause == NIL)
          return false;

      /* check for features we can prove distinctness with */
*************** query_is_distinct_for(Query *query, List
*** 787,806 ****
      Assert(list_length(colnos) == list_length(opids));

      /*
-      * A set-returning function in the query's targetlist can result in
-      * returning duplicate rows, if the SRF is evaluated after the
-      * de-duplication step; so we play it safe and say "no" if there are any
-      * SRFs.  (We could be certain that it's okay if SRFs appear only in the
-      * specified columns, since those must be evaluated before de-duplication;
-      * but it doesn't presently seem worth the complication to check that.)
-      */
-     if (query->hasTargetSRFs)
-         return false;
-
-     /*
       * DISTINCT (including DISTINCT ON) guarantees uniqueness if all the
       * columns in the DISTINCT clause appear in colnos and operator semantics
!      * match.
       */
      if (query->distinctClause)
      {
--- 787,796 ----
      Assert(list_length(colnos) == list_length(opids));

      /*
       * DISTINCT (including DISTINCT ON) guarantees uniqueness if all the
       * columns in the DISTINCT clause appear in colnos and operator semantics
!      * match.  This is true even if there are SRFs in the DISTINCT columns or
!      * elsewhere in the tlist.
       */
      if (query->distinctClause)
      {
*************** query_is_distinct_for(Query *query, List
*** 820,825 ****
--- 810,825 ----
      }

      /*
+      * Otherwise, a set-returning function in the query's targetlist can
+      * result in returning duplicate rows, despite any grouping that might
+      * occur before tlist evaluation.  (If all tlist SRFs are within GROUP BY
+      * columns, it would be safe because they'd be expanded before grouping.
+      * But it doesn't currently seem worth the effort to check for that.)
+      */
+     if (query->hasTargetSRFs)
+         return false;
+
+     /*
       * Similarly, GROUP BY without GROUPING SETS guarantees uniqueness if all
       * the grouped columns appear in colnos and operator semantics match.
       */
diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index 4bbb4a8..edff6da 100644
*** a/src/backend/utils/adt/selfuncs.c
--- b/src/backend/utils/adt/selfuncs.c
*************** estimate_num_groups(PlannerInfo *root, L
*** 3361,3366 ****
--- 3361,3367 ----
                      List **pgset)
  {
      List       *varinfos = NIL;
+     double        srf_multiplier = 1.0;
      double        numdistinct;
      ListCell   *l;
      int            i;
*************** estimate_num_groups(PlannerInfo *root, L
*** 3394,3399 ****
--- 3395,3401 ----
      foreach(l, groupExprs)
      {
          Node       *groupexpr = (Node *) lfirst(l);
+         double        this_srf_multiplier;
          VariableStatData vardata;
          List       *varshere;
          ListCell   *l2;
*************** estimate_num_groups(PlannerInfo *root, L
*** 3402,3407 ****
--- 3404,3424 ----
          if (pgset && !list_member_int(*pgset, i++))
              continue;

+         /*
+          * Set-returning functions in grouping columns are a bit problematic.
+          * The code below will effectively ignore their SRF nature and come up
+          * with a numdistinct estimate as though they were scalar functions.
+          * We compensate by scaling up the end result by the largest SRF
+          * rowcount estimate.  (This will be an overestimate if the SRF
+          * produces multiple copies of any output value, but it seems best to
+          * assume the SRF's outputs are distinct.  In any case, it's probably
+          * pointless to worry too much about this without much better
+          * estimates for SRF output rowcounts than we have today.)
+          */
+         this_srf_multiplier = expression_returns_set_rows(groupexpr);
+         if (srf_multiplier < this_srf_multiplier)
+             srf_multiplier = this_srf_multiplier;
+
          /* Short-circuit for expressions returning boolean */
          if (exprType(groupexpr) == BOOLOID)
          {
*************** estimate_num_groups(PlannerInfo *root, L
*** 3467,3475 ****
--- 3484,3498 ----
       */
      if (varinfos == NIL)
      {
+         /* Apply SRF multiplier as we would do in the long path */
+         numdistinct *= srf_multiplier;
+         /* Round off */
+         numdistinct = ceil(numdistinct);
          /* Guard against out-of-range answers */
          if (numdistinct > input_rows)
              numdistinct = input_rows;
+         if (numdistinct < 1.0)
+             numdistinct = 1.0;
          return numdistinct;
      }

*************** estimate_num_groups(PlannerInfo *root, L
*** 3638,3643 ****
--- 3661,3670 ----
          varinfos = newvarinfos;
      } while (varinfos != NIL);

+     /* Now we can account for the effects of any SRFs */
+     numdistinct *= srf_multiplier;
+
+     /* Round off */
      numdistinct = ceil(numdistinct);

      /* Guard against out-of-range answers */


В списке pgsql-performance по дате сообщения:

От: Tom Lane
Дата:
Сообщение: Re: Query became very slow after 9.6 -> 10 upgrade
От: "Henrik Cednert (Filmlance)"
Дата:
Сообщение: Re: pg_dump 3 times as slow after 8.4 -> 9.5 upgrade