Re: estimation problems for DISTINCT ON with FDW

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: estimation problems for DISTINCT ON with FDW
Дата
Msg-id 542866.1593555692@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: estimation problems for DISTINCT ON with FDW  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: estimation problems for DISTINCT ON with FDW  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Список pgsql-hackers
I wrote:
> I poked into that and found that the problem is in estimate_num_groups,
> which effectively just disregards any relation that has rel->tuples = 0.
> That is the case for a postgres_fdw foreign table if use_remote_estimate
> is true, because postgres_fdw never bothers to set any other value.
> (On the other hand, if use_remote_estimate is false, it does fill in a
> pretty-bogus value, mainly so it can use set_baserel_size_estimates.
> See postgresGetForeignRelSize.)

> It seems like we could make estimate_num_groups a bit more robust here;
> it could just skip its attempts to clamp based on total size or
> restriction selectivity, but still include the reldistinct value for the
> rel into the total numdistinct.  I wonder though if this is the only
> problem caused by failing to fill in any value for rel->tuples ...
> should we make postgres_fdw install some value for that?

Attached are a couple of quick-hack patches along each of those lines.
Either one resolves the crazy number-of-groups estimate for Jeff's
example; neither changes any existing regression test results.

On the whole I'm not sure I like 0001 (ie, changing estimate_num_groups).
Sure, it makes that function "more robust", but it does so at the cost
of believing what might be a default or otherwise pretty insane
reldistinct estimate.  We put in the clamping behavior for a reason,
and I'm not sure we should disable it just because reltuples = 0.

0002 seems like a better answer on the whole, but it has a pretty
significant issue as well: it's changing the API for FDW
GetForeignRelSize functions, because now we're expecting them to set
both rows and tuples to something sane, contrary to the existing docs.

What I'm sort of inclined to do is neither of these exactly, but
instead put the

    baserel->tuples = Max(baserel->tuples, baserel->rows);

clamping behavior into the core code, immediately after the call to
GetForeignRelSize.  This'd still let the FDW set baserel->tuples if
it has a mind to, while not requiring that; and it prevents the
situation where the rows and tuples estimates are inconsistent.

            regards, tom lane

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index be08eb4814..8eea59d5a3 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3647,14 +3647,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
                     (1 - pow((rel->tuples - rel->rows) / rel->tuples,
                              rel->tuples / reldistinct));
             }
-            reldistinct = clamp_row_est(reldistinct);
-
-            /*
-             * Update estimate of total distinct groups.
-             */
-            numdistinct *= reldistinct;
         }

+        /*
+         * Update estimate of total distinct groups.
+         */
+        reldistinct = clamp_row_est(reldistinct);
+        numdistinct *= reldistinct;
+
         varinfos = newvarinfos;
     } while (varinfos != NIL);

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..fc061adedb 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -688,6 +688,15 @@ postgresGetForeignRelSize(PlannerInfo *root,
         /* Report estimated baserel size to planner. */
         baserel->rows = fpinfo->rows;
         baserel->reltarget->width = fpinfo->width;
+
+        /*
+         * plancat.c copied baserel->pages and baserel->tuples from pg_class.
+         * If the foreign table has never been ANALYZEd, or if its stats are
+         * out of date, baserel->tuples might now be less than baserel->rows,
+         * which will confuse assorted logic.  Hack it to appear minimally
+         * sensible.  (Do we need to hack baserel->pages too?)
+         */
+        baserel->tuples = Max(baserel->tuples, baserel->rows);
     }
     else
     {

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stephen Frost
Дата:
Сообщение: Re: Remove Deprecated Exclusive Backup Mode
Следующее
От: David Steele
Дата:
Сообщение: Re: Remove Deprecated Exclusive Backup Mode