Обсуждение: estimation problems for DISTINCT ON with FDW

Поиск
Список
Период
Сортировка

estimation problems for DISTINCT ON with FDW

От
Jeff Janes
Дата:
If I use the attached sql file to set up the database with loop-back postgres_fdw, and then turn on use_remote_estimate for this query:

distinct on (id) id, z from fgn.priority order by id, priority desc,z

It issues two queries for the foreign estimate, one with a sort and one without:

EXPLAIN SELECT id, priority, z FROM public.priority

EXPLAIN SELECT id, priority, z FROM public.priority ORDER BY id ASC NULLS LAST, priority DESC NULLS FIRST, z ASC NULLS LAST

It doesn't cost out the plan of pushing the DISTINCT ON down to the foreign side, which is probably the best way to run the query.  I guess it makes sense that FDW machinery in general doesn't want to try to push a PostgreSQL specific construct.

But much worse than that, it horribly misestmates the number of unique rows it will get back, having never asked the remote side for an estimate of that.

 Result  (cost=100.51..88635.90 rows=1 width=16)
   ->  Unique  (cost=100.51..88635.90 rows=1 width=16)
         ->  Foreign Scan on priority  (cost=100.51..86135.90 rows=1000000 width=16)

Where does it come up with the idea that these 1,000,000 rows will DISTINCT/Unique down to just 1 row?   I can't find the place in the code where that happens.  I suspect it is happening somewhere in the core code based on data fed into it by postgres_fdw, not in postgres_fdw itself.

This leads to horrible plans if the DISTINCT ON is actually in a subquery which is joined to other tables, for example.

If you don't use the remote estimates, it at least comes up with a roughly sane estimate of 200 distinct rows, which is enough to inhibit selection of the worst plans. Why does an uninformative remote estimate do so much worse than no remote estimate at all?

Of course I could just disable remote estimates for this table, but then other queries that use the table without DISTINCT ON suffer.  Another solution is to ANALYZE the foreign table, but that opens up a can of worms of its own.

I see this behavior in all supported or in-development versions.

Cheers,

Jeff
Вложения

Re: estimation problems for DISTINCT ON with FDW

От
Bharath Rupireddy
Дата:
> It doesn't cost out the plan of pushing the DISTINCT ON down to the foreign side, which is probably the best way to
runthe query.  I guess it makes sense that FDW machinery in general doesn't want to try to push a PostgreSQL specific
construct.

I think you are right, the DISTINCT operation is not being pushed to
remote(I may be wrong here. just for info - I looked at remote SQL
from explain(verbose) on the query to find this out) and so is for
estimates. There might be problems pushing DISTINCTs to remote servers
with the usage of fdw for sharding configurations. But when fdw is
used for non-sharded configurations such as just to get existing data
from another remote postgres server, oracle, hadoop or  some other
remote database engines where DISTINCT operation is supported, it's
good to push that to remote for both explains/estimates as well as in
the actual queries itself, to reduce data transferred from remote
database server to local postgres database server.

With Regards,
Bharath Rupireddy.
EnterpriseDB: http://www.enterprisedb.com



Re: estimation problems for DISTINCT ON with FDW

От
Etsuro Fujita
Дата:
On Mon, Jun 29, 2020 at 7:02 PM Bharath Rupireddy
<bharath.rupireddyforpostgres@gmail.com> wrote:
> > It doesn't cost out the plan of pushing the DISTINCT ON down to the foreign side, which is probably the best way to
runthe query.  I guess it makes sense that FDW machinery in general doesn't want to try to push a PostgreSQL specific
construct.
>
> I think you are right, the DISTINCT operation is not being pushed to
> remote(I may be wrong here. just for info - I looked at remote SQL
> from explain(verbose) on the query to find this out) and so is for
> estimates.

I think you are right.

> But when fdw is
> used for non-sharded configurations such as just to get existing data
> from another remote postgres server, oracle, hadoop or  some other
> remote database engines where DISTINCT operation is supported, it's
> good to push that to remote for both explains/estimates as well as in
> the actual queries itself, to reduce data transferred from remote
> database server to local postgres database server.

I think so too.  And I think we could do so using the upper-planner
pathification (ie, GetForeignUpperPaths() with UPPERREL_DISTINCT in
create_distinct_paths()).  It's on my long-term TODO list to implement
that in postgres_fdw.

Best regards,
Etsuro Fujita



Re: estimation problems for DISTINCT ON with FDW

От
Tom Lane
Дата:
Jeff Janes <jeff.janes@gmail.com> writes:
> It doesn't cost out the plan of pushing the DISTINCT ON down to the foreign
> side, which is probably the best way to run the query.  I guess it makes
> sense that FDW machinery in general doesn't want to try to push a
> PostgreSQL specific construct.

Well, that's an unimplemented feature anyway.  But people hared off after
that without addressing your actual bug report:

> But much worse than that, it horribly misestmates the number of unique rows
> it will get back, having never asked the remote side for an estimate of
> that.

I poked into that and found that the problem is in estimate_num_groups,
which effectively just disregards any relation that has rel->tuples = 0.
That is the case for a postgres_fdw foreign table if use_remote_estimate
is true, because postgres_fdw never bothers to set any other value.
(On the other hand, if use_remote_estimate is false, it does fill in a
pretty-bogus value, mainly so it can use set_baserel_size_estimates.
See postgresGetForeignRelSize.)

It seems like we could make estimate_num_groups a bit more robust here;
it could just skip its attempts to clamp based on total size or
restriction selectivity, but still include the reldistinct value for the
rel into the total numdistinct.  I wonder though if this is the only
problem caused by failing to fill in any value for rel->tuples ...
should we make postgres_fdw install some value for that?

(Note that the question of whether we should ask the remote server for
an estimate of ndistinct is kind of orthogonal to any of these points.
Even if we had obtained one that way, estimate_num_groups would not pay
any attention to it without a fix for the point at hand.)

            regards, tom lane



Re: estimation problems for DISTINCT ON with FDW

От
Tom Lane
Дата:
I wrote:
> I poked into that and found that the problem is in estimate_num_groups,
> which effectively just disregards any relation that has rel->tuples = 0.
> That is the case for a postgres_fdw foreign table if use_remote_estimate
> is true, because postgres_fdw never bothers to set any other value.
> (On the other hand, if use_remote_estimate is false, it does fill in a
> pretty-bogus value, mainly so it can use set_baserel_size_estimates.
> See postgresGetForeignRelSize.)

> It seems like we could make estimate_num_groups a bit more robust here;
> it could just skip its attempts to clamp based on total size or
> restriction selectivity, but still include the reldistinct value for the
> rel into the total numdistinct.  I wonder though if this is the only
> problem caused by failing to fill in any value for rel->tuples ...
> should we make postgres_fdw install some value for that?

Attached are a couple of quick-hack patches along each of those lines.
Either one resolves the crazy number-of-groups estimate for Jeff's
example; neither changes any existing regression test results.

On the whole I'm not sure I like 0001 (ie, changing estimate_num_groups).
Sure, it makes that function "more robust", but it does so at the cost
of believing what might be a default or otherwise pretty insane
reldistinct estimate.  We put in the clamping behavior for a reason,
and I'm not sure we should disable it just because reltuples = 0.

0002 seems like a better answer on the whole, but it has a pretty
significant issue as well: it's changing the API for FDW
GetForeignRelSize functions, because now we're expecting them to set
both rows and tuples to something sane, contrary to the existing docs.

What I'm sort of inclined to do is neither of these exactly, but
instead put the

    baserel->tuples = Max(baserel->tuples, baserel->rows);

clamping behavior into the core code, immediately after the call to
GetForeignRelSize.  This'd still let the FDW set baserel->tuples if
it has a mind to, while not requiring that; and it prevents the
situation where the rows and tuples estimates are inconsistent.

            regards, tom lane

diff --git a/src/backend/utils/adt/selfuncs.c b/src/backend/utils/adt/selfuncs.c
index be08eb4814..8eea59d5a3 100644
--- a/src/backend/utils/adt/selfuncs.c
+++ b/src/backend/utils/adt/selfuncs.c
@@ -3647,14 +3647,14 @@ estimate_num_groups(PlannerInfo *root, List *groupExprs, double input_rows,
                     (1 - pow((rel->tuples - rel->rows) / rel->tuples,
                              rel->tuples / reldistinct));
             }
-            reldistinct = clamp_row_est(reldistinct);
-
-            /*
-             * Update estimate of total distinct groups.
-             */
-            numdistinct *= reldistinct;
         }

+        /*
+         * Update estimate of total distinct groups.
+         */
+        reldistinct = clamp_row_est(reldistinct);
+        numdistinct *= reldistinct;
+
         varinfos = newvarinfos;
     } while (varinfos != NIL);

diff --git a/contrib/postgres_fdw/postgres_fdw.c b/contrib/postgres_fdw/postgres_fdw.c
index 9fc53cad68..fc061adedb 100644
--- a/contrib/postgres_fdw/postgres_fdw.c
+++ b/contrib/postgres_fdw/postgres_fdw.c
@@ -688,6 +688,15 @@ postgresGetForeignRelSize(PlannerInfo *root,
         /* Report estimated baserel size to planner. */
         baserel->rows = fpinfo->rows;
         baserel->reltarget->width = fpinfo->width;
+
+        /*
+         * plancat.c copied baserel->pages and baserel->tuples from pg_class.
+         * If the foreign table has never been ANALYZEd, or if its stats are
+         * out of date, baserel->tuples might now be less than baserel->rows,
+         * which will confuse assorted logic.  Hack it to appear minimally
+         * sensible.  (Do we need to hack baserel->pages too?)
+         */
+        baserel->tuples = Max(baserel->tuples, baserel->rows);
     }
     else
     {

Re: estimation problems for DISTINCT ON with FDW

От
Etsuro Fujita
Дата:
On Wed, Jul 1, 2020 at 7:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Attached are a couple of quick-hack patches along each of those lines.
> Either one resolves the crazy number-of-groups estimate for Jeff's
> example; neither changes any existing regression test results.
>
> On the whole I'm not sure I like 0001 (ie, changing estimate_num_groups).
> Sure, it makes that function "more robust", but it does so at the cost
> of believing what might be a default or otherwise pretty insane
> reldistinct estimate.  We put in the clamping behavior for a reason,
> and I'm not sure we should disable it just because reltuples = 0.
>
> 0002 seems like a better answer on the whole, but it has a pretty
> significant issue as well: it's changing the API for FDW
> GetForeignRelSize functions, because now we're expecting them to set
> both rows and tuples to something sane, contrary to the existing docs.

postgres_fdw already sets both rows and tuples if use_remote_estimate
is false, and we have pages=0 and tuples=0, so the contrary seems OK
to me.

In the 0002 patch:

+ /*
+ * plancat.c copied baserel->pages and baserel->tuples from pg_class.
+ * If the foreign table has never been ANALYZEd, or if its stats are
+ * out of date, baserel->tuples might now be less than baserel->rows,
+ * which will confuse assorted logic.  Hack it to appear minimally
+ * sensible.  (Do we need to hack baserel->pages too?)
+ */
+ baserel->tuples = Max(baserel->tuples, baserel->rows);

for consistency, this should be

  baserel->tuples = clamp_row_est(baserel->rows / sel);

where sel is the selectivity of the baserestrictinfo clauses?

> What I'm sort of inclined to do is neither of these exactly, but
> instead put the
>
>         baserel->tuples = Max(baserel->tuples, baserel->rows);
>
> clamping behavior into the core code, immediately after the call to
> GetForeignRelSize.  This'd still let the FDW set baserel->tuples if
> it has a mind to, while not requiring that; and it prevents the
> situation where the rows and tuples estimates are inconsistent.

I'm not sure this would address the inconsistency.  Consider the
postgres_fdw case where use_remote_estimate is true, and the stats are
out of date, eg, baserel->tuples copied from pg_class is much larger
than the actual tuples and hence baserel->rows (I assume here that
postgres_fdw doesn't do anything about baserel->tuples).  In such a
case the inconsistency would make the estimate_num_groups() estimate
more inaccurate.  I think the consistency is the responsibility of the
FDW rather than the core, so I would vote for the 0002 patch.  Maybe
I'm missing something.

Thanks for working on this!

Best regards,
Etsuro Fujita



Re: estimation problems for DISTINCT ON with FDW

От
Tom Lane
Дата:
Etsuro Fujita <etsuro.fujita@gmail.com> writes:
> On Wed, Jul 1, 2020 at 7:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> +    baserel->tuples = Max(baserel->tuples, baserel->rows);

> for consistency, this should be
>   baserel->tuples = clamp_row_est(baserel->rows / sel);
> where sel is the selectivity of the baserestrictinfo clauses?

If we had the selectivity available, maybe so, but we don't.
(And even less so if we put this logic in the core code.)

Short of sending a whole second query to the remote server, it's
not clear to me how we could get the full table size (or equivalently
the target query's selectivity for that table).  The best we realistically
can do is to adopt pg_class.reltuples if there's been an ANALYZE of
the foreign table.  That case already works (and this proposal doesn't
break it).  The problem is what to do when pg_class.reltuples is zero
or otherwise badly out-of-date.

>> What I'm sort of inclined to do is neither of these exactly, but
>> instead put the
>>     baserel->tuples = Max(baserel->tuples, baserel->rows);
>> clamping behavior into the core code, immediately after the call to
>> GetForeignRelSize.  This'd still let the FDW set baserel->tuples if
>> it has a mind to, while not requiring that; and it prevents the
>> situation where the rows and tuples estimates are inconsistent.

> I'm not sure this would address the inconsistency.  Consider the
> postgres_fdw case where use_remote_estimate is true, and the stats are
> out of date, eg, baserel->tuples copied from pg_class is much larger
> than the actual tuples and hence baserel->rows (I assume here that
> postgres_fdw doesn't do anything about baserel->tuples).  In such a
> case the inconsistency would make the estimate_num_groups() estimate
> more inaccurate.  I think the consistency is the responsibility of the
> FDW rather than the core, so I would vote for the 0002 patch.  Maybe
> I'm missing something.

Nothing about this proposal is stopping the FDW from inserting a better
value for rel->tuples if it's got one.  But it's not necessarily easy
or cheap to get that info.  In any case I think that upgrading the
requirements for what GetForeignRelSize must set is a hard sell.
We certainly could not back-patch a fix that required that, and even
going forward, it seems likely that many FDWs would never get the word.
(Well, maybe we could force the issue by throwing an error if
rel->tuples < rel->rows after GetForeignRelSize, but it's not hard
to imagine that routine testing could fail to trigger such a check.)

            regards, tom lane



Re: estimation problems for DISTINCT ON with FDW

От
Etsuro Fujita
Дата:
On Wed, Jul 1, 2020 at 11:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Etsuro Fujita <etsuro.fujita@gmail.com> writes:
> > On Wed, Jul 1, 2020 at 7:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> +    baserel->tuples = Max(baserel->tuples, baserel->rows);
>
> > for consistency, this should be
> >   baserel->tuples = clamp_row_est(baserel->rows / sel);
> > where sel is the selectivity of the baserestrictinfo clauses?
>
> If we had the selectivity available, maybe so, but we don't.
> (And even less so if we put this logic in the core code.)
>
> Short of sending a whole second query to the remote server, it's
> not clear to me how we could get the full table size (or equivalently
> the target query's selectivity for that table).  The best we realistically
> can do is to adopt pg_class.reltuples if there's been an ANALYZE of
> the foreign table.  That case already works (and this proposal doesn't
> break it).  The problem is what to do when pg_class.reltuples is zero
> or otherwise badly out-of-date.

In estimate_path_cost_size(), if use_remote_estimate is true, we
adjust the rows estimate returned from the remote server, by factoring
in the selectivity of the locally-checked quals.  I thought what I
proposed above would be more consistent with that.

Best regards,
Etsuro Fujita



Re: estimation problems for DISTINCT ON with FDW

От
Tom Lane
Дата:
Etsuro Fujita <etsuro.fujita@gmail.com> writes:
> On Wed, Jul 1, 2020 at 11:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Short of sending a whole second query to the remote server, it's
>> not clear to me how we could get the full table size (or equivalently
>> the target query's selectivity for that table).  The best we realistically
>> can do is to adopt pg_class.reltuples if there's been an ANALYZE of
>> the foreign table.  That case already works (and this proposal doesn't
>> break it).  The problem is what to do when pg_class.reltuples is zero
>> or otherwise badly out-of-date.

> In estimate_path_cost_size(), if use_remote_estimate is true, we
> adjust the rows estimate returned from the remote server, by factoring
> in the selectivity of the locally-checked quals.  I thought what I
> proposed above would be more consistent with that.

No, I don't think that would be very helpful.  There are really three
different numbers of interest here:

1. The actual total rowcount of the remote table.

2. The number of rows returned by the remote query (which is #1 times
the selectivity of the shippable quals).

3. The number of rows returned by the foreign scan (which is #2 times
the selectivity of the non-shippable quals)).

Clearly, rel->rows should be set to #3.  However, what we really want
for rel->tuples is #1.  That's because, to the extent that the planner
inspects rel->tuples at all, it's to adjust whole-table stats such as
we might have from ANALYZE.  What you're suggesting is that we use #2,
but I doubt that that's a big improvement.  In a decently tuned query
it's going to be a lot closer to #3 than to #1.

We could perhaps try to make our own estimate of the selectivity of the
shippable quals and then back into #1 from the value we got for #2 from
the remote server.  But that sounds mighty error-prone, so I doubt it'd
make for much of an improvement.  It also doesn't sound like something
I'd want to back-patch.

Another point here is that, to the extent we are relying on whole-table
stats from the last ANALYZE, pg_class.reltuples is actually the right
value to go along with that.  We could spend a lot of cycles doing
what I just suggested and end up with net-worse estimates.

In any case, the proposal I'm making is just to add a sanity-check
clamp to prevent the worst effects of not setting rel->tuples sanely.
It doesn't foreclose future improvements inside the FDW.

            regards, tom lane



Re: estimation problems for DISTINCT ON with FDW

От
Tom Lane
Дата:
Concretely, I now propose the attached, which seems entirely
safe to back-patch.

            regards, tom lane

diff --git a/doc/src/sgml/fdwhandler.sgml b/doc/src/sgml/fdwhandler.sgml
index 6587678af2..1e997c218b 100644
--- a/doc/src/sgml/fdwhandler.sgml
+++ b/doc/src/sgml/fdwhandler.sgml
@@ -123,6 +123,14 @@ GetForeignRelSize(PlannerInfo *root,
      should be replaced if at all possible.  The function may also choose to
      update <literal>baserel->width</literal> if it can compute a better estimate
      of the average result row width.
+     (The initial value is based on column data types and on column
+     average-width values measured by the last <command>ANALYZE</command>.)
+     Also, this function may update <literal>baserel->tuples</literal> if
+     it can compute a better estimate of the foreign table's total row count.
+     (The initial value is
+     from <structname>pg_class</structname>.<structfield>reltuples</structfield>
+     which represents the total row count seen by the
+     last <command>ANALYZE</command>.)
     </para>

     <para>
diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index d984da25d7..63761d5593 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -911,6 +911,9 @@ set_foreign_size(PlannerInfo *root, RelOptInfo *rel, RangeTblEntry *rte)

     /* ... but do not let it set the rows estimate to zero */
     rel->rows = clamp_row_est(rel->rows);
+
+    /* also, make sure rel->tuples is not insane relative to rel->rows */
+    rel->tuples = Max(rel->tuples, rel->rows);
 }

 /*

Re: estimation problems for DISTINCT ON with FDW

От
Etsuro Fujita
Дата:
On Thu, Jul 2, 2020 at 11:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Etsuro Fujita <etsuro.fujita@gmail.com> writes:
> > On Wed, Jul 1, 2020 at 11:40 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> >> Short of sending a whole second query to the remote server, it's
> >> not clear to me how we could get the full table size (or equivalently
> >> the target query's selectivity for that table).  The best we realistically
> >> can do is to adopt pg_class.reltuples if there's been an ANALYZE of
> >> the foreign table.  That case already works (and this proposal doesn't
> >> break it).  The problem is what to do when pg_class.reltuples is zero
> >> or otherwise badly out-of-date.
>
> > In estimate_path_cost_size(), if use_remote_estimate is true, we
> > adjust the rows estimate returned from the remote server, by factoring
> > in the selectivity of the locally-checked quals.  I thought what I
> > proposed above would be more consistent with that.
>
> No, I don't think that would be very helpful.  There are really three
> different numbers of interest here:
>
> 1. The actual total rowcount of the remote table.
>
> 2. The number of rows returned by the remote query (which is #1 times
> the selectivity of the shippable quals).
>
> 3. The number of rows returned by the foreign scan (which is #2 times
> the selectivity of the non-shippable quals)).
>
> Clearly, rel->rows should be set to #3.  However, what we really want
> for rel->tuples is #1.  That's because, to the extent that the planner
> inspects rel->tuples at all, it's to adjust whole-table stats such as
> we might have from ANALYZE.  What you're suggesting is that we use #2,
> but I doubt that that's a big improvement.  In a decently tuned query
> it's going to be a lot closer to #3 than to #1.
>
> We could perhaps try to make our own estimate of the selectivity of the
> shippable quals and then back into #1 from the value we got for #2 from
> the remote server.

Actually, that is what I suggested:

    + /*
    + * plancat.c copied baserel->pages and baserel->tuples from pg_class.
    + * If the foreign table has never been ANALYZEd, or if its stats are
    + * out of date, baserel->tuples might now be less than baserel->rows,
    + * which will confuse assorted logic.  Hack it to appear minimally
    + * sensible.  (Do we need to hack baserel->pages too?)
    + */
    + baserel->tuples = Max(baserel->tuples, baserel->rows);

    for consistency, this should be

      baserel->tuples = clamp_row_est(baserel->rows / sel);

    where sel is the selectivity of the baserestrictinfo clauses?

By "the baserestrictinfo clauses", I mean the shippable clauses as
well as the non-shippable clauses.  Since baserel->rows stores the
rows estimate returned by estimate_path_cost_size(), which is #3, this
estimates #1.

> But that sounds mighty error-prone, so I doubt it'd
> make for much of an improvement.

I have to admit the error-proneness.

> In any case, the proposal I'm making is just to add a sanity-check
> clamp to prevent the worst effects of not setting rel->tuples sanely.
> It doesn't foreclose future improvements inside the FDW.

Agreed.

Best regards,
Etsuro Fujita



Re: estimation problems for DISTINCT ON with FDW

От
Etsuro Fujita
Дата:
On Fri, Jul 3, 2020 at 5:19 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Concretely, I now propose the attached, which seems entirely
> safe to back-patch.

The patch looks good to me.  And +1 for back-patching.

Best regards,
Etsuro Fujita



Re: estimation problems for DISTINCT ON with FDW

От
Tom Lane
Дата:
Etsuro Fujita <etsuro.fujita@gmail.com> writes:
> On Thu, Jul 2, 2020 at 11:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> We could perhaps try to make our own estimate of the selectivity of the
>> shippable quals and then back into #1 from the value we got for #2 from
>> the remote server.

> Actually, that is what I suggested:
> ... By "the baserestrictinfo clauses", I mean the shippable clauses as
> well as the non-shippable clauses.  Since baserel->rows stores the
> rows estimate returned by estimate_path_cost_size(), which is #3, this
> estimates #1.

Ah.  That isn't a number we compute in this code path at the moment,
but you're right that we could do so.  However ...

>> But that sounds mighty error-prone, so I doubt it'd
>> make for much of an improvement.

> I have to admit the error-proneness.

... that is the crux of the problem.  The entire reason why we're
expending all these cycles to get a remote estimate is that we don't
trust the local estimate of the shippable quals' selectivity to be
any good.  So relying on it anyway doesn't seem very smart, even if
it's for the usually-not-too-important purpose of estimating the
total table size.

I suppose there is one case where this approach could win: if the
local selectivity estimate is just fine, but the remote table size has
changed a lot since we last did an ANALYZE, then this would give us a
decent table size estimate with no additional remote traffic.  But
that doesn't really seem like a great bet --- if the table size has
changed that much, our local stats are probably obsolete too.

I wonder whether someday we ought to invent a new API that's more
suited to postgres_fdw's needs than EXPLAIN is.  It's not like the
remote planner doesn't know the number we want; it just fails to
include it in EXPLAIN.

>> In any case, the proposal I'm making is just to add a sanity-check
>> clamp to prevent the worst effects of not setting rel->tuples sanely.
>> It doesn't foreclose future improvements inside the FDW.

> Agreed.

OK, I'll go ahead and push the patch I proposed yesterday.

            regards, tom lane



Re: estimation problems for DISTINCT ON with FDW

От
Jeff Janes
Дата:
On Fri, Jul 3, 2020 at 5:50 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:

OK, I'll go ahead and push the patch I proposed yesterday.

Thank you.  I tested 12_STABLE with my real queries on the real data set, and the "hard coded" estimate of 200 distinct rows (when use_remote_estimte is turned back on) is enough to get rid of the worst plans I was seeing in 12.3.

Cheers,

Jeff