Re: estimation problems for DISTINCT ON with FDW

Поиск
Список
Период
Сортировка
От Etsuro Fujita
Тема Re: estimation problems for DISTINCT ON with FDW
Дата
Msg-id CAPmGK15_kkoqKaT_va5WX1q41F=6xGOZGQdAivYg1QF7nJ7uhw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: estimation problems for DISTINCT ON with FDW  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: estimation problems for DISTINCT ON with FDW  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On Wed, Jul 1, 2020 at 7:21 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Attached are a couple of quick-hack patches along each of those lines.
> Either one resolves the crazy number-of-groups estimate for Jeff's
> example; neither changes any existing regression test results.
>
> On the whole I'm not sure I like 0001 (ie, changing estimate_num_groups).
> Sure, it makes that function "more robust", but it does so at the cost
> of believing what might be a default or otherwise pretty insane
> reldistinct estimate.  We put in the clamping behavior for a reason,
> and I'm not sure we should disable it just because reltuples = 0.
>
> 0002 seems like a better answer on the whole, but it has a pretty
> significant issue as well: it's changing the API for FDW
> GetForeignRelSize functions, because now we're expecting them to set
> both rows and tuples to something sane, contrary to the existing docs.

postgres_fdw already sets both rows and tuples if use_remote_estimate
is false, and we have pages=0 and tuples=0, so the contrary seems OK
to me.

In the 0002 patch:

+ /*
+ * plancat.c copied baserel->pages and baserel->tuples from pg_class.
+ * If the foreign table has never been ANALYZEd, or if its stats are
+ * out of date, baserel->tuples might now be less than baserel->rows,
+ * which will confuse assorted logic.  Hack it to appear minimally
+ * sensible.  (Do we need to hack baserel->pages too?)
+ */
+ baserel->tuples = Max(baserel->tuples, baserel->rows);

for consistency, this should be

  baserel->tuples = clamp_row_est(baserel->rows / sel);

where sel is the selectivity of the baserestrictinfo clauses?

> What I'm sort of inclined to do is neither of these exactly, but
> instead put the
>
>         baserel->tuples = Max(baserel->tuples, baserel->rows);
>
> clamping behavior into the core code, immediately after the call to
> GetForeignRelSize.  This'd still let the FDW set baserel->tuples if
> it has a mind to, while not requiring that; and it prevents the
> situation where the rows and tuples estimates are inconsistent.

I'm not sure this would address the inconsistency.  Consider the
postgres_fdw case where use_remote_estimate is true, and the stats are
out of date, eg, baserel->tuples copied from pg_class is much larger
than the actual tuples and hence baserel->rows (I assume here that
postgres_fdw doesn't do anything about baserel->tuples).  In such a
case the inconsistency would make the estimate_num_groups() estimate
more inaccurate.  I think the consistency is the responsibility of the
FDW rather than the core, so I would vote for the 0002 patch.  Maybe
I'm missing something.

Thanks for working on this!

Best regards,
Etsuro Fujita



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Cleanup - adjust the code crossing 80-column window limit
Следующее
От: Daniel Gustafsson
Дата:
Сообщение: Re: Additional improvements to extended statistics