Re: Import Statistics in postgres_fdw before resorting to sampling.

Поиск
Список
Период
Сортировка
От Corey Huinker
Тема Re: Import Statistics in postgres_fdw before resorting to sampling.
Дата
Msg-id CADkLM=foU7yK1rfMzqTPiQtfEOGANAS9dHE5-Js6hEXmZiKfow@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Import Statistics in postgres_fdw before resorting to sampling.  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Ответы Re: Import Statistics in postgres_fdw before resorting to sampling.
Список pgsql-hackers

Other initial comments:

The commit message says:

    This is managed via two new options, fetch_stats and remote_analyze,
    both are available at the server level and table level. If fetch_stats
    is true, then the ANALYZE command will first attempt to fetch statistics
    from the remote table and import those statistics locally.

    If remote_analyze is true, and if the first attempt to fetch remote
    statistics found no attribute statistics, then an attempt will be made
    to ANALYZE the remote table before a second and final attempt to fetch
    remote statistics.

    If no statistics are found, then ANALYZE will fall back to the normal
    behavior of sampling and local analysis.

I think the first step assumes that the remote stats are up-to-date;
if they aren't, it would cause a regression.  (If the remote relation
is a plain table, they are likely to be up-to-date, but for example,
if it is a foreign table, it's possible that they are stale.)  So how
about making it the user's responsibility to make them up-to-date?  If
doing so, we wouldn't need to do the second and third steps anymore,
making the patch simple.

Obviously there is no way to know the quality/freshness of remote stats if they are found.

The analyze option was borne of feedback from other postgres hackers while brainstorming on what this option might look like. I don't think we *need* this extra option for the feature to be a success, but it's relative simplicity did make me want to put it out there to see who else liked it.
 

On the other hand:

    This operation will only work on remote relations that can have stored
    statistics: tables, partitioned tables, and materialized views. If the
    remote relation is a view then remote fetching/analyzing is just wasted
    effort and the user is better of setting fetch_stats to false for that
    table.

I'm not sure the waste effort is acceptable; IMO, if the remote table
is a view, I think that the system should detect that in some way, and
then just do the normal ANALYZE processing.

The stats fetch query is pretty light, but I can see fetching the relkind along with the relstats, and making decisions on whether to continue from there, only applying the relstats after attrstats have been successfully applied.

 
That's it for now.

I'll see what I can do to make that work.
 
My apologies for the delayed response.

Valuable responses are worth waiting for.

В списке pgsql-hackers по дате отправления: