Re: Import Statistics in postgres_fdw before resorting to sampling.

Поиск
Список
Период
Сортировка
От Corey Huinker
Тема Re: Import Statistics in postgres_fdw before resorting to sampling.
Дата
Msg-id CADkLM=ezV0_oUCxK=A2TmaAuXvWs0+i-H8fSNU3BRY=dBuDhQQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Import Statistics in postgres_fdw before resorting to sampling.  (Etsuro Fujita <etsuro.fujita@gmail.com>)
Ответы Re: Import Statistics in postgres_fdw before resorting to sampling.
Re: Import Statistics in postgres_fdw before resorting to sampling.
Re: Import Statistics in postgres_fdw before resorting to sampling.
Список pgsql-hackers
> I agree, if there is no fallback, then the default should be false. When I was initially brainstorming this patch, Nathan Bossart had suggested making it the default because 1) that would be an automatic benefit to users and 2) the cost for attempting to import stats was small in comparison to a table stample, so it was worth the attempt. I still want users to get that automatic benefit, but if there is no fallback to sampling then the default only makes sense as false.

I think that the FDW API that I proposed could actually allow us to
fall back to sampling, by modifying StatisticsAreImportable so that it
also checks if 1) there are statistics on the remote server and 2) the
data is fresh enough, and if so, returns true; otherwise, returns
false; in the latter case we could fall back to sampling.  And if we
modified it as such, I think we could change the default to true.
(Checking #2 is necessary to avoid importing stale data, which would
degrade plan quality.)

I've given this some more thought.

First, we'd have to add the va_cols param to StatisticsAreImportable, which isn't itself terrible.

Then, we'd have to determine that there are stats available for every mapped column (filtered by va_cols, if any). But the only way to do that is to query the pg_stats view on the remote end, and if we have done that, then we've already fetched the stats. Yes, we could avoid selecting the actual statistical values, and that would save some network bandwidth at the cost of having to do the query again with stats. So I don't really see the savings.

Also, the pg_stats view is our security-barrier black box into statistics, and it gives no insight into how recently those stats were acquired. We could poke pg_stat_all_tables, assuming we can even query it, and then make a value judgement on the value of (CURRENT_TIMESTAMP - GREATEST(last_analyze, last_autoanalyze), but that value is highly subjective.

I suppose we could move all of the statistics fetching into StatisticsAreImportable, And carry those values forward if they are satisfactory. That would leave ImportStatistics() with little to do other than form up the calls to pg_restore_*_stats(), but those could still fail, and at that point we'd have no way to fall back to sampling and analysis.

I really want to make sampling fallback possible.

Anyway, here's v8, incorporating the documentation feedback and Matheus's notes.
Вложения

В списке pgsql-hackers по дате отправления: