Re: Ideas about a better API for postgres_fdw remote estimates

Поиск
Список
Период
Сортировка
От Andrey Lepikhov
Тема Re: Ideas about a better API for postgres_fdw remote estimates
Дата
Msg-id 93c0dcb8-59fd-69e4-edcf-e20a4542ddb3@postgrespro.ru
обсуждение исходный текст
Ответ на Ideas about a better API for postgres_fdw remote estimates  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers

On 8/29/20 9:22 PM, Stephen Frost wrote:
> 
>> I implemented some FDW + pg core machinery to reduce weight of the problem.
>> The ANALYZE command on foreign table executes query on foreign server that
>> extracts statistics tuple, serializes it into json-formatted string and
>> returns to the caller. The caller deserializes this string, generates
>> statistics for this foreign table and update it. The second patch is a
>> proof-of-concept.
> 
> Isn't this going to create a version dependency that we'll need to deal
> with..?  What if a newer major version has some kind of improved ANALYZE
> command, in terms of what it looks at or stores, and it's talking to an
> older server?
> 
> When I was considering the issue with ANALYZE and FDWs, I had been
> thinking it'd make sense to just change the query that's built in
> deparseAnalyzeSql() to have a TABLESAMPLE clause, but otherwise run in
> more-or-less the same manner as today.  If we don't like the available
> TABLESAMPLE methods then we could add a new one that's explicitly the
> 'right' sample for an ANALYZE call and use that when it's available on
> the remote side.  Not sure if it'd make any sense for ANALYZE itself to
> start using that same TABLESAMPLE code, but maybe?  Not that I think
> it'd be much of an issue if it's independent either, with appropriate
> comments to note that we should probably try to make them match up for
> the sake of FDWs.
This approach does not contradict your idea.  This is a lightweight 
opportunity to reduce the cost of analysis if we have a set of servers 
with actual versions of system catalog and fdw.
> 
>> This patch speedup analyze command and provides statistics relevance on a
>> foreign table after autovacuum operation. Its effectiveness depends on
>> relevance of statistics on the remote server, but still.
> 
> If we do decide to go down this route, wouldn't it mean we'd have to
> solve the problem of what to do when it's a 9.6 foreign server being
> queried from a v12 server and dealing with any difference in the
> statistics structures of the two?
> 
> Seems like we would... in which case I would say that we should pull
> that bit out and make it general, and use it for pg_upgrade too, which
> would benefit a great deal from having the ability to upgrade stats
> between major versions also.  That's a much bigger piece to take on, of
> course, but seems to be what's implied with this approach for the FDW.
> 

Thank you for this use case.

We can add field "version" to statistics string (btree uses versioning 
too). As you can see, in this patch we are only trying to get 
statistics. If for some reason this does not work out, then we go along 
a difficult route.

Moreover,  I believe this strategy should only work if we analyze a 
relation implicitly. If the user executes analysis explicitly by the 
command "ANALYZE <relname>", we need to perform an fair analysis of the 
table.

-- 
regards,
Andrey Lepikhov
Postgres Professional



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Ideas about a better API for postgres_fdw remote estimates
Следующее
От: Mark Dilger
Дата:
Сообщение: Re: new heapcheck contrib module