Обсуждение: pgsql: Sample postgres_fdw tables remotely during ANALYZE

Поиск
Список
Период
Сортировка

pgsql: Sample postgres_fdw tables remotely during ANALYZE

От
Tomas Vondra
Дата:
Sample postgres_fdw tables remotely during ANALYZE

When collecting ANALYZE sample on foreign tables, postgres_fdw fetched
all rows and performed the sampling locally. For large tables this means
transferring and immediately discarding large amounts of data.

This commit allows the sampling to be performed on the remote server,
transferring only the much smaller sample. The sampling is performed
using the built-in TABLESAMPLE methods (system, bernoulli) or random()
function, depending on the remote server version.

Remote sampling can be enabled by analyze_sampling on the foreign server
and/or foreign table, with supported values 'off', 'auto', 'system',
'bernoulli' and 'random'. The default value is 'auto' which uses either
'bernoulli' (TABLESAMPLE method) or 'random' (for remote servers without
TABLESAMPLE support).

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/8ad51b5f446b5c19ba2c0033a0f7b3180b3b6d95

Modified Files
--------------
contrib/postgres_fdw/deparse.c                 |  72 ++++++++-
contrib/postgres_fdw/expected/postgres_fdw.out |  25 +++
contrib/postgres_fdw/option.c                  |  21 +++
contrib/postgres_fdw/postgres_fdw.c            | 202 ++++++++++++++++++++++++-
contrib/postgres_fdw/postgres_fdw.h            |  15 ++
contrib/postgres_fdw/sql/postgres_fdw.sql      |  36 +++++
doc/src/sgml/postgres-fdw.sgml                 |  35 +++++
7 files changed, 397 insertions(+), 9 deletions(-)


Re: pgsql: Sample postgres_fdw tables remotely during ANALYZE

От
Alvaro Herrera
Дата:
On 2022-Dec-30, Tomas Vondra wrote:

> Sample postgres_fdw tables remotely during ANALYZE
> 
> When collecting ANALYZE sample on foreign tables, postgres_fdw fetched
> all rows and performed the sampling locally. For large tables this means
> transferring and immediately discarding large amounts of data.
> 
> This commit allows the sampling to be performed on the remote server,
> transferring only the much smaller sample. The sampling is performed
> using the built-in TABLESAMPLE methods (system, bernoulli) or random()
> function, depending on the remote server version.
> 
> Remote sampling can be enabled by analyze_sampling on the foreign server
> and/or foreign table, with supported values 'off', 'auto', 'system',
> 'bernoulli' and 'random'. The default value is 'auto' which uses either
> 'bernoulli' (TABLESAMPLE method) or 'random' (for remote servers without
> TABLESAMPLE support).

You left out the Discussion line, so here's the link:
https://postgr.es/m/151e835f-55d6-ddbc-b5b4-07ee606aba44@enterprisedb.com

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"Nunca se desea ardientemente lo que solo se desea por razón" (F. Alexandre)