Обсуждение: Partitioned table statistics vs autoanalyze

Поиск
Список
Период
Сортировка

Partitioned table statistics vs autoanalyze

От
Kamil Frydel
Дата:
Hi,

we faced a performance issue when joining 2 partitioned tables 
(declarative partitioning). The planner chooses nested loop while we 
expect hash join.

The query and the plan are available here: https://explain.depesz.com/s/23r9

table_1 and table_2 are hash partitioned using volume_id column. Usually 
we make analyze on partitions. We do not make analyze on the partitioned 
table (parent).
However, if we run 'analyze' on the partitioned table then planner 
starts choosing hash join. As a comparison, the execution using nested 
loop takes about 15 minutes and if it is done using hash join then the 
query lasts for about 1 minute. When running 'analyze' for the 
partitioned table, postgres inserts statistics for the partitioned table 
into pg_stats (pg_statistics). Before that, there are only statistics 
for partitions. We suspect that this is the reason for selecting bad 
query plan.

The query is executed with cursor thus, in order to avoid parallel 
query, I set max_parallel_workers_per_gather to 0 during tests.

We found that a similar issue was discussed in the context of 
inheritance: 
https://www.postgresql.org/message-id/Pine.BSO.4.64.0904161836540.11937%40leary.csoft.net 
and the conclusion was to add the following paragraph to the 'analyze' doc:

 > If the table being analyzed has one or more children, ANALYZE will 
gather statistics twice: once on the rows of the parent table only, and 
a second time on the rows of the parent table with all of its children. 
This second set of statistics is needed when planning queries that 
traverse the entire inheritance tree. The autovacuum daemon, however, 
will only consider inserts or updates on the parent table itself when 
deciding whether to trigger an automatic analyze for that table. If that 
table is rarely inserted into or updated, the inheritance statistics 
will not be up to date unless you run ANALYZE manually.
(https://www.postgresql.org/docs/13/sql-analyze.html)

I would appreciate if anyone could shed some light on the following 
questions:
1) Is this above paragraph from docs still valid in PG 13 and does it 
apply to declarative partitioning as well? Is running analyze manually 
on a partitioned table needed to get proper plans for queries on 
partitioned tables? Partitioned table (in the declarative way) is 
”virtual” and does not keep any data so it seems that there are no 
statistics that can be gathered from the table itself and statistics 
from partitions should be sufficient.
2) Why does the planner need these statistics since they seem to be 
unused in the query plan. The query plan uses only partitions, not the 
partitioned table.

PostgreSQL version number:
                                                                    version

---------------------------------------------------------------------------------------------------------------------------------------------
  PostgreSQL 13.3 (Ubuntu 13.3-1.pgdg16.04+1) on x86_64-pc-linux-gnu, 
compiled by gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609, 64-bit
(1 row)


How you installed PostgreSQL: From Ubuntu 16 repositories.

Changes made to the settings in the postgresql.conf file:
                 name                 |             current_setting 
         |        source
-------------------------------------+-----------------------------------------+----------------------
  application_name                    | psql 
         | client
  auto_explain.log_analyze            | on 
         | configuration file
  auto_explain.log_min_duration       | 30s 
         | configuration file
  auto_explain.log_nested_statements  | on 
         | configuration file
  auto_explain.log_timing             | off 
         | configuration file
  autovacuum_freeze_max_age           | 1000000000 
         | configuration file
  autovacuum_max_workers              | 6 
         | configuration file
  autovacuum_vacuum_cost_delay        | 20ms 
         | configuration file
  autovacuum_vacuum_cost_limit        | 2000 
         | configuration file
  checkpoint_completion_target        | 0.9 
         | configuration file
  checkpoint_timeout                  | 15min 
         | configuration file
  cluster_name                        | 13/main 
         | configuration file
  cpu_index_tuple_cost                | 0.001 
         | configuration file
  cpu_operator_cost                   | 0.0005 
         | configuration file
  cursor_tuple_fraction               | 1 
         | configuration file
  DateStyle                           | ISO, MDY 
         | configuration file
  default_statistics_target           | 200 
         | configuration file
  default_text_search_config          | pg_catalog.english 
         | configuration file
  dynamic_shared_memory_type          | posix 
         | configuration file
  effective_cache_size                | 193385MB 
         | configuration file
  effective_io_concurrency            | 1000 
         | configuration file
  external_pid_file                   | /var/run/postgresql/13-main.pid 
         | configuration file
  from_collapse_limit                 | 15 
         | configuration file
  geqo_threshold                      | 15 
         | configuration file
  idle_in_transaction_session_timeout | 1h 
         | configuration file
  jit_above_cost                      | -1 
         | configuration file
  jit_inline_above_cost               | -1 
         | configuration file
  jit_optimize_above_cost             | -1 
         | configuration file
  join_collapse_limit                 | 15 
         | configuration file
  lc_messages                         | en_US.UTF-8 
         | configuration file
  lc_monetary                         | en_US.UTF-8 
         | configuration file
  lc_numeric                          | en_US.UTF-8 
         | configuration file
  lc_time                             | en_US.UTF-8 
         | configuration file
  log_autovacuum_min_duration         | 1min 
         | configuration file
  log_checkpoints                     | on 
         | configuration file
  log_connections                     | on 
         | configuration file
  log_destination                     | stderr 
         | configuration file
  log_directory                       | pg_log 
         | configuration file
  log_disconnections                  | on 
         | configuration file
  log_filename                        | postgresql-%Y-%m-%d_%H%M%S.log 
         | configuration file
  log_line_prefix                     | %t [%p-%l] app=%a %q%u@%d 
         | configuration file
  log_lock_waits                      | on 
         | configuration file
  log_min_duration_statement          | 3s 
         | configuration file
  log_rotation_age                    | 1d 
         | configuration file
  log_rotation_size                   | 1GB 
         | configuration file
  log_temp_files                      | 0 
         | configuration file
  log_timezone                        | America/New_York 
         | configuration file
  logging_collector                   | on 
         | configuration file
  maintenance_work_mem                | 2GB 
         | configuration file
  max_connections                     | 1000 
         | configuration file
  max_locks_per_transaction           | 1280 
         | configuration file
  max_parallel_workers_per_gather     | 6 
         | configuration file
  max_stack_depth                     | 2MB 
         | environment variable
  max_wal_size                        | 10GB 
         | configuration file
  max_worker_processes                | 26 
         | configuration file
  min_wal_size                        | 1GB 
         | configuration file
  pg_stat_statements.max              | 2000 
         | configuration file
  pg_stat_statements.track            | all 
         | configuration file
  pg_stat_statements.track_planning   | off 
         | configuration file
  port                                | 5433 
         | configuration file
  random_page_cost                    | 1.5 
         | configuration file
  shared_buffers                      | 8GB 
         | configuration file
  shared_preload_libraries            | pg_stat_statements,auto_explain 
         | configuration file
  ssl                                 | on 
         | configuration file
  ssl_cert_file                       | 
/etc/ssl/certs/ssl-cert-snakeoil.pem    | configuration file
  ssl_key_file                        | 
/etc/ssl/private/ssl-cert-snakeoil.key  | configuration file
  stats_temp_directory                | 
/var/run/postgresql/13-main.pg_stat_tmp | configuration file
  temp_buffers                        | 2GB 
         | configuration file
  TimeZone                            | America/New_York 
         | configuration file
  track_commit_timestamp              | on 
         | configuration file
  track_io_timing                     | on 
         | configuration file
  unix_socket_directories             | /var/run/postgresql 
         | configuration file
  vacuum_freeze_table_age             | 1000000000 
         | configuration file
  wal_buffers                         | 128MB 
         | configuration file
  work_mem                            | 758MB 
         | configuration file
(75 rows)


Operating system and version: Linux r730server 4.15.0-142-generic 
#146~16.04.1-Ubuntu SMP Tue Apr 13 09:27:15 UTC 2021 x86_64 x86_64 
x86_64 GNU/Linux
What program you're using to connect to PostgreSQL: psql
Is there anything relevant or unusual in the PostgreSQL server logs?: No

-- 
Best Regards
Kamil Frydel



Re: Partitioned table statistics vs autoanalyze

От
Justin Pryzby
Дата:
On Thu, Jul 22, 2021 at 01:32:51PM +0200, Kamil Frydel wrote:
> table_1 and table_2 are hash partitioned using volume_id column. Usually we
> make analyze on partitions. We do not make analyze on the partitioned table
> (parent).
> However, if we run 'analyze' on the partitioned table then planner starts
> choosing hash join. As a comparison, the execution using nested loop takes
> about 15 minutes and if it is done using hash join then the query lasts for
> about 1 minute. When running 'analyze' for the partitioned table, postgres
> inserts statistics for the partitioned table into pg_stats (pg_statistics).
> Before that, there are only statistics for partitions. We suspect that this
> is the reason for selecting bad query plan.

> updated, the inheritance statistics will not be up to date unless you run
> ANALYZE manually.
> (https://www.postgresql.org/docs/13/sql-analyze.html)
> 
> I would appreciate if anyone could shed some light on the following
> questions:
> 1) Is this above paragraph from docs still valid in PG 13 and does it apply
> to declarative partitioning as well? Is running analyze manually on a
> partitioned table needed to get proper plans for queries on partitioned
> tables? Partitioned table (in the declarative way) is ”virtual” and does not
> keep any data so it seems that there are no statistics that can be gathered
> from the table itself and statistics from partitions should be sufficient.

Up through v13, autoanalyze doesn't collect stats on parent tables (neither
declarative nor inheritence).  I agree that this doesn't seem to be well
documented.  I think it should also be mentioned here:
https://www.postgresql.org/docs/current/routine-vacuuming.html#VACUUM-FOR-STATISTICS

In v14 (which is currently in beta), autoanalyze will process the partitioned
table automatically:
https://www.postgresql.org/docs/14/release-14.html
|Autovacuum now analyzes partitioned tables (Yuzuko Hosoya, Álvaro Herrera)
|Insert, update, and delete tuple counts from partitions are now propagated to their parent tables so autovacuum knows
whento process them.
 

> 2) Why does the planner need these statistics since they seem to be unused
> in the query plan. The query plan uses only partitions, not the partitioned
> table.

The "inherited" stats are used when you SELECT FROM table.  The stats for the
individual table would be needed when you SELECT FROM ONLY table (which makes
no sense for a partitioned table).

-- 
Justin



Re: Partitioned table statistics vs autoanalyze

От
Kamil Frydel
Дата:
> In v14 (which is currently in beta), autoanalyze will process the partitioned
> table automatically:
> https://www.postgresql.org/docs/14/release-14.html
> |Autovacuum now analyzes partitioned tables (Yuzuko Hosoya, Álvaro Herrera)
> |Insert, update, and delete tuple counts from partitions are now propagated to their parent tables so autovacuum
knowswhen to process them.
 
> 

Thank you for the prompt reply! Changes in v14 sound promising.


-- 
Best regards
Kamil Frydel