700% faster (was: Re: Why is Postgres only using 8 cores for partitioned count? [Parallel append])

Поиск

Список

Период

Сортировка

От	Seamus Abshere
Тема	700% faster (was: Re: Why is Postgres only using 8 cores for partitioned count? [Parallel append])
Дата	14 февраля 2021 г. 06:10:43
Msg-id	7d6fdc20-857c-4cbe-ae2e-c0ff9520ed55@www.fastmail.com обсуждение исходный текст
Ответ на	Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append] ("Seamus Abshere" <sabshere@alumni.princeton.edu>)
Список	pgsql-general

Дерево обсуждения

hi,

Apologies for the self-replying, but I made it 700% faster by compiling from source and forcing the parallel_workers
from7 to 96.
 

If this weren't a partitioned table, I could just do `alter table test_3pd_cstore_partitioned set (parallel_workers =
96)`- but that isn't currently supported.
 

diff --git a/src/backend/optimizer/path/allpaths.c b/src/backend/optimizer/path/allpaths.c
index cd3fdd259c..0057a69d4e 100644
--- a/src/backend/optimizer/path/allpaths.c
+++ b/src/backend/optimizer/path/allpaths.c
@@ -1544,6 +1544,9 @@ add_paths_to_append_rel(PlannerInfo *root, RelOptInfo *rel,
                                max_parallel_workers_per_gather);
         Assert(parallel_workers > 0);
 
+        // force a crazy parallelism
+        parallel_workers = 96;
+        
         appendpath = create_append_path(root, rel, pa_nonpartial_subpaths,
                                         pa_partial_subpaths,
                                         NIL, NULL, parallel_workers, true,

BEFORE:

postgres=# explain analyze select count(*) from test_3pd_cstore_partitioned where age > 50;

 Finalize Aggregate  (cost=558860.96..558860.97 rows=1 width=8) (actual time=8528.509..8538.627 rows=1 loops=1)
   ->  Gather  (cost=56918.65..558860.94 rows=7 width=8) (actual time=1863.462..8538.603 rows=64 loops=1)
         Workers Planned: 7
         Workers Launched: 7
         ->  Parallel Append  (cost=55918.65..557860.24 rows=1 width=8) (actual time=1877.875..8417.486 rows=8
loops=8)
[...]
 Execution Time: 8565.734 ms

AFTER:

postgres=# explain analyze select count(*) from test_3pd_cstore_partitioned where age > 50;
 Finalize Aggregate  (cost=57030.20..57030.21 rows=1 width=8) (actual time=1125.828..1129.958 rows=1 loops=1)
   ->  Gather  (cost=56611.92..57029.96 rows=96 width=8) (actual time=994.708..1129.940 rows=64 loops=1)
         Workers Planned: 96
         Workers Launched: 96
         ->  Parallel Append  (cost=55611.92..56020.36 rows=1 width=8) (actual time=656.027..656.051 rows=1 loops=97)
[...]
 Execution Time: 1133.810 ms

Should I try to submit a patch that adds support for "alter table test_3pd_cstore_partitioned set (parallel_workers =
96)"for partitioned tables?
 

Best,
Seamus

PS. The awesome thing about this is that I'm table scanning a 270 million row, 600 column table in 1.2 seconds, which I
neverthought was possible.
 

PPS. I have no idea why 96 worked better than 64 (the number of cores I have), but it did - 700% vs 400%.

On Sat, Feb 13, 2021, at 7:09 PM, Seamus Abshere wrote:
> hi,
> 
> I've traced this back to the formula for Parallel Append workers - 
> log2(partitions).
> 
> The comment from Robert says: (src/backend/optimizer/path/allpaths.c)
> 
>         /*
>          * If the use of parallel append is permitted, always request at least
>          * log2(# of children) workers. 
> 
> In my case, every partition takes 1 second to scan, I have 64 cores, I 
> have 64 partitions, and the wall time is 8 seconds with 8 workers.
> 
> I assume that if it it planned significantly more workers (16? 32? even 
> 64?), it would get significantly faster (even accounting for 
> transaction cost). So why doesn't it ask for more? Note that I've set 
> max_parallel_workers=512, etc. (postgresql.conf in my first message).
> 
> Here are full plans 
> https://gist.github.com/seamusabshere/1c1d869ee24145689f45e4124d8730d3
> 
> Best,
> Seamus
> 
> On Fri, Feb 12, 2021, at 4:50 PM, Seamus Abshere wrote:
> > hi,
> > 
> > How can I convince Postgres to use more than 8 cores?
> > 
> > I've got an r6gd.16xlarge with 64 vCpus and 512gb RAM running Postgres 
> > 13.1 on Ubuntu 20.04.
> > 
> > CREATE TABLE tbl (
> >   [...]
> > ) PARTITION BY HASH (address_key);
> > 
> > It has 64 partitions - e.g. FOR VALUES WITH (modulus 64, remainder 0) etc.
> > 
> > We're running `SELECT COUNT(*) FROM tbl`.
> > 
> > I've watched top and I never see more than 8 cores going 100%.
> > 
> > Here is my (admittedly ridiculous) postgresql.conf:
> > 
> > checkpoint_completion_target = 0.9
> > data_directory='/tank/postgresql/13/main'
> > default_statistics_target = 100
> > effective_cache_size = 381696MB
> > effective_io_concurrency = 200
> > enable_partition_pruning=on
> > enable_partitionwise_aggregate=on
> > enable_partitionwise_join=on
> > listen_addresses='*'
> > maintenance_work_mem = 2GB
> > max_connections = 200
> > max_parallel_maintenance_workers = 4
> > max_parallel_workers = 512
> > max_parallel_workers_per_gather = 512
> > max_wal_size = 4GB
> > max_worker_processes = 512
> > min_wal_size = 1GB
> > random_page_cost = 1.1
> > shared_buffers = 127232MB
> > shared_preload_libraries = 'cstore_fdw'
> > synchronous_commit=off
> > wal_buffers = 16MB
> > work_mem = 1628560kB
> > 
> > Best,
> > Seamus

В списке pgsql-general по дате отправления:

Предыдущее

От: David Gauthier
Дата: 14 февраля 2021 г., 03:09:23
Сообщение: ODBC message "server closed the connection unexpectedly" when accessing a PG/11.3 DB

Следующее

От: David Rowley
Дата: 14 февраля 2021 г., 12:47:30
Сообщение: Re: Why is Postgres only using 8 cores for partitioned count? [Parallel Append]

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

700% faster (was: Re: Why is Postgres only using 8 cores for partitioned count? [Parallel append])

Предыдущее

Следующее