Re: optimizing number of workers

Поиск
Список
Период
Сортировка
От Greg Hennessy
Тема Re: optimizing number of workers
Дата
Msg-id CA+mZaOMRBe4_szsyD-JVTQJ25ah6tOUdE_923ZffT-9zVjxqtQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: optimizing number of workers  ("Weck, Luis" <luis.weck@pismo.io>)
Ответы Re: optimizing number of workers
Список pgsql-general
Setting those values to zero (not something I'd want to do in production) changes the number of workes
from 10 to 13. At least something, but if anyone knows where discussion about how to use
large numbers of CPU's in postgresql are being held I'd appreciate learning about it.

Greg


On Fri, Jul 11, 2025 at 2:11 PM Weck, Luis <luis.weck@pismo.io> wrote:
From: Greg Hennessy <greg.hennessy@gmail.com>
Date: Thursday, July 10, 2025 at 4:40 PM
Subject: optimizing number of workers

Having just received a shiny new dual CPU machine to use as a postgresql
server, I'm trying to do some reasonable efforts to configure it correctly. The hard
ware has 128 cores, and I am running a VM with Redhat 9 and Postgresql  16.9.

In postgresql.conf I have:
max_worker_processes = 90               # (change requires restart)
max_parallel_workers_per_gather = 72    # gsh 26 oct 2022
max_parallel_maintenance_workers = 72   # gsh 12 jun 2025
max_parallel_workers =  72              # gsh 12 jun 2025
max_logical_replication_workers = 72    # gsh 12 jun 2025
max_sync_workers_per_subscription = 72   # gsh 12 jun 2025
autovacuum_max_workers = 12             # max number of autovacuum subprocesses

When I do a simple count of a large (large being 1.8 Billion entries), I get
about 10 workers used.

prod_v1_0_0_rc1=# explain (analyze, buffers) select count(*) from gaiadr3.gaia_source;
                                                                                         QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Finalize Aggregate  (cost=14379796.81..14379796.82 rows=1 width=8) (actual time=16702.806..16705.479 rows=1 loops=1)
   Buffers: shared hit=2507481
   ->  Gather  (cost=14379795.78..14379796.79 rows=10 width=8) (actual time=16702.513..16705.470 rows=11 loops=1)
         Workers Planned: 10
         Workers Launched: 10
         Buffers: shared hit=2507481
         ->  Partial Aggregate  (cost=14379785.78..14379785.79 rows=1 width=8) (actual time=16691.820..16691.821 rows=1 loops=11)
               Buffers: shared hit=2507481
               ->  Parallel Index Only Scan using gaia_source_nest128 on gaia_source  (cost=0.58..13926632.85 rows=181261171 width=0) (actual time=0.025..9559.644 rows=164700888 loops=11)
                     Heap Fetches: 0
                     Buffers: shared hit=2507481
 Planning:
   Buffers: shared hit=163
 Planning Time: 14.898 ms
 Execution Time: 16705.592 ms

Postgres has chosen to use only a small fraction of the CPU's I have on
my machine. Given the query returns an answer in about 8 seconds, it may be
that Postgresql has allocated the proper number of works. But if I wanted
to try to tweak some config parameters to see if using more workers
would give me an answer faster, I don't seem to see any obvious knobs
to turn. Are there parameters that I can adjust to see if I can increase
throughput? Would adjusting parallel_setup_cost or parallel_tuple_cost
likely to be of help? 
I believe you can decrease min_parallel_table_scan_size (default is 8MB) and min_parallel_index_scan_size (default 5112kB). The number of workers depends also on a multiple of these settings.

В списке pgsql-general по дате отправления: