Unexpectedly huge memory usage (over 180x of work_mem) during hash join... confirmed by TopMemoryContext results (postgresql 13.10)

Поиск
Список
Период
Сортировка
От Maxim Boguk
Тема Unexpectedly huge memory usage (over 180x of work_mem) during hash join... confirmed by TopMemoryContext results (postgresql 13.10)
Дата
Msg-id CAK-MWwQFXw5KgvdYtWLCBsTGaeAn4Xx8_wGs8XuM0KLxLoVzew@mail.gmail.com
обсуждение исходный текст
Список pgsql-general
Hi,

During investigation of the OOM server crash (with more than 30GB available memory) I found it was triggered by a single query.

related settings:
work_mem=32MB
hash_mem_multiplier=1
max_parallel_workers_per_gather=3

Now the result of the investigation performed on an idle server with no other activity on it.

Now to results:

Query and explain (analyze, costs, buffers, timing) results:

explain (analyze, costs, buffers, timing) select * from employer_manager WHERE employer_id IN (
            SELECT employer_id
            FROM
              employer e
            WHERE
            EXISTS (
                SELECT 1
                FROM employer_service es
                     JOIN account_service_package_entry aspe
                       ON (aspe.service_package_id = es.employer_service_id)
                     JOIN account_service_event ase
                       ON (ase.account_service_package_entry_id = aspe.id)
                     JOIN vacancy v
                       ON (ase.object_id = v.vacancy_id)
                WHERE v.employer_id = e.employer_id AND es.price = 0
              )
          );

                                                                                                QUERY PLAN                                                                                                
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Gather  (cost=12483578.55..12699754.67 rows=4951913 width=183) (actual time=402888.266..541201.083 rows=3456146 loops=1)
   Workers Planned: 3
   Workers Launched: 3
   Buffers: shared hit=10227924, temp read=5404033 written=9746488
   ->  Parallel Hash Semi Join  (cost=12483478.55..12650135.54 rows=1597391 width=183) (actual time=402879.754..403927.356 rows=864036 loops=4)
         Hash Cond: (employer_manager.employer_id = v.employer_id)
         Buffers: shared hit=10227924, temp read=5404033 written=9746488
         ->  Parallel Seq Scan on employer_manager  (cost=0.00..61704.37 rows=3902987 width=183) (actual time=0.015..926.040 rows=3030730 loops=4)
               Buffers: shared hit=226745
         ->  Parallel Hash  (cost=12470696.28..12470696.28 rows=1173692 width=8) (actual time=400745.411..400757.615 rows=315704 loops=4)
               Buckets: 1048576  Batches: 8  Memory Usage: 14432kB
               Buffers: shared hit=10000933, temp read=5185403 written=9531604
               ->  Parallel Hash Semi Join  (cost=5419572.63..12470696.28 rows=1173692 width=8) (actual time=381755.186..400673.065 rows=315704 loops=4)
                     Hash Cond: (e.employer_id = v.employer_id)
                     Buffers: shared hit=10000933, temp read=5185403 written=9527768
                     ->  Parallel Seq Scan on employer e  (cost=0.00..65876.37 rows=2986447 width=4) (actual time=0.015..547.631 rows=2313235 loops=4)
                           Buffers: shared hit=360119
                     ->  Parallel Hash  (cost=4389870.76..4389870.76 rows=94549378 width=4) (actual time=300367.731..300370.699 rows=10515574 loops=4)
                           Buckets: 1048576 (originally 1048576)  Batches: 131072 (originally 512)  Memory Usage: 287136kB
                           Buffers: shared hit=9640814, temp read=4068338 written=7441540
                           ->  Parallel Hash Join  (cost=1522656.32..4389870.76 rows=94549378 width=4) (actual time=113906.860..121650.966 rows=10515574 loops=4)
                                 Hash Cond: (aspe.service_package_id = es.employer_service_id)
                                 Buffers: shared hit=9640814, temp read=3570214 written=3573964
                                 ->  Parallel Hash Join  (cost=1139337.57..3828356.61 rows=132999548 width=8) (actual time=87792.230..102392.631 rows=60107033 loops=4)
                                       Hash Cond: (ase.account_service_package_entry_id = aspe.id)
                                       Buffers: shared hit=8740464, temp read=2645196 written=2648224
                                       ->  Parallel Hash Join  (cost=669912.02..3179027.68 rows=132999548 width=8) (actual time=44539.790..74472.198 rows=60107033 loops=4)
                                             Hash Cond: (ase.object_id = v.vacancy_id)
                                             Buffers: shared hit=7347306, temp read=1656073 written=1657576
                                             ->  Parallel Seq Scan on account_service_event ase  (cost=0.00..2275263.13 rows=172006123 width=8) (actual time=0.011..20798.324 rows=133306300 loops=4)
                                                   Buffers: shared hit=5552019
                                             ->  Parallel Hash  (cost=414267.15..414267.15 rows=23473845 width=8) (actual time=7923.194..7923.195 rows=18198520 loops=4)
                                                   Buckets: 1048576  Batches: 128  Memory Usage: 30528kB
                                                   Buffers: shared hit=1795287, temp written=247932
                                                   ->  Parallel Seq Scan on vacancy v  (cost=0.00..414267.15 rows=23473845 width=8) (actual time=0.011..5221.328 rows=18198520 loops=4)
                                                         Buffers: shared hit=1795287
                                       ->  Parallel Hash  (cost=297333.92..297333.92 rows=15801812 width=8) (actual time=4352.140..4352.141 rows=12280833 loops=4)
                                             Buckets: 1048576  Batches: 128  Memory Usage: 23296kB
                                             Buffers: shared hit=1393158, temp written=167644
                                             ->  Parallel Seq Scan on account_service_package_entry aspe  (cost=0.00..297333.92 rows=15801812 width=8) (actual time=0.015..2520.045 rows=12280833 loops=4)
                                                   Buffers: shared hit=1393158
                                 ->  Parallel Hash  (cost=258846.24..258846.24 rows=11429325 width=4) (actual time=3424.467..3424.468 rows=8908616 loops=4)
                                       Buckets: 1048576  Batches: 64  Memory Usage: 30080kB
                                       Buffers: shared hit=900350, temp written=103252
                                       ->  Parallel Seq Scan on employer_service es  (cost=0.00..258846.24 rows=11429325 width=4) (actual time=0.010..2087.361 rows=8908616 loops=4)
                                             Filter: (price = 0)
                                             Rows Removed by Filter: 3624132
                                             Buffers: shared hit=900350
 Planning:
   Buffers: shared hit=236
 Planning Time: 25.494 ms
 Execution Time: 541359.085 ms

There are no big missestimations in rows, but during moments of high memory usage each parallel worker used over 5GB of RAM (so total over 20GB of RAM 3 worker + leader).

Memory context results of single worker:

TopMemoryContext: 3357032 total in 16 blocks; 46064 free (16 chunks); 3310968 used
...
  TopPortalContext: 8192 total in 1 blocks; 7656 free (0 chunks); 536 used
    PortalHoldContext: 24632 total in 2 blocks; 7384 free (0 chunks); 17248 used
    PortalContext: 540672 total in 69 blocks; 8568 free (10 chunks); 532104 used:
      ExecutorState: 1082567568 total in 124072 blocks; 4073608 free (7018 chunks); 1078493960 used
        HashTableContext: 74808 total in 5 blocks; 25008 free (3 chunks); 49800 used
          HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
        HashTableContext: 74808 total in 5 blocks; 25056 free (3 chunks); 49752 used
          HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
        HashTableContext: 40960 total in 3 blocks; 12328 free (6 chunks); 28632 used
          HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
        HashTableContext: 4156883208 total in 124074 blocks; 146264 free (30 chunks); 4156736944 used
          HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
        HashTableContext: 8192 total in 1 blocks; 4376 free (7 chunks); 3816 used
          HashBatchContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
        ExprContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
...
        ExprContext: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used
  Relcache by OID: 16384 total in 2 blocks; 1424 free (2 chunks); 14960 used
  CacheMemoryContext: 1163704 total in 14 blocks; 197832 free (0 chunks); 965872 used
    relation rules: 24576 total in 6 blocks; 4896 free (1 chunks); 19680 used: pg_settings
    index info: 2048 total in 2 blocks; 528 free (1 chunks); 1520 used: pg_toast_2619_index
....
    48 more child contexts containing 91136 total in 83 blocks; 27632 free (45 chunks); 63504 used
  PrivateRefCount: 8192 total in 1 blocks; 2616 free (0 chunks); 5576 used
  MdSmgr: 8192 total in 1 blocks; 3880 free (2 chunks); 4312 used
  LOCALLOCK hash: 32768 total in 3 blocks; 16824 free (8 chunks); 15944 used
  Timezones: 104120 total in 2 blocks; 2616 free (0 chunks); 101504 used
  ErrorContext: 8192 total in 1 blocks; 7928 free (4 chunks); 264 used
Grand total: 5245755200 bytes in 248596 blocks; 4995280 free (7224 chunks); 5240759920 used

Also during query execution there was huge amount of small temp files:
find ~postgres/13/main/base/pgsql_tmp/ -type f | wc -l
1459383


There are two questions:
1)is this behavior normal/expected?
2)is there any way to limit memory usage of such queries to prevent OOM in random times
(memory usage of almost 200x of work_mem very surprised me).

PS: after crash recovery of a database with a few millions of files in pgsql_tmp takes ages (however it is fixed in newer versions so it's not a critical issue).


--
Maxim Boguk
Senior Postgresql DBA
https://dataegret.com/

Phone UA: +380 99 143 0000
Phone AU: +61  45 218 5678

В списке pgsql-general по дате отправления:

Предыдущее
От: Laurenz Albe
Дата:
Сообщение: Re: vacuum TOAST tables
Следующее
От: Marc Millas
Дата:
Сообщение: Re: missing something about json syntax