Re: Memory leak from ExecutorState context?

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: Memory leak from ExecutorState context?
Дата
Msg-id 7ea0562c-25d1-32e0-40b5-48127a6f49ce@garret.ru
обсуждение исходный текст
Ответ на Re: Memory leak from ExecutorState context?  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
Ответы Re: Memory leak from ExecutorState context?  (Melanie Plageman <melanieplageman@gmail.com>)
Список pgsql-hackers

On 11.04.2023 8:14 PM, Jehan-Guillaume de Rorthais wrote:
> On Sat, 8 Apr 2023 02:01:19 +0200
> Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote:
>
>> On Fri, 31 Mar 2023 14:06:11 +0200
>> Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote:
>>
>>   [...]
>>
>> After rebasing Tomas' memory balancing patch, I did some memory measures
>> to answer some of my questions. Please, find in attachment the resulting
>> charts "HJ-HEAD.png" and "balancing-v3.png" to compare memory consumption
>> between HEAD and Tomas' patch. They shows an alternance of numbers
>> before/after calling ExecHashIncreaseNumBatches (see the debug patch). I
>> didn't try to find the exact last total peak of memory consumption during the
>> join phase and before all the BufFiles are destroyed. So the last number
>> might be underestimated.
> I did some more analysis about the total memory consumption in filecxt of HEAD,
> v3 and v4 patches. My previous debug numbers only prints memory metrics during
> batch increments or hash table destruction. That means:
>
> * for HEAD: we miss the batches consumed during the outer scan
> * for v3: adds twice nbatch in spaceUsed, which is a rough estimation
> * for v4: batches are tracked in spaceUsed, so they are reflected in spacePeak
>
> Using a breakpoint in ExecHashJoinSaveTuple to print "filecxt->mem_allocated"
> from there, here are the maximum allocated memory for bufFile context for each
> branch:
>
>          batches   max bufFiles   total  spaceAllowed rise
>    HEAD    16384      199966960  ~194MB
>    v3       4096       65419456   ~78MB
>    v4(*3)   2048       34273280    48MB  nbatch*sizeof(PGAlignedBlock)*3
>    v4(*4)   1024       17170160  60.6MB  nbatch*sizeof(PGAlignedBlock)*4
>    v4(*5)   2048       34273280  42.5MB  nbatch*sizeof(PGAlignedBlock)*5
>
> It seems account for bufFile in spaceUsed allows a better memory balancing and
> management. The precise factor to rise spaceAllowed is yet to be defined. *3 or
> *4 looks good, but this is based on a single artificial test case.
>
> Also, note that HEAD is currently reporting ~4MB of memory usage. This is by
> far wrong with the reality. So even if we don't commit the balancing memory
> patch in v16, maybe we could account for filecxt in spaceUsed as a bugfix?
>
> Regards,

Thank you for the patch.
I  faced with the same problem (OOM caused by hash join).
I tried to create simplest test reproducing the problem:

create table t(pk int, val int);
insert into t values (generate_series(1,100000000),0);
set work_mem='64kB';
explain (analyze,buffers) select count(*) from t t1 join t t2 on 
(t1.pk=t2.pk);


There are three workers and size of each exceeds 1.3Gb.

Plan is the following:

  Finalize Aggregate  (cost=355905977972.87..355905977972.88 rows=1 
width=8) (actual time=2
12961.033..226097.513 rows=1 loops=1)
    Buffers: shared hit=32644 read=852474 dirtied=437947 written=426374, 
temp read=944407 w
ritten=1130380
    ->  Gather  (cost=355905977972.65..355905977972.86 rows=2 width=8) 
(actual time=212943.
505..226097.497 rows=3 loops=1)
          Workers Planned: 2
          Workers Launched: 2
          Buffers: shared hit=32644 read=852474 dirtied=437947 
written=426374, temp read=94
4407 written=1130380
          ->  Partial Aggregate (cost=355905976972.65..355905976972.66 
rows=1 width=8) (ac
tual time=212938.410..212940.035 rows=1 loops=3)
                Buffers: shared hit=32644 read=852474 dirtied=437947 
written=426374, temp r
ead=944407 written=1130380
                ->  Parallel Hash Join (cost=1542739.26..303822614472.65 
rows=20833345000002 width=0) (actual time=163268.274..207829.524 
rows=33333333 loops=3)
                      Hash Cond: (t1.pk = t2.pk)
                      Buffers: shared hit=32644 read=852474 
dirtied=437947 written=426374, temp read=944407 written=1130380
                      ->  Parallel Seq Scan on t t1 
(cost=0.00..859144.78 rows=41666678 width=4) (actual 
time=0.045..30828.051 rows=33333333 loops=3)
                            Buffers: shared hit=16389 read=426089 written=87
                      ->  Parallel Hash (cost=859144.78..859144.78 
rows=41666678 width=4) (actual time=82202.445..82202.447 rows=33333333 
loops=3)
                            Buckets: 4096 (originally 4096)  Batches: 
32768 (originally 8192)  Memory Usage: 192kB
                            Buffers: shared hit=16095 read=426383 
dirtied=437947 written=426287, temp read=267898 written=737164
                            ->  Parallel Seq Scan on t t2 
(cost=0.00..859144.78 rows=41666678 width=4) (actual 
time=0.054..12647.534 rows=33333333 loops=3)
                                  Buffers: shared hit=16095 read=426383 
dirtied=437947 writ
ten=426287
  Planning:
    Buffers: shared hit=69 read=38
  Planning Time: 2.819 ms
  Execution Time: 226113.292 ms
(22 rows)



-----------------------------

So we have increased number of batches to 32k.
I applied your patches 0001-0004 but unfortunately them have not reduced 
memory consumption - still size of each backend is more than 1.3Gb.

I wonder what can be the prefered solution of the problem?
We have to limit size of hash table which we can hold in memory.
And number of batches is calculate as inner relation size divided by 
hash table size.
So for arbitrary large inner relation we can get arbitrary large numer 
of batches which may consume arbitrary larger amount of memory.
We should either prohibit further increase of number of batches - it 
will not solve the problem completely but at least in the test above prevent
increase of number of batches from 8k to 32k, either prohibit use of 
hash join in this case at all (assign very high cost to this path).

Also I winder why do we crete so larger number of files for each batch?
Can it be reduced?





В списке pgsql-hackers по дате отправления:

Предыдущее
От: Justin Pryzby
Дата:
Сообщение: Re: Wrong results from Parallel Hash Full Join
Следующее
От: Tom Lane
Дата:
Сообщение: Re: LLVM strip -x fails