Re: Optimize planner memory consumption for huge arrays

Поиск
Список
Период
Сортировка
От Lepikhov Andrei
Тема Re: Optimize planner memory consumption for huge arrays
Дата
Msg-id 1108a71a-e65e-41a5-81ab-beccc80c3628@app.fastmail.com
обсуждение исходный текст
Ответ на Re: Optimize planner memory consumption for huge arrays  (Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>)
Ответы Re: Optimize planner memory consumption for huge arrays
Список pgsql-hackers

On Wed, Sep 6, 2023, at 8:09 PM, Ashutosh Bapat wrote:
> Hi Lepikhov,
>
> Thanks for using my patch and I am glad that you found it useful.
>
> On Mon, Sep 4, 2023 at 10:56 AM Lepikhov Andrei
> <a.lepikhov@postgrespro.ru> wrote:
>>
>> Hi, hackers,
>>
>> Looking at the planner behaviour with the memory consumption patch [1], I figured out that arrays increase memory
consumptionby the optimizer significantly. See init.sql in attachment. 
>> The point here is that the planner does small memory allocations for each element during estimation. As a result, it
lookslike the planner consumes about 250 bytes for each integer element. 
>
> I guess the numbers you mentioned in init.sql are total memory used by
> the planner (as reported by the patch in the thread) when planning
> that query and not memory consumed by Const nodes themselves. Am I
> right? I think the measurements need to be explained better and also
> the realistic scenario you are trying to oprimize.

Yes, it is the total memory consumed by the planner - I used the numbers generated by your patch [1]. I had been
increasingthe number of elements in the array to exclude the memory consumed by the planner for other purposes. As you
cansee, the array with 1 element consumes 12kB of memory, 1E4 elements - 2.6 MB. All of that memory increment is
relatedto the only enlargement of this array. (2600-12)/10 = 260 bytes. So, I make a conclusion: each 4-byte element
producesa consumption of 260 bytes of memory. 
This scenario I obtained from the user complaint - they had strict restrictions on memory usage and were stuck in this
unusualmemory usage case. 

> I guess, the reason you think that partitioning will increase the
> memory consumed is because each partition will have the clause
> translated for it. Selectivity estimation for each partition will
> create those many Const nodes and hence consume memory. Am I right?

Yes.

> Can you please measure the memory consumed with and without your
> patch.

Done. See test case and results in 'init_parts.sql' in attachment. Short summary below. I varied a number of elements
from1 to 10000 and partitions from 1 to 100. As you can see, partitioning adds a lot of memory consumption by itself.
Butwe see an effect from patch also. 

master:
elems    1        1E1        1E2        1E3        1E4
parts
1        28kB    50kB    0.3MB    2.5MB    25MB
10        45kB    143kB    0.6MB    4.8MB    47MB
100        208kB    125kB    3.3MB    27MB    274MB

patched:
elems    1        1E1        1E2        1E3        1E4
parts
1        28kB    48kB    0.25MB    2.2MB    22.8MB
10        44kB    100kB    313kB    2.4MB    23.7MB
100        208kB    101kB    0.9MB    3.7MB    32.4MB

Just for comparison, without partitioning:
elems    1        1E1        1E2        1E3        1E4
master:    12kB    14kB    37kB    266kB    2.5MB
patched:    12kB    11.5kB    13kB    24kB    141kB

>> It is maybe not a problem most of the time. However, in the case of partitions, memory consumption multiplies by
eachpartition. Such a corner case looks weird, but the fix is simple. So, why not? 
>
> With vectorized operations becoming a norm these days, it's possible
> to have thousands of element in array of an ANY or IN clause. Also
> will be common to have thousands of partitions. But I think what we
> need to do here is to write a selectivity estimation function which
> takes an const array and return selectivity without requiring to
> create a Const node for each element.

Maybe you're right. Could you show any examples of vectorized usage of postgres to understand your idea more clearly?
Here I propose only quick simple solution. I don't think it would change the way of development.

>> The diff in the attachment is proof of concept showing how to reduce wasting of memory. Having benchmarked a bit, I
didn'tfind any overhead. 
>>
>
> You might want to include your benchmarking results as well.

Here is nothing interesting. pgbench TPS and planning time for the cases above doesn't change planning time.

[1] Report planning memory in EXPLAIN ANALYZE

--
Regards,
Andrei Lepikhov
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Eager page freeze criteria clarification
Следующее
От: Kyotaro Horiguchi
Дата:
Сообщение: Re: pg_ctl start may return 0 even if the postmaster has been already started on Windows