Re: BUG #15321: XMLTABLE leaks memory like crazy

Поиск
Список
Период
Сортировка
От Pavel Stehule
Тема Re: BUG #15321: XMLTABLE leaks memory like crazy
Дата
Msg-id CAFj8pRDWORKuuLEvFN-tMp06nuD5g_+yPAWWVW7umanoriEwwg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #15321: XMLTABLE leaks memory like crazy  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Ответы Re: BUG #15321: XMLTABLE leaks memory like crazy  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Список pgsql-bugs


2018-08-12 11:44 GMT+02:00 Andrew Gierth <andrew@tao11.riddles.org.uk>:
>>>>> "Pavel" == Pavel Stehule <pavel.stehule@gmail.com> writes:

 >> We still need a per-result-tuple memory context otherwise we're
 >> leaking whatever memory got allocated in each GetValue call into the
 >> per-input-value context. (We can use our projection econtext's
 >> per-tuple memory for this because we haven't done any evaluation of
 >> output items for the current cycle yet at the point this code is
 >> reached.)
 >>
 >> Again, look at functionscan for how this is supposed to work.

 Pavel> it is done by tuplestore_putvalues

No, tuplestore_putvalues is only responsible for the memory it allocates
itself, which belongs to the tuplestore, it has nothing to do with the
memory allocated by its caller - and XmlTableGetValue does quite a few
allocations.

Maybe I used wrong words. Sorry, my English lang is not good.

In master tfuncLoadRows switch to tstate->perValueCxt. You change it by switch econtext->ecxt_per_tuple_memory what is wrong I thing.

If we don't change memory context, we will stay inside  perValueCxt. And this context will be cleaned outside.



It's true that the only way (I think) to get a lot of result rows from
XMLTABLE is to use a large input value, the memory usage for which will
dwarf that of the output rows (a ~128MB xml value can easily mean using
~4.5GB of backend memory, which seems quite excessively profligate), but
my tests show that using that value, the lack of a per-GetValue context
reset costs perhaps another ~500MB on top for my specific testcase:

select count(*)
  from (select ('<rec xmlns="http://x">'
                || repeat('<o><c>foo</c></o>',8000000+(i%10))
                || '</rec>')::xml as content
          from generate_series(1,10) i offset 0) s,
       xmltable(xmlnamespaces('http://x' as x),
                'x:o'
                passing s.content
                columns
                col1 text path 'x:c');

 >> It's not useless at all; it's needed to avoid excess memory usage
 >> when a single XMLTABLE() call returns many rows.

 Pavel> When this context was not necessary before, then it is not need
 Pavel> be used now. Tuplestore does all work

Before, you were using perValueCxt and resetting it once per GetValue
call. Since my patch takes perValueCxt to use for another purpose
instead, it needs to be replaced, and econtext->ecxt_per_tuple_memory
is suitable for the job (and consistent with functionscan).


I think so using cxt_per_tuple_memory is not necessary - and my patch is working too.

Just I removed MemoryContextReset(tstate->perValueCxt) after tuplestore_putvalues. It is possible, because tstate->perValueCxt is cleaned immediately in caller tfuncFetchRows function.

 
The tuplestore does not do all the work - just look at XmlTableGetValue,
and notice that it has calls to text_to_cstring, pstrdup,
appendStringInfoText, InputFunctionCall, xml_xmlnodetoxmltype, all of
which will allocate memory in the current memory context. All of that
needs to be reset on a per-output-tuple basis.

But it has own  context or it can works under perValueCxt

I think using two memory contexts is not necessary, and with just one context, the code can be simpler.

Regards

Pavel



--
Andrew (irc:RhodiumToad)

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Andrew Gierth
Дата:
Сообщение: Re: BUG #15321: XMLTABLE leaks memory like crazy
Следующее
От: Andrew Gierth
Дата:
Сообщение: Re: BUG #15321: XMLTABLE leaks memory like crazy