Обсуждение: Memory exhaustion during bulk insert

Поиск
Список
Период
Сортировка

Memory exhaustion during bulk insert

От
Xin Wang
Дата:
Hi all,

I'm doing an experimental project using Postgres as the prototype.
I want to store attribute values of xml type in an internal XML table
which is created for every XML column. One XML node (element,
attribute or text) is stored as a tuple in the XML table. While
a 127MB XML document 'dblp.xml' (that has about 4 million XML nodes
thus 4 million tuples) is being stored, 2GB memory is exhausted rapidly
and then my computer hangs up. I guess the reason is that the memory
runs out before the transaction is committed because the number of
tuples being inserted is too large.

The flow of tuple insertion and functions called are as follows:

while (get the next XML node != NULL)
{
/* fill in values and isnull array */
...
tup = heap_form_tuple(tupleDesc, values, isnull);
simple_heap_insert(xmlTable, tup);
...
heap_freetuple(tup);
}

I searched the mailinglist archive and noticed that a patch to improve
bulk insert performance is committed in Nov 2008. The log message said

"(the patch) keeps the current target buffer pinned and make it work
in a small ring of buffers to avoid having bulk inserts trash the whole
buffer arena."

However, I do not know much about the code below the heapam layer. Can that
patch solve my problem (the version I use is 8.3.5)? Or could you give me
some suggestion about how to avoid memory exhaustion during bulk insert
(in the meanwhile it must clean up nicely after a transaction abort)?

Thanks in advance.
Regards,


Re: Memory exhaustion during bulk insert

От
Tom Lane
Дата:
Xin Wang <andywx@gmail.com> writes:
> I searched the mailinglist archive and noticed that a patch to improve
> bulk insert performance is committed in Nov 2008. The log message said
> "(the patch) keeps the current target buffer pinned and make it work
> in a small ring of buffers to avoid having bulk inserts trash the whole
> buffer arena."
> However, I do not know much about the code below the heapam layer. Can that
> patch solve my problem (the version I use is 8.3.5)?

No.  You have a memory leak to fix.  I suspect you need to be paying
attention to evaluating the successive tuple values in a short-term
memory context that you can reset on each cycle.  There are other
possibilities though --- looking at the memory map produced on an
out-of-memory error would help narrow down the problem.  (If the thing
"hangs up" without producing such an error, that's the *first* problem
to solve.  It could be that it's not so much hanging up as going into
swap hell; in which case I'd suggest running the postmaster under a more
restrictive ulimit, so that it fails before starting to swap.)
        regards, tom lane


Re: Memory exhaustion during bulk insert

От
Alvaro Herrera
Дата:
Xin Wang escribió:

> However, I do not know much about the code below the heapam layer. Can that
> patch solve my problem (the version I use is 8.3.5)? Or could you give me
> some suggestion about how to avoid memory exhaustion during bulk insert
> (in the meanwhile it must clean up nicely after a transaction abort)?

Try 8.3.7.  There have been memory leak fixes in XML code lately, which
maybe solve your problem.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.