Обсуждение: pg_dump: largeobject behavior issues (possible bug)

Поиск
Список
Период
Сортировка

pg_dump: largeobject behavior issues (possible bug)

От
"Joshua D. Drake"
Дата:
Hello,

I have been working a problem with Andrew Gierth (sp?) in regards to 
pg_dump. Here is the basic breakdown:

FreeBSD 10.1
PostgreSQL 9.3.6
64GB ~ memory
500GB database
228G of largeobjects (106M objects)

The database dumps fine as long as we don't dump large objects. However, 
if we try to dump the large objects, FreeBSD will kill pg_dump as it 
will consume all free memory and swap. With Andrew's help we were able 
to determine the following:

There is a memory cost of about 160 bytes per largeobject. Based on the 
number of largeobjects we have that would be about 16GB of memory. Also 
when pg_dump is reading in the largobject list there is a point where 
pg_dump has a PGresult containing the entire contents of 
pg_largeobject_metadata and a malloc of an array where it is going to 
copy the data to. That could easily get above the 40G thus causeFreeBSD 
to kill the process.

tl;dr

The memory issue comes down to the fact that in the prep stage, pg_dump 
creates a TOC entry for every individual large object.

It seems that pg_dump should be much more efficient about dumping these
objects.

Sincerely,

JD


-- 
Command Prompt, Inc. - http://www.commandprompt.com/  503-667-4564
PostgreSQL Centered full stack support, consulting and development.
Announcing "I'm offended" is basically telling the world you can't
control your own emotions, so everyone else should do it for you.



Re: pg_dump: largeobject behavior issues (possible bug)

От
Andrew Gierth
Дата:
>>>>> "Joshua" == Joshua D Drake <jd@commandprompt.com> writes:
Joshua> The database dumps fine as long as we don't dump largeJoshua> objects. However, if we try to dump the large
objects,FreeBSDJoshua> will kill pg_dump as it will consume all free memory andJoshua> swap. With Andrew's help we were
ableto determine theJoshua> following:
 
Joshua> There is a memory cost of about 160 bytes per largeobject.

I may have the exact number here wrong, it was just a quick eyeball of
the data structures (and depends on malloc overheads anyway).

The relevant code is getBlobs in pg_dump.c, which queries the whole of
pg_largeobject_metadata without using a cursor (so the PGresult is
already huge thanks to having >100 million rows), and then mallocs a
BlobInfo array and populates it from the PGresult, also using pg_strdup
for the oid string, owner name, and ACL if any.

-- 
Andrew (irc:RhodiumToad)



Re: pg_dump: largeobject behavior issues (possible bug)

От
Andrew Dunstan
Дата:
On 04/23/2015 04:04 PM, Andrew Gierth wrote:
>>>>>> "Joshua" == Joshua D Drake <jd@commandprompt.com> writes:
>   Joshua> The database dumps fine as long as we don't dump large
>   Joshua> objects. However, if we try to dump the large objects, FreeBSD
>   Joshua> will kill pg_dump as it will consume all free memory and
>   Joshua> swap. With Andrew's help we were able to determine the
>   Joshua> following:
>
>   Joshua> There is a memory cost of about 160 bytes per largeobject.
>
> I may have the exact number here wrong, it was just a quick eyeball of
> the data structures (and depends on malloc overheads anyway).
>
> The relevant code is getBlobs in pg_dump.c, which queries the whole of
> pg_largeobject_metadata without using a cursor (so the PGresult is
> already huge thanks to having >100 million rows), and then mallocs a
> BlobInfo array and populates it from the PGresult, also using pg_strdup
> for the oid string, owner name, and ACL if any.
>


I'm surprised this hasn't come up before. I have a client that I 
persuaded to convert all their LOs to bytea fields because of problems 
with pg_dump handling millions of LOs, and kept them on an older 
postgres version until they made that change.

cheers

andrew






Re: pg_dump: largeobject behavior issues (possible bug)

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 04/23/2015 04:04 PM, Andrew Gierth wrote:
>> The relevant code is getBlobs in pg_dump.c, which queries the whole of
>> pg_largeobject_metadata without using a cursor (so the PGresult is
>> already huge thanks to having >100 million rows), and then mallocs a
>> BlobInfo array and populates it from the PGresult, also using pg_strdup
>> for the oid string, owner name, and ACL if any.

> I'm surprised this hasn't come up before. I have a client that I 
> persuaded to convert all their LOs to bytea fields because of problems 
> with pg_dump handling millions of LOs, and kept them on an older 
> postgres version until they made that change.

Yeah, this was brought up when we added per-large-object metadata; it was
obvious that that patch would cause pg_dump to choke on large numbers of
large objects.  The (perhaps rather lame) argument was that you wouldn't
have that many of them.

Given that large objects don't have any individual dependencies,
one could envision fixing this by replacing the individual large-object
DumpableObjects by a single placeholder to participate in the sort phase,
and then when it's time to dump that, scan the large objects using a
cursor and create/print/delete the information separately for each one.
This would likely involve some rather painful refactoring in pg_dump
however.
        regards, tom lane



Re: pg_dump: largeobject behavior issues (possible bug)

От
"Joshua D. Drake"
Дата:
On 04/24/2015 03:41 PM, Tom Lane wrote:

> Given that large objects don't have any individual dependencies,
> one could envision fixing this by replacing the individual large-object
> DumpableObjects by a single placeholder to participate in the sort phase,
> and then when it's time to dump that, scan the large objects using a
> cursor and create/print/delete the information separately for each one.
> This would likely involve some rather painful refactoring in pg_dump
> however.

Andrew G mentioned something about using a cursor for the main query 
that pulls the info. He said that it wasn't a solution but may be a 
bandaid (my words). Is that something we may want to look into as a stop 
gap?

>
>             regards, tom lane
>


-- 
The most kicking donkey PostgreSQL Infrastructure company in existence.
The oldest, the most experienced, the consulting company to the stars.
Command Prompt, Inc. http://www.commandprompt.com/ +1 -503-667-4564 -
24x7 - 365 - Proactive and Managed Professional Services!



Re: pg_dump: largeobject behavior issues (possible bug)

От
Andrew Dunstan
Дата:
On 04/24/2015 06:41 PM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 04/23/2015 04:04 PM, Andrew Gierth wrote:
>>> The relevant code is getBlobs in pg_dump.c, which queries the whole of
>>> pg_largeobject_metadata without using a cursor (so the PGresult is
>>> already huge thanks to having >100 million rows), and then mallocs a
>>> BlobInfo array and populates it from the PGresult, also using pg_strdup
>>> for the oid string, owner name, and ACL if any.
>> I'm surprised this hasn't come up before. I have a client that I
>> persuaded to convert all their LOs to bytea fields because of problems
>> with pg_dump handling millions of LOs, and kept them on an older
>> postgres version until they made that change.
> Yeah, this was brought up when we added per-large-object metadata; it was
> obvious that that patch would cause pg_dump to choke on large numbers of
> large objects.  The (perhaps rather lame) argument was that you wouldn't
> have that many of them.
>
> Given that large objects don't have any individual dependencies,
> one could envision fixing this by replacing the individual large-object
> DumpableObjects by a single placeholder to participate in the sort phase,
> and then when it's time to dump that, scan the large objects using a
> cursor and create/print/delete the information separately for each one.
> This would likely involve some rather painful refactoring in pg_dump
> however.


I think we need to think about this some more, TBH, I'm not convinced 
that the changes made back in 9.0 were well conceived. Having separate 
TOC entries for each LO seems wrong in principle, although I understand 
why it was done. For now, my advice would be to avoid use of 
pg_dump/pg_restore if you have large numbers of LOs. The good news is 
that these days there are alternative methods of doing backup / restore, 
albeit not 100% equivalent with pg_dump / pg_restore.

One useful thing might be to provide pg_dump with 
--no-blobs/--blobs-only switches so you could at least easily segregate 
the blobs into their own dump file. That would be in addition to dealing 
with the memory problems pg_dump has with millions of LOs, of course.


cheers

andrew



Re: pg_dump: largeobject behavior issues (possible bug)

От
Tom Lane
Дата:
Andrew Dunstan <andrew@dunslane.net> writes:
> On 04/24/2015 06:41 PM, Tom Lane wrote:
>> Yeah, this was brought up when we added per-large-object metadata; it was
>> obvious that that patch would cause pg_dump to choke on large numbers of
>> large objects.  The (perhaps rather lame) argument was that you wouldn't
>> have that many of them.

>> Given that large objects don't have any individual dependencies,
>> one could envision fixing this by replacing the individual large-object
>> DumpableObjects by a single placeholder to participate in the sort phase,
>> and then when it's time to dump that, scan the large objects using a
>> cursor and create/print/delete the information separately for each one.
>> This would likely involve some rather painful refactoring in pg_dump
>> however.

> I think we need to think about this some more, TBH, I'm not convinced 
> that the changes made back in 9.0 were well conceived. Having separate 
> TOC entries for each LO seems wrong in principle, although I understand 
> why it was done.

Perhaps.  One advantage of doing it this way is that you can get
pg_restore to extract a single LO from an archive file; though it's
debatable whether that's worth the potential resource-consumption hazards.
Another issue is that restore options such as --no-owner and
--no-privileges would not work for LOs (at least not without messy hacks)
if we go back to a scheme where all the LO information is just SQL
commands inside a single TOC object.

After further thought I realized that if we simply hack pg_dump to emit
the LOs in a streaming fashion, but keep the archive-file representation
the same as it is now, then we haven't really fixed the problem because
pg_restore is still likely to choke when it tries to read the archive's
TOC.  So my proposal above isn't enough either.

Perhaps what we need is some sort of "second-level TOC" which is only ever
processed in a streaming fashion, by both pg_dump and pg_restore.  This
would not support dependency resolution or re-ordering, but we don't need
those abilities for LOs.
        regards, tom lane



Re: pg_dump: largeobject behavior issues (possible bug)

От
Andrew Dunstan
Дата:
On 04/25/2015 12:32 PM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 04/24/2015 06:41 PM, Tom Lane wrote:
>>> Yeah, this was brought up when we added per-large-object metadata; it was
>>> obvious that that patch would cause pg_dump to choke on large numbers of
>>> large objects.  The (perhaps rather lame) argument was that you wouldn't
>>> have that many of them.
>>> Given that large objects don't have any individual dependencies,
>>> one could envision fixing this by replacing the individual large-object
>>> DumpableObjects by a single placeholder to participate in the sort phase,
>>> and then when it's time to dump that, scan the large objects using a
>>> cursor and create/print/delete the information separately for each one.
>>> This would likely involve some rather painful refactoring in pg_dump
>>> however.
>> I think we need to think about this some more, TBH, I'm not convinced
>> that the changes made back in 9.0 were well conceived. Having separate
>> TOC entries for each LO seems wrong in principle, although I understand
>> why it was done.
> Perhaps.  One advantage of doing it this way is that you can get
> pg_restore to extract a single LO from an archive file; though it's
> debatable whether that's worth the potential resource-consumption hazards.


In my view it isn't worth it.


> Another issue is that restore options such as --no-owner and
> --no-privileges would not work for LOs (at least not without messy hacks)
> if we go back to a scheme where all the LO information is just SQL
> commands inside a single TOC object.
>
> After further thought I realized that if we simply hack pg_dump to emit
> the LOs in a streaming fashion, but keep the archive-file representation
> the same as it is now, then we haven't really fixed the problem because
> pg_restore is still likely to choke when it tries to read the archive's
> TOC.  So my proposal above isn't enough either.


Yep, that's certainly true.

>
> Perhaps what we need is some sort of "second-level TOC" which is only ever
> processed in a streaming fashion, by both pg_dump and pg_restore.  This
> would not support dependency resolution or re-ordering, but we don't need
> those abilities for LOs.
>
>             


+1, I had a similar thought, half-formed, but you've expressed it better 
than I could have.

cheers

andrew