Re: pg_dump: largeobject behavior issues (possible bug)

Поиск
Список
Период
Сортировка
От Andrew Dunstan
Тема Re: pg_dump: largeobject behavior issues (possible bug)
Дата
Msg-id 553BAB5A.9010707@dunslane.net
обсуждение исходный текст
Ответ на Re: pg_dump: largeobject behavior issues (possible bug)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: pg_dump: largeobject behavior issues (possible bug)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
On 04/24/2015 06:41 PM, Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> On 04/23/2015 04:04 PM, Andrew Gierth wrote:
>>> The relevant code is getBlobs in pg_dump.c, which queries the whole of
>>> pg_largeobject_metadata without using a cursor (so the PGresult is
>>> already huge thanks to having >100 million rows), and then mallocs a
>>> BlobInfo array and populates it from the PGresult, also using pg_strdup
>>> for the oid string, owner name, and ACL if any.
>> I'm surprised this hasn't come up before. I have a client that I
>> persuaded to convert all their LOs to bytea fields because of problems
>> with pg_dump handling millions of LOs, and kept them on an older
>> postgres version until they made that change.
> Yeah, this was brought up when we added per-large-object metadata; it was
> obvious that that patch would cause pg_dump to choke on large numbers of
> large objects.  The (perhaps rather lame) argument was that you wouldn't
> have that many of them.
>
> Given that large objects don't have any individual dependencies,
> one could envision fixing this by replacing the individual large-object
> DumpableObjects by a single placeholder to participate in the sort phase,
> and then when it's time to dump that, scan the large objects using a
> cursor and create/print/delete the information separately for each one.
> This would likely involve some rather painful refactoring in pg_dump
> however.


I think we need to think about this some more, TBH, I'm not convinced 
that the changes made back in 9.0 were well conceived. Having separate 
TOC entries for each LO seems wrong in principle, although I understand 
why it was done. For now, my advice would be to avoid use of 
pg_dump/pg_restore if you have large numbers of LOs. The good news is 
that these days there are alternative methods of doing backup / restore, 
albeit not 100% equivalent with pg_dump / pg_restore.

One useful thing might be to provide pg_dump with 
--no-blobs/--blobs-only switches so you could at least easily segregate 
the blobs into their own dump file. That would be in addition to dealing 
with the memory problems pg_dump has with millions of LOs, of course.


cheers

andrew



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: forward vs backward slashes in msvc build code
Следующее
От: Tom Lane
Дата:
Сообщение: Re: pg_dump: largeobject behavior issues (possible bug)