Обсуждение: Hanging with pg_restore and large objects

Поиск
Список
Период
Сортировка

Hanging with pg_restore and large objects

От
"Reuven M. Lerner"
Дата:

Hi, everyone.  I'm working on a project that is using 8.3.0; among other things, I'm helping them to move to 9.0.  The project is running on Windows XP.


Someone from this project asked me earlier today why a particular database restore was taking a long time.  How long?  Well, it has been running for 1.5 days (yes, that's 36 hours).  The restore is running under Windows XP, and the backup was done using pg_dump into the custom/binary format.  The dumpfile was about 140 MB in size. 


Using the Windows process monitor, we saw that pg_restore was using about 50 percent of the CPU, doing an enormous (about 60 billion, by this point) reads from the disk, but zero writes.  The dumpfile does contain a number of large (binary) objects, as well as a number of regular tables with integer and textual content.  The restore was run with the -a (data only) flag, on an empty database schema.


We tried to replicate this problem on another, similarly equipped machine, adding the -c (clean before restoring), -e (exit upon error), and -v (verbose) flags.  We saw that the restore hung (for about 30 minutes, as of this writing) while loading one of the large objects from the restore.


We tried to use pg_restore on the dumpfile, but found that it hung when restoring the same large object.  It's not even close to the first large object, and I don't believe that it's the last one, either.


My guess is pg_dump in 8.3 is somehow causing a problem in the dumpfile on or around that large object.


So:

  • Is this a known problem on PostgreSQL 8.3, Windows, or the combination?
  • Is there an easy way to identify problems, corruption, and the like in our pg_dump file?
  • Should we be using a different type of dumpfile, such as text, to get around this problem for now?
  • Is there any obvious way to diagnose or work around this problem?
  • I don't believe that there's a way to tell either pg_dump or pg_restore to ignore objects with particular OIDs.  Am I right?
Thanks in advance for any help you can offer,

Reuven

-- 
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner

Re: Hanging with pg_restore and large objects

От
Alban Hertroys
Дата:
On 7 Dec 2010, at 14:51, Reuven M. Lerner wrote:
> My guess is pg_dump in 8.3 is somehow causing a problem in the dumpfile on or around that large object.

Which version of pg_dump did you use? The one that came with the 9.0 install or the one from the old 8.3 one? It should
havebeen the first of these two. 

Regardless, IMHO it shouldn't behave like it does for you now. It doesn't look like it's waiting on a lock elsewhere
(Notwith a CPU load like that), so it should either refuse or bail out, not enter some seemingly endless loop. 

Alban Hertroys

--
If you can't see the forest for the trees,
cut the trees and you'll see there is no forest.


!DSPAM:737,4cfe7a60802658365545708!



Re: Hanging with pg_restore and large objects

От
Tom Lane
Дата:
"Reuven M. Lerner" <reuven@lerner.co.il> writes:
> Using the Windows process monitor, we saw that pg_restore was using
> about 50 percent of the CPU, doing an enormous (about 60 billion, by
> this point) reads from the disk, but zero writes.  The dumpfile does
> contain a number of large (binary) objects, as well as a number of
> regular tables with integer and textual content.  The restore was run
> with the -a (data only) flag, on an empty database schema.

That's pretty curious.  Can you take the dump file to a non-Windows
machine, or at least one with a different build of pg_restore, and
see what happens there?  I'm wondering about possible corrupted
executable, buggy zlib, etc.

>     * Is this a known problem on PostgreSQL 8.3, Windows, or the
>       combination?

No, not that I've heard of.  The most likely theory seems to be that the
dump file is corrupt somehow.

>     * Is there any obvious way to diagnose or work around this problem?

Well, it'd be interesting to trace through it with a debugger.  Ideally
you shouldn't get an infinite loop (as this seems to be) even with
corrupt input.  Is the data sufficiently non-proprietary that you'd be
willing to show the dump file to someone else?

>     * I don't believe that there's a way to tell either pg_dump or
>       pg_restore to ignore objects with particular OIDs.  Am I right?

Try using pg_restore -l, edit the list file, pg_restore -L.  That would
let you get the remaining data out, and would also give evidence of
whether there is corruption in the structure of the dump file or just
in the data of the one BLOB.

            regards, tom lane

Re: Hanging with pg_restore and large objects

От
"Reuven M. Lerner"
Дата:
Hi, everyone.  Alban wrote:
> Which version of pg_dump did you use? The one that came with the 9.0 install or the one from the old 8.3 one? It
shouldhave been the first of these two. 
The dump was done by someone using the old, existing system, which runs
under 8.3.  She complained that she was having problems (i.e., the
36-hour restore), and after we looked at it, we decided to take the file
and try to import it into our 9.0 database with pg_restore 9.0, just to
see what was going on.

It's quite possible that the pg_dump from 9.0 would have done a better
or more intelligent job, but we can't switch it into widespread use
right now.  (We have 8.3 installed in a closed-box product that's
physically distributed to customers.)  We can mix and match 8.3 and 9.0
in the development lab, but not on a widespread scale.

Reuven

--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner


Re: Hanging with pg_restore and large objects

От
Tom Lane
Дата:
"Reuven M. Lerner" <reuven@lerner.co.il> writes:
> Hi, everyone.  Alban wrote:
>> Which version of pg_dump did you use? The one that came with the 9.0 install or the one from the old 8.3 one? It
shouldhave been the first of these two. 

> The dump was done by someone using the old, existing system, which runs
> under 8.3.

Hmmm ... I wonder whether this is related to the known problem that
8.3's pg_dump doesn't correctly detect file seekability under Windows:
http://archives.postgresql.org/pgsql-hackers/2010-06/msg01227.php
That means the dump file will not contain any data offset pointers.
Up to now that was only known to cause issues for parallel pg_restore,
but maybe you found another case.  But that's just a hypothesis, and a
quick test here doesn't seem to support it: I can still do pg_restore -a
from a blob-containing dump that I forced to not have data offsets.
OTOH I'm not using Windows.

Does it work any better if you use 9.0's pg_dump to dump from the 8.3
server?

            regards, tom lane

Re: Hanging with pg_restore and large objects

От
"Reuven M. Lerner"
Дата:
Hi. Tom.  You wrote:

> That's pretty curious.  Can you take the dump file to a non-Windows
> machine, or at least one with a different build of pg_restore, and
> see what happens there?  I'm wondering about possible corrupted
> executable, buggy zlib, etc.
I'll try to get a copy of the problematic data file to my Unix box in
the coming days, and will report back on what happens.
> No, not that I've heard of.  The most likely theory seems to be that the
> dump file is corrupt somehow.
This raises a question that came up during our discussion of this
problem: Is there a way to verify that a dumpfile was not corrupt?
That is, without having to run pg_restore on the entire file, only to
discover that the end is missing data.  I haven't encountered
data-recovery problems of this sort before, but it does surprise me that
PostgreSQL doesn't check the integrity of the file before trying to read
and then apply it.
>>      * Is there any obvious way to diagnose or work around this problem?
> Well, it'd be interesting to trace through it with a debugger.  Ideally
> you shouldn't get an infinite loop (as this seems to be) even with
> corrupt input.  Is the data sufficiently non-proprietary that you'd be
> willing to show the dump file to someone else?
I'm guessing that if we have dummy data in there, then we can share it.
I'll get back to you about this in the coming day or two.  Thanks for
the offer!

Reuven

--
Reuven M. Lerner -- Web development, consulting, and training
Mobile: +972-54-496-8405 * US phone: 847-230-9795
Skype/AIM: reuvenlerner