Обсуждение: Hanging with pg_restore and large objects
Hi, everyone. I'm working on a project that is using 8.3.0; among other things, I'm helping them to move to 9.0. The project is running on Windows XP.
Someone from this project asked me earlier today why a particular database restore was taking a long time. How long? Well, it has been running for 1.5 days (yes, that's 36 hours). The restore is running under Windows XP, and the backup was done using pg_dump into the custom/binary format. The dumpfile was about 140 MB in size.
Using the Windows process monitor, we saw that pg_restore was using about 50 percent of the CPU, doing an enormous (about 60 billion, by this point) reads from the disk, but zero writes. The dumpfile does contain a number of large (binary) objects, as well as a number of regular tables with integer and textual content. The restore was run with the -a (data only) flag, on an empty database schema.
We tried to replicate this problem on another, similarly equipped machine, adding the -c (clean before restoring), -e (exit upon error), and -v (verbose) flags. We saw that the restore hung (for about 30 minutes, as of this writing) while loading one of the large objects from the restore.
We tried to use pg_restore on the dumpfile, but found that it hung when restoring the same large object. It's not even close to the first large object, and I don't believe that it's the last one, either.
My guess is pg_dump in 8.3 is somehow causing a problem in the dumpfile on or around that large object.
So:
- Is this a known problem on PostgreSQL 8.3, Windows, or the combination?
- Is there an easy way to identify problems, corruption, and the like in our pg_dump file?
- Should we be using a different type of dumpfile, such as text, to get around this problem for now?
- Is there any obvious way to diagnose or work around this problem?
- I don't believe that there's a way to tell either pg_dump or pg_restore to ignore objects with particular OIDs. Am I right?
Reuven
-- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner
On 7 Dec 2010, at 14:51, Reuven M. Lerner wrote: > My guess is pg_dump in 8.3 is somehow causing a problem in the dumpfile on or around that large object. Which version of pg_dump did you use? The one that came with the 9.0 install or the one from the old 8.3 one? It should havebeen the first of these two. Regardless, IMHO it shouldn't behave like it does for you now. It doesn't look like it's waiting on a lock elsewhere (Notwith a CPU load like that), so it should either refuse or bail out, not enter some seemingly endless loop. Alban Hertroys -- If you can't see the forest for the trees, cut the trees and you'll see there is no forest. !DSPAM:737,4cfe7a60802658365545708!
"Reuven M. Lerner" <reuven@lerner.co.il> writes: > Using the Windows process monitor, we saw that pg_restore was using > about 50 percent of the CPU, doing an enormous (about 60 billion, by > this point) reads from the disk, but zero writes. The dumpfile does > contain a number of large (binary) objects, as well as a number of > regular tables with integer and textual content. The restore was run > with the -a (data only) flag, on an empty database schema. That's pretty curious. Can you take the dump file to a non-Windows machine, or at least one with a different build of pg_restore, and see what happens there? I'm wondering about possible corrupted executable, buggy zlib, etc. > * Is this a known problem on PostgreSQL 8.3, Windows, or the > combination? No, not that I've heard of. The most likely theory seems to be that the dump file is corrupt somehow. > * Is there any obvious way to diagnose or work around this problem? Well, it'd be interesting to trace through it with a debugger. Ideally you shouldn't get an infinite loop (as this seems to be) even with corrupt input. Is the data sufficiently non-proprietary that you'd be willing to show the dump file to someone else? > * I don't believe that there's a way to tell either pg_dump or > pg_restore to ignore objects with particular OIDs. Am I right? Try using pg_restore -l, edit the list file, pg_restore -L. That would let you get the remaining data out, and would also give evidence of whether there is corruption in the structure of the dump file or just in the data of the one BLOB. regards, tom lane
Hi, everyone. Alban wrote: > Which version of pg_dump did you use? The one that came with the 9.0 install or the one from the old 8.3 one? It shouldhave been the first of these two. The dump was done by someone using the old, existing system, which runs under 8.3. She complained that she was having problems (i.e., the 36-hour restore), and after we looked at it, we decided to take the file and try to import it into our 9.0 database with pg_restore 9.0, just to see what was going on. It's quite possible that the pg_dump from 9.0 would have done a better or more intelligent job, but we can't switch it into widespread use right now. (We have 8.3 installed in a closed-box product that's physically distributed to customers.) We can mix and match 8.3 and 9.0 in the development lab, but not on a widespread scale. Reuven -- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner
"Reuven M. Lerner" <reuven@lerner.co.il> writes: > Hi, everyone. Alban wrote: >> Which version of pg_dump did you use? The one that came with the 9.0 install or the one from the old 8.3 one? It shouldhave been the first of these two. > The dump was done by someone using the old, existing system, which runs > under 8.3. Hmmm ... I wonder whether this is related to the known problem that 8.3's pg_dump doesn't correctly detect file seekability under Windows: http://archives.postgresql.org/pgsql-hackers/2010-06/msg01227.php That means the dump file will not contain any data offset pointers. Up to now that was only known to cause issues for parallel pg_restore, but maybe you found another case. But that's just a hypothesis, and a quick test here doesn't seem to support it: I can still do pg_restore -a from a blob-containing dump that I forced to not have data offsets. OTOH I'm not using Windows. Does it work any better if you use 9.0's pg_dump to dump from the 8.3 server? regards, tom lane
Hi. Tom. You wrote: > That's pretty curious. Can you take the dump file to a non-Windows > machine, or at least one with a different build of pg_restore, and > see what happens there? I'm wondering about possible corrupted > executable, buggy zlib, etc. I'll try to get a copy of the problematic data file to my Unix box in the coming days, and will report back on what happens. > No, not that I've heard of. The most likely theory seems to be that the > dump file is corrupt somehow. This raises a question that came up during our discussion of this problem: Is there a way to verify that a dumpfile was not corrupt? That is, without having to run pg_restore on the entire file, only to discover that the end is missing data. I haven't encountered data-recovery problems of this sort before, but it does surprise me that PostgreSQL doesn't check the integrity of the file before trying to read and then apply it. >> * Is there any obvious way to diagnose or work around this problem? > Well, it'd be interesting to trace through it with a debugger. Ideally > you shouldn't get an infinite loop (as this seems to be) even with > corrupt input. Is the data sufficiently non-proprietary that you'd be > willing to show the dump file to someone else? I'm guessing that if we have dummy data in there, then we can share it. I'll get back to you about this in the coming day or two. Thanks for the offer! Reuven -- Reuven M. Lerner -- Web development, consulting, and training Mobile: +972-54-496-8405 * US phone: 847-230-9795 Skype/AIM: reuvenlerner