Обсуждение: Problem loading pg_dump file

Поиск
Список
Период
Сортировка

Problem loading pg_dump file

От
"Mason Hale"
Дата:
Hello --

I'm having a problem loading a recent pg_dump of our production database.

In our environment we take a monthly snapshot of our production server and copy that to our development server so that we have a recent batch of data to work with.

However, when trying to load the file for this month's snapshot, we are (for the first time) seeing a slew of errors, such as:

invalid command \N
invalid command \N
ERROR:  syntax error at or near ""/>\n  <img alt="" style="" at character 1
LINE 1: "/>\n  <img alt="" style="border: 0;
        ^
ERROR:  syntax error at or near "padding" at character 1
LINE 1: padding: 8px 0 0 0;
        ^
ERROR:  syntax error at or near "height" at character 1
LINE 1: height: 2px;
        ^
ERROR:  syntax error at or near "font" at character 1
LINE 1: font-size: 1px;
        ^
ERROR:  syntax error at or near "border" at character 1
LINE 1: border: 0;
        ^
ERROR:  syntax error at or near "margin" at character 1
LINE 1: margin: 0;
        ^
ERROR:  syntax error at or near "padding" at character 1
LINE 1: padding: 0;
        ^
invalid command \N
invalid command \N

The commands I'm using to create and load the dump are:

on production:>
pg_dump bduprod_2 | gzip > bdu_01_21_07.gz

(transfer .gz file to development server)

on dev:>
createdb -T template0 -D disk1 -e -O lss -U postgres bdu_01_21_07
gunzip -c bduprod_2-01-21-07.gz | psql bdu_01_21_07

I'm also unzipped the .gz file and tried to load it directly via
psql psql bdu_01_21_07 < psql bduprod_2-01-21-07
with the same results.

I think I might be running into the UTF8 encoding issue mentioned in this message:
http://archives.postgresql.org/pgsql-bugs/2006-10/msg00246.php

Both the production and dev servers are UTF8.

Obviously, beyond our monthly dev snapshots, I'm concerned about the reliability of our production server backups.

Anyone have any ideas what the problem is? Is there a way for me to tell if it is the UTF8 encoding problem mentioned above?
Any work arounds?

thanks in advance,
Mason




Re: Problem loading pg_dump file

От
Tom Lane
Дата:
"Mason Hale" <masonhale@gmail.com> writes:
> I'm having a problem loading a recent pg_dump of our production database.

> However, when trying to load the file for this month's snapshot, we are (for
> the first time) seeing a slew of errors, such as:

> invalid command \N
> invalid command \N
> ERROR:  syntax error at or near ""/>\n  <img alt="" style="" at character 1
> LINE 1: "/>\n  <img alt="" style="border: 0;
>         ^

You need to look at the very first error, and ignore the slew following
it.  What seems to have happened here is that an error in the COPY command
caused psql to fall out of copy mode (or perhaps never enter it in the
first place) and start trying to treat lines of COPY data as SQL
commands.  So, tons of noise.  What was the first error?

            regards, tom lane

Re: Problem loading pg_dump file

От
"Mason Hale"
Дата:
I've done a bit more digging into this, here's what I've found --

The text db dump file is much too big to edit by hand (~37GB), so I ran the import in single-step mode:

psql -U bdu -s bdu_01_21_07 < bduprod_2-01-21-07

Here's the first error I run across:

***(Single step mode: verify command)*******************************************
COPY blocked_info (id, created_at, reason_code, note, do_count_links) FROM stdin;
***(press return to proceed or enter x and return to cancel)********************
ERROR:  invalid input syntax for integer: "2006-10-09 22:55:58"
CONTEXT:  COPY blocked_info, line 1, column id: "2006-10-09 22:55:58"

The part of the script that is responsible for this error is:

COPY blocked_info (id, created_at, reason_code, note, do_count_links) FROM stdin;
1       2006-10-09 22:55:58     0       \N      \N
2       2006-10-09 22:55:58     0       \N      \N
3       2006-10-09 22:55:58     0       \N      \N
4       2006-10-09 22:55:58     0       \N      \N
5       2006-10-09 22:55:58     0       \N      \N
6       2006-10-10 13:03:27     \N      \N      \N
7       2006-10-10 13:06:28     \N      \N      \N
8       2006-10-10 13:09:37     \N      \N      \N
9       2006-10-27 22:39:49     \N      \N      \N
10      2006-10-27 22:39:58     \N      \N      \N
11      2006-11-09 04:48:18     \N      \N      \N
12      2006-11-30 16:03:58     \N      \N      f
13      2006-12-02 15:11:42     \N      \N      f
14      2006-12-04 12:31:20     \N      \N      f
15      2006-12-05 00:11:30     \N      \N      f
16      2006-12-05 00:15:45     \N      \N      f
17      2006-12-05 03:02:29     \N      \N      f
18      2006-12-05 12:03:10     \N      \N      f
20      2006-12-05 16:20:15     \N      \N      f
19      2006-12-05 16:20:15     \N      \N      f
21      2006-12-16 16:13:24     \N      \N      f
22      2006-12-19 16:06:43     \N      \N      f
23      2006-12-20 01:33:51     \N      \N      f
24      2006-12-21 18:38:56     \N      \N      f
25      2006-12-22 15:06:15     \N      \N      f
26      2006-12-23 09:43:17     \N      \N      f
27      2007-01-04 12:37:50     \N      \N      f
28      2007-01-08 17:33:26     \N      \N      f
29      2007-01-08 17:38:47     \N      \N      f
30      2007-01-13 15:32:34     \N      \N      f
\.

Now, I'm not too familiar with the copy command, but the above looks correct to me.

When I let subsequent statements run, I get similar errors such as:

ERROR:  invalid input syntax for integer: "0.261191951289869"
ERROR:  invalid input syntax for type real: "2006-08-23 22:54: 11.24"
ERROR:  invalid input syntax for integer: "0.99655325708605502"
ERROR:  invalid input syntax for type boolean: "1401353"

Any ideas what is causing this, or how to track this issue down?
------------------

In addition to the above, I did a new pg_dump using the -Fc option to generate an archive in binary/compressed format.

I also verified that the versions of pg_dump and pg_restore on both machines where the same ( 8.1.5).

I then copy that file over to our dev server an load it doing a pg_restore -v (verbose mode). This time, I get an error when loading data into a different table in the database (not the blocked_info table mentioned above).

The error message is:
pg_restore: restoring data for table "rawfeed"
pg_restore: [custom archiver] could not read data block -- expected 4096, got 3448
pg_restore: *** aborted because of error

I've tried running both with and without the -e (stop on errors) option, and either way the pg_restore stops when it hits this error.

When I list the database contents using pg_restore -l  -- the table where error occurs is table listed as #14 out of 23. This particular table includes a bytea column that contains gzipped data.

If I try to selectively restore any of the individual tables 1-13 (as listed by pg_restore -l), using the -t option, I do not encounter any errors. But if I try to restore any individual tables 14-23, I get the same error as above (but with a different table name).

In addition, if I try to generate a sql script from pg_restore using the -f option, for any of the individual tables 14-23, I get the same error: "could not read data block -- expected 4096, got 3448".

> pg_restore -t rawfeed -f rawfeed.sql bduprod_2-01-25-07
pg_restore: [custom archiver] could not read data block -- expected 4096, got 3448

In the case of table 14 (rawfeed), an output file is generated, it is mid-way through a copy command. I can run that partially generated script against the db without error. I can also verify that the last record in the script is successfully added to the db.

Note: this generated file to restore this one table is huge: 9.2G by itself. Is there an upper limit to the amount of data copy can load at one time?

If I generate a sql script for any of tables 15-23, it takes a long time for the command to finally return the same error (about the same amount of time to run the 'pg_restore -t rawfeed ...' variation above), and when it does an output file is generated, but the file only contains the schema creation commands and the first line of the copy command. It does not contain any of the table data. This is true even for a table that contains only 1 row of data.

----

At this point I'm not sure how to proceed.

My suspicion is still that this has something to do with the encoding of data in the database. So I'm trying to narrow down the location of the problem so that I can try to clean it up.

From trying to load the data from the pg_dump text format export, it looks like something is wrong with the copy command when loading the blocked_info table. However when trying to load the pg_dump -Fc binary format export, it appears there's some problem with the rawfeed table.

Any ideas on what to try next will be greatly appreciated.

thanks in advance,
Mason



On 1/25/07, Tom Lane < tgl@sss.pgh.pa.us> wrote:
"Mason Hale" < masonhale@gmail.com> writes:
> I'm having a problem loading a recent pg_dump of our production database.

> However, when trying to load the file for this month's snapshot, we are (for
> the first time) seeing a slew of errors, such as:

> invalid command \N
> invalid command \N
> ERROR:  syntax error at or near ""/>\n  <img alt="" style="" at character 1
> LINE 1: "/>\n  <img alt="" style="border: 0;
>         ^

You need to look at the very first error, and ignore the slew following
it.  What seems to have happened here is that an error in the COPY command
caused psql to fall out of copy mode (or perhaps never enter it in the
first place) and start trying to treat lines of COPY data as SQL
commands.  So, tons of noise.  What was the first error?

                        regards, tom lane

Re: Problem loading pg_dump file

От
Mark Walker
Дата:
What is the delimiter between id and created_at?  I believe they're
supposed to be tabs with \t used for tabs inside a field.  The data you
give here is all whitespaces.


Mason Hale wrote:
> I've done a bit more digging into this, here's what I've found --
>
> The text db dump file is much too big to edit by hand (~37GB), so I
> ran the import in single-step mode:
>
> psql -U bdu -s bdu_01_21_07 < bduprod_2-01-21-07
>
> Here's the first error I run across:
>
> ***(Single step mode: verify
> command)*******************************************
> COPY blocked_info (id, created_at, reason_code, note, do_count_links)
> FROM stdin;
> ***(press return to proceed or enter x and return to
> cancel)********************
> ERROR:  invalid input syntax for integer: "2006-10-09 22:55:58"
> CONTEXT:  COPY blocked_info, line 1, column id: "2006-10-09 22:55:58"
>
> The part of the script that is responsible for this error is:
>
> COPY blocked_info (id, created_at, reason_code, note, do_count_links)
> FROM stdin;
> 1       2006-10-09 22:55:58     0       \N      \N
> 2       2006-10-09 22:55:58     0       \N      \N
> 3       2006-10-09 22:55:58     0       \N      \N
> 4       2006-10-09 22:55:58     0       \N      \N
> 5       2006-10-09 22:55:58     0       \N      \N
> 6       2006-10-10 13:03:27     \N      \N      \N
> 7       2006-10-10 13:06:28     \N      \N      \N
> 8       2006-10-10 13:09:37     \N      \N      \N
> 9       2006-10-27 22:39:49     \N      \N      \N
> 10      2006-10-27 22:39:58     \N      \N      \N
> 11      2006-11-09 04:48:18     \N      \N      \N
> 12      2006-11-30 16:03:58     \N      \N      f
> 13      2006-12-02 15:11:42     \N      \N      f
> 14      2006-12-04 12:31:20     \N      \N      f
> 15      2006-12-05 00:11:30     \N      \N      f
> 16      2006-12-05 00:15:45     \N      \N      f
> 17      2006-12-05 03:02:29     \N      \N      f
> 18      2006-12-05 12:03:10     \N      \N      f
> 20      2006-12-05 16:20:15     \N      \N      f
> 19      2006-12-05 16:20:15     \N      \N      f
> 21      2006-12-16 16:13:24     \N      \N      f
> 22      2006-12-19 16:06:43     \N      \N      f
> 23      2006-12-20 01:33:51     \N      \N      f
> 24      2006-12-21 18:38:56     \N      \N      f
> 25      2006-12-22 15:06:15     \N      \N      f
> 26      2006-12-23 09:43:17     \N      \N      f
> 27      2007-01-04 12:37:50     \N      \N      f
> 28      2007-01-08 17:33:26     \N      \N      f
> 29      2007-01-08 17:38:47     \N      \N      f
> 30      2007-01-13 15:32:34     \N      \N      f
> \.
>
> Now, I'm not too familiar with the copy command, but the above looks
> correct to me.
>
> When I let subsequent statements run, I get similar errors such as:
>
> ERROR:  invalid input syntax for integer: "0.261191951289869"
> ERROR:  invalid input syntax for type real: "2006-08-23 22:54: 11.24"
> ERROR:  invalid input syntax for integer: "0.99655325708605502"
> ERROR:  invalid input syntax for type boolean: "1401353"
>
> Any ideas what is causing this, or how to track this issue down?
> ------------------
>
> In addition to the above, I did a new pg_dump using the -Fc option to
> generate an archive in binary/compressed format.
>
> I also verified that the versions of pg_dump and pg_restore on both
> machines where the same ( 8.1.5).
>
> I then copy that file over to our dev server an load it doing a
> pg_restore -v (verbose mode). This time, I get an error when loading
> data into a different table in the database (not the blocked_info
> table mentioned above).
>
> The error message is:
> pg_restore: restoring data for table "rawfeed"
> pg_restore: [custom archiver] could not read data block -- expected
> 4096, got 3448
> pg_restore: *** aborted because of error
>
> I've tried running both with and without the -e (stop on errors)
> option, and either way the pg_restore stops when it hits this error.
>
> When I list the database contents using pg_restore -l  -- the table
> where error occurs is table listed as #14 out of 23. This particular
> table includes a bytea column that contains gzipped data.
>
> If I try to selectively restore any of the individual tables 1-13 (as
> listed by pg_restore -l), using the -t option, I do not encounter any
> errors. But if I try to restore any individual tables 14-23, I get the
> same error as above (but with a different table name).
>
> In addition, if I try to generate a sql script from pg_restore using
> the -f option, for any of the individual tables 14-23, I get the same
> error: "could not read data block -- expected 4096, got 3448".
>
> > pg_restore -t rawfeed -f rawfeed.sql bduprod_2-01-25-07
> pg_restore: [custom archiver] could not read data block -- expected
> 4096, got 3448
>
> In the case of table 14 (rawfeed), an output file is generated, it is
> mid-way through a copy command. I can run that partially generated
> script against the db without error. I can also verify that the last
> record in the script is successfully added to the db.
>
> Note: this generated file to restore this one table is huge: 9.2G by
> itself. Is there an upper limit to the amount of data copy can load at
> one time?
>
> If I generate a sql script for any of tables 15-23, it takes a long
> time for the command to finally return the same error (about the same
> amount of time to run the 'pg_restore -t rawfeed ...' variation
> above), and when it does an output file is generated, but the file
> only contains the schema creation commands and the first line of the
> copy command. It does not contain any of the table data. This is true
> even for a table that contains only 1 row of data.
>
> ----
>
> At this point I'm not sure how to proceed.
>
> My suspicion is still that this has something to do with the encoding
> of data in the database. So I'm trying to narrow down the location of
> the problem so that I can try to clean it up.
>
> From trying to load the data from the pg_dump text format export, it
> looks like something is wrong with the copy command when loading the
> blocked_info table. However when trying to load the pg_dump -Fc binary
> format export, it appears there's some problem with the rawfeed table.
>
> Any ideas on what to try next will be greatly appreciated.
>
> thanks in advance,
> Mason
>
>
>
> On 1/25/07, *Tom Lane* < tgl@sss.pgh.pa.us <mailto:tgl@sss.pgh.pa.us>>
> wrote:
>
>     "Mason Hale" < masonhale@gmail.com <mailto:masonhale@gmail.com>>
>     writes:
>     > I'm having a problem loading a recent pg_dump of our production
>     database.
>
>     > However, when trying to load the file for this month's snapshot,
>     we are (for
>     > the first time) seeing a slew of errors, such as:
>
>     > invalid command \N
>     > invalid command \N
>     > ERROR:  syntax error at or near ""/>\n  <img alt="" style="" at
>     character 1
>     > LINE 1: "/>\n  <img alt="" style="border: 0;
>     >         ^
>
>     You need to look at the very first error, and ignore the slew
>     following
>     it.  What seems to have happened here is that an error in the COPY
>     command
>     caused psql to fall out of copy mode (or perhaps never enter it in
>     the
>     first place) and start trying to treat lines of COPY data as SQL
>     commands.  So, tons of noise.  What was the first error?
>
>                             regards, tom lane
>
>


Re: Problem loading pg_dump file

От
"Mason Hale"
Дата:
I was able to successfully able to dump and restore my database this morning.

Here's what I did:

After doing single table restore to a text file of the rawfeed table (the one the triggered the error),
I was able to get the id of the last row that was successfully exported. As mentioned earlier I was
able to import all the data up to that point without error.

To see how many rows were loaded in my development database (where the partial dataset was loaded) I ran:

select count(*) from rawfeed;
>> 287,478

The same table in the production database had 749,723 rows.

To get the id of the first 'bad' row, (the next one after the last successfully imported row), I ran this in production:

select id from rawfeed offset 287477 limit 10;

I verified that the first id returned matched the id of that last row successfully loaded on the dev server.
Then I took the id of the next row in the list (37126091) and did:

select * into rawfeed_backup where id = 37126091;
delete from rawfeed where id = 37126091;

After doing this I was able to do another dump of the entire database with the -Fc option and was able to restore that on our dev server successfully. BTW -- this includes dumping/restoring the rawfeed_backup table containing the one 'bad row'.

Then I noticed was that our original dump file from 1 week ago was 7GB, and the one today was 14GB.
We've had a lot of db activity, but I doubt our database has doubled in size in just one week.

Now I'm thinking that the 7GB dump file was somehow truncated or aborted before it was finished. (I ran the pg_dump as a background job, and didn't capture the output... so there may have been an error I didn't see).

Does this file truncation theory sound consistent with the symptoms I reported? Or does the 'single corrupted row of data' theory ring truer?

Mason


> ***(Single step mode: verify
> command)*******************************************
> COPY blocked_info (id, created_at, reason_code, note, do_count_links) FROM
> stdin;
> ***(press return to proceed or enter x and return to
> cancel)********************
> ERROR:  invalid input syntax for integer: "2006-10-09 22:55:58"
> CONTEXT:  COPY blocked_info, line 1, column id: "2006-10-09 22:55:58"
>
> The part of the script that is responsible for this error is:
>
> COPY blocked_info (id, created_at, reason_code, note, do_count_links) FROM
> stdin;
> 1       2006-10-09 22:55:58     0       \N      \N
> 2       2006-10-09 22:55:58     0       \N      \N
> 3       2006-10-09 22:55:58     0       \N      \N
> 4       2006-10-09 22:55:58     0       \N      \N

That's completely bizarre ... it seems like it's just lost the first
field of the COPY data, which is not a failure I've ever heard of
before.  I have no theory about that at the moment.

I'm mystified too. Once I catch my breath, I circle back to take a closer look at this. But for now, I can backup and restore the database. If I find something more, I'll be sure to send a follow-up message.

Mason


Re: Problem loading pg_dump file

От
Tom Lane
Дата:
"Mason Hale" <masonhale@gmail.com> writes:
> Here's the first error I run across:
>
> ***(Single step mode: verify
> command)*******************************************
> COPY blocked_info (id, created_at, reason_code, note, do_count_links) FROM
> stdin;
> ***(press return to proceed or enter x and return to
> cancel)********************
> ERROR:  invalid input syntax for integer: "2006-10-09 22:55:58"
> CONTEXT:  COPY blocked_info, line 1, column id: "2006-10-09 22:55:58"
>
> The part of the script that is responsible for this error is:
>
> COPY blocked_info (id, created_at, reason_code, note, do_count_links) FROM
> stdin;
> 1       2006-10-09 22:55:58     0       \N      \N
> 2       2006-10-09 22:55:58     0       \N      \N
> 3       2006-10-09 22:55:58     0       \N      \N
> 4       2006-10-09 22:55:58     0       \N      \N

That's completely bizarre ... it seems like it's just lost the first
field of the COPY data, which is not a failure I've ever heard of
before.  I have no theory about that at the moment.

> In addition to the above, I did a new pg_dump using the -Fc option to
> generate an archive in binary/compressed format.
> ...
> The error message is:
> pg_restore: restoring data for table "rawfeed"
> pg_restore: [custom archiver] could not read data block -- expected 4096,
> got 3448
> pg_restore: *** aborted because of error
>
> Note: this generated file to restore this one table is huge: 9.2G by itself.
> Is there an upper limit to the amount of data copy can load at one time?

There is not supposed to be.  However, on some platforms programs need
special build flags to deal with files exceeding 4GB.  I'm wondering
if your build of pg_restore is missing largefile support for some
reason.  What is the platform here, exactly, and how did you build
or come by your PG installation?

            regards, tom lane

Re: Problem loading pg_dump file

От
Tom Lane
Дата:
"Mason Hale" <masonhale@gmail.com> writes:
> Then I noticed was that our original dump file from 1 week ago was 7GB, and
> the one today was 14GB.
> We've had a lot of db activity, but I doubt our database has doubled in size
> in just one week.

> Now I'm thinking that the 7GB dump file was somehow truncated or aborted
> before it was finished. (I ran the pg_dump as a background job, and didn't
> capture the output... so there may have been an error I didn't see).

> Does this file truncation theory sound consistent with the symptoms I
> reported? Or does the 'single corrupted row of data' theory ring truer?

Hmm ... truncation would explain the weird pg_restore error, I think,
but not the problem where it seemed to be misinterpreting the first
row of COPY data.

            regards, tom lane