Обсуждение: Emergency - Need assistance

Поиск
Список
Период
Сортировка

Emergency - Need assistance

От
warren little
Дата:
I received the following error message when trying to copy a table from
one database to another on the same cluster:

pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor
pg_restore: [custom archiver] could not read data block -- expected 1,
got 0
pg_restore: *** aborted because of error

The table contains a bytea column which houses pdf documents.
Is this a sign of corrupted data and if so would setting
"zero_damaged_pages = true" allow the copy to proceed?

The table is about 25GB in size and takes a long time to dump/restore
and I'm running out of time to get the cluster back into production.

note running:
PostgreSQL 8.1beta4 on x86_64-unknown-linux-gnu, compiled by GCC gcc
(GCC) 3.4.4 20050721 (Red Hat 3.4.4-2)"


--
Warren Little
CTO
Meridias Capital Inc
ph 866.369.7763

Re: Emergency - Need assistance

От
Tom Lane
Дата:
warren little <warren.little@meridiascapital.com> writes:
> I received the following error message when trying to copy a table from
> one database to another on the same cluster:

> pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor
> pg_restore: [custom archiver] could not read data block -- expected 1,
> got 0
> pg_restore: *** aborted because of error

You seem to have omitted the messages that would indicate what's
actually wrong; the above is all just subsidiary damage after whatever
caused the FETCH to fail.

> The table is about 25GB in size and takes a long time to dump/restore
> and I'm running out of time to get the cluster back into production.

> note running:
> PostgreSQL 8.1beta4 on x86_64-unknown-linux-gnu, compiled by GCC gcc
> (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2)"

You're running a production database on a beta release??

            regards, tom lane

Re: Emergency - Need assistance

От
warren little
Дата:
Tom,
The extent of the messages I received from the command
pg_dump -Fc --table=casedocument -d tigrissave | pg_restore --verbose -d
tigris is listed below:

 pg_dump: SQL command failed
pg_dump: Error message from server: server closed the connection
unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor
pg_restore: [custom archiver] could not read data block -- expected 1,
got 0
pg_restore: *** aborted because of error


I had removed all the files in pg_log prior to getting this error and no
new logfile was created.  I'm guessing I screwed up the logger when
removing all the files, but I assumed that when writing to the error
logs the backend would create a file if one did not exist.

I currently attempt to run the dump/restore with the zero_damaged_pages
turned on to see if the results yield something more useful.

About the beta version, this is temporary, hadn't really planned on
running production on our development box.  Haven't had any issues with
8.1beta for a few months and will be moving to 8.1.x as soon as some new
hardware arrives (about a week).

thanks

On Mon, 2006-01-02 at 15:10 -0500, Tom Lane wrote:
> warren little <warren.little@meridiascapital.com> writes:
> > I received the following error message when trying to copy a table from
> > one database to another on the same cluster:
>
> > pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor
> > pg_restore: [custom archiver] could not read data block -- expected 1,
> > got 0
> > pg_restore: *** aborted because of error
>
> You seem to have omitted the messages that would indicate what's
> actually wrong; the above is all just subsidiary damage after whatever
> caused the FETCH to fail.
>
> > The table is about 25GB in size and takes a long time to dump/restore
> > and I'm running out of time to get the cluster back into production.
>
> > note running:
> > PostgreSQL 8.1beta4 on x86_64-unknown-linux-gnu, compiled by GCC gcc
> > (GCC) 3.4.4 20050721 (Red Hat 3.4.4-2)"
>
> You're running a production database on a beta release??
>
>             regards, tom lane
--
Warren Little
CTO
Meridias Capital Inc
ph 866.369.7763

Re: Emergency - Need assistance

От
Tom Lane
Дата:
warren little <warren.little@meridiascapital.com> writes:
>  pg_dump: SQL command failed
> pg_dump: Error message from server: server closed the connection
> unexpectedly
>         This probably means the server terminated abnormally
>         before or while processing the request.
> pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor

Hmm.  This could mean corrupted data files, but it's hard to be sure
without more info.

> I had removed all the files in pg_log prior to getting this error and no
> new logfile was created.  I'm guessing I screwed up the logger when
> removing all the files, but I assumed that when writing to the error
> logs the backend would create a file if one did not exist.

The file *does* exist, there's just no directory link to it anymore :-(
You need to force a logfile rotation, which might be most easily done by
stopping and restarting the postmaster.

What you need to do is see the postmaster log entry about the backend
crash.  If it's dying on a signal (likely sig11 = SEGV) then inspecting
the core file might yield useful information.

> I currently attempt to run the dump/restore with the zero_damaged_pages
> turned on to see if the results yield something more useful.

That really ought to be the last resort not the first one, because it
will destroy not only data but most of the evidence about what went
wrong...

            regards, tom lane

Re: Emergency - Need assistance

От
warren little
Дата:
The dump/restore failed even with the zero_damaged_pages=true.
The the logfile (postgresql-2006-01-02_130023.log)
did not have much in the way of useful info. I've attached the section
of the logfile around the time of the crash.  I cannot find any sign of
a core file.  Where might the core dump have landed?

Regarding your comments about losing the evidence, the data I'm trying
to load is in another database in the same cluster which I have no
intention of purging until a can get the table moved to the new
database.

thanks




On Mon, 2006-01-02 at 16:34 -0500, Tom Lane wrote:
> warren little <warren.little@meridiascapital.com> writes:
> >  pg_dump: SQL command failed
> > pg_dump: Error message from server: server closed the connection
> > unexpectedly
> >         This probably means the server terminated abnormally
> >         before or while processing the request.
> > pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor
>
> Hmm.  This could mean corrupted data files, but it's hard to be sure
> without more info.
>
> > I had removed all the files in pg_log prior to getting this error and no
> > new logfile was created.  I'm guessing I screwed up the logger when
> > removing all the files, but I assumed that when writing to the error
> > logs the backend would create a file if one did not exist.
>
> The file *does* exist, there's just no directory link to it anymore :-(
> You need to force a logfile rotation, which might be most easily done by
> stopping and restarting the postmaster.
>
> What you need to do is see the postmaster log entry about the backend
> crash.  If it's dying on a signal (likely sig11 = SEGV) then inspecting
> the core file might yield useful information.
>
> > I currently attempt to run the dump/restore with the zero_damaged_pages
> > turned on to see if the results yield something more useful.
>
> That really ought to be the last resort not the first one, because it
> will destroy not only data but most of the evidence about what went
> wrong...
>
>             regards, tom lane

Re: Emergency - Need assistance

От
warren little
Дата:
Sorry,
forget the attachment.

On Mon, 2006-01-02 at 15:24 -0700, warren little wrote:
> The dump/restore failed even with the zero_damaged_pages=true.
> The the logfile (postgresql-2006-01-02_130023.log)
> did not have much in the way of useful info. I've attached the section
> of the logfile around the time of the crash.  I cannot find any sign of
> a core file.  Where might the core dump have landed?
>
> Regarding your comments about losing the evidence, the data I'm trying
> to load is in another database in the same cluster which I have no
> intention of purging until a can get the table moved to the new
> database.
>
> thanks
>
>
>
>
> On Mon, 2006-01-02 at 16:34 -0500, Tom Lane wrote:
> > warren little <warren.little@meridiascapital.com> writes:
> > >  pg_dump: SQL command failed
> > > pg_dump: Error message from server: server closed the connection
> > > unexpectedly
> > >         This probably means the server terminated abnormally
> > >         before or while processing the request.
> > > pg_dump: The command was: FETCH 100 FROM _pg_dump_cursor
> >
> > Hmm.  This could mean corrupted data files, but it's hard to be sure
> > without more info.
> >
> > > I had removed all the files in pg_log prior to getting this error and no
> > > new logfile was created.  I'm guessing I screwed up the logger when
> > > removing all the files, but I assumed that when writing to the error
> > > logs the backend would create a file if one did not exist.
> >
> > The file *does* exist, there's just no directory link to it anymore :-(
> > You need to force a logfile rotation, which might be most easily done by
> > stopping and restarting the postmaster.
> >
> > What you need to do is see the postmaster log entry about the backend
> > crash.  If it's dying on a signal (likely sig11 = SEGV) then inspecting
> > the core file might yield useful information.
> >
> > > I currently attempt to run the dump/restore with the zero_damaged_pages
> > > turned on to see if the results yield something more useful.
> >
> > That really ought to be the last resort not the first one, because it
> > will destroy not only data but most of the evidence about what went
> > wrong...
> >
> >             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly

Вложения

Re: Emergency - Need assistance

От
Tom Lane
Дата:
warren little <warren.little@meridiascapital.com> writes:
> The dump/restore failed even with the zero_damaged_pages=true.
> The the logfile (postgresql-2006-01-02_130023.log)
> did not have much in the way of useful info. I've attached the section
> of the logfile around the time of the crash.  I cannot find any sign of
> a core file.  Where might the core dump have landed?

It would typically go into $PGDATA (if you're using 8.1) or some
subdirectory thereof (if you're using an older release).  There are
some platforms such as OS X that put core files in a special directory
/core so check for that too.

If you're not finding any corefile then the most likely bet is that the
postmaster has been launched under "ulimit -c 0" which forbids dropping
a corefile.  (This seems to be the default environment under many
Linuxen.)  I'd suggest adding "ulimit -c unlimited" to the postmaster
start script you're using, restarting the postmaster, and repeating the
dump to cause the crash again.

            regards, tom lane