Обсуждение: PANIC: heap_update_redo: no block

Поиск
Список
Период
Сортировка

PANIC: heap_update_redo: no block

От
"Alex bahdushka"
Дата:
Hi.

Upon rebooting one of our main production database servers, we were
greeted with this:

(@)<2006-03-18 23:30:32.687 MST>[3791]LOG:  database system was
interrupted while in recovery at 2006-03-18 23:30:26 MST
(@)<2006-03-18 23:30:32.687 MST>[3791]HINT:  This probably means that
some data is corrupted and you will have to use the last backup for
recovery.
(@)<2006-03-18 23:30:32.688 MST>[3791]LOG:  checkpoint record is at D/1919D5F0
(@)<2006-03-18 23:30:32.688 MST>[3791]LOG:  redo record is at
D/191722C8; undo record is at 0/0; shutdown FALSE
(@)<2006-03-18 23:30:32.688 MST>[3791]LOG:  next transaction ID:
81148900; next OID: 16566476
(@)<2006-03-18 23:30:32.688 MST>[3791]LOG:  next MultiXactId: 1; next
MultiXactOffset: 0
(@)<2006-03-18 23:30:32.689 MST>[3791]LOG:  database system was not
properly shut down; automatic recovery in progress
(@)<2006-03-18 23:30:33.032 MST>[3791]LOG:  redo starts at D/191722C8
(@)<2006-03-18 23:30:33.035 MST>[3791]PANIC:  heap_update_redo: no block
(@)<2006-03-18 23:30:33.036 MST>[3790]LOG:  startup process (PID 3791)
was terminated by signal 6
(@)<2006-03-18 23:30:33.036 MST>[3790]LOG:  aborting startup due to
startup process failure

As far as i know the postgresql was shutdown properly before the
reboot (pg_ctl stop -m fast).  Though im not positive, I was just
brought in when they couldn't figure out why postgresql would not
start.  Any ideas as to how this happened or how to fix it?

Right now im copying over the database, and then going to try
pg_resetxlog.  Just to make sure, the only possible lost data are
things that are/would be in the xlog right?  So i dont need to go
looking at all the tables, just ones I know were modified then.

Are there any other solutions that dont involve possibly loosing data?
(Yes I know backups, unfortunately the last back up was about 2 hours
ago and is not as up to date as i would like)

Just curious, ive also been investigating pitr for instead of doing
backups every 2 hours.  If this problem were to surface when i was
using pitr as a backup solution, would all my data then be hosed (or
at least what pg_resetlog can not restore)?

Re: PANIC: heap_update_redo: no block

От
Martijn van Oosterhout
Дата:
On Sat, Mar 18, 2006 at 11:55:35PM -0700, Alex bahdushka wrote:
> Hi.
>
> Upon rebooting one of our main production database servers, we were
> greeted with this:

To help you at all, we *really* need to know what version and platform
you're running. In particular, are you running the most recent release
of your branch. There have been bug fixes related to WAL recovery in
some versions...

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.

Вложения

Re: PANIC: heap_update_redo: no block

От
"Alex bahdushka"
Дата:
On 3/19/06, Martijn van Oosterhout <kleptog@svana.org> wrote:
> On Sat, Mar 18, 2006 at 11:55:35PM -0700, Alex bahdushka wrote:
> > Hi.
> >
> > Upon rebooting one of our main production database servers, we were
> > greeted with this:
>
> To help you at all, we *really* need to know what version and platform
> you're running. In particular, are you running the most recent release
> of your branch. There have been bug fixes related to WAL recovery in
> some versions...

Ahh of course! sorry!

select version();
                                                      version
--------------------------------------------------------------------------------------------------------------------
 PostgreSQL 8.1.3 on i686-pc-linux-gnu, compiled by GCC gcc-3.4 (GCC)
3.4.4 20050314 (prerelease) (Debian 3.4.3-13)

> Have a nice day,
> --
> Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> > Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> > tool for doing 5% of the work and then sitting around waiting for someone
> > else to do the other 95% so you can sue them.
>
>
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.1 (GNU/Linux)
>
> iD8DBQFEHSreIB7bNG8LQkwRAo5fAJ9TGnk9HFyx7DKsL5C5bSvtmMQIPgCdEVhr
> 7nTMsnzvyk1TI5az+GDCx9I=
> =zZR+
> -----END PGP SIGNATURE-----
>
>
>

Re: PANIC: heap_update_redo: no block

От
"Alex bahdushka"
Дата:
On 3/19/06, Alex bahdushka <bahdushka@gmail.com> wrote:
> On 3/19/06, Martijn van Oosterhout <kleptog@svana.org> wrote:
> > On Sat, Mar 18, 2006 at 11:55:35PM -0700, Alex bahdushka wrote:
> > > Hi.
> > >
> > > Upon rebooting one of our main production database servers, we were
> > > greeted with this:
> >
> > To help you at all, we *really* need to know what version and platform
> > you're running. In particular, are you running the most recent release
> > of your branch. There have been bug fixes related to WAL recovery in
> > some versions...
>
> Ahh of course! sorry!
>
> select version();
>                                                       version
> --------------------------------------------------------------------------------------------------------------------
>  PostgreSQL 8.1.3 on i686-pc-linux-gnu, compiled by GCC gcc-3.4 (GCC)
> 3.4.4 20050314 (prerelease) (Debian 3.4.3-13)
>

After doing some more digging, it looks like that server was missing
the appropriate Kpostgresql symlink in /etc/rc0.d/.  So upon shutdown
(shutdown -h now)...  my guess is it got a sigterm (you know where it
says Sending all processes a TERM signal or whatever), then it (init)
waited 5 seconds or whatever the timeout is and sent a sigkill.

If postgresql took longer to shutdown than that timeout and so was
then given a sigkill and then server turned off.... Could that do it?
(not to mention i don't exactly remember where file system get
unmounted, before or after it sends out those signals, i think its
before though so it might have mounted it read only (couldn't of
unmounted it because it was in use by postgresql)).

Im mainly asking because i would love for this to be user error.  It
scares the hell out of me (and my boss obviously).  Though i must say
for the 2+ years we have been using postgresql its proven to be very
stable, robust and fast.

Thanks!


<snipped>

Re: PANIC: heap_update_redo: no block

От
"Qingqing Zhou"
Дата:
""Alex bahdushka"" <bahdushka@gmail.com> wrote
>
> After doing some more digging, it looks like that server was missing
> the appropriate Kpostgresql symlink in /etc/rc0.d/.  So upon shutdown
> (shutdown -h now)...  my guess is it got a sigterm (you know where it
> says Sending all processes a TERM signal or whatever), then it (init)
> waited 5 seconds or whatever the timeout is and sent a sigkill.
>
> If postgresql took longer to shutdown than that timeout and so was
> then given a sigkill and then server turned off.... Could that do it?
>

I don't believe in this explaination actually. According the startup
message, the error "heap_update_redo: no block" could most possibly happen
when PostgreSQL tried to read an existing block but found that the file
length is not long enough to have it. How could a SIGKILL truncate a data
file like that?

Regards,
Qingqing



Re: PANIC: heap_update_redo: no block

От
"Alex bahdushka"
Дата:
On 3/20/06, Qingqing Zhou <zhouqq@cs.toronto.edu> wrote:
>
> ""Alex bahdushka"" <bahdushka@gmail.com> wrote
> >
> > After doing some more digging, it looks like that server was missing
> > the appropriate Kpostgresql symlink in /etc/rc0.d/.  So upon shutdown
> > (shutdown -h now)...  my guess is it got a sigterm (you know where it
> > says Sending all processes a TERM signal or whatever), then it (init)
> > waited 5 seconds or whatever the timeout is and sent a sigkill.
> >
> > If postgresql took longer to shutdown than that timeout and so was
> > then given a sigkill and then server turned off.... Could that do it?
> >
>
> I don't believe in this explaination actually. According the startup
> message, the error "heap_update_redo: no block" could most possibly happen
> when PostgreSQL tried to read an existing block but found that the file
> length is not long enough to have it. How could a SIGKILL truncate a data
> file like that?
>

Hrm... well i obviously have restored the database by now (using
pg_resetxlog; pg_dump; initdb; pg_restore).  However i did make a
backup of the broken directory before I created the new database.  If
anyone has any thing they would like me to try to possibly help track
down this possible bug.  I would be more than glad to do it.

Since it sounds like its something wrong with the xlog here is the
contents of the dir... Im not sure how useful this is but here it is
anyways.

pg_xlog# du -ak
16404   ./000000010000000D00000022
16404   ./000000010000000D0000001E
16404   ./000000010000000D00000019
16404   ./000000010000000D0000001A
16404   ./000000010000000D0000001D
16404   ./000000010000000D0000001C
16404   ./000000010000000D00000020
16404   ./000000010000000D00000021
16404   ./000000010000000D0000001B
4       ./archive_status
16404   ./000000010000000D00000023
16404   ./000000010000000D0000001F
16404   ./000000010000000D00000018
196856  .

They are all the same size, so it does not look like a truncated
file... Or am i just misinterpreting the error message and its one of
the files elsewhere?

The file system is ext3 and it fscked fine, and nothing is the the
lost+found dir/.  As far as i know the computer was allowed to sync
the buffers to disk before the reboot (the plug was not pulled or
anything).

Any Ideas?

Otherwise it sounds like ill just have to chalk this one up to the
gods, and hope its fixed in 8.1.4....

Re: PANIC: heap_update_redo: no block

От
"Qingqing Zhou"
Дата:
""Alex bahdushka"" <bahdushka@gmail.com> wrote
>
> Hrm... well i obviously have restored the database by now (using
> pg_resetxlog; pg_dump; initdb; pg_restore).  However i did make a
> backup of the broken directory before I created the new database.  If
> anyone has any thing they would like me to try to possibly help track
> down this possible bug.  I would be more than glad to do it.
>

pg_resetxlog is the last resort to "avoid" the real problem. Once you reset
it, then the xlog after that some offset will not get replayed (so the
problem disappered) and possibly some data get lost :-(.

Can you patch the heap/heapam.c/heap_xlog_update() like this:

- elog(PANIC, "heap_update_redo: no block");
+ elog(PANIC, "heap_update_redo: no block: target block: %u, relation
length: %u",
+    ItemPointerGetBlockNumber(&(xlrec->target.tid)),
+    RelationGetNumberOfBlocks(reln));

And restart your database to see what's the output?

Regards,
Qingqing



Re: PANIC: heap_update_redo: no block

От
Tom Lane
Дата:
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> Can you patch the heap/heapam.c/heap_xlog_update() like this:

> - elog(PANIC, "heap_update_redo: no block");
> + elog(PANIC, "heap_update_redo: no block: target block: %u, relation
> length: %u",
> +    ItemPointerGetBlockNumber(&(xlrec->target.tid)),
> +    RelationGetNumberOfBlocks(reln));

> And restart your database to see what's the output?

While at it, you should extend the error message to include the relation
ID, so you have some idea which table is affected ... this is certainly
not a very informative message ...

            regards, tom lane

Re: PANIC: heap_update_redo: no block

От
"Qingqing Zhou"
Дата:
"Tom Lane" <tgl@sss.pgh.pa.us> wrote
>
> While at it, you should extend the error message to include the relation
> ID, so you have some idea which table is affected ... this is certainly
> not a very informative message ...
>

Exactly. Please use the following version:

- elog(PANIC, "heap_update_redo: no block");
+ elog(PANIC, "heap_update_redo: no block: target blcknum: %u,
relation(%u/%u/%u) length: %u",
+    ItemPointerGetBlockNumber(&(xlrec->target.tid)),
+    reln->rd_node.spcNode,
+    reln->rd_node.dbNode,
+    reln->rd_node.relNode,
+    RelationGetNumberOfBlocks(reln));

BTW: I just realized that there is another (better) way to do so is to
enable WAL_DEBUG in xlog.h and SET XLOG_DEBUG=true. And that's why we don't
have much error message in xlog redo.

Regards,
Qingqing



Re: PANIC: heap_update_redo: no block

От
Tom Lane
Дата:
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> BTW: I just realized that there is another (better) way to do so is to
> enable WAL_DEBUG in xlog.h and SET XLOG_DEBUG=true. And that's why we don't
> have much error message in xlog redo.

That was probably Vadim's original reasoning for not being very verbose
in the redo routines' various PANIC messages.  But for failures in the
field it'd be awfully nice to be able to see this info from a standard
build, so I'm thinking we should improve the elog messages.  If you feel
like creating a patch I'll be glad to apply it ...

            regards, tom lane

Re: PANIC: heap_update_redo: no block

От
"Qingqing Zhou"
Дата:
"Tom Lane" <tgl@sss.pgh.pa.us> wrote
> "Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> > BTW: I just realized that there is another (better) way to do so is to
> > enable WAL_DEBUG in xlog.h and SET XLOG_DEBUG=true. And that's why we
don't
> > have much error message in xlog redo.
>
> That was probably Vadim's original reasoning for not being very verbose
> in the redo routines' various PANIC messages.  But for failures in the
> field it'd be awfully nice to be able to see this info from a standard
> build, so I'm thinking we should improve the elog messages.  If you feel
> like creating a patch I'll be glad to apply it ...
>

So there are three ways to do it:
(1) enable WALD_DEBUG by default
So it is user's reponsibility to enable XLOG_DEBUG to print verbose
information while at error. This adds no cost during normal running, but the
problem is that too much information (only the last is useful) may pollute
the log file.

(2) print verbose information after errror
We can change StartupXLOG like this:

PG_TRY();
{
    RmgrTable[record->xl_rmid].rm_redo(EndRecPtr, record);
}
PG_CATCH();
{
    RmgrTable[record->xl_rmid].rm_desc(buf,
         record->xl_info, XLogRecGetData(record));
    abort();
}
PG_END_CATCH();

Also, channge err_finish() so that if PANIC if InRecovery, do a
PG_RE_THROW(). The problem is this looks like a little bit hack.

(3) Replace elog in every xlog_ABC_redo() to some xlog_elog(), so the
description information can be automatically appended.

I vote for method 2.

Regards,
Qingqing



Re: PANIC: heap_update_redo: no block

От
Tom Lane
Дата:
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> (2) print verbose information after errror

If you want to do it that way, the correct thing is for the WAL replay
logic to add an error context hook to print the current WAL record,
not invent weird hacks in the error processing logic.
Compare the way bufmgr.c reports errors during smgrwrite() operations
(buffer_write_error_callback()).

One problem with the approach is that I'm not sure how robust the WAL
record description routines really are.  They seem at the least
vulnerable to buffer-overflow issues.  A crash while trying to describe
the current record wouldn't be a net improvement :-(.  I'd kind of
want to modify them to have a more robust API, eg write into a
StringInfo instead of an unspecified-size buffer.

            regards, tom lane

Re: PANIC: heap_update_redo: no block

От
"Alex bahdushka"
Дата:
On 3/21/06, Qingqing Zhou <zhouqq@cs.toronto.edu> wrote:
>
> "Tom Lane" <tgl@sss.pgh.pa.us> wrote
> >
> > While at it, you should extend the error message to include the relation
> > ID, so you have some idea which table is affected ... this is certainly
> > not a very informative message ...
> >
>
> Exactly. Please use the following version:
>
> - elog(PANIC, "heap_update_redo: no block");
> + elog(PANIC, "heap_update_redo: no block: target blcknum: %u,
> relation(%u/%u/%u) length: %u",
> +    ItemPointerGetBlockNumber(&(xlrec->target.tid)),
> +    reln->rd_node.spcNode,
> +    reln->rd_node.dbNode,
> +    reln->rd_node.relNode,
> +    RelationGetNumberOfBlocks(reln));
>
> BTW: I just realized that there is another (better) way to do so is to
> enable WAL_DEBUG in xlog.h and SET XLOG_DEBUG=true. And that's why we don't
> have much error message in xlog redo.
>

Hrm ok ill see about doing either the patch or setting wal_debug to
true (or both).  However im currently on vacation till Saturday so ill
do this first thing then and report the results back.  Thank you very
much!

> Regards,
> Qingqing
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: don't forget to increase your free space map settings
>

Re: PANIC: heap_update_redo: no block

От
"Alex bahdushka"
Дата:
<snip>

> Hrm ok ill see about doing either the patch or setting wal_debug to
> true (or both).  However im currently on vacation till Saturday so ill
> do this first thing then and report the results back.  Thank you very
> much!

Ok here is the output with wal_debug set to 1 in the config and the
final supplied patch by Qingqing.

(@)<2006-03-25 20:54:17.509 MST>[26571]LOG:  database system was
interrupted while in recovery at 2006-03-25 20:51:58 MST
(@)<2006-03-25 20:54:17.509 MST>[26571]HINT:  This probably means that
some data is corrupted and you will have to use the last backup for
recovery.
(@)<2006-03-25 20:54:17.509 MST>[26571]LOG:  checkpoint record is at D/1919D5F0
(@)<2006-03-25 20:54:17.509 MST>[26571]LOG:  redo record is at
D/191722C8; undo record is at 0/0; shutdown FALSE
(@)<2006-03-25 20:54:17.509 MST>[26571]LOG:  next transaction ID:
81148900; next OID: 16566476
(@)<2006-03-25 20:54:17.509 MST>[26571]LOG:  next MultiXactId: 1; next
MultiXactOffset: 0
(@)<2006-03-25 20:54:17.509 MST>[26571]LOG:  database system was not
properly shut down; automatic recovery in progress
(@)<2006-03-25 20:54:17.524 MST>[26571]LOG:  redo starts at D/191722C8
(@)<2006-03-25 20:54:17.524 MST>[26571]LOG:  REDO @ D/191722C8; LSN
D/19172678: prev D/191722A0; xid 81148908: Heap - update: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.524 MST>[26571]LOG:  REDO @ D/19172678; LSN
D/191726B8: prev D/191722C8; xid 81148908: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.524 MST>[26571]LOG:  REDO @ D/191726B8; LSN
D/19172788: prev D/19172678; xid 81148908: Heap - update: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.524 MST>[26571]LOG:  REDO @ D/19172788; LSN
D/191727C8: prev D/191726B8; xid 81148908: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.524 MST>[26571]LOG:  REDO @ D/191727C8; LSN
D/19172A30: prev D/19172788; xid 81148908: Heap - update: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.524 MST>[26571]LOG:  REDO @ D/19172A30; LSN
D/19172A70: prev D/191727C8; xid 81148908: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.524 MST>[26571]LOG:  REDO @ D/19172A70; LSN
D/19172F68: prev D/19172A30; xid 81148908: Heap - update: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.524 MST>[26571]LOG:  REDO @ D/19172F68; LSN
D/19172FA8: prev D/19172A70; xid 81148908: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.524 MST>[26571]LOG:  REDO @ D/19172FA8; LSN
D/19172FD0: prev D/19172F68; xid 81148908: Transaction - commit:
2006-03-18 22:51:5
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/19172FD0; LSN
D/19173088: prev D/19172FA8; xid 81148942: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/19173088; LSN
D/191730C8: prev D/19172FD0; xid 81148942: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/191730C8; LSN
D/191731E8: prev D/19173088; xid 81148942: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/191731E8; LSN
D/19173228: prev D/191730C8; xid 81148942: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/19173228; LSN
D/19173250: prev D/191731E8; xid 81148942: Transaction - commit:
2006-03-18 22:51:5
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/19173250; LSN
D/19173414: prev D/19173228; xid 81148944: Heap - update: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/19173414; LSN
D/19173454: prev D/19173250; xid 81148944: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/19173454; LSN
D/19173760: prev D/19173414; xid 81148944: Heap - update: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/19173760; LSN
D/191737A0: prev D/19173454; xid 81148944: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/191737A0; LSN
D/191737C8: prev D/19173760; xid 81148944: Transaction - commit:
2006-03-18 22:51:5
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/191737C8; LSN
D/1917380C: prev D/191737A0; xid 81148946: Heap - update: rel
1663/16386/16865; tid
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/1917380C; LSN
D/19173834: prev D/191737C8; xid 81148946: Transaction - commit:
2006-03-18 22:51:5
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/19173834; LSN
D/191739B8: prev D/1917380C; xid 81148948: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/191739B8; LSN
D/191739F8: prev D/19173834; xid 81148948: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/191739F8; LSN
D/19173AE4: prev D/191739B8; xid 81148948: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/19173AE4; LSN
D/19173B24: prev D/191739F8; xid 81148948: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/19173B24; LSN
D/19173B4C: prev D/19173AE4; xid 81148948: Transaction - commit:
2006-03-18 22:51:5
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/19173B4C; LSN
D/19173C38: prev D/19173B24; xid 81148960: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/19173C38; LSN
D/19173C78: prev D/19173B4C; xid 81148960: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.525 MST>[26571]LOG:  REDO @ D/19173C78; LSN
D/19173EFC: prev D/19173C38; xid 81148960: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/19173EFC; LSN
D/19173F3C: prev D/19173C78; xid 81148960: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/19173F3C; LSN
D/19173F64: prev D/19173EFC; xid 81148960: Transaction - commit:
2006-03-18 22:51:5
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/19173F64; LSN
D/19174064: prev D/19173F3C; xid 81148962: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/19174064; LSN
D/191740A4: prev D/19173F64; xid 81148962: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/191740A4; LSN
D/19174294: prev D/19174064; xid 81148962: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/19174294; LSN
D/191742D4: prev D/191740A4; xid 81148962: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/191742D4; LSN
D/191742FC: prev D/19174294; xid 81148962: Transaction - commit:
2006-03-18 22:51:5
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/191742FC; LSN
D/191743E8: prev D/191742D4; xid 81148964: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/191743E8; LSN
D/19174428: prev D/191742FC; xid 81148964: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/19174428; LSN
D/191746FC: prev D/191743E8; xid 81148964: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/191746FC; LSN
D/1917473C: prev D/19174428; xid 81148964: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/1917473C; LSN
D/19174764: prev D/191746FC; xid 81148964: Transaction - commit:
2006-03-18 22:51:5
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/19174764; LSN
D/19174824: prev D/1917473C; xid 81148966: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/19174824; LSN
D/19174864: prev D/19174764; xid 81148966: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/19174864; LSN
D/191749A4: prev D/19174824; xid 81148966: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/191749A4; LSN
D/191749E4: prev D/19174864; xid 81148966: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/191749E4; LSN
D/19174A0C: prev D/191749A4; xid 81148966: Transaction - commit:
2006-03-18 22:51:5
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/19174A0C; LSN
D/19174D18: prev D/191749E4; xid 81148968: Heap - update: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/19174D18; LSN
D/19174D58: prev D/19174A0C; xid 81148968: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.526 MST>[26571]LOG:  REDO @ D/19174D58; LSN
D/19174E60: prev D/19174D18; xid 81148968: Heap - update: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/19174E60; LSN
D/19174EA0: prev D/19174D58; xid 81148968: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/19174EA0; LSN
D/19174EC8: prev D/19174E60; xid 81148968: Transaction - commit:
2006-03-18 22:52:0
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/19174EC8; LSN
D/1917508C: prev D/19174EA0; xid 81148970: Heap - update: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/1917508C; LSN
D/191750CC: prev D/19174EC8; xid 81148970: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/191750CC; LSN
D/19175194: prev D/1917508C; xid 81148970: Heap - update: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/19175194; LSN
D/191751D4: prev D/191750CC; xid 81148970: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/191751D4; LSN
D/191752C8: prev D/19175194; xid 81148970: Heap - update: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/191752C8; LSN
D/19175308: prev D/191751D4; xid 81148970: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/19175308; LSN
D/191753D0: prev D/191752C8; xid 81148970: Heap - update: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/191753D0; LSN
D/19175410: prev D/19175308; xid 81148970: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/19175410; LSN
D/19175438: prev D/191753D0; xid 81148970: Transaction - commit:
2006-03-18 22:52:0
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/19175438; LSN
D/19175524: prev D/19175410; xid 81148972: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/19175524; LSN
D/19175564: prev D/19175438; xid 81148972: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/19175564; LSN
D/19175844: prev D/19175524; xid 81148972: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/19175844; LSN
D/19175884: prev D/19175564; xid 81148972: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/19175884; LSN
D/1917596C: prev D/19175844; xid 81148972: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/1917596C; LSN
D/191759AC: prev D/19175884; xid 81148972: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/191759AC; LSN
D/191759D4: prev D/1917596C; xid 81148972: Transaction - commit:
2006-03-18 22:52:0
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/191759D4; LSN
D/19175B98: prev D/191759AC; xid 81148974: Heap - update: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/19175B98; LSN
D/19175BD8: prev D/191759D4; xid 81148974: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.527 MST>[26571]LOG:  REDO @ D/19175BD8; LSN
D/19175EE4: prev D/19175B98; xid 81148974: Heap - update: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.528 MST>[26571]LOG:  REDO @ D/19175EE4; LSN
D/19175F24: prev D/19175BD8; xid 81148974: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.528 MST>[26571]LOG:  REDO @ D/19175F24; LSN
D/19175F4C: prev D/19175EE4; xid 81148974: Transaction - commit:
2006-03-18 22:52:0
(@)<2006-03-25 20:54:17.528 MST>[26571]LOG:  REDO @ D/19175F4C; LSN
D/19176104: prev D/19175F24; xid 81148976: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.528 MST>[26571]LOG:  REDO @ D/19176104; LSN
D/19176144: prev D/19175F4C; xid 81148976: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.528 MST>[26571]LOG:  REDO @ D/19176144; LSN
D/19176230: prev D/19176104; xid 81148976: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.528 MST>[26571]LOG:  REDO @ D/19176230; LSN
D/19176270: prev D/19176144; xid 81148976: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.528 MST>[26571]LOG:  REDO @ D/19176270; LSN
D/19176298: prev D/19176230; xid 81148976: Transaction - commit:
2006-03-18 22:52:0
(@)<2006-03-25 20:54:17.528 MST>[26571]LOG:  REDO @ D/19176298; LSN
D/19176384: prev D/19176270; xid 81148978: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.528 MST>[26571]LOG:  REDO @ D/19176384; LSN
D/191763C4: prev D/19176298; xid 81148978: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.528 MST>[26571]LOG:  REDO @ D/191763C4; LSN
D/191765A8: prev D/19176384; xid 81148978: Heap - insert: rel
1663/16386/2619; tid
(@)<2006-03-25 20:54:17.528 MST>[26571]LOG:  REDO @ D/191765A8; LSN
D/191765E8: prev D/191763C4; xid 81148978: Btree - insert: rel
1663/16386/2696; tid
(@)<2006-03-25 20:54:17.528 MST>[26571]LOG:  REDO @ D/191765E8; LSN
D/19176610: prev D/191765A8; xid 81148978: Transaction - commit:
2006-03-18 22:52:0
(@)<2006-03-25 20:54:17.528 MST>[26571]LOG:  REDO @ D/19176610; LSN
D/19176644: prev D/191765E8; xid 81148979: Heap - clean: rel
1663/16386/16559898; b
(@)<2006-03-25 20:54:17.528 MST>[26571]LOG:  REDO @ D/19176644; LSN
D/191766A4: prev D/19176610; xid 81148979: Heap - move: rel
1663/16386/16559898; ti
(@)<2006-03-25 20:54:17.528 MST>[26571]PANIC:  heap_update_redo: no
block: target blcknum: 1, relation(1663/16386/16559898) length: 1
(@)<2006-03-25 20:54:17.529 MST>[26570]LOG:  startup process (PID
26571) was terminated by signal 6
(@)<2006-03-25 20:54:17.530 MST>[26570]LOG:  aborting startup due to
startup process failure

Ok so whats next?

Re: PANIC: heap_update_redo: no block

От
"Qingqing Zhou"
Дата:
""Alex bahdushka"" <bahdushka@gmail.com> wrote
>
> (@)<2006-03-25 20:54:17.528 MST>[26571]LOG:  REDO @ D/19176610; LSN
> D/19176644: prev D/191765E8; xid 81148979: Heap - clean: rel
> 1663/16386/16559898; b
> (@)<2006-03-25 20:54:17.528 MST>[26571]LOG:  REDO @ D/19176644; LSN
> D/191766A4: prev D/19176610; xid 81148979: Heap - move: rel
> 1663/16386/16559898; ti
> (@)<2006-03-25 20:54:17.528 MST>[26571]PANIC:  heap_update_redo: no
> block: target blcknum: 1, relation(1663/16386/16559898) length: 1
> (@)<2006-03-25 20:54:17.529 MST>[26570]LOG:  startup process (PID
> 26571) was terminated by signal 6
>

It is like the problem due to the confliction of vacuum & update. The update
is on the page that vacuum was just removed.

Before we try to understand/attack the bug exactly, first I'd like to see
the complete xlog output. Your xlog output is imcomplete -- for example, the
first line after "b" should be "lk %u", the second line after "ti" should be
"d %u%u".  Can you check out why the output looks like that? If you want to
examine the source code, look at heapam.c/heap_desc().

Regards,
Qingqing




Re: PANIC: heap_update_redo: no block

От
"Alex bahdushka"
Дата:
> It is like the problem due to the confliction of vacuum & update. The update
> is on the page that vacuum was just removed.
>
> Before we try to understand/attack the bug exactly, first I'd like to see
> the complete xlog output. Your xlog output is imcomplete -- for example, the
> first line after "b" should be "lk %u", the second line after "ti" should be
> "d %u%u".  Can you check out why the output looks like that? If you want to
> examine the source code, look at heapam.c/heap_desc().


Gah, yeah it looks like a stupid copy and paste error, im attaching it
this time... hope thats ok

Вложения

Re: PANIC: heap_update_redo: no block

От
"Qingqing Zhou"
Дата:
""Alex bahdushka"" <bahdushka@gmail.com> wrote
>
> LOG:  REDO @ D/19176610; LSN D/19176644: prev D/191765E8; xid 81148979:
> Heap - clean: rel 1663/16386/16559898; blk 0
> LOG:  REDO @ D/19176644; LSN D/191766A4: prev D/19176610; xid 81148979:
> Heap - move: rel 1663/16386/16559898; tid 1/1; new 0/10
> PANIC:  heap_update_redo: no block: target blcknum: 1,
> relation(1663/16386/16559898) length: 1
>
What happened that time is like this: heap 16559898 has two pages {0, 1}. A
vacuum full first examines page 0, do some cleaning; then it comes to the
second page, move the item 1/1 to page 0 -- as a byproduct, the heap is
truncated to only 1 page since 1/1 is the only item on the second page. At
this time, system crashed, and heap 16559898 has only 1 page left.  At the
xlog startup time, for some unknown reason( or I am totally wrong),
PostgreSQL didn't extend the heap to 2 blocks, thus the
heap_update_redo(more exactly, should be heap_move_redo) failed. But what's
the unknown reason, I really don't have a clue :-(.

Actually I tried to simulate your situation, but everytime I got a neat
recovery -- the only carvet is that depends on the XLOG_SMGR_TRUNCATE
written down or not, there may one extra useless page at the end of heap,
which is not a problem at all.

Regards,
Qingqing




Re: PANIC: heap_update_redo: no block

От
Tom Lane
Дата:
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> Actually I tried to simulate your situation, but everytime I got a neat
> recovery --

You probably filled the test table and then vacuumed within a single
checkpoint cycle, so that the replay sequence included loading data into
page 1 in the first place.  The risk case is fill table, checkpoint,
vacuum, crash; because then the replay starts from the checkpoint and
won't re-create page 1.

            regards, tom lane

Re: PANIC: heap_update_redo: no block

От
"Qingqing Zhou"
Дата:
""Alex bahdushka"" <bahdushka@gmail.com> wrote
>
> (@)<2006-03-18 23:30:33.035 MST>[3791]PANIC:  heap_update_redo: no block
>

According to the discussion in pgsql-hackers, to finish this case, did you
turn off the full_page_writes parameter? I hope the answer is "yes" ...

Regards,
Qingqing




Re: PANIC: heap_update_redo: no block

От
"Alex bahdushka"
Дата:
On 3/31/06, Qingqing Zhou <zhouqq@cs.toronto.edu> wrote:
>
> ""Alex bahdushka"" <bahdushka@gmail.com> wrote
> >
> > (@)<2006-03-18 23:30:33.035 MST>[3791]PANIC:  heap_update_redo: no block
> >
>
> According to the discussion in pgsql-hackers, to finish this case, did you
> turn off the full_page_writes parameter? I hope the answer is "yes" ...
>

If by off you mean full_page_writes = on then yes.

Thanks for all your help!