Обсуждение: Drop and reload table to clear corruption?

Поиск
Список
Период
Сортировка

Drop and reload table to clear corruption?

От
jboes@nexcerpt.com (Jeff Boes)
Дата:
We have a few tables that seem to have suffered some kind of
corruption in their TOAST files.  Symptoms include:

 + Runaway CPU usage during VACUUM ANALYZE (load average
   spikes of 20x normal for 10-15 minutes at a time) and
   more rarely during other operations

 + Recurring messages of "ERROR:  Parent tuple was not found"
   during VACUUM FULL

We have upgraded from 7.2.1 to 7.2.3 without apparent effect.  Now
what I would like to try is:

1. pg_dump the table

2. truncate the table

3. VACUUM FULL  (is this necessary?)

4. reload from pg_dump file

However, due to the very large number of foreign key constraints into
this table, I really don't want to drop the table, nor do I want to
trigger cascading deletes or updates.  Can I use the same snippet of
code generated by pg_dump to disable the triggers?  Are there any
other precautions I should take?

Please CC me on any responses, as I read this list via NNTP, and
news.postgresql.org seems to be on the blink this week.

Re: Drop and reload table to clear corruption?

От
Tom Lane
Дата:
jboes@nexcerpt.com (Jeff Boes) writes:
> We have a few tables that seem to have suffered some kind of
> corruption in their TOAST files.  Symptoms include:

>  + Runaway CPU usage during VACUUM ANALYZE (load average
>    spikes of 20x normal for 10-15 minutes at a time) and
>    more rarely during other operations

This seems odd.  Can you attach to one of the runaway backends with a
debugger and get a stack trace?  That might give some clue what the
problem is.

>  + Recurring messages of "ERROR:  Parent tuple was not found"
>    during VACUUM FULL

The cause of this (or a cause of this, anyway) is known and fixed for
7.3.  If you like you could try the attached back-patch for 7.2.3,
which fixes the most common case.  (Plug: this patch is straight out of
the RHDB sources; if you are on a Red Hat platform I'd suggest trying
7.2.3-RH as soon as it's out.)

If that doesn't seem to help then you can go on with plan B:

> what I would like to try is:
> 1. pg_dump the table
> 2. truncate the table
> 3. VACUUM FULL  (is this necessary?)

No, not if you truncated the table.

> 4. reload from pg_dump file

This should work to remove any data corruption in the table, assuming
you are able to get a clean dump.  (You want a data-only dump of
course.)

> Can I use the same snippet of code generated by pg_dump to disable the
> triggers?

Should work.  I think pg_dump may provide that for free in a data-only
dump, anyway.

BTW, there is an oversight in 7.2's TRUNCATE code: it only truncates the
given relation and not the TOAST table for same.  But fortunately it
also fails to check just what you're truncating, so you can manually
issue a TRUNCATE against the TOAST table after truncating the main
table.  (But I think you might have to run a standalone postgres with
-O -P to be allowed to do the latter.)

            regards, tom lane


*** src/backend/commands/vacuum.c.orig    Mon Sep 30 15:45:57 2002
--- src/backend/commands/vacuum.c    Sat Oct 12 14:21:29 2002
***************
*** 187,192 ****
--- 187,196 ----
      if (IsTransactionBlock())
          elog(ERROR, "%s cannot run inside a BEGIN/END block", stmttype);

+     /* Running VACUUM from a function would free the function context */
+     if (!MemoryContextContains(QueryContext, vacstmt))
+         elog(ERROR, "%s cannot be executed from a function", stmttype);
+
      /*
       * Send info about dead objects to the statistics collector
       */
***************
*** 1320,1326 ****
          usable_free_size = 0;
      }

!     if (usable_free_size > 0 && num_vtlinks > 0)
      {
          qsort((char *) vtlinks, num_vtlinks, sizeof(VTupleLinkData),
                vac_cmp_vtlinks);
--- 1324,1331 ----
          usable_free_size = 0;
      }

!     /* don't bother to save vtlinks if we will not call repair_frag */
!     if (fraged_pages->num_pages > 0 && num_vtlinks > 0)
      {
          qsort((char *) vtlinks, num_vtlinks, sizeof(VTupleLinkData),
                vac_cmp_vtlinks);
***************
*** 1602,1608 ****
               */
              if ((tuple.t_data->t_infomask & HEAP_UPDATED &&
               !TransactionIdPrecedes(tuple.t_data->t_xmin, OldestXmin)) ||
!                 (!(tuple.t_data->t_infomask & HEAP_XMAX_INVALID) &&
                   !(ItemPointerEquals(&(tuple.t_self),
                                       &(tuple.t_data->t_ctid)))))
              {
--- 1607,1614 ----
               */
              if ((tuple.t_data->t_infomask & HEAP_UPDATED &&
               !TransactionIdPrecedes(tuple.t_data->t_xmin, OldestXmin)) ||
!                 (!(tuple.t_data->t_infomask & (HEAP_XMAX_INVALID |
!                                                HEAP_MARKED_FOR_UPDATE)) &&
                   !(ItemPointerEquals(&(tuple.t_self),
                                       &(tuple.t_data->t_ctid)))))
              {
***************
*** 1633,1639 ****
                   * If this tuple is in the begin/middle of the chain then
                   * we have to move to the end of chain.
                   */
!                 while (!(tp.t_data->t_infomask & HEAP_XMAX_INVALID) &&
                         !(ItemPointerEquals(&(tp.t_self),
                                             &(tp.t_data->t_ctid))))
                  {
--- 1639,1646 ----
                   * If this tuple is in the begin/middle of the chain then
                   * we have to move to the end of chain.
                   */
!                 while (!(tp.t_data->t_infomask & (HEAP_XMAX_INVALID |
!                                                   HEAP_MARKED_FOR_UPDATE)) &&
                         !(ItemPointerEquals(&(tp.t_self),
                                             &(tp.t_data->t_ctid))))
                  {

Re: Drop and reload table to clear corruption?

От
Jeff Boes
Дата:
On Tue, 2002-10-22 at 17:02, Tom Lane wrote:
> jboes@nexcerpt.com (Jeff Boes) writes:
> > We have a few tables that seem to have suffered some kind of
> > corruption in their TOAST files.  Symptoms include:
>
> >  + Runaway CPU usage during VACUUM ANALYZE (load average
> >    spikes of 20x normal for 10-15 minutes at a time) and
> >    more rarely during other operations
>
> This seems odd.  Can you attach to one of the runaway backends with a
> debugger and get a stack trace?  That might give some clue what the
> problem is.

No; we can't do *anything* while this is happening.  The load average
(on a Linux system) runs up to 30+ within a few minutes.  By the time we
notice it happening, no command prompts are responsive.


--
Jeff Boes                                      vox 616.226.9550 ext 24
Database Engineer                                     fax 616.349.9076
Nexcerpt, Inc.                                 http://www.nexcerpt.com
           ...Nexcerpt... Extend your Expertise