Re: error: could not find pg_class tuple for index 2662

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: error: could not find pg_class tuple for index 2662
Дата
Msg-id 27557.1312219429@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: error: could not find pg_class tuple for index 2662  (daveg <daveg@sonic.net>)
Ответы Re: error: could not find pg_class tuple for index 2662  (daveg <daveg@sonic.net>)
Список pgsql-hackers
daveg <daveg@sonic.net> writes:
> On Sun, Jul 31, 2011 at 11:44:39AM -0400, Tom Lane wrote:
>> I think we need to start adding some instrumentation so we can get a
>> better handle on what's going on in your database.  If I were to send
>> you a source-code patch for the server that adds some more logging
>> printout when this happens, would you be willing/able to run a patched
>> build on your machine?

> Yes we can run an instrumented server so long as the instrumentation does
> not interfere with normal operation. However, scheduling downtime to switch
> binaries is difficult, and generally needs to be happen on a weekend, but
> sometimes can be expedited. I'll look into that.

OK, attached is a patch against 9.0 branch that will re-scan pg_class
after a failure of this sort occurs, and log what it sees in the tuple
header fields for each tuple for the target index.  This should give us
some useful information.  It might be worthwhile for you to also log the
results of

select relname,pg_relation_filenode(oid) from pg_class
where relname like 'pg_class%';

in your script that does VACUUM FULL, just before and after each time it
vacuums pg_class.  That will help in interpreting the relfilenodes in
the log output.

> My observations so far are:

>  - the error occurs at commit of vacuum full of pg_class
>  - in these cases error hits autovacuum after it waited for a lock on pg_class
>  - in these two cases there was a new process startup while the vacuum was
>    running. Don't know if this is relevant.

Interesting.  We'll want to know whether that happens every time.

>  - while these hit autovacuum, the error does hit other processs (just not in
>    these sessions).  Unknown if autovacuum is a required component.

Good question.  Please consider setting log_autovacuum_min_duration = 0
so that the log also traces all autovacuum activity.

            regards, tom lane

diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c
index 81cea8b60406dffa7b5d278697ab5ad6cef6a3d8..d7957ba4e0055a4f9f7721e06b72cbeb0ddb350e 100644
*** a/src/backend/utils/cache/relcache.c
--- b/src/backend/utils/cache/relcache.c
***************
*** 32,37 ****
--- 32,38 ----

  #include "access/genam.h"
  #include "access/reloptions.h"
+ #include "access/relscan.h"
  #include "access/sysattr.h"
  #include "access/transam.h"
  #include "access/xact.h"
***************
*** 64,69 ****
--- 65,71 ----
  #include "optimizer/var.h"
  #include "rewrite/rewriteDefine.h"
  #include "storage/fd.h"
+ #include "storage/bufmgr.h"
  #include "storage/lmgr.h"
  #include "storage/smgr.h"
  #include "utils/array.h"
*************** ScanPgRelation(Oid targetRelId, bool ind
*** 310,315 ****
--- 312,379 ----
  }

  /*
+  *        ScanPgRelationDetailed
+  *
+  *        Try to figure out why we failed to locate row for relation.
+  */
+ static HeapTuple
+ ScanPgRelationDetailed(Oid targetRelId)
+ {
+     HeapTuple    pg_class_tuple;
+     Relation    pg_class_desc;
+     SysScanDesc pg_class_scan;
+     ScanKeyData key[1];
+
+     /*
+      * form a scan key
+      */
+     ScanKeyInit(&key[0],
+                 ObjectIdAttributeNumber,
+                 BTEqualStrategyNumber, F_OIDEQ,
+                 ObjectIdGetDatum(targetRelId));
+
+     /*
+      * Open pg_class and fetch tuples, forcing heap scan and disabling
+      * visibility checks.
+      */
+     pg_class_desc = heap_open(RelationRelationId, AccessShareLock);
+     pg_class_scan = systable_beginscan(pg_class_desc, ClassOidIndexId,
+                                        false,
+                                        SnapshotAny,
+                                        1, key);
+
+     while (HeapTupleIsValid((pg_class_tuple = systable_getnext(pg_class_scan))))
+     {
+         Buffer        buf = pg_class_scan->scan->rs_cbuf;
+         bool        valid;
+
+         /* need buffer lock to call HeapTupleSatisfiesVisibility */
+         LockBuffer(buf, BUFFER_LOCK_SHARE);
+         valid = HeapTupleSatisfiesVisibility(pg_class_tuple,
+                                              SnapshotNow,
+                                              buf);
+         LockBuffer(buf, BUFFER_LOCK_UNLOCK);
+
+         elog(LOG, "searching %u for pg_class tuple for index %u: found ctid (%u,%u), xmin %u, xmax %u, flags 0x%4x
0x%4x,valid %d", 
+              pg_class_desc->rd_node.relNode,
+              targetRelId,
+              ItemPointerGetBlockNumber(&(pg_class_tuple->t_self)),
+              ItemPointerGetOffsetNumber(&(pg_class_tuple->t_self)),
+              HeapTupleHeaderGetXmin(pg_class_tuple->t_data),
+              HeapTupleHeaderGetXmax(pg_class_tuple->t_data),
+              pg_class_tuple->t_data->t_infomask,
+              pg_class_tuple->t_data->t_infomask2,
+              valid);
+     }
+
+     /* all done */
+     systable_endscan(pg_class_scan);
+     heap_close(pg_class_desc, AccessShareLock);
+
+     return NULL;
+ }
+
+ /*
   *        AllocateRelationDesc
   *
   *        This is used to allocate memory for a new relation descriptor
*************** RelationReloadIndexInfo(Relation relatio
*** 1737,1744 ****
      indexOK = (RelationGetRelid(relation) != ClassOidIndexId);
      pg_class_tuple = ScanPgRelation(RelationGetRelid(relation), indexOK);
      if (!HeapTupleIsValid(pg_class_tuple))
!         elog(ERROR, "could not find pg_class tuple for index %u",
!              RelationGetRelid(relation));
      relp = (Form_pg_class) GETSTRUCT(pg_class_tuple);
      memcpy(relation->rd_rel, relp, CLASS_TUPLE_SIZE);
      /* Reload reloptions in case they changed */
--- 1801,1815 ----
      indexOK = (RelationGetRelid(relation) != ClassOidIndexId);
      pg_class_tuple = ScanPgRelation(RelationGetRelid(relation), indexOK);
      if (!HeapTupleIsValid(pg_class_tuple))
!     {
!         pg_class_tuple = ScanPgRelationDetailed(RelationGetRelid(relation));
!         if (!HeapTupleIsValid(pg_class_tuple))
!             elog(ERROR, "could not find pg_class tuple for index %u",
!                  RelationGetRelid(relation));
!         else
!             elog(LOG, "could not find pg_class tuple for index %u, but succeeded on second try",
!                  RelationGetRelid(relation));
!     }
      relp = (Form_pg_class) GETSTRUCT(pg_class_tuple);
      memcpy(relation->rd_rel, relp, CLASS_TUPLE_SIZE);
      /* Reload reloptions in case they changed */

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: One-Shot Plans
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Compressing the AFTER TRIGGER queue