On Thu, Aug 04, 2011 at 04:16:08PM -0400, Tom Lane wrote:
> daveg <daveg@sonic.net> writes:
> > We are seeing "cannot read' and 'cannot open' errors too that would be
> > consistant with trying to use a vanished file.
>
> Yeah, these all seem consistent with the idea that the failing backend
> somehow missed an update for the relation mapping file. You would get
> the "could not find pg_class tuple" syndrome if the process was holding
> an open file descriptor for the now-deleted file, and otherwise cannot
> open/cannot read type errors. And unless it later received another
> sinval message for the relation mapping file, the errors would persist.
>
> If this theory is correct then all of the file-related errors ought to
> match up to recently-vacuumed mapped catalogs or indexes (those are the
> ones with relfilenode = 0 in pg_class). Do you want to expand your
> logging of the VACUUM FULL actions and see if you can confirm that idea?
At your service, what would you like to see?
> Since the machine is running RHEL, I think we can use glibc's
> backtrace() function to get simple stack traces without too much effort.
> I'll write and test a patch and send it along in a bit.
Great.
Any point to try to capture SI events somehow?
-dg
--
David Gould daveg@sonic.net 510 536 1443 510 282 0869
If simplicity worked, the world would be overrun with insects.