> Yeah, this is proof that what it was doing is the same as what we saw in
> Jeff's backtrace, ie loading up the system catalog relcache entries the
> hard way via seqscans on the core catalogs. So the question to be
> answered is why that's suddenly a big performance bottleneck. It's not
> a cheap operation of course (that's why we cache the results ;-)) but
> it shouldn't take minutes either. And, because they are seqscans, it
> doesn't seem like messed-up indexes should matter.
FWIW, this appeared to be an all-or-nothing event: either every new backend
was suffering through this, or none were. They all seemed to clear up
at the same time as well.
> The theory I have in mind about Jeff's case is that it was basically an
> I/O storm, but it's not clear whether the same explanation works for
> your case. There may be some other contributing factor that we haven't
> identified yet.
Let me know if you think of anything particular I can test while it is
happening again. I'll try to arrange a (netapp) snapshot the next time
it happens as well (this system is too busy and too large to do anything
else).
--
Greg Sabino Mullane greg@endpoint.com
End Point Corporation
PGP Key: 0x14964AC8