Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Apr 1, 2015 at 7:05 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I've not fully tracked it down, but I think that the blame falls on the
>> MVCC-snapshots-for-catalog-scans patch; it appears that it's trying to
>> read pg_am's pg_class entry with a snapshot that's too old, possibly
>> because it assumes that sinval signaling is alive which I think ain't so.
> Hmm, sorry for the bug. It looks to me like sinval signaling gets
> started up at the beginning of InitPostgres(). Perhaps something like
> this?
Yeah, it does, so you'd think this would work. I've now tracked down
exactly what's happening, and it's this: while we're reading pg_authid
during PerformAuthentication, we establish a catalog snapshot. Later on,
we read pg_database to find out what DB we're connecting to. If any
sinval messages for unshared system catalogs arrive at that point, they'll
effectively be ignored, *because our MyDatabaseId is still zero and so
doesn't match the incoming message*. That's okay as far as the catcaches
are concerned, cause they're empty, and the relcache only knows about
shared catalogs so far. But InvalidateCatalogSnapshot doesn't get called
by LocalExecuteInvalidationMessage, and so we think we can plow ahead with
the old snapshot.
It looks to me like an appropriate fix would be as attached; thoughts?
(I've tested this with a delay during relation lock acquisition and
concurrent whole-database VACUUM FULLs, and so far been unable to break
it.)
regards, tom lane
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 1e646a1..6aa6868 100644
*** a/src/backend/utils/init/postinit.c
--- b/src/backend/utils/init/postinit.c
*************** InitPostgres(const char *in_dbname, Oid
*** 853,858 ****
--- 853,866 ----
MyProc->databaseId = MyDatabaseId;
/*
+ * We may have already established a catalog snapshot while trying to read
+ * pg_authid; but until we have set up MyDatabaseId, we won't react to
+ * incoming sinval messages properly, so we won't realize it if the
+ * snapshot has been invalidated. Best to assume it's no good anymore.
+ */
+ InvalidateCatalogSnapshot();
+
+ /*
* Now, take a writer's lock on the database we are trying to connect to.
* If there is a concurrently running DROP DATABASE on that database, this
* will block us until it finishes (and has committed its update of