Обсуждение: Useless code in RelationCacheInitializePhase3
While looking at the pending patch to clean up management of rd_partcheck, I noticed that RelationCacheInitializePhase3 has code that purports to reload rd_partkey and rd_partdesc, but none for rd_partcheck. However, that reload code is dead code, as is easily confirmed by checking the code coverage report, because we have no partitioned system catalogs. Moreover, if somebody tried to add such a catalog, I'd bet a good deal of money that this code would not work. It seems highly unlikely that we could run RelationBuildPartitionKey or RelationBuildPartitionDesc successfully when we haven't even finished bootstrapping the relcache. I don't think that this foolishness is entirely the fault of the partitioning work; it's evidently modeled on the adjacent code to reload rules, triggers, and row security code. But that code is all equally dead, equally unlikely to work if somebody tried to invoke it, and equally likely to be forever unused because there are many other problems you'd have to surmount to support something like triggers or row security on system catalogs. I'm inclined to remove almost everything below the comment "Fix data that isn't saved in relcache cache file", and replace it with either assertions or test-and-elogs that say that a relation appearing in the cache file can't have triggers or rules or row security or be partitioned. I am less sure about whether the table-access-method stanza is as silly as the rest, but I do see that it's unreached in current testing. So I wonder whether there is any thought that we'd realistically support catalogs with nondefault AMs, and if there is, does anyone think that this code would work? regards, tom lane
Hi, On 2019-04-12 14:17:11 -0400, Tom Lane wrote: > While looking at the pending patch to clean up management of > rd_partcheck, I noticed that RelationCacheInitializePhase3 has code that > purports to reload rd_partkey and rd_partdesc, but none for rd_partcheck. > However, that reload code is dead code, as is easily confirmed by > checking the code coverage report, because we have no partitioned system > catalogs. > > Moreover, if somebody tried to add such a catalog, I'd bet a good deal > of money that this code would not work. It seems highly unlikely that > we could run RelationBuildPartitionKey or RelationBuildPartitionDesc > successfully when we haven't even finished bootstrapping the relcache. But it sure would be nice if we made it work at some point. Having e.g. global, permanent + unlogged, and temporary tables attributes in a separate pg_attribute would be quite an advantage (and much easier than a separate pg_class). Obviously even that is *far* from trivial. > I don't think that this foolishness is entirely the fault of the > partitioning work; it's evidently modeled on the adjacent code to reload > rules, triggers, and row security code. But that code is all equally > dead, equally unlikely to work if somebody tried to invoke it, and > equally likely to be forever unused because there are many other > problems you'd have to surmount to support something like triggers or > row security on system catalogs. I don't see us wanting to go to supporting triggers, but I could see us desiring RLS at some point. To hide rows a user doesn't have access to. > I am less sure about whether the table-access-method stanza is as silly > as the rest, but I do see that it's unreached in current testing. > So I wonder whether there is any thought that we'd realistically support > catalogs with nondefault AMs, and if there is, does anyone think that > this code would work? Right now it definitely won't work, most importantly because there's a fair bit of catalog related code that triggers direct heap_insert/update/delete, and expects systable_getnext() to not need memory to allocate the result in the current context (hence the !shouldFree assert) and just generally because a lot of places just straight up assume the catalog is heap. Most of that would be fairly easy to fix however. A lot of rote work, but technically not hard. The hardest is probably a bunch of code that uses xmin for cache validation and such, but that seems solvable. I don't quite know however how we'd use the ability to technically be able to have a different AM for catalog tables. One possible thing would be using different builtin AMs for different catalog tables, that seems like it'd not be too hard. But after that it gets harder - e.g. doing an initdb with a different default AM sounds not impossible, but also far from easy (we can't do pg_proc lookups before having initialized it, which is why formrdesc hardcodes GetHeapamTableAmRoutine()). And having different AMs per database seems even harder. I think it probably would work for catalog tables, as it's coded right now. There's no catalog lookups RelationInitTableAccessMethod() for tables that return true for IsCatalogTable(). In fact, I think we should apply something like: diff --git a/src/backend/utils/cache/relcache.c b/src/backend/utils/cache/relcache.c index 64f3c2e8870..7ff64b108c4 100644 --- a/src/backend/utils/cache/relcache.c +++ b/src/backend/utils/cache/relcache.c @@ -1746,6 +1746,7 @@ RelationInitTableAccessMethod(Relation relation) * seem prudent to show that in the catalog. So just overwrite it * here. */ + Assert(relation->rd_rel->relam == InvalidOid); relation->rd_amhandler = HEAP_TABLE_AM_HANDLER_OID; } else if (IsCatalogRelation(relation)) @@ -1935,8 +1936,7 @@ formrdesc(const char *relationName, Oid relationReltype, /* * initialize the table am handler */ - relation->rd_rel->relam = HEAP_TABLE_AM_OID; - relation->rd_tableam = GetHeapamTableAmRoutine(); + RelationInitTableAccessMethod(relation); /* * initialize the rel-has-index flag, using hardwired knowledge To a) ensure that that is and stays the case b) avoid having the necessary information in multiple places. Not sure why we not ended up doing the thing in the second hunk earlier. Just using RelationInitTableAccessMethod() seems cleaner to me. Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2019-04-12 14:17:11 -0400, Tom Lane wrote: >> While looking at the pending patch to clean up management of >> rd_partcheck, I noticed that RelationCacheInitializePhase3 has code that >> purports to reload rd_partkey and rd_partdesc, but none for rd_partcheck. >> However, that reload code is dead code, as is easily confirmed by >> checking the code coverage report, because we have no partitioned system >> catalogs. >> >> Moreover, if somebody tried to add such a catalog, I'd bet a good deal >> of money that this code would not work. It seems highly unlikely that >> we could run RelationBuildPartitionKey or RelationBuildPartitionDesc >> successfully when we haven't even finished bootstrapping the relcache. > But it sure would be nice if we made it work at some point. Whether it would be nice or not is irrelevant to my point: this code doesn't work, and it's unlikely that it would ever be part of a working solution. I don't think there's any way that it'd be sane to attempt catalog accesses during RelationCacheInitializePhase3. If we want any of these features for system catalogs, I think the route to a real fix would be to make them load-on-demand data so that they can be fetched later on. Or, possibly, the easiest way is to include these data structures in the dumped cache file. But what's here is a dead end. I'd even call it an attractive nuisance, because it encourages people to add yet more nonfunctional code, rather than pointing them in the direction of doing something useful. >> I am less sure about whether the table-access-method stanza is as silly >> as the rest, but I do see that it's unreached in current testing. >> So I wonder whether there is any thought that we'd realistically support >> catalogs with nondefault AMs, and if there is, does anyone think that >> this code would work? > Right now it definitely won't work, Sure, I wasn't expecting that. The question is the same as above: is it plausible that this code would appear in this form in a complete working implementation? If not, I think we should rip it out rather than leave the impression that we think it does something useful. > I think it probably would work for catalog tables, as it's coded right > now. There's no catalog lookups RelationInitTableAccessMethod() for > tables that return true for IsCatalogTable(). In fact, I think we should > apply something like: Makes sense, and I'd also add some comments pointing out that there had better not be any catalog lookups when this is called for a system catalog. regards, tom lane
I wrote: > Whether it would be nice or not is irrelevant to my point: this code > doesn't work, and it's unlikely that it would ever be part of a working > solution. I don't think there's any way that it'd be sane to attempt > catalog accesses during RelationCacheInitializePhase3. BTW, to clarify that: obviously, this loop *does* access pg_class, and pg_class's indexes too. The issue here is that if any of these other stanzas ever really executed, we would be doing accesses to a bunch of other catalogs as well, meaning that their relcache entries would have to already exist in a state valid enough to permit access. That would mean that they'd have to be treated as bootstrap catalogs so that we could create hardwired entries with formrdesc. That's not a direction I want to go in. Bootstrap catalogs are a huge pain to maintain; we don't want any more than the absolute minimum of them. regards, tom lane
Hello, On 2019-Apr-13, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > I think it probably would work for catalog tables, as it's coded right > > now. There's no catalog lookups RelationInitTableAccessMethod() for > > tables that return true for IsCatalogTable(). In fact, I think we should > > apply something like: > > Makes sense, and I'd also add some comments pointing out that there had > better not be any catalog lookups when this is called for a system > catalog. I think this was forgotten ... -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services