Обсуждение: Ye olde "relation doesn't quite exist" problem
I just gave an incorrect answer on pgsql-interfaces concerning the following bug, which has been around for quite a while: regression=> create table bug1 (f1 int28 primary key); NOTICE: CREATE TABLE/PRIMARY KEY will create implicit index 'bug1_pkey' for table 'bug1' ERROR: Can't find a default operator class for type 22. -- That's fine, since we have no index support for int28. But: regression=> create table bug1 (f1 int28); ERROR: Relation 'bug1' already exists -- Oops. regression=> drop table bug1; ERROR: Relation 'bug1' does not exist -- Double oops. I'm pretty sure I recall a discussion to the effect that CREATE TABLE was failing in this case because pgsql/data/base/dbname/bug1 had already been created and wasn't deleted at transaction abort. That may have been the case in older versions of Postgres, but we seem to have fixed that problem: with current sources the database file *is* removed at transaction abort. Unfortunately the bug still persists :-(. Some quick tracing indicates that the reason the second CREATE TABLE fails is that there's still an entry for bug1 in the system cache: the search in RelnameFindRelid(), tuple = SearchSysCacheTuple(RELNAME, PointerGetDatum(relname), 0, 0, 0); is finding an entry! (If you quit the backend and start a new one, things go back to normal, since the new backend's relcache doesn't have the bogus entry.) So, apparently the real problem is that the relname cache is not cleaned of bogus entries inserted during a failed transaction. This strikes me as a fairly serious problem, especially if it applies to all the SysCache tables. That could lead to all kinds of erroneous behavior after an aborted transaction. I think this is a "must fix" issue... regards, tom lane
> I'm pretty sure I recall a discussion to the effect that CREATE TABLE > was failing in this case because pgsql/data/base/dbname/bug1 had already > been created and wasn't deleted at transaction abort. That may have > been the case in older versions of Postgres, but we seem to have fixed > that problem: with current sources the database file *is* removed at > transaction abort. Unfortunately the bug still persists :-(. > > Some quick tracing indicates that the reason the second CREATE TABLE > fails is that there's still an entry for bug1 in the system cache: the > search in RelnameFindRelid(), > tuple = SearchSysCacheTuple(RELNAME, > PointerGetDatum(relname), > 0, 0, 0); > is finding an entry! (If you quit the backend and start a new one, > things go back to normal, since the new backend's relcache doesn't > have the bogus entry.) > > So, apparently the real problem is that the relname cache is not cleaned > of bogus entries inserted during a failed transaction. This strikes me > as a fairly serious problem, especially if it applies to all the > SysCache tables. That could lead to all kinds of erroneous behavior > after an aborted transaction. I think this is a "must fix" issue... OK, let me give two ideas here. First, we could create a linked list of all cache additions that happen inside a transaction. If the transaction aborts, we can invalidate all the cache entries in the list. Second, we could just invalidate the entire cache on a transaction abort. Third, we could somehow invalidate the cache on transaction abort "only" if there was some system table modification in the transaction. The third one seems a little harder. Because the linked list could get large, we could do a linked list, and if it gets too large, do an full invalidation. Also, there may be a way to spin through the cache and remove all entries marked as part of the aborted transaction. Seems like this not something for 6.5. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > OK, let me give two ideas here. First, we could create a linked list of > all cache additions that happen inside a transaction. If the > transaction aborts, we can invalidate all the cache entries in the list. > Second, we could just invalidate the entire cache on a transaction > abort. Third, we could somehow invalidate the cache on transaction > abort "only" if there was some system table modification in the > transaction. The third one seems a little harder. Yes, the second one was the quick-and-dirty answer that occurred to me. That would favor apps that seldom incur errors (no extra overhead to keep track of cache changes), but would be bad news for those that often incur errors (unnecessary cache reloads). Is there room in the SysCaches for the transaction ID of the last transaction to modify each entry? That would provide an easy and inexpensive way of finding the ones to zap when the current xact is aborted, I would think: abort would just scan all the caches looking for entries with the current xact ID, and invalidate only those entries. The cost in the no-error case would just be storing an additional field whenever an entry is modified; seems cheap enough. However, if there are a lot of different places in the code that can create/ modify a cache entry, this could be a fair amount of work (and it'd carry the risk of missing some places...). > Seems like this not something for 6.5. I think we really ought to do *something*. I'd settle for the brute-force blow-away-all-the-caches answer for now, though. regards, tom lane
> Is there room in the SysCaches for the transaction ID of the last > transaction to modify each entry? That would provide an easy and > inexpensive way of finding the ones to zap when the current xact is > aborted, I would think: abort would just scan all the caches looking > for entries with the current xact ID, and invalidate only those entries. > The cost in the no-error case would just be storing an additional > field whenever an entry is modified; seems cheap enough. However, > if there are a lot of different places in the code that can create/ > modify a cache entry, this could be a fair amount of work (and it'd > carry the risk of missing some places...). Yes, I think we could put it in, though it may have to sequential scan to remove the entries. > > > Seems like this not something for 6.5. > > I think we really ought to do *something*. I'd settle for the > brute-force blow-away-all-the-caches answer for now, though. OK. I wonder if there are any problems with that. I do that in heap.c: /* * This is heavy-handed, but appears necessary bjm 1999/02/01 * SystemCacheRelationFlushed(relid)is not enough either. */ RelationForgetRelation(relid); ResetSystemCache(); as part of a temp table creation to remove any non-temp table entry in the cache. I could not find another way, and because the temp table creation doesn't cause problems, this could probably be used in transaction abort too. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
I have dealt with this bug: > test=> create table bug1 (f1 int28 primary key); > ERROR: Can't find a default operator class for type 22. > -- That's expected, since we have no index support for int28. But now: > test=> create table bug1 (f1 int28); > ERROR: Relation 'bug1' already exists It is not real clear to me why the existing syscache invalidation mechanism (CatalogCacheIdInvalidate() etc) fails to handle this case, but it doesn't. Perhaps it is because the underlying pg_class tuple never actually makes it to "confirmed good" status, so the SI code figures it can ignore it. I think the correct place to handle the problem is in SystemCacheRelationFlushed() in catcache.c. That routine is called by RelationFlushRelation() (which does the same task for the relcache). Unfortunately, it was only handling one aspect of the cache-update problem: it was cleaning out the cache associated with a system table when the *system table's* relcache entry was flushed. It didn't scan the cache contents to see if any of the records are associated with a non-system table that's being flushed. For the moment, I have made it call ResetSystemCache() --- that is, just flush *all* the cache entries. Scanning the individual entries to find the ones referencing the given relID would require knowing exactly which column to look in for each kind of system cache, which is more knowledge than catcache.c actually has. Eventually we could improve it. This means it is no longer necessary for heap.c or index.c to call ResetSystemCache() when handling a temp table --- their calls to RelationForgetRelation are sufficient. I have applied those changes as well. regards, tom lane
Tom Lane wrote: > > I think the correct place to handle the problem is in > SystemCacheRelationFlushed() in catcache.c. That routine is called by > RelationFlushRelation() (which does the same task for the relcache). > Unfortunately, it was only handling one aspect of the cache-update > problem: it was cleaning out the cache associated with a system table > when the *system table's* relcache entry was flushed. It didn't scan > the cache contents to see if any of the records are associated with a > non-system table that's being flushed. > > For the moment, I have made it call ResetSystemCache() --- that is, just > flush *all* the cache entries. Scanning the individual entries to find ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ Isn't is tooooo bad for performance ?! > the ones referencing the given relID would require knowing exactly which > column to look in for each kind of system cache, which is more knowledge > than catcache.c actually has. Eventually we could improve it. > > This means it is no longer necessary for heap.c or index.c to call > ResetSystemCache() when handling a temp table --- their calls to > RelationForgetRelation are sufficient. I have applied those changes > as well. Vadim
> For the moment, I have made it call ResetSystemCache() --- that is, just > flush *all* the cache entries. Scanning the individual entries to find > the ones referencing the given relID would require knowing exactly which > column to look in for each kind of system cache, which is more knowledge > than catcache.c actually has. Eventually we could improve it. > > This means it is no longer necessary for heap.c or index.c to call > ResetSystemCache() when handling a temp table --- their calls to > RelationForgetRelation are sufficient. I have applied those changes > as well. Thanks. I am a little confused. I thought you just flushed only on elog()/abort. How does the new code work. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian wrote: > > > For the moment, I have made it call ResetSystemCache() --- that is, just > > flush *all* the cache entries. Scanning the individual entries to find > > the ones referencing the given relID would require knowing exactly which > > column to look in for each kind of system cache, which is more knowledge > > than catcache.c actually has. Eventually we could improve it. > > > > This means it is no longer necessary for heap.c or index.c to call > > ResetSystemCache() when handling a temp table --- their calls to > > RelationForgetRelation are sufficient. I have applied those changes > > as well. > > Thanks. I am a little confused. I thought you just flushed only on ^^^^^^^^^^^^^^^^^^^^ > elog()/abort. How does the new code work. ^^^^^^^^^^^^ It seems as more right thing to do. Vadim
Vadim Mikheev <vadim@krs.ru> writes: > Bruce Momjian wrote: >> Thanks. I am a little confused. I thought you just flushed only on > ^^^^^^^^^^^^^^^^^^^^ >> elog()/abort. How does the new code work. > ^^^^^^^^^^^^ > It seems as more right thing to do. What I just committed does the cache flush whenever RelationFlushRelation is called --- in particular, elog/abort will cause it to happen if there are any created-in-current-transaction relations to be disposed of. But otherwise, no flush. The obvious question about that is "what about modifications to cacheable tuples that are not triggered by a relation creation?" I think that those cases are OK because they are covered by the shared-invalidation code. At least, we have no bug reports to prove the contrary... regards, tom lane
Vadim Mikheev <vadim@krs.ru> writes: > Tom Lane wrote: >> For the moment, I have made it call ResetSystemCache() --- that is, just >> flush *all* the cache entries. Scanning the individual entries to find > ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ > Isn't is tooooo bad for performance ?! It's not ideal, by any means. But I don't know how to fix it better right now, and we've got a release to ship ... regards, tom lane
Added to TODO: * elog() flushes cache, try invalidating just entries from current xact > Bruce Momjian <maillist@candle.pha.pa.us> writes: > > OK, let me give two ideas here. First, we could create a linked list of > > all cache additions that happen inside a transaction. If the > > transaction aborts, we can invalidate all the cache entries in the list. > > Second, we could just invalidate the entire cache on a transaction > > abort. Third, we could somehow invalidate the cache on transaction > > abort "only" if there was some system table modification in the > > transaction. The third one seems a little harder. > > Yes, the second one was the quick-and-dirty answer that occurred to me. > That would favor apps that seldom incur errors (no extra overhead to > keep track of cache changes), but would be bad news for those that > often incur errors (unnecessary cache reloads). > > Is there room in the SysCaches for the transaction ID of the last > transaction to modify each entry? That would provide an easy and > inexpensive way of finding the ones to zap when the current xact is > aborted, I would think: abort would just scan all the caches looking > for entries with the current xact ID, and invalidate only those entries. > The cost in the no-error case would just be storing an additional > field whenever an entry is modified; seems cheap enough. However, > if there are a lot of different places in the code that can create/ > modify a cache entry, this could be a fair amount of work (and it'd > carry the risk of missing some places...). > > > Seems like this not something for 6.5. > > I think we really ought to do *something*. I'd settle for the > brute-force blow-away-all-the-caches answer for now, though. > > regards, tom lane > > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026