Обсуждение: md.c is feeling much better now, thank you
Hiroshi spotted the fundamental problem we were having: RelationFlushRelation would happily flush a relation-cache entry that still had an open file entry at the md.c and fd.c levels. This resulted in a permanent leak of md and vfd file descriptors, which was most easily observable as a leakage of kernel file descriptors (though fd.c would eventually recycle same). smgrclose() in RelationFlushRelation fixes it. While I was poking at this I found a number of other problems in md.c having to do with multiple-segment relations. I believe they're all fixed now. I have been able to run initdb and the regression tests with a 64Kb segment size, which never worked before. I stuck my neck out to the extent of committing these changes into 6.5.* as well as current. I'd recommend a few more days of beta-testing before we release 6.5.2 ;-). Marc, can you make a new 6.5.2 candidate tarball? regards, tom lane
> -----Original Message----- > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane > Sent: Thursday, September 02, 1999 1:36 PM > To: pgsql-hackers@postgreSQL.org > Subject: [HACKERS] md.c is feeling much better now, thank you > > > Hiroshi spotted the fundamental problem we were having: > RelationFlushRelation would happily flush a relation-cache > entry that still had an open file entry at the md.c and fd.c > levels. This resulted in a permanent leak of md and vfd > file descriptors, which was most easily observable as a leakage > of kernel file descriptors (though fd.c would eventually > recycle same). smgrclose() in RelationFlushRelation fixes it. > Thanks. But I'm unhappy with your change for mdtruncate(). It's still dangerous to unlink unwanted segments in mdtruncte(). StartTransaction() and CommandCounterIncrement() trigger relation cache invalidation. Unfortunately those are insufficient to prevent backends from inserting into invalid relations. For exmaple If a backend is blocked by vacuum,it would insert into the target relation without relation cache invalidation after vacuum. It seems that other triggers are necessary around LockRelation(). Regards. Hiroshi Inoue Inoue@tpf.co.jp
I think that, based on this, the changes should be back'd out of v6.5.2 until further testing and analysis can be done. If we have to, we can do a v6.5.3 at a later date, if you want to get this in then... On Thu, 2 Sep 1999, Hiroshi Inoue wrote: > > > -----Original Message----- > > From: owner-pgsql-hackers@postgreSQL.org > > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane > > Sent: Thursday, September 02, 1999 1:36 PM > > To: pgsql-hackers@postgreSQL.org > > Subject: [HACKERS] md.c is feeling much better now, thank you > > > > > > Hiroshi spotted the fundamental problem we were having: > > RelationFlushRelation would happily flush a relation-cache > > entry that still had an open file entry at the md.c and fd.c > > levels. This resulted in a permanent leak of md and vfd > > file descriptors, which was most easily observable as a leakage > > of kernel file descriptors (though fd.c would eventually > > recycle same). smgrclose() in RelationFlushRelation fixes it. > > > > Thanks. > > But I'm unhappy with your change for mdtruncate(). > It's still dangerous to unlink unwanted segments in mdtruncte(). > > StartTransaction() and CommandCounterIncrement() trigger > relation cache invalidation. Unfortunately those are insufficient > to prevent backends from inserting into invalid relations. > > For exmaple > > If a backend is blocked by vacuum,it would insert into the target > relation without relation cache invalidation after vacuum. > > It seems that other triggers are necessary around LockRelation(). > > Regards. > > Hiroshi Inoue > Inoue@tpf.co.jp > > ************ > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > StartTransaction() and CommandCounterIncrement() trigger > relation cache invalidation. Unfortunately those are insufficient > to prevent backends from inserting into invalid relations. > > If a backend is blocked by vacuum,it would insert into the target > relation without relation cache invalidation after vacuum. If that's true, then we have problems far worse than whether mdtruncate has tried to unlink the segment. What you are saying is that another backend can attempt to do an insert/update on a relation being vacuumed, and have gotten as far as deciding which block it's going to insert the tuple into before it gets blocked by vacuum's AccessExclusiveLock. If so, that is *broken* and it's not mdtruncate's fault. What happens if vacuum fills up the chosen block with moved tuples? I did indeed wonder whether relation cache inval will do the right thing when another backend is already waiting to access the relation being invalidated --- but if it does not, we have to fix the inval mechanism. mdtruncate is the least of our worries. Vadim, any comments here? regards, tom lane
I wrote: > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: >> StartTransaction() and CommandCounterIncrement() trigger >> relation cache invalidation. Unfortunately those are insufficient >> to prevent backends from inserting into invalid relations. > If that's true, then we have problems far worse than whether mdtruncate > has tried to unlink the segment. I poked at this a little bit and found that for the VACUUM case, RelationFlushRelation in the other backend (the one waiting to insert/update) occurs here: #0 RelationFlushRelation (relationPtr=0x7b034824, onlyFlushReferenceCountZero=1 '\001') at relcache.c:1259 #1 0x158a60 in RelationIdInvalidateRelationCacheByRelationId ( relationId=272146) at relcache.c:1368 #2 0x156ba0 in CacheIdInvalidate (cacheId=1259, hashIndex=272146, pointer=0x0) at inval.c:323 #3 0x11673c in SIReadEntryData (segP=0x80da1000, backendId=-2133183664, invalFunction=0x4000c692 <SSNAN+9066>, resetFunction=0x4000c69a<SSNAN+9074>) at sinvaladt.c:649 #4 0x115e6c in InvalidateSharedInvalid (invalFunction=0x80da1000, resetFunction=0x4000c69a <SSNAN+9074>) at sinval.c:164 #5 0x156e54 in DiscardInvalid () at inval.c:518 #6 0x94354 in AtStart_Cache () at xact.c:548 #7 0x94314 in CommandCounterIncrement () at xact.c:514 #8 0x121218 in pg_exec_query_dest ( query_string=0x40079580 "insert into tenk1 values(19999,1234);", dest=Remote, aclOverride=0'\000') at postgres.c:726 which looks good ... except that the CommandCounterIncrement() occurs *after* the insert has executed. So we've got a problem here. In the DROP TABLE scenario, things seem to be broken independently of md.c. I tried this: BACKEND #1:begin;lock table tenk1; BACKEND #2:insert into tenk1 values(29999,1234);-- backend #2 hangs waiting for lock BACKEND #1:drop table tenk1;end; Backend #2 now suffers an assert failure: #6 0x15b8c4 in ExceptionalCondition ( conditionName=0x28898 "!((((PageHeader) ((PageHeader) pageHeader))->pd_upper ==0))", exceptionP=0x40009a58, detail=0x0, fileName=0x7ae4 "\003", lineNumber=136) at assert.c:72 #7 0x7c470 in RelationPutHeapTupleAtEnd (relation=0x400e8a40, tuple=0x401127a0) at hio.c:136 #8 0x7aa48 in heap_insert (relation=0x400e8a40, tup=0x401127a0) at heapam.c:1086 #9 0xb87e4 in ExecAppend (slot=0x4010a078, tupleid=0x200, estate=0x40109e98) at execMain.c:1190 #10 0xb8630 in ExecutePlan (estate=0x40109e98, plan=0x40109860, operation=CMD_INSERT, offsetTuples=0, numberTuples=0, direction=ForwardScanDirection, destfunc=0x40112730) at execMain.c:1064 #11 0xb7b6c in ExecutorRun (queryDesc=0x40109e80, estate=0x40109e98, feature=3, limoffset=0x0, limcount=0x0) at execMain.c:329 #12 0x12294c in ProcessQueryDesc (queryDesc=0x40109e80, limoffset=0x0, limcount=0x0) at pquery.c:315 #13 0x1229f4 in ProcessQuery (parsetree=0x400e42d0, plan=0x40109860, dest=Local) at pquery.c:358 #14 0x1211dc in pg_exec_query_dest ( query_string=0x40079580 "insert into tenk1 values(29999,1234);", dest=Remote, aclOverride=2'\002') at postgres.c:710 which hardly looks like it can be blamed on md.c either. My guess is that we ought to be checking for relcache invalidation immediately after gaining any lock on the relation. I don't know where that should be done, however. Perhaps we also ought to make RelationFlushRelation do smgrclose() unconditionally, regardless of the reference-count test. If the relation is still in use, that should be OK --- md.c will reopen the files automatically on the next access. BTW, it appears that DROP TABLE physically deletes the relation *immediately*, which means that aborting a transaction that contains a DROP TABLE does not work. But we knew that, didn't we? regards, tom lane
The Hermit Hacker <scrappy@hub.org> writes: > I think that, based on this, the changes should be back'd out of v6.5.2 > until further testing and analysis can be done. If we can't find a solution to the inval-too-late problem pronto, what we can do is comment out the FileUnlink call in mdtruncate. I don't see a need to back out the other fixes in md.c. But I think we ought to fix the underlying problem, not this symptom. What we now see is that after one backend has done something that requires invalidating a relcache entry, another backend is able to complete an entire query using the *old* relcache info before it notices the shared-inval signal. That's got to have bad consequences for more than just md.c. regards, tom lane
> BTW, it appears that DROP TABLE physically deletes the relation > *immediately*, which means that aborting a transaction that contains > a DROP TABLE does not work. But we knew that, didn't we? Yes, on TODO. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Tom Lane wrote: > > My guess is that we ought to be checking for relcache invalidation > immediately after gaining any lock on the relation. I don't know where > that should be done, however. Seems as GOOD solution! We could do inval check in LockRelation() just after LockAcquire(). Vadim
Vadim Mikheev <vadim@krs.ru> writes: > Tom Lane wrote: >> My guess is that we ought to be checking for relcache invalidation >> immediately after gaining any lock on the relation. I don't know where >> that should be done, however. > Seems as GOOD solution! > We could do inval check in LockRelation() just after LockAcquire(). I tried inserting code like this in LockRelation: --- 163,176 ---- tag.objId.blkno = InvalidBlockNumber; LockAcquire(LockTableId, &tag, lockmode); ! ! /* Check to make sure the relcache entry hasn't been invalidated ! * while we were waiting to lock it. ! */ ! DiscardInvalid(); ! if (relation != RelationIdGetRelation(tag.relId)) ! elog(ERROR, "LockRelation: relation %u deleted while waiting to lock it", ! tag.relId); } /* and moving the smgrclose() call in RelationFlushRelation so that it is called unconditionally. Doesn't work though: the ALTER TABLE tests in regress/misc fail. Apparently, this change causes the sinval report from update of the relation's pg_class heap entry to be read while the relation has refcnt>0, so RelationFlushRelation doesn't flush it, so we have an obsolete relation cache entry. Ooops. Did you have a different approach in mind? Or do we need to redesign RelationFlushRelation so that it rebuilds the relcache entry, rather than dropping it, if refcnt > 0? regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Thursday, September 02, 1999 11:44 PM > To: The Hermit Hacker > Cc: Hiroshi Inoue; pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] md.c is feeling much better now, thank you > > > The Hermit Hacker <scrappy@hub.org> writes: > > I think that, based on this, the changes should be back'd out of v6.5.2 > > until further testing and analysis can be done. > > If we can't find a solution to the inval-too-late problem pronto, > what we can do is comment out the FileUnlink call in mdtruncate. > I don't see a need to back out the other fixes in md.c. > I think so too. Regards. Hiroshi Inoue Inoue@tpf.co.jp
> -----Original Message----- > From: Bruce Momjian [mailto:maillist@candle.pha.pa.us] > Sent: Friday, September 03, 1999 12:22 AM > To: Tom Lane > Cc: Hiroshi Inoue; pgsql-hackers@postgreSQL.org > Subject: Re: [HACKERS] md.c is feeling much better now, thank you > > > > BTW, it appears that DROP TABLE physically deletes the relation > > *immediately*, which means that aborting a transaction that contains > > a DROP TABLE does not work. But we knew that, didn't we? > > Yes, on TODO. > Hmm,Data Define commands are unrecoverable in many DBMS-s. Is it necessary to allow PostgreSQL to execute Data Define commands inside transactions ? If so,is it possible ? For example,should the following be possible ? [A table t exists.] begin; drop table t; create table t (dt int4); insert into t values (1); drop table t; abort; Regards. Hiroshi Inoue Inoue@tpf.co.jp
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Friday, September 03, 1999 8:03 AM > To: Vadim Mikheev > Cc: Hiroshi Inoue; pgsql-hackers@postgresql.org > Subject: Re: [HACKERS] md.c is feeling much better now, thank you > > > Vadim Mikheev <vadim@krs.ru> writes: > > Tom Lane wrote: > >> My guess is that we ought to be checking for relcache invalidation > >> immediately after gaining any lock on the relation. I don't know where > >> that should be done, however. > > > Seems as GOOD solution! > > We could do inval check in LockRelation() just after LockAcquire(). > > I tried inserting code like this in LockRelation: > > --- 163,176 ---- > tag.objId.blkno = InvalidBlockNumber; > > LockAcquire(LockTableId, &tag, lockmode); > ! > ! /* Check to make sure the relcache entry hasn't been invalidated > ! * while we were waiting to lock it. > ! */ > ! DiscardInvalid(); > ! if (relation != RelationIdGetRelation(tag.relId)) > ! elog(ERROR, "LockRelation: relation %u deleted > while waiting to > lock it", > ! tag.relId); > } > > /* > > and moving the smgrclose() call in RelationFlushRelation so that it is > called unconditionally. > > Doesn't work though: the ALTER TABLE tests in regress/misc fail. > Apparently, this change causes the sinval report from update of the > relation's pg_class heap entry to be read while the relation has refcnt>0, > so RelationFlushRelation doesn't flush it, so we have an obsolete > relation cache entry. Ooops. > How about inserting "RelationDecrementReferenceCount(relation);" between LockAcquire() and DiscardInvalid() ? And isn't it preferable that LockRelation() returns the relation which RelationIdGetRelation(tag.relId) returns ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: >> Doesn't work though: the ALTER TABLE tests in regress/misc fail. >> Apparently, this change causes the sinval report from update of the >> relation's pg_class heap entry to be read while the relation has refcnt>0, >> so RelationFlushRelation doesn't flush it, so we have an obsolete >> relation cache entry. Ooops. > How about inserting "RelationDecrementReferenceCount(relation);" > between LockAcquire() and DiscardInvalid() ? Would only help if the relation had been opened exactly once before the lock; not if its refcnt is greater than 1. Worse, it would only help for the particular relation being locked, but we might receive an sinval report for a different already-locked relation. > And isn't it preferable that LockRelation() returns the relation > which RelationIdGetRelation(tag.relId) returns ? No, because that would only inform the immediate caller of LockRelation of a change. This is insufficient for both the reasons mentioned above. For that matter, my first-cut patch is insufficient, because it won't detect the case that a relcache entry other than the one currently being locked has been flushed. I think what we need to do is revise RelationFlushRelation so that it (a) deletes the relcache entry if its refcnt is zero; otherwise (b) leaves the relcache entry in existence, but recomputes all its contents and subsidiary data structures, and (c) if, while trying to recompute the contents, it finds that the relation has actually been deleted, then it's elog(ERROR) time. In this way, existing pointers to the relcache entry --- which might belong to routines very far down the call stack --- remain valid, or else we elog() if they aren't. We might still have a few bugs with routines that copy data out of the relcache entry before locking it, but I don't recall having seen any code like that. Most of the code seems to do heap_open immediately followed by LockRelation, and that should be safe. I'd like to hear Vadim's opinion before wading in, but this seems like it ought to be workable. regards, tom lane
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > Hmm,Data Define commands are unrecoverable in many DBMS-s. True. > For example,should the following be possible ? > [A table t exists.] > begin; > drop table t; > create table t (dt int4); > insert into t values (1); > drop table t; > abort; I don't mind if that is rejected --- but it ought to be rejected cleanly, rather than leaving a broken table behind. IIRC, it is fairly easy to tell from the xact.c state whether we are inside a BEGIN block. Maybe DROP TABLE and anything else that has nonreversible side effects ought to simply elog(ERROR) if called inside a BEGIN block. We'd need to be a little careful though, since I think DROP TABLE on a temp table created in the same transaction ought to work. regards, tom lane
I have just committed changes that address the problem of relcache entries not being updated promptly after another backend issues a shared-invalidation report. LockRelation() now checks for sinval reports just after acquiring a lock, and the relcache entry will be updated if necessary --- or, if the relation has actually disappeared entirely, an elog(ERROR) will occur. As a side effect of the relcache update, the underlying md.c/fd.c files will be closed, and then reopened if necessary. This should solve our concerns about vacuum.c not being able to truncate relations safely. There is still some potential for misbehavior as a result of the fact that the parser looks at relcache entries without bothering to obtain any kind of lock on the relation. For example: -- In backend #1: regression=> create table z1 (f1 int4); CREATE regression=> select * from z1; f1 -- (0 rows) regression=> begin; BEGIN -- In backend #2: regression=> alter table z1 add column f2 int4; ADD regression=> -- In backend #1: regression=> select * from z1; f1 -- (0 rows) -- parser uses un-updated relcache entry and sees only one column in z1. -- However, the relcache *will* get updated as soon as we either lock a -- table or do the CommandCounterIncrement() at end of query, so a second -- try sees the new info: regression=> select * from z1; f1|f2 --+-- (0 rows) regression=> end; END The window for problems is pretty small: you have to be within a transaction (otherwise the StartTransaction will notice the sinval report), and your very first query after the other backend does ALTER TABLE has to reference the altered table. So I'm not sure this is worth worrying about. But perhaps the parser ought to obtain the weakest possible lock on each table referenced in a query before it does any looking at the attributes of the table. Comments? I believe these changes ought to be committed into REL6_5 as well, but it might be wise to test them a little more in current first. Or would people find it easier to test them against 6.5 databases? In that case maybe I should just commit them now... regards, tom lane
> The window for problems is pretty small: you have to be within a > transaction (otherwise the StartTransaction will notice the sinval > report), and your very first query after the other backend does > ALTER TABLE has to reference the altered table. So I'm not sure > this is worth worrying about. But perhaps the parser ought to obtain > the weakest possible lock on each table referenced in a query before > it does any looking at the attributes of the table. Comments? Good question. How do other db's handle such a case? I hesitate to do locking for parser lookups. Seems live more lock overhead. > I believe these changes ought to be committed into REL6_5 as well, > but it might be wise to test them a little more in current first. > Or would people find it easier to test them against 6.5 databases? > In that case maybe I should just commit them now... Seems it should be 6.6 only. Too obscure a bug. Could introduce a bug. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
> I have just committed changes that address the problem of relcache > entries not being updated promptly after another backend issues > a shared-invalidation report. LockRelation() now checks for sinval > reports just after acquiring a lock, and the relcache entry will be > updated if necessary --- or, if the relation has actually disappeared > entirely, an elog(ERROR) will occur. > > As a side effect of the relcache update, the underlying md.c/fd.c files > will be closed, and then reopened if necessary. This should solve our > concerns about vacuum.c not being able to truncate relations safely. > > There is still some potential for misbehavior as a result of the fact > that the parser looks at relcache entries without bothering to obtain > any kind of lock on the relation. For example: > > -- In backend #1: > regression=> create table z1 (f1 int4); > CREATE > regression=> select * from z1; > f1 > -- > (0 rows) > > regression=> begin; > BEGIN > > -- In backend #2: > regression=> alter table z1 add column f2 int4; > ADD > regression=> > > -- In backend #1: > regression=> select * from z1; > f1 > -- > (0 rows) > > -- parser uses un-updated relcache entry and sees only one column in z1. > -- However, the relcache *will* get updated as soon as we either lock a > -- table or do the CommandCounterIncrement() at end of query, so a second > -- try sees the new info: > regression=> select * from z1; > f1|f2 > --+-- > (0 rows) > > regression=> end; > END > > The window for problems is pretty small: you have to be within a > transaction (otherwise the StartTransaction will notice the sinval > report), and your very first query after the other backend does > ALTER TABLE has to reference the altered table. So I'm not sure > this is worth worrying about. But perhaps the parser ought to obtain > the weakest possible lock on each table referenced in a query before > it does any looking at the attributes of the table. Comments? Ok. I will give another example that seems more serious. test=> begin; BEGIN test=> create table t1(i int); CREATE -- a table file named "t1" is created. test=> aaa; ERROR: parser: parse error at or near "aaa" -- transaction is aborted and the table file t1 is unlinked. test=> select * from t1; -- but parser doesn't know t1 does not exist any more. -- it tries to open t1 using mdopen(). (see including backtrace) -- mdopen() tries to open t1 and fails. In this case mdopen() -- creates t1! NOTICE: (transaction aborted): queries ignored until END *ABORT STATE* test=> end; END test=> create table t1(i int); ERROR: cannot create t1 -- since relation file t1 already exists. test=> EOF [t-ishii@ext16 src]$ !! psql test Welcome to the POSTGRESQL interactive sql monitor: Please read the file COPYRIGHT for copyright terms of POSTGRESQL [PostgreSQL 6.6.0 on powerpc-unknown-linux-gnu, compiled by gcc egcs-2.90.25 980302 (egcs-1.0.2 prerelease)] type \? for help on slash commands type \q to quit type \g or terminate with semicolon to execute queryYou are currentlyconnected to the database: test test=> select * from t1; ERROR: t1: Table does not exist. test=> create table t1(i int); ERROR: cannot create t1 -- again, since relation file t1 already exists. -- user would never be able to create t1! I think the long range solution would be let parser obtain certain locks as Tom said. Until that I propose following solution. It looks simple, safe and would be neccessary anyway (I don't know why that check had not been implemented). See included patches. ---------------------------- backtrace ----------------------------- #0 mdopen (reln=0x1a9af18) at md.c:279 #1 0x18cb784 in smgropen (which=425, reln=0xbfffdef0) at smgr.c:185 #2 0x18cb784 in smgropen (which=0, reln=0x1a9af18) at smgr.c:185 #3 0x1904c1c in RelationBuildDesc () #4 0x1905360 in RelationNameGetRelation () #5 0x18259a4 in heap_openr () #6 0x187f59c in addRangeTableEntry () #7 0x1879cb0 in transformTableEntry () #8 0x1879d40 in parseFromClause () #9 0x1879a90 in makeRangeTable () #10 0x1871fd8 in transformSelectStmt () #11 0x1870d14 in transformStmt () #12 0x18709e0 in parse_analyze () #13 0x18792d4 in parser () #14 0x18cd158 in pg_parse_and_plan () #15 0x18cd5c0 in pg_exec_query_dest () #16 0x18cd524 in pg_exec_query () #17 0x18ce9ac in PostgresMain () #18 0x18a5994 in DoBackend () #19 0x18a53c8 in BackendStartup () #20 0x18a46d0 in ServerLoop () #21 0x18a4108 in PostmasterMain () #22 0x1870928 in main () ---------------------------- backtrace ----------------------------- ---------------------------- patches ----------------------------- *** md.c~ Sun Sep 5 08:41:28 1999 --- md.c Sun Sep 5 11:01:57 1999 *************** *** 286,296 **** --- 286,303 ---- /* this should only happen during bootstrap processing */ if (fd < 0) + { + if (!IsBootstrapProcessingMode()) + { + elog(ERROR,"Couldn't open %s\n", path); + return -1; + } #ifndef __CYGWIN32__ fd = FileNameOpenFile(path, O_RDWR | O_CREAT | O_EXCL, 0600); #else fd =FileNameOpenFile(path, O_RDWR | O_CREAT | O_EXCL | O_BINARY, 0600); #endif + } vfd = _fdvec_alloc(); if (vfd < 0)
Tatsuo Ishii <t-ishii@sra.co.jp> writes: > Ok. I will give another example that seems more serious. > test=> aaa; > ERROR: parser: parse error at or near "aaa" > -- transaction is aborted and the table file t1 is unlinked. > test=> select * from t1; > -- but parser doesn't know t1 does not exist any more. > -- it tries to open t1 using mdopen(). (see including backtrace) > -- mdopen() tries to open t1 and fails. In this case mdopen() > -- creates t1! > NOTICE: (transaction aborted): queries ignored until END > *ABORT STATE* Hmm. It seems a more straightforward solution would be to alter pg_parse_and_plan so that the parser isn't even called if we have already failed the current transaction; that is, the "queries ignored" test should occur sooner. I'm rather surprised to realize that we do run the parser in this situation... > I think the long range solution would be let parser obtain certain > locks as Tom said. That would not solve this particular problem, since the lock manager wouldn't know any better than the parser that the table doesn't really exist any more. > Until that I propose following solution. It looks > simple, safe and would be neccessary anyway (I don't know why that > check had not been implemented). See included patches. This looks like it might be a good change, but I'm not quite as sure as you are that it won't have any bad effects. Have you tested it? regards, tom lane
> Tatsuo Ishii <t-ishii@sra.co.jp> writes: > > Ok. I will give another example that seems more serious. > > test=> aaa; > > ERROR: parser: parse error at or near "aaa" > > -- transaction is aborted and the table file t1 is unlinked. > > test=> select * from t1; > > -- but parser doesn't know t1 does not exist any more. > > -- it tries to open t1 using mdopen(). (see including backtrace) > > -- mdopen() tries to open t1 and fails. In this case mdopen() > > -- creates t1! > > NOTICE: (transaction aborted): queries ignored until END > > *ABORT STATE* > > Hmm. It seems a more straightforward solution would be to alter > pg_parse_and_plan so that the parser isn't even called if we have > already failed the current transaction; that is, the "queries ignored" > test should occur sooner. I'm rather surprised to realize that > we do run the parser in this situation... No. we have to run the parser so that we could accept "end". > > I think the long range solution would be let parser obtain certain > > locks as Tom said. > > That would not solve this particular problem, since the lock manager > wouldn't know any better than the parser that the table doesn't really > exist any more. I see. > > Until that I propose following solution. It looks > > simple, safe and would be neccessary anyway (I don't know why that > > check had not been implemented). See included patches. > > This looks like it might be a good change, but I'm not quite as sure > as you are that it won't have any bad effects. Have you tested it? At least initdb and the regression test runs fine for me... --- Tatsuo Ishii
Tatsuo Ishii <t-ishii@sra.co.jp> writes: >> Hmm. It seems a more straightforward solution would be to alter >> pg_parse_and_plan so that the parser isn't even called if we have >> already failed the current transaction; that is, the "queries ignored" >> test should occur sooner. I'm rather surprised to realize that >> we do run the parser in this situation... > No. we have to run the parser so that we could accept "end". Ah, very good point. I stand corrected. >>>> Until that I propose following solution. It looks >>>> simple, safe and would be neccessary anyway (I don't know why that >>>> check had not been implemented). See included patches. >> >> This looks like it might be a good change, but I'm not quite as sure >> as you are that it won't have any bad effects. Have you tested it? > > At least initdb and the regression test runs fine for me... Same here. I have committed it into current, but not REL6_5. regards, tom lane
Any resolution on this? > Tatsuo Ishii <t-ishii@sra.co.jp> writes: > > Ok. I will give another example that seems more serious. > > test=> aaa; > > ERROR: parser: parse error at or near "aaa" > > -- transaction is aborted and the table file t1 is unlinked. > > test=> select * from t1; > > -- but parser doesn't know t1 does not exist any more. > > -- it tries to open t1 using mdopen(). (see including backtrace) > > -- mdopen() tries to open t1 and fails. In this case mdopen() > > -- creates t1! > > NOTICE: (transaction aborted): queries ignored until END > > *ABORT STATE* > > Hmm. It seems a more straightforward solution would be to alter > pg_parse_and_plan so that the parser isn't even called if we have > already failed the current transaction; that is, the "queries ignored" > test should occur sooner. I'm rather surprised to realize that > we do run the parser in this situation... > > > I think the long range solution would be let parser obtain certain > > locks as Tom said. > > That would not solve this particular problem, since the lock manager > wouldn't know any better than the parser that the table doesn't really > exist any more. > > > Until that I propose following solution. It looks > > simple, safe and would be neccessary anyway (I don't know why that > > check had not been implemented). See included patches. > > This looks like it might be a good change, but I'm not quite as sure > as you are that it won't have any bad effects. Have you tested it? > > regards, tom lane > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Bruce Momjian <maillist@candle.pha.pa.us> writes: > Any resolution on this? I believe I committed Tatsuo's change. There is still the issue that the parser doesn't obtain any lock on a relation during parsing, so it's possible to use a slightly stale relcache entry for parsing purposes. (It can't be *really* stale, since presumably we just read the SI queue during StartTransaction, but still it could be wrong if someone commits an ALTER TABLE while we are parsing our query.) After thinking about it for a while, I am not sure if we should try to fix this or not. The obvious fix would be to have the parser grab AccessShareLock on any relation as soon as it is seen in the query, and then keep this lock till end of transaction; that would guarantee that no one else could alter the table structure and thereby invalidate the parser's information about the relation. But that does not work because it guarantees deadlock if two processes both try to get AccessExclusiveLock, as in plain old "BEGIN; LOCK TABLE table; ...". They'll both be holding AccessShareLock so neither can get exclusive. There might be another way, but we need to be careful not to choose a cure that's worse than the disease. regards, tom lane >> Tatsuo Ishii <t-ishii@sra.co.jp> writes: >>>> Ok. I will give another example that seems more serious. >>>> test=> aaa; >>>> ERROR: parser: parse error at or near "aaa" >>>> -- transaction is aborted and the table file t1 is unlinked. >>>> test=> select * from t1; >>>> -- but parser doesn't know t1 does not exist any more. >>>> -- it tries to open t1 using mdopen(). (see including backtrace) >>>> -- mdopen() tries to open t1 and fails. In this case mdopen() >>>> -- creates t1! >>>> NOTICE: (transaction aborted): queries ignored until END >>>> *ABORT STATE* >> >> Hmm. It seems a more straightforward solution would be to alter >> pg_parse_and_plan so that the parser isn't even called if we have >> already failed the current transaction; that is, the "queries ignored" >> test should occur sooner. I'm rather surprised to realize that >> we do run the parser in this situation... >> >>>> I think the long range solution would be let parser obtain certain >>>> locks as Tom said. >> >> That would not solve this particular problem, since the lock manager >> wouldn't know any better than the parser that the table doesn't really >> exist any more. >> >>>> Until that I propose following solution. It looks >>>> simple, safe and would be neccessary anyway (I don't know why that >>>> check had not been implemented). See included patches. >> >> This looks like it might be a good change, but I'm not quite as sure >> as you are that it won't have any bad effects. Have you tested it?