Обсуждение: md.c is feeling much better now, thank you

Поиск
Список
Период
Сортировка

md.c is feeling much better now, thank you

От
Tom Lane
Дата:
Hiroshi spotted the fundamental problem we were having:
RelationFlushRelation would happily flush a relation-cache
entry that still had an open file entry at the md.c and fd.c
levels.  This resulted in a permanent leak of md and vfd
file descriptors, which was most easily observable as a leakage
of kernel file descriptors (though fd.c would eventually
recycle same).  smgrclose() in RelationFlushRelation fixes it.

While I was poking at this I found a number of other problems
in md.c having to do with multiple-segment relations.  I believe
they're all fixed now.  I have been able to run initdb and the
regression tests with a 64Kb segment size, which never worked
before.

I stuck my neck out to the extent of committing these changes
into 6.5.* as well as current.  I'd recommend a few more days
of beta-testing before we release 6.5.2 ;-).  Marc, can you
make a new 6.5.2 candidate tarball?
        regards, tom lane


RE: [HACKERS] md.c is feeling much better now, thank you

От
"Hiroshi Inoue"
Дата:
> -----Original Message-----
> From: owner-pgsql-hackers@postgreSQL.org
> [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane
> Sent: Thursday, September 02, 1999 1:36 PM
> To: pgsql-hackers@postgreSQL.org
> Subject: [HACKERS] md.c is feeling much better now, thank you
> 
> 
> Hiroshi spotted the fundamental problem we were having:
> RelationFlushRelation would happily flush a relation-cache
> entry that still had an open file entry at the md.c and fd.c
> levels.  This resulted in a permanent leak of md and vfd
> file descriptors, which was most easily observable as a leakage
> of kernel file descriptors (though fd.c would eventually
> recycle same).  smgrclose() in RelationFlushRelation fixes it.
>

Thanks.

But I'm unhappy with your change for mdtruncate().
It's still dangerous to unlink unwanted segments in mdtruncte().

StartTransaction() and CommandCounterIncrement() trigger
relation cache invalidation. Unfortunately those are insufficient 
to prevent backends from inserting into invalid relations.

For exmaple
If a backend is blocked by vacuum,it would insert into the target 
relation without relation cache invalidation after vacuum.

It seems that other triggers are necessary around LockRelation().  

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


RE: [HACKERS] md.c is feeling much better now, thank you

От
The Hermit Hacker
Дата:
I think that, based on this, the changes should be back'd out of v6.5.2
until further testing and analysis can be done.  If we have to, we can do
a v6.5.3 at a later date, if you want to get this in then...

On Thu, 2 Sep 1999, Hiroshi Inoue wrote:

> 
> > -----Original Message-----
> > From: owner-pgsql-hackers@postgreSQL.org
> > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane
> > Sent: Thursday, September 02, 1999 1:36 PM
> > To: pgsql-hackers@postgreSQL.org
> > Subject: [HACKERS] md.c is feeling much better now, thank you
> > 
> > 
> > Hiroshi spotted the fundamental problem we were having:
> > RelationFlushRelation would happily flush a relation-cache
> > entry that still had an open file entry at the md.c and fd.c
> > levels.  This resulted in a permanent leak of md and vfd
> > file descriptors, which was most easily observable as a leakage
> > of kernel file descriptors (though fd.c would eventually
> > recycle same).  smgrclose() in RelationFlushRelation fixes it.
> >
> 
> Thanks.
> 
> But I'm unhappy with your change for mdtruncate().
> It's still dangerous to unlink unwanted segments in mdtruncte().
> 
> StartTransaction() and CommandCounterIncrement() trigger
> relation cache invalidation. Unfortunately those are insufficient 
> to prevent backends from inserting into invalid relations.
> 
> For exmaple
>  
> If a backend is blocked by vacuum,it would insert into the target 
> relation without relation cache invalidation after vacuum.
> 
> It seems that other triggers are necessary around LockRelation().  
> 
> Regards.
> 
> Hiroshi Inoue
> Inoue@tpf.co.jp
> 
> ************
> 

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org 
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org 



Re: [HACKERS] md.c is feeling much better now, thank you

От
Tom Lane
Дата:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> StartTransaction() and CommandCounterIncrement() trigger
> relation cache invalidation. Unfortunately those are insufficient 
> to prevent backends from inserting into invalid relations.
>
> If a backend is blocked by vacuum,it would insert into the target 
> relation without relation cache invalidation after vacuum.

If that's true, then we have problems far worse than whether mdtruncate
has tried to unlink the segment.  What you are saying is that another
backend can attempt to do an insert/update on a relation being vacuumed,
and have gotten as far as deciding which block it's going to insert the
tuple into before it gets blocked by vacuum's AccessExclusiveLock.
If so, that is *broken* and it's not mdtruncate's fault.  What happens
if vacuum fills up the chosen block with moved tuples?

I did indeed wonder whether relation cache inval will do the right
thing when another backend is already waiting to access the relation
being invalidated --- but if it does not, we have to fix the inval
mechanism.  mdtruncate is the least of our worries.

Vadim, any comments here?
        regards, tom lane


Re: [HACKERS] md.c is feeling much better now, thank you

От
Tom Lane
Дата:
I wrote:
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>> StartTransaction() and CommandCounterIncrement() trigger
>> relation cache invalidation. Unfortunately those are insufficient 
>> to prevent backends from inserting into invalid relations.

> If that's true, then we have problems far worse than whether mdtruncate
> has tried to unlink the segment.

I poked at this a little bit and found that for the VACUUM case,
RelationFlushRelation in the other backend (the one waiting to
insert/update) occurs here:

#0  RelationFlushRelation (relationPtr=0x7b034824,   onlyFlushReferenceCountZero=1 '\001') at relcache.c:1259
#1  0x158a60 in RelationIdInvalidateRelationCacheByRelationId (   relationId=272146) at relcache.c:1368
#2  0x156ba0 in CacheIdInvalidate (cacheId=1259, hashIndex=272146, pointer=0x0)   at inval.c:323
#3  0x11673c in SIReadEntryData (segP=0x80da1000, backendId=-2133183664,   invalFunction=0x4000c692 <SSNAN+9066>,
resetFunction=0x4000c69a<SSNAN+9074>) at sinvaladt.c:649
 
#4  0x115e6c in InvalidateSharedInvalid (invalFunction=0x80da1000,   resetFunction=0x4000c69a <SSNAN+9074>) at
sinval.c:164
#5  0x156e54 in DiscardInvalid () at inval.c:518
#6  0x94354 in AtStart_Cache () at xact.c:548
#7  0x94314 in CommandCounterIncrement () at xact.c:514
#8  0x121218 in pg_exec_query_dest (   query_string=0x40079580 "insert into tenk1 values(19999,1234);",   dest=Remote,
aclOverride=0'\000') at postgres.c:726
 

which looks good ... except that the CommandCounterIncrement()
occurs *after* the insert has executed.  So we've got a problem
here.

In the DROP TABLE scenario, things seem to be broken independently
of md.c.  I tried this:

BACKEND #1:begin;lock table tenk1;
BACKEND #2:insert into tenk1 values(29999,1234);-- backend #2 hangs waiting for lock
BACKEND #1:drop table tenk1;end;

Backend #2 now suffers an assert failure:

#6  0x15b8c4 in ExceptionalCondition (   conditionName=0x28898 "!((((PageHeader) ((PageHeader) pageHeader))->pd_upper
==0))", exceptionP=0x40009a58, detail=0x0, fileName=0x7ae4 "\003",   lineNumber=136) at assert.c:72
 
#7  0x7c470 in RelationPutHeapTupleAtEnd (relation=0x400e8a40,   tuple=0x401127a0) at hio.c:136
#8  0x7aa48 in heap_insert (relation=0x400e8a40, tup=0x401127a0)   at heapam.c:1086
#9  0xb87e4 in ExecAppend (slot=0x4010a078, tupleid=0x200, estate=0x40109e98)   at execMain.c:1190
#10 0xb8630 in ExecutePlan (estate=0x40109e98, plan=0x40109860,   operation=CMD_INSERT, offsetTuples=0, numberTuples=0,
 direction=ForwardScanDirection, destfunc=0x40112730) at execMain.c:1064
 
#11 0xb7b6c in ExecutorRun (queryDesc=0x40109e80, estate=0x40109e98,   feature=3, limoffset=0x0, limcount=0x0) at
execMain.c:329
#12 0x12294c in ProcessQueryDesc (queryDesc=0x40109e80, limoffset=0x0,   limcount=0x0) at pquery.c:315
#13 0x1229f4 in ProcessQuery (parsetree=0x400e42d0, plan=0x40109860,   dest=Local) at pquery.c:358
#14 0x1211dc in pg_exec_query_dest (   query_string=0x40079580 "insert into tenk1 values(29999,1234);",   dest=Remote,
aclOverride=2'\002') at postgres.c:710
 

which hardly looks like it can be blamed on md.c either.

My guess is that we ought to be checking for relcache invalidation
immediately after gaining any lock on the relation.  I don't know where
that should be done, however.

Perhaps we also ought to make RelationFlushRelation do smgrclose()
unconditionally, regardless of the reference-count test.  If the
relation is still in use, that should be OK --- md.c will reopen
the files automatically on the next access.


BTW, it appears that DROP TABLE physically deletes the relation
*immediately*, which means that aborting a transaction that contains
a DROP TABLE does not work.  But we knew that, didn't we?
        regards, tom lane


Re: [HACKERS] md.c is feeling much better now, thank you

От
Tom Lane
Дата:
The Hermit Hacker <scrappy@hub.org> writes:
> I think that, based on this, the changes should be back'd out of v6.5.2
> until further testing and analysis can be done.

If we can't find a solution to the inval-too-late problem pronto,
what we can do is comment out the FileUnlink call in mdtruncate.
I don't see a need to back out the other fixes in md.c.

But I think we ought to fix the underlying problem, not this symptom.
What we now see is that after one backend has done something that
requires invalidating a relcache entry, another backend is able
to complete an entire query using the *old* relcache info before it
notices the shared-inval signal.  That's got to have bad consequences
for more than just md.c.
        regards, tom lane


Re: [HACKERS] md.c is feeling much better now, thank you

От
Bruce Momjian
Дата:
> BTW, it appears that DROP TABLE physically deletes the relation
> *immediately*, which means that aborting a transaction that contains
> a DROP TABLE does not work.  But we knew that, didn't we?

Yes, on TODO.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] md.c is feeling much better now, thank you

От
Vadim Mikheev
Дата:
Tom Lane wrote:
> 
> My guess is that we ought to be checking for relcache invalidation
> immediately after gaining any lock on the relation.  I don't know where
> that should be done, however.

Seems as GOOD solution!
We could do inval check in LockRelation() just after LockAcquire().

Vadim


Re: [HACKERS] md.c is feeling much better now, thank you

От
Tom Lane
Дата:
Vadim Mikheev <vadim@krs.ru> writes:
> Tom Lane wrote:
>> My guess is that we ought to be checking for relcache invalidation
>> immediately after gaining any lock on the relation.  I don't know where
>> that should be done, however.

> Seems as GOOD solution!
> We could do inval check in LockRelation() just after LockAcquire().

I tried inserting code like this in LockRelation:

--- 163,176 ----       tag.objId.blkno = InvalidBlockNumber;
       LockAcquire(LockTableId, &tag, lockmode);
!
!       /* Check to make sure the relcache entry hasn't been invalidated
!        * while we were waiting to lock it.
!        */
!       DiscardInvalid();
!       if (relation != RelationIdGetRelation(tag.relId))
!               elog(ERROR, "LockRelation: relation %u deleted while waiting to
lock it",
!                        tag.relId); }
 /*

and moving the smgrclose() call in RelationFlushRelation so that it is
called unconditionally.

Doesn't work though: the ALTER TABLE tests in regress/misc fail.
Apparently, this change causes the sinval report from update of the
relation's pg_class heap entry to be read while the relation has refcnt>0,
so RelationFlushRelation doesn't flush it, so we have an obsolete
relation cache entry.  Ooops.

Did you have a different approach in mind?  Or do we need to redesign
RelationFlushRelation so that it rebuilds the relcache entry, rather
than dropping it, if refcnt > 0?
        regards, tom lane


RE: [HACKERS] md.c is feeling much better now, thank you

От
"Hiroshi Inoue"
Дата:

> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Thursday, September 02, 1999 11:44 PM
> To: The Hermit Hacker
> Cc: Hiroshi Inoue; pgsql-hackers@postgreSQL.org
> Subject: Re: [HACKERS] md.c is feeling much better now, thank you 
> 
> 
> The Hermit Hacker <scrappy@hub.org> writes:
> > I think that, based on this, the changes should be back'd out of v6.5.2
> > until further testing and analysis can be done.
> 
> If we can't find a solution to the inval-too-late problem pronto,
> what we can do is comment out the FileUnlink call in mdtruncate.
> I don't see a need to back out the other fixes in md.c.
>

I think so too.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp 


RE: [HACKERS] md.c is feeling much better now, thank you

От
"Hiroshi Inoue"
Дата:
> -----Original Message-----
> From: Bruce Momjian [mailto:maillist@candle.pha.pa.us]
> Sent: Friday, September 03, 1999 12:22 AM
> To: Tom Lane
> Cc: Hiroshi Inoue; pgsql-hackers@postgreSQL.org
> Subject: Re: [HACKERS] md.c is feeling much better now, thank you
> 
> 
> > BTW, it appears that DROP TABLE physically deletes the relation
> > *immediately*, which means that aborting a transaction that contains
> > a DROP TABLE does not work.  But we knew that, didn't we?
> 
> Yes, on TODO.
>

Hmm,Data Define commands are unrecoverable in many DBMS-s. 
Is it necessary to allow PostgreSQL to execute Data Define
commands inside transactions ?
If so,is it possible ? 

For example,should the following be possible ?

[A table t exists.]

begin;
drop table t;
create table t (dt int4);
insert into t values (1);
drop table t;
abort;

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp 



RE: [HACKERS] md.c is feeling much better now, thank you

От
"Hiroshi Inoue"
Дата:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Friday, September 03, 1999 8:03 AM
> To: Vadim Mikheev
> Cc: Hiroshi Inoue; pgsql-hackers@postgresql.org
> Subject: Re: [HACKERS] md.c is feeling much better now, thank you
>
>
> Vadim Mikheev <vadim@krs.ru> writes:
> > Tom Lane wrote:
> >> My guess is that we ought to be checking for relcache invalidation
> >> immediately after gaining any lock on the relation.  I don't know where
> >> that should be done, however.
>
> > Seems as GOOD solution!
> > We could do inval check in LockRelation() just after LockAcquire().
>
> I tried inserting code like this in LockRelation:
>
> --- 163,176 ----
>         tag.objId.blkno = InvalidBlockNumber;
>
>         LockAcquire(LockTableId, &tag, lockmode);
> !
> !       /* Check to make sure the relcache entry hasn't been invalidated
> !        * while we were waiting to lock it.
> !        */
> !       DiscardInvalid();
> !       if (relation != RelationIdGetRelation(tag.relId))
> !               elog(ERROR, "LockRelation: relation %u deleted
> while waiting to
> lock it",
> !                        tag.relId);
>   }
>
>   /*
>
> and moving the smgrclose() call in RelationFlushRelation so that it is
> called unconditionally.
>
> Doesn't work though: the ALTER TABLE tests in regress/misc fail.
> Apparently, this change causes the sinval report from update of the
> relation's pg_class heap entry to be read while the relation has refcnt>0,
> so RelationFlushRelation doesn't flush it, so we have an obsolete
> relation cache entry.  Ooops.
>

How about inserting "RelationDecrementReferenceCount(relation);"
between LockAcquire() and DiscardInvalid() ?
And isn't it preferable that LockRelation() returns the relation
which RelationIdGetRelation(tag.relId) returns ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp



Re: [HACKERS] md.c is feeling much better now, thank you

От
Tom Lane
Дата:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>> Doesn't work though: the ALTER TABLE tests in regress/misc fail.
>> Apparently, this change causes the sinval report from update of the
>> relation's pg_class heap entry to be read while the relation has refcnt>0,
>> so RelationFlushRelation doesn't flush it, so we have an obsolete
>> relation cache entry.  Ooops.

> How about inserting "RelationDecrementReferenceCount(relation);"
> between LockAcquire() and DiscardInvalid() ?

Would only help if the relation had been opened exactly once before
the lock; not if its refcnt is greater than 1.  Worse, it would only
help for the particular relation being locked, but we might receive
an sinval report for a different already-locked relation.

> And isn't it preferable that LockRelation() returns the relation
> which RelationIdGetRelation(tag.relId) returns ?

No, because that would only inform the immediate caller of LockRelation
of a change.  This is insufficient for both the reasons mentioned above.
For that matter, my first-cut patch is insufficient, because it
won't detect the case that a relcache entry other than the one
currently being locked has been flushed.

I think what we need to do is revise RelationFlushRelation so that
it (a) deletes the relcache entry if its refcnt is zero; otherwise
(b) leaves the relcache entry in existence, but recomputes all
its contents and subsidiary data structures, and (c) if, while
trying to recompute the contents, it finds that the relation has
actually been deleted, then it's elog(ERROR) time.  In this way,
existing pointers to the relcache entry --- which might belong to
routines very far down the call stack --- remain valid, or else
we elog() if they aren't.

We might still have a few bugs with routines that copy data out of the
relcache entry before locking it, but I don't recall having seen any
code like that.  Most of the code seems to do heap_open immediately
followed by LockRelation, and that should be safe.

I'd like to hear Vadim's opinion before wading in, but this seems
like it ought to be workable.
        regards, tom lane


Re: [HACKERS] md.c is feeling much better now, thank you

От
Tom Lane
Дата:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> Hmm,Data Define commands are unrecoverable in many DBMS-s. 

True.

> For example,should the following be possible ?

> [A table t exists.]

> begin;
> drop table t;
> create table t (dt int4);
> insert into t values (1);
> drop table t;
> abort;

I don't mind if that is rejected --- but it ought to be rejected
cleanly, rather than leaving a broken table behind.

IIRC, it is fairly easy to tell from the xact.c state whether we are
inside a BEGIN block.  Maybe DROP TABLE and anything else that has
nonreversible side effects ought to simply elog(ERROR) if called inside
a BEGIN block.  We'd need to be a little careful though, since I think
DROP TABLE on a temp table created in the same transaction ought to
work.
        regards, tom lane


Re: [HACKERS] md.c is feeling much better now, thank you

От
Tom Lane
Дата:
I have just committed changes that address the problem of relcache
entries not being updated promptly after another backend issues
a shared-invalidation report.  LockRelation() now checks for sinval
reports just after acquiring a lock, and the relcache entry will be
updated if necessary --- or, if the relation has actually disappeared
entirely, an elog(ERROR) will occur.

As a side effect of the relcache update, the underlying md.c/fd.c files
will be closed, and then reopened if necessary.  This should solve our
concerns about vacuum.c not being able to truncate relations safely.

There is still some potential for misbehavior as a result of the fact
that the parser looks at relcache entries without bothering to obtain
any kind of lock on the relation.  For example:

-- In backend #1:
regression=> create table z1 (f1 int4);
CREATE
regression=> select * from z1;
f1
--
(0 rows)

regression=> begin;
BEGIN

-- In backend #2:
regression=> alter table z1 add column f2 int4;
ADD
regression=>

-- In backend #1:
regression=> select * from z1;
f1
--
(0 rows)

-- parser uses un-updated relcache entry and sees only one column in z1.
-- However, the relcache *will* get updated as soon as we either lock a
-- table or do the CommandCounterIncrement() at end of query, so a second
-- try sees the new info:
regression=> select * from z1;
f1|f2
--+--
(0 rows)

regression=> end;
END

The window for problems is pretty small: you have to be within a
transaction (otherwise the StartTransaction will notice the sinval
report), and your very first query after the other backend does
ALTER TABLE has to reference the altered table.  So I'm not sure
this is worth worrying about.  But perhaps the parser ought to obtain
the weakest possible lock on each table referenced in a query before
it does any looking at the attributes of the table.  Comments?


I believe these changes ought to be committed into REL6_5 as well,
but it might be wise to test them a little more in current first.
Or would people find it easier to test them against 6.5 databases?
In that case maybe I should just commit them now...
        regards, tom lane


Re: [HACKERS] md.c is feeling much better now, thank you

От
Bruce Momjian
Дата:
> The window for problems is pretty small: you have to be within a
> transaction (otherwise the StartTransaction will notice the sinval
> report), and your very first query after the other backend does
> ALTER TABLE has to reference the altered table.  So I'm not sure
> this is worth worrying about.  But perhaps the parser ought to obtain
> the weakest possible lock on each table referenced in a query before
> it does any looking at the attributes of the table.  Comments?

Good question.  How do other db's handle such a case?  I hesitate to do
locking for parser lookups.  Seems live more lock overhead.


> I believe these changes ought to be committed into REL6_5 as well,
> but it might be wise to test them a little more in current first.
> Or would people find it easier to test them against 6.5 databases?
> In that case maybe I should just commit them now...

Seems it should be 6.6 only.  Too obscure a bug.  Could introduce a bug.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] md.c is feeling much better now, thank you

От
Tatsuo Ishii
Дата:
> I have just committed changes that address the problem of relcache
> entries not being updated promptly after another backend issues
> a shared-invalidation report.  LockRelation() now checks for sinval
> reports just after acquiring a lock, and the relcache entry will be
> updated if necessary --- or, if the relation has actually disappeared
> entirely, an elog(ERROR) will occur.
> 
> As a side effect of the relcache update, the underlying md.c/fd.c files
> will be closed, and then reopened if necessary.  This should solve our
> concerns about vacuum.c not being able to truncate relations safely.
> 
> There is still some potential for misbehavior as a result of the fact
> that the parser looks at relcache entries without bothering to obtain
> any kind of lock on the relation.  For example:
> 
> -- In backend #1:
> regression=> create table z1 (f1 int4);
> CREATE
> regression=> select * from z1;
> f1
> --
> (0 rows)
> 
> regression=> begin;
> BEGIN
> 
> -- In backend #2:
> regression=> alter table z1 add column f2 int4;
> ADD
> regression=>
> 
> -- In backend #1:
> regression=> select * from z1;
> f1
> --
> (0 rows)
> 
> -- parser uses un-updated relcache entry and sees only one column in z1.
> -- However, the relcache *will* get updated as soon as we either lock a
> -- table or do the CommandCounterIncrement() at end of query, so a second
> -- try sees the new info:
> regression=> select * from z1;
> f1|f2
> --+--
> (0 rows)
> 
> regression=> end;
> END
> 
> The window for problems is pretty small: you have to be within a
> transaction (otherwise the StartTransaction will notice the sinval
> report), and your very first query after the other backend does
> ALTER TABLE has to reference the altered table.  So I'm not sure
> this is worth worrying about.  But perhaps the parser ought to obtain
> the weakest possible lock on each table referenced in a query before
> it does any looking at the attributes of the table.  Comments?

Ok. I will give another example that seems more serious.

test=> begin;
BEGIN
test=> create table t1(i int);
CREATE
-- a table file named "t1" is created.
test=> aaa;
ERROR:  parser: parse error at or near "aaa"
-- transaction is aborted and the table file t1 is unlinked.
test=> select * from t1;
-- but parser doesn't know t1 does not exist any more.
-- it tries to open t1 using mdopen(). (see including backtrace)
-- mdopen() tries to open t1 and fails. In this case mdopen()
-- creates t1!
NOTICE:  (transaction aborted): queries ignored until END
*ABORT STATE*
test=> end;
END
test=> create table t1(i int); 
ERROR:  cannot create t1
-- since relation file t1 already exists.
test=> 
EOF
[t-ishii@ext16 src]$ !!
psql test
Welcome to the POSTGRESQL interactive sql monitor: Please read the file COPYRIGHT for copyright terms of POSTGRESQL
[PostgreSQL 6.6.0 on powerpc-unknown-linux-gnu, compiled by gcc egcs-2.90.25 980302 (egcs-1.0.2 prerelease)]
  type \? for help on slash commands  type \q to quit  type \g or terminate with semicolon to execute queryYou are
currentlyconnected to the database: test
 

test=> select * from t1;

ERROR:  t1: Table does not exist.
test=> create table t1(i int);

ERROR:  cannot create t1
-- again, since relation file t1 already exists.
-- user would never be able to create t1!

I think the long range solution would be let parser obtain certain
locks as Tom said. Until that I propose following solution. It looks
simple, safe and would be neccessary anyway (I don't know why that
check had not been implemented). See included patches.

---------------------------- backtrace -----------------------------
#0  mdopen (reln=0x1a9af18) at md.c:279
#1  0x18cb784 in smgropen (which=425, reln=0xbfffdef0) at smgr.c:185
#2  0x18cb784 in smgropen (which=0, reln=0x1a9af18) at smgr.c:185
#3  0x1904c1c in RelationBuildDesc ()
#4  0x1905360 in RelationNameGetRelation ()
#5  0x18259a4 in heap_openr ()
#6  0x187f59c in addRangeTableEntry ()
#7  0x1879cb0 in transformTableEntry ()
#8  0x1879d40 in parseFromClause ()
#9  0x1879a90 in makeRangeTable ()
#10 0x1871fd8 in transformSelectStmt ()
#11 0x1870d14 in transformStmt ()
#12 0x18709e0 in parse_analyze ()
#13 0x18792d4 in parser ()
#14 0x18cd158 in pg_parse_and_plan ()
#15 0x18cd5c0 in pg_exec_query_dest ()
#16 0x18cd524 in pg_exec_query ()
#17 0x18ce9ac in PostgresMain ()
#18 0x18a5994 in DoBackend ()
#19 0x18a53c8 in BackendStartup ()
#20 0x18a46d0 in ServerLoop ()
#21 0x18a4108 in PostmasterMain ()
#22 0x1870928 in main ()
---------------------------- backtrace -----------------------------

---------------------------- patches  -----------------------------
*** md.c~    Sun Sep  5 08:41:28 1999
--- md.c    Sun Sep  5 11:01:57 1999
***************
*** 286,296 ****
--- 286,303 ----      /* this should only happen during bootstrap processing */     if (fd < 0)
+     {
+         if (!IsBootstrapProcessingMode())
+         {
+             elog(ERROR,"Couldn't open %s\n", path);
+             return -1;
+         } #ifndef __CYGWIN32__         fd = FileNameOpenFile(path, O_RDWR | O_CREAT | O_EXCL, 0600); #else         fd
=FileNameOpenFile(path, O_RDWR | O_CREAT | O_EXCL | O_BINARY, 0600); #endif
 
+     }      vfd = _fdvec_alloc();     if (vfd < 0)


Re: [HACKERS] md.c is feeling much better now, thank you

От
Tom Lane
Дата:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> Ok. I will give another example that seems more serious.
> test=> aaa;
> ERROR:  parser: parse error at or near "aaa"
> -- transaction is aborted and the table file t1 is unlinked.
> test=> select * from t1;
> -- but parser doesn't know t1 does not exist any more.
> -- it tries to open t1 using mdopen(). (see including backtrace)
> -- mdopen() tries to open t1 and fails. In this case mdopen()
> -- creates t1!
> NOTICE:  (transaction aborted): queries ignored until END
> *ABORT STATE*

Hmm.  It seems a more straightforward solution would be to alter
pg_parse_and_plan so that the parser isn't even called if we have
already failed the current transaction; that is, the "queries ignored"
test should occur sooner.  I'm rather surprised to realize that
we do run the parser in this situation...

> I think the long range solution would be let parser obtain certain
> locks as Tom said.

That would not solve this particular problem, since the lock manager
wouldn't know any better than the parser that the table doesn't really
exist any more.

> Until that I propose following solution. It looks
> simple, safe and would be neccessary anyway (I don't know why that
> check had not been implemented). See included patches.

This looks like it might be a good change, but I'm not quite as sure
as you are that it won't have any bad effects.  Have you tested it?
        regards, tom lane


Re: [HACKERS] md.c is feeling much better now, thank you

От
Tatsuo Ishii
Дата:
> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > Ok. I will give another example that seems more serious.
> > test=> aaa;
> > ERROR:  parser: parse error at or near "aaa"
> > -- transaction is aborted and the table file t1 is unlinked.
> > test=> select * from t1;
> > -- but parser doesn't know t1 does not exist any more.
> > -- it tries to open t1 using mdopen(). (see including backtrace)
> > -- mdopen() tries to open t1 and fails. In this case mdopen()
> > -- creates t1!
> > NOTICE:  (transaction aborted): queries ignored until END
> > *ABORT STATE*
> 
> Hmm.  It seems a more straightforward solution would be to alter
> pg_parse_and_plan so that the parser isn't even called if we have
> already failed the current transaction; that is, the "queries ignored"
> test should occur sooner.  I'm rather surprised to realize that
> we do run the parser in this situation...

No. we have to run the parser so that we could accept "end".

> > I think the long range solution would be let parser obtain certain
> > locks as Tom said.
> 
> That would not solve this particular problem, since the lock manager
> wouldn't know any better than the parser that the table doesn't really
> exist any more.

I see.

> > Until that I propose following solution. It looks
> > simple, safe and would be neccessary anyway (I don't know why that
> > check had not been implemented). See included patches.
> 
> This looks like it might be a good change, but I'm not quite as sure
> as you are that it won't have any bad effects.  Have you tested it?

At least initdb and the regression test runs fine for me...
---
Tatsuo Ishii


Re: [HACKERS] md.c is feeling much better now, thank you

От
Tom Lane
Дата:
Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>> Hmm.  It seems a more straightforward solution would be to alter
>> pg_parse_and_plan so that the parser isn't even called if we have
>> already failed the current transaction; that is, the "queries ignored"
>> test should occur sooner.  I'm rather surprised to realize that
>> we do run the parser in this situation...

> No. we have to run the parser so that we could accept "end".

Ah, very good point.  I stand corrected.

>>>> Until that I propose following solution. It looks
>>>> simple, safe and would be neccessary anyway (I don't know why that
>>>> check had not been implemented). See included patches.
>> 
>> This looks like it might be a good change, but I'm not quite as sure
>> as you are that it won't have any bad effects.  Have you tested it?
>
> At least initdb and the regression test runs fine for me...

Same here.  I have committed it into current, but not REL6_5.
        regards, tom lane


Re: [HACKERS] md.c is feeling much better now, thank you

От
Bruce Momjian
Дата:
Any resolution on this?

> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
> > Ok. I will give another example that seems more serious.
> > test=> aaa;
> > ERROR:  parser: parse error at or near "aaa"
> > -- transaction is aborted and the table file t1 is unlinked.
> > test=> select * from t1;
> > -- but parser doesn't know t1 does not exist any more.
> > -- it tries to open t1 using mdopen(). (see including backtrace)
> > -- mdopen() tries to open t1 and fails. In this case mdopen()
> > -- creates t1!
> > NOTICE:  (transaction aborted): queries ignored until END
> > *ABORT STATE*
> 
> Hmm.  It seems a more straightforward solution would be to alter
> pg_parse_and_plan so that the parser isn't even called if we have
> already failed the current transaction; that is, the "queries ignored"
> test should occur sooner.  I'm rather surprised to realize that
> we do run the parser in this situation...
> 
> > I think the long range solution would be let parser obtain certain
> > locks as Tom said.
> 
> That would not solve this particular problem, since the lock manager
> wouldn't know any better than the parser that the table doesn't really
> exist any more.
> 
> > Until that I propose following solution. It looks
> > simple, safe and would be neccessary anyway (I don't know why that
> > check had not been implemented). See included patches.
> 
> This looks like it might be a good change, but I'm not quite as sure
> as you are that it won't have any bad effects.  Have you tested it?
> 
>             regards, tom lane
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] md.c is feeling much better now, thank you

От
Tom Lane
Дата:
Bruce Momjian <maillist@candle.pha.pa.us> writes:
> Any resolution on this?

I believe I committed Tatsuo's change.

There is still the issue that the parser doesn't obtain any lock on
a relation during parsing, so it's possible to use a slightly stale
relcache entry for parsing purposes.  (It can't be *really* stale,
since presumably we just read the SI queue during StartTransaction,
but still it could be wrong if someone commits an ALTER TABLE while
we are parsing our query.)

After thinking about it for a while, I am not sure if we should try to
fix this or not.  The obvious fix would be to have the parser grab
AccessShareLock on any relation as soon as it is seen in the query,
and then keep this lock till end of transaction; that would guarantee
that no one else could alter the table structure and thereby invalidate
the parser's information about the relation.  But that does not work
because it guarantees deadlock if two processes both try to get
AccessExclusiveLock, as in plain old "BEGIN; LOCK TABLE table; ...".
They'll both be holding AccessShareLock so neither can get exclusive.

There might be another way, but we need to be careful not to choose
a cure that's worse than the disease.

            regards, tom lane


>> Tatsuo Ishii <t-ishii@sra.co.jp> writes:
>>>> Ok. I will give another example that seems more serious.
>>>> test=> aaa;
>>>> ERROR:  parser: parse error at or near "aaa"
>>>> -- transaction is aborted and the table file t1 is unlinked.
>>>> test=> select * from t1;
>>>> -- but parser doesn't know t1 does not exist any more.
>>>> -- it tries to open t1 using mdopen(). (see including backtrace)
>>>> -- mdopen() tries to open t1 and fails. In this case mdopen()
>>>> -- creates t1!
>>>> NOTICE:  (transaction aborted): queries ignored until END
>>>> *ABORT STATE*
>>
>> Hmm.  It seems a more straightforward solution would be to alter
>> pg_parse_and_plan so that the parser isn't even called if we have
>> already failed the current transaction; that is, the "queries ignored"
>> test should occur sooner.  I'm rather surprised to realize that
>> we do run the parser in this situation...
>>
>>>> I think the long range solution would be let parser obtain certain
>>>> locks as Tom said.
>>
>> That would not solve this particular problem, since the lock manager
>> wouldn't know any better than the parser that the table doesn't really
>> exist any more.
>>
>>>> Until that I propose following solution. It looks
>>>> simple, safe and would be neccessary anyway (I don't know why that
>>>> check had not been implemented). See included patches.
>>
>> This looks like it might be a good change, but I'm not quite as sure
>> as you are that it won't have any bad effects.  Have you tested it?