Обсуждение: Re: [HACKERS] Re: [GENERAL] drop/rename table and transactions
> From: Lamar Owen <lamar.owen@wgcr.org> > On Fri, 26 Nov 1999, Mike Mascari wrote: > > > Tom Lane <tgl@sss.pgh.pa.us> writes: > > > What does ORACLE do here? > > > > > Since ORACLE has 70% of the RDBMS market, it is the de facto standard > > > > > > Yes, and Windows is the de facto standard operating system. I don't use > > > Windows, and I'm not willing to follow Oracle's lead when they make a > > > bad decision... > > > So I guess I should file away my other suggestion to use DCOM as > > the object technology of choice instead of CORBA? ;-) > > This is a Free Software project -- PostgreSQL is not bound by the decisions of > the 'market leader' any more than Linux is bound by the standards of Microsoft. The DCOM remark was just a joke ;-). My remark concerning ORACLE was in response to Andreas' comment that implicit COMMITs of DDL statements was absurd. I wanted to simply point out that, since ORACLE has 70% market share, most corporate database developers EXPECT their DDL statements to commit their transactions (if they've RTFM). I also pointed out that it would be GREAT if PostgreSQL could successfully rollback DDL statements sanely (and thus diverge from ORACLE). I guess I don't expect that to happen successfully until something the equivalent of TABLESPACES is implemented and there is a disassociation between table names, index names and their filesystem counterparts and to be able to "undo" filesystem operations. That, it seems to me, will be a major undertaking and not going to happen any time soon... I'll stop swinging at windmills now... Mike Mascari
On Fri, 26 Nov 1999, Mike Mascari wrote: > The DCOM remark was just a joke ;-). My remark concerning ORACLE was in > response to Andreas' comment that implicit COMMITs of DDL statements was > absurd. I wanted to simply point out that, since ORACLE has 70% market > share, I did not see the response to Andreas, nor did I see Andreas' assertion that it was absurd. My apologies. > counterparts and to be able to "undo" filesystem operations. That, it seems > to > me, will be a major undertaking and not going to happen any time soon... Yes, that is true. As long as the storage manager relies on the filesystem for table names, this will be a problem, unless filesystem deletions are delayed until COMMIT, and filesystem creates are undone at a ROLLBACK. -- Lamar Owen WGCR Internet Radio 1 Peter 4:11
> if PostgreSQL could successfully rollback DDL statements sanely (and thus > diverge from ORACLE). I guess I don't expect that to happen successfully > until > something the equivalent of TABLESPACES is implemented and there is a > disassociation between table names, index names and their filesystem > counterparts and to be able to "undo" filesystem operations. That, it seems > to > me, will be a major undertaking and not going to happen any time soon... Ingres has table names that don't match on-disk file names, and it is a pain to administer because you can't figure out what is going on at the file system level. Table files have names like AAAHFGE. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote: > > > if PostgreSQL could successfully rollback DDL statements sanely (and thus > > diverge from ORACLE). I guess I don't expect that to happen successfully > > until > > something the equivalent of TABLESPACES is implemented and there is a > > disassociation between table names, index names and their filesystem > > counterparts and to be able to "undo" filesystem operations. That, it seems > > to > > me, will be a major undertaking and not going to happen any time soon... > > Ingres has table names that don't match on-disk file names, and it is a > pain to administer because you can't figure out what is going on at the > file system level. Table files have names like AAAHFGE. I have to say that I'm going to change on-disk database/table/index file names to _OID_! This is required by WAL because of inside of log records there will be just database/table/index oids, not names, and after crash recovery will not be able to read pg_class to get database/table/index name using oid ... Vadim
> I have to say that I'm going to change on-disk database/table/index > file names to _OID_! This is required by WAL because of inside of > log records there will be just database/table/index oids, not names, > and after crash recovery will not be able to read pg_class to get > database/table/index name using oid ... Wow, that is a major pain. Anyone else think so? Using oid's instead of names may give us some ability to fix some other bugs, though. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian wrote: > > > I have to say that I'm going to change on-disk database/table/index > > file names to _OID_! This is required by WAL because of inside of > > log records there will be just database/table/index oids, not names, > > and after crash recovery will not be able to read pg_class to get > > database/table/index name using oid ... > > Wow, that is a major pain. Anyone else think so? Why it's so painful? We can write utility to construct database dir with table names symlinked to real table files -:) Actually, I don't understand for what would you need to know what is what, (c) -:) > Using oid's instead of names may give us some ability to fix some other > bugs, though. Yes. Vadim
> Bruce Momjian wrote: > > > > > I have to say that I'm going to change on-disk database/table/index > > > file names to _OID_! This is required by WAL because of inside of > > > log records there will be just database/table/index oids, not names, > > > and after crash recovery will not be able to read pg_class to get > > > database/table/index name using oid ... > > > > Wow, that is a major pain. Anyone else think so? > > Why it's so painful? > We can write utility to construct database dir with table names > symlinked to real table files -:) > Actually, I don't understand > for what would you need to know what is what, (c) -:) With Ingres, you can't just look at a file and know the table name, and if you need to reload just one file from a tape, it is a royal pain to know which file to bring back. I have said Ingres make things 100 times harder for adminstrators by doing this. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Bruce Momjian <pgman@candle.pha.pa.us> writes: >> I have to say that I'm going to change on-disk database/table/index >> file names to _OID_! This is required by WAL because of inside of >> log records there will be just database/table/index oids, not names, >> and after crash recovery will not be able to read pg_class to get >> database/table/index name using oid ... > Wow, that is a major pain. Anyone else think so? > Using oid's instead of names may give us some ability to fix some other > bugs, though. Yes, and yes. I've been trying to nerve myself to propose that, because it seems the only reasonable way to make rollback of RENAME TABLE and DROP TABLE work safely. It'll be a pain in the neck for debugging and admin purposes though. Can we make some sort of usually-correct-but-not-guaranteed-correct dump that shows which corresponds to what? Maybe something similar to the textfile dump of pg_shadow that the postmaster uses for password authentication? Then at least you'd have some shot at figuring out which file was what in extremis... regards, tom lane
Bruce Momjian wrote: > > > > > > > Wow, that is a major pain. Anyone else think so? > > > > Why it's so painful? > > We can write utility to construct database dir with table names > > symlinked to real table files -:) > > Actually, I don't understand > > for what would you need to know what is what, (c) -:) > > With Ingres, you can't just look at a file and know the table name, and > if you need to reload just one file from a tape, it is a royal pain to > know which file to bring back. I have said Ingres make things 100 times > harder for adminstrators by doing this. Moving table file to/off database dir separately is not right way for backup/restore... On-line/off-line full backup utility will copy _all_ database files to _somewhere_ (tape etc) as well as on-line transaction logs and pg_control (to know when was the last checkpoint made). And to restore things after disk failure administrator will have to copy _all_ files + logs (+logs made as incremental backup) + pg_control back and start postmaster. Vadim
Tom Lane wrote: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > >> I have to say that I'm going to change on-disk database/table/index > >> file names to _OID_! This is required by WAL because of inside of > >> log records there will be just database/table/index oids, not names, > >> and after crash recovery will not be able to read pg_class to get > >> database/table/index name using oid ... > > > Wow, that is a major pain. Anyone else think so? > > Using oid's instead of names may give us some ability to fix some other > > bugs, though. > > Yes, and yes. I've been trying to nerve myself to propose that, because > it seems the only reasonable way to make rollback of RENAME TABLE and > DROP TABLE work safely. It'll be a pain in the neck for debugging and > admin purposes though. So, no more nerves needed, Tom, yeh? -:) It would be nice if someone else, not me, implement this... > Can we make some sort of usually-correct-but-not-guaranteed-correct > dump that shows which corresponds to what? Maybe something similar > to the textfile dump of pg_shadow that the postmaster uses for password > authentication? Then at least you'd have some shot at figuring out > which file was what in extremis... As it was proposed - utility to create dir with database name (in addition to dir with database oid where data really live) and symlinks there: table_name --> ../db_oid/table_oid Vadim
Vadim Mikheev <vadim@krs.ru> writes: > So, no more nerves needed, Tom, yeh? -:) > It would be nice if someone else, not me, implement this... Um, I've got more than enough on my plate already... >> Can we make some sort of usually-correct-but-not-guaranteed-correct >> dump that shows which corresponds to what? Maybe something similar >> to the textfile dump of pg_shadow that the postmaster uses for password >> authentication? Then at least you'd have some shot at figuring out >> which file was what in extremis... > As it was proposed - utility to create dir with database name > (in addition to dir with database oid where data really live) > and symlinks there: table_name --> ../db_oid/table_oid I saw your message about that after sending mine. Yes, that'd be a cool way of displaying the relationship. But the main thing to remember is that it'd only be correct at steady-state when nothing is being changed. If we tried to guarantee the mapping was correct 100% of the time, we'd be back to square one. Of course, that makes the whole thing somewhat less useful for debugging purposes, since Murphy's Law says that the times you really need to know what's what are just when the system crashed in the middle of a table rename ;-) regards, tom lane
> > Wow, that is a major pain. Anyone else think so? > > Using oid's instead of names may give us some ability to fix some other > > bugs, though. > > Yes, and yes. I've been trying to nerve myself to propose that, because > it seems the only reasonable way to make rollback of RENAME TABLE and > DROP TABLE work safely. It'll be a pain in the neck for debugging and > admin purposes though. I look at this and question the value of allowing such fancy things vs. the ability to look at the directory and know exactly what table is which file. Maybe we can use file names like 23423_mytable where 24323 is the table oid and mytable is the table name. That way, we can know the table, and they are unique too to allow RENAME TABLE to work. This doesn't solve Vadim's problem. His additional work would be to write a line to the log file for each table create/delete saying I deleted this table with this oid, and when reading back the log, he has to record the oid_username combination and use that to translate his log oids into actual filenames. In fact, doesn't that information already appear in the WAL log as part of pg_class changes? Or is the problem that the table changes happen before the pg_class is committed? > Can we make some sort of usually-correct-but-not-guaranteed-correct > dump that shows which corresponds to what? Maybe something similar > to the textfile dump of pg_shadow that the postmaster uses for password > authentication? Then at least you'd have some shot at figuring out > which file was what in extremis... That is OK, and a possible workaround if the above idea is not good. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> Moving table file to/off database dir separately is not right way for > backup/restore... > > On-line/off-line full backup utility will copy _all_ database files to > _somewhere_ (tape etc) as well as on-line transaction logs > and pg_control (to know when was the last checkpoint made). > And to restore things after disk failure administrator will > have to copy _all_ files + logs (+logs made as incremental backup) > + pg_control back and start postmaster. No, I am talking about restoring a single table without doing the entire database. If you recreate the table empty with the same structure, shutdown db, mv table restored file to data directory and restart, table not has old contents. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> As it was proposed - utility to create dir with database name > (in addition to dir with database oid where data really live) > and symlinks there: table_name --> ../db_oid/table_oid In fact, let me change what I suggested. Instead of 3434_mytable, I suggest mytable_3434 so that the tables even appear in alphabetical order in the directory. The _ may be the wrong character to separate tablename from oid. Not sure, but we may need to use something that can't be used in sql like mytable+234. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
> > Can we make some sort of usually-correct-but-not-guaranteed-correct > > dump that shows which corresponds to what? Maybe something similar > > to the textfile dump of pg_shadow that the postmaster uses for password > > authentication? Then at least you'd have some shot at figuring out > > which file was what in extremis... > > As it was proposed - utility to create dir with database name > (in addition to dir with database oid where data really live) > and symlinks there: table_name --> ../db_oid/table_oid That's interesting, but I am concerned about the extra overhead of creating two links for every file. The other issue is if the table is accidentally dropped, how do you use that utility to know the oid of the table that was removed? I guess I like the OID_tablename idea. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Hi all, I propose here that we stop the release of lock before end of transaction. I have been suffering from the early release of lock. Comments ? If we don't allow DDL command inside transaction block,we won't need the release before end of transaction. If we allow DDL command inside transaction block,it may be a problem. But are there any other principles which could guarantee consistency ? Regards. Hiroshi Inoue Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > I propose here that we stop the release of lock before end of transaction. > I have been suffering from the early release of lock. Now that read and write locks don't interfere with each other, this may not be as big a performance loss as it sounds. But, do you intend to apply this principle to system tables as well as user tables? I am concerned that we will have deadlock problems if we try to do it for system tables, because practically all transactions will start out with system-table accesses, which implies grabbing a read lock on those system tables if you want to take a hard line about it. If you later need to upgrade to a higher-grade lock on any of those system tables, you've got trouble. There is another issue I've been thinking about that seems to require some amount of lock-releasing within a transaction, too. Specifically, I'd like to see the parser grab a minimal lock (AccessShareLock) on each table referenced in a query as soon as it recognizes the table name. The rewriter would also have to lock each table that it adds into the query due to rules. This would prevent problems that we have now with ALTER TABLE running in parallel with parsing/planning of a query. But, many queries require more than AccessShareLock on their tables. If we simply try to grab the higher-grade locks without releasing AccessShareLock, we will certainly suffer deadlock. If anyone's having a hard time seeing why lock upgrade is dangerous, consider two backends trying at about the same time to doBEGIN; LOCK TABLE foo; etc etc since this can happen:Backend A's parser recognizes 'foo', grabs AccessShareLock on fooBackend B's parser recognizes 'foo',grabs AccessShareLock on fooBackend A's executor tries to get AccessExclusiveLock on foo, must wait for BBackendB's executor tries to get AccessExclusiveLock on foo, must wait for A So I think the real solution must go something like this: * Parser and rewriter grab AccessShareLock on each table as it is added to the query. * At start of planner, all tables and required access rights are known. Release AccessShareLocks, then grab required lock levels on each table. We probably want to error out if any DDL alteration has actually occurred to any of the tables by the time we re-acquire its lock. An easy improvement on this is to avoid the drop/grab if AccessShareLock is the only thing needed on each table (as in a SELECT). We could further try to extend the parser so that it grabs a sufficient lock on each table initially --- that's probably easy enough for INSERT target tables and so forth, but we cannot guarantee that it will be possible in every case. (Consider rule rewrites that add actions not present in the initial query.) Can you see a way to solve this problem without dropping/grabbing locks? > If we don't allow DDL command inside transaction block,we won't need > the release before end of transaction. > If we allow DDL command inside transaction block,it may be a problem. > But are there any other principles which could guarantee consistency ? I certainly do not wish to give up the goal of supporting DDL statements inside transactions. What problems do you foresee? regards, tom lane
> -----Original Message----- > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > I propose here that we stop the release of lock before end of > transaction. > > I have been suffering from the early release of lock. > > Now that read and write locks don't interfere with each other, this may > not be as big a performance loss as it sounds. > > But, do you intend to apply this principle to system tables as well as > user tables? Yes I said about only system tables. Isn't an early release of lock for user tables is already a bug ?(except AccessShareLock). > I am concerned that we will have deadlock problems if we > try to do it for system tables, because practically all transactions > will start out with system-table accesses, which implies grabbing a read > lock on those system tables if you want to take a hard line about it. > If you later need to upgrade to a higher-grade lock on any of those > system tables, you've got trouble. > Sorry,my target is only executor stage this time and AccessShareLock is an exception. As for parser/planner stage,it needs further consideration and I don't have any solution yet. SPI already has an ability to prepare plans and executor could use them in other transactions. We have to draw a clear line between executor and parse/planner. Probably plan invalidation mecahnism will be needed and we may have to put plans on shared memory to realize it. > > > If we don't allow DDL command inside transaction block,we won't need > > the release before end of transaction. > > If we allow DDL command inside transaction block,it may be a problem. > > But are there any other principles which could guarantee consistency ? > > I certainly do not wish to give up the goal of supporting DDL statements > inside transactions. What problems do you foresee? > I may be the first man that threw a question about DDL commands inside transactions. I'm stiil suspicious about the possibility. I have thought about the following. I think they should be considered even in case of DDL commands *outside* transactions. 1) The biggest obstacle for me is this early release of lock(including parser/planner handling). Without a solution forthis I couldn't see any consistency for system tuples. 2) The implementation of row level share locking. 3) The naming of relation files which has been discussed in this thread. 4) The lack of read consistency in DDL statement though I couldn't tell concretely what's wrong with it. Regards. Hiroshi Inoue Inoue@tpf.co.jp
> I propose here that we stop the release of lock before end of transaction. > I have been suffering from the early release of lock. > > Comments ? If there's no objection,I would change UnlockRelation() to not release the specified lock except AccessShareLock. Regards. Hiroshi Inoue Inoue@tpf.co.jp
Hiroshi Inoue wrote: > > > I propose here that we stop the release of lock before end of transaction. > > I have been suffering from the early release of lock. > > > > Comments ? > > If there's no objection,I would change UnlockRelation() to not release > the specified lock except AccessShareLock. Why don't remove this call from improper places? I would try to find all calls and understand why they made... Vadim
Vadim Mikheev wrote: > Hiroshi Inoue wrote: > > > > > I propose here that we stop the release of lock before end of transaction. > > > I have been suffering from the early release of lock. > > > > > > Comments ? > > > > If there's no objection,I would change UnlockRelation() to not release > > the specified lock except AccessShareLock. > > Why don't remove this call from improper places? > I would try to find all calls and understand why > they made... > I think UnlockRelation() is unnecessary Oracle doesn't have > > Vadim
Vadim Mikheev wrote: > Hiroshi Inoue wrote: > > > > > > > > > > If there's no objection,I would change UnlockRelation() to not release > > > > the specified lock except AccessShareLock. > > > > > > Why don't remove this call from improper places? > > > I would try to find all calls and understand why > > > they made... > > > > > > > I was surprized that few people really want DDL commands inside transactions. > > Are there any reasons to releasing lock before end of transaction except > > that long term lock for system tuples is not preferable ? > > > > I think that UnlockRelation() is unnecessary fundamentally. > > Mine is the simplest way to achieve this. > > If there's no problem,I am glad to remove UnlockRelation() calls. > > There are! I finally found where I used UnlockRelation() - > in execUtils.c:ExecCloseIndices(). Please read comments in > ExecOpenIndices() where LockRelation() is called... I see it now. Hmm,index itself doesn't have its time qualification and is out of transaction control(at least now). OK,I would examine it one by one. Regards. Hiroshi Inoue Inoue@tpf.co.jp
Hiroshi Inoue wrote: > > > > > > > If there's no objection,I would change UnlockRelation() to not release > > > the specified lock except AccessShareLock. > > > > Why don't remove this call from improper places? > > I would try to find all calls and understand why > > they made... > > > > I was surprized that few people really want DDL commands inside transactions. > Are there any reasons to releasing lock before end of transaction except > that long term lock for system tuples is not preferable ? > > I think that UnlockRelation() is unnecessary fundamentally. > Mine is the simplest way to achieve this. > If there's no problem,I am glad to remove UnlockRelation() calls. There are! I finally found where I used UnlockRelation() - in execUtils.c:ExecCloseIndices(). Please read comments in ExecOpenIndices() where LockRelation() is called... Vadim
Hiroshi Inoue wrote: > > > Why don't remove this call from improper places? > > I would try to find all calls and understand why > > they made... > > > > I think UnlockRelation() is unnecessary > > Oracle doesn't have And we havn't UNLOCK TABLE _command_ as well -:) But func call is internal thing and I don't know Oracle internals. If this call is unnecessary - get rid of it at all... Vadim
Oops sorry,I sent a draft by mistake. Vadim Mikheev wrote: > Hiroshi Inoue wrote: > > > > > I propose here that we stop the release of lock before end of transaction. > > > I have been suffering from the early release of lock. > > > > > > Comments ? > > > > If there's no objection,I would change UnlockRelation() to not release > > the specified lock except AccessShareLock. > > Why don't remove this call from improper places? > I would try to find all calls and understand why > they made... > I was surprized that few people really want DDL commands inside transactions. Are there any reasons to releasing lock before end of transaction except that long term lock for system tuples is not preferable ? I think that UnlockRelation() is unnecessary fundamentally. Mine is the simplest way to achieve this. If there's no problem,I am glad to remove UnlockRelation() calls. Regards. Hiroshi Inoue Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: >> I propose here that we stop the release of lock before end of transaction. >> I have been suffering from the early release of lock. > If there's no objection,I would change UnlockRelation() to not release > the specified lock except AccessShareLock. After thinking about this some more, I am not convinced that it will buy anything --- and it might possibly break things that work now. The reason I'm not convinced about it is that we cannot apply the "don't release locks till end of transaction" rule perfectly uniformly. You already propose not to treat AccessShareLock that way, and Vadim seems to think there will be other cases where we need to break the rule. So we won't have a theoretically-clean situation anyway, and will have to look at things case by case. Can you give specific examples of cases that will be fixed? For the most part I believe that we effectively protect updates to system-table rows by holding AccessExclusiveLock on the associated user relation. Locking the system table is just a means of preventing VACUUM from running concurrently on the system table (and possibly moving the tuple we want to update/delete). So I think releasing the system-table lock is OK as long as we hold the user table lock till end of transaction. VACUUM works fine with uncommitted tuples --- maybe we should turn off its warning about them, at least in system relations? There might be places where we are failing to hold user table locks long enough, but those are just localized bugs and ought to be treated that way. In any case, I do not think it's a good idea to put such a quick hack in UnlockRelation(). UnlockRelation() should do what it's told. If we want to do this, we should go around and change the heap_close() calls to specify NoLock instead of whatever locks they specify now. regards, tom lane
> -----Original Message----- > From: owner-pgsql-hackers@postgresql.org > [mailto:owner-pgsql-hackers@postgresql.org]On Behalf Of Tom Lane > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > >> I propose here that we stop the release of lock before end of > transaction. > >> I have been suffering from the early release of lock. > > > If there's no objection,I would change UnlockRelation() to not release > > the specified lock except AccessShareLock. > > After thinking about this some more, I am not convinced that it will buy > anything --- and it might possibly break things that work now. The > reason I'm not convinced about it is that we cannot apply the "don't > release locks till end of transaction" rule perfectly uniformly. You Why are we allowed to break 2 phase locking for system tables ? Isn't 2 phase locking a fundamental principle to preserve consistency ? If there's another principle to rely on,please teach me. > already propose not to treat AccessShareLock that way, and Vadim seems It seems to me that AccessShare(Exclusive)Lock is essentailly for VACUUM. There's a possibilty to remove AccessExclusiveLock except for VACUUM. Oracle has neither. If AccessExclusiveLock is limited to VACUUM,AccessShareLock could be hold until end of transaction. > to think there will be other cases where we need to break the rule. > So we won't have a theoretically-clean situation anyway, and will have > to look at things case by case. > OK case by case. I will be glad to check them one by one. In fact,I have already excluded LockPage() because it is not used for transaction control. But I have thought that the main purpose of early release of lock is to avoid long term lock for system tables. Have I misunderstood until now ? > Can you give specific examples of cases that will be fixed? > Unfortunately no example now. > For the most part I believe that we effectively protect updates to > system-table rows by holding AccessExclusiveLock on the associated user There are system-table rows which don't have the associated user relations and there are many DDL commands except VACUUM. We have to preserve consistency for system tuples among themselves, don't we ? > relation. Locking the system table is just a means of preventing VACUUM > from running concurrently on the system table (and possibly moving the > tuple we want to update/delete). So I think releasing the system-table > lock is OK as long as we hold the user table lock till end of This is needed for DDL command inside transactions,isn't it ? But isn't walking on such a tightrope wasteful in order to realize DDL command inside transactions either ? Anyway,I want a decision here. I have already done a wasteful work in current spec about "can neither drop nor create" bug. Regards. Hiroshi Inoue Inoue@tpf.co.jp
> > -----Original Message----- > > From: owner-pgsql-hackers@postgresql.org > > [mailto:owner-pgsql-hackers@postgresql.org]On Behalf Of Tom Lane > > > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > >> I propose here that we stop the release of lock before end of > > transaction. > > >> I have been suffering from the early release of lock. > > > > > If there's no objection,I would change UnlockRelation() to > not release > > > the specified lock except AccessShareLock. > > > > After thinking about this some more, I am not convinced that it will buy > > anything --- and it might possibly break things that work now. The > > reason I'm not convinced about it is that we cannot apply the "don't > > release locks till end of transaction" rule perfectly uniformly. You > > Why are we allowed to break 2 phase locking for system tables ? > Isn't 2 phase locking a fundamental principle to preserve consistency ? > If there's another principle to rely on,please teach me. > > > already propose not to treat AccessShareLock that way, and Vadim seems > > to think there will be other cases where we need to break the rule. > > So we won't have a theoretically-clean situation anyway, and will have > > to look at things case by case. > > > > OK case by case. I will be glad to check them one by one. I'm checking them for AccessExclusiveLock now. As for RowExclusiveLock,it would be much effective to remove the release of lock unconditionally. RowExclusiveLock isn't so intensive a lock. For example it doesn't conflict relatively. Therefore it would be hard to find and resolve the bugs which are caused by the early release of RowExclusiveLock,deadlock etc ... Holding the lock a little longer won't be so harmful inversely. Comments ? As for AccessExclusiveLock I found followings now. 1) commands/user.c(CREATE/DROP/ALTER USER) I could create same 2 users easily. The lock should be held till end of trancaction. 2) commands/cluster.c(CLUSTER) It isn't properly implemented. It seems almost impossible to implement CLUSTER commandproperly in current spec. 3) commands/dbcommands.c(DROP DATABASE) elog(ERROR) follows immediately. The release should be removed. 4) commands/sequence.c(CREATE SEQUNECE) The lock is for the being created (sequence) relation. Holding the lock tillend of transaction has no problem. 5) commands/vacuum.c(VACUUM) The release is caused by new security check. Probably the check could be done before acquiringAccessExcl- usiveLock. 6) commands/commands.c(ALTER TABLE) ALTER TABLE doesn't release AccessExclusiveLock till end of transaction. I couldn'tfind any reason to allow the release for inheritors of a relation class. The release should be removed. 7) commands/async.c(LISTEN/UNLISTEN) This case seems dangerous too but unfortunately I couldn't point out a concreteflaw now. Regards. Hiroshi Inoue Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: >> OK case by case. I will be glad to check them one by one. > I'm checking them for AccessExclusiveLock now. > As for RowExclusiveLock,it would be much effective to remove > the release of lock unconditionally. > RowExclusiveLock isn't so intensive a lock. For example it > doesn't conflict relatively. Therefore it would be hard to find > and resolve the bugs which are caused by the early release > of RowExclusiveLock,deadlock etc ... > Holding the lock a little longer won't be so harmful inversely. We could try it and see, certainly. You are probably right that it would not be harmful to hold it. > As for AccessExclusiveLock I found followings now. > 3) commands/dbcommands.c(DROP DATABASE) > elog(ERROR) follows immediately. The release should > be removed. > 5) commands/vacuum.c(VACUUM) > The release is caused by new security check. Probably > the check could be done before acquiring AccessExcl- > usiveLock. In both of these cases, we are closing the system table unmodified, and AFAICT the point is just to release the lock a tad sooner than we otherwise would. (I coded the VACUUM code just like code I'd seen elsewhere.) It's probably harmless either way. > 6) commands/commands.c(ALTER TABLE) > ALTER TABLE doesn't release AccessExclusiveLock till > end of transaction. I couldn't find any reason to allow > the release for inheritors of a relation class. The release > should be removed. Yes, the recursive subroutine will grab the same lock and not release it. I'm not sure why it's coded the way it is, but certainly there's no benefit to releasing the lock earlier at the outer level. > 7) commands/async.c(LISTEN/UNLISTEN) > This case seems dangerous too but unfortunately I couldn't > point out a concrete flaw now. Holding the lock on pg_listener longer than absolutely necessary strikes me as very risky, since any other backend might need to grab the lock before it can complete its own transaction (in order to send or receive notifies). Deadlock could ensue depending on what other locks are held. I think the locking logic for pg_listener was last revised for 6.4 (by me). It seems to work fine as-is, but it doesn't know anything about MVCC. It is possible that we could downgrade the locks to RowExclusiveLock and rely on MVCC semantics instead of a hard lock. This would require careful study however, in particular to be sure that cross-backend notifies couldn't be missed because of not-yet-committed tuples. I haven't had time to think about it... regards, tom lane
> -----Original Message----- > From: owner-pgsql-hackers@postgreSQL.org > [mailto:owner-pgsql-hackers@postgreSQL.org]On Behalf Of Tom Lane > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > >> OK case by case. I will be glad to check them one by one. > > > I'm checking them for AccessExclusiveLock now. > > > 7) commands/async.c(LISTEN/UNLISTEN) > > This case seems dangerous too but unfortunately I couldn't > > point out a concrete flaw now. > > Holding the lock on pg_listener longer than absolutely necessary strikes > me as very risky, since any other backend might need to grab the lock > before it can complete its own transaction (in order to send or receive > notifies). Deadlock could ensue depending on what other locks are held. > It's difficult for me to find a flaw for this case. There aren't so many conflicts. For example,LISTEN/UNLISTEN never conflict relatively because they could be issued only for its own backend. And as you say,it's very bad to hold the lock till end of transaction in LISTEN/UNLISTEN. Row level locking in MVCC may allow another(RowExclusiveLock?) lock instead of AccessExclusiveLock. I'm not sure now. Regards. Hiroshi Inoue Inoue@tpf.co.jp
is there a PostgreSQL in win98 or win95?
Chris Ian Capon Fiel wrote: > > is there a PostgreSQL in win98 or win95? http://www.postgresql.org/docs/pgsql/doc/README.NT The NT port will, AFAIK, run on any Win32 implementation, as long as you have the Cygwin stuff loaded (talked about in the README.NT file....). -- Lamar Owen WGCR Internet Radio 1 Peter 4:11