Обсуждение: a modest improvement to get_object_address()
I'd like to propose the attached patch, which changes get_object_address() in a manner similar to what we did in RangeVarGetRelid() in commit 4240e429d0c2d889d0cda23c618f94e12c13ade7. The basic idea is that, if we look up an object name, acquire the corresponding lock, and then find that the object was dropped during the lock wait, we retry the whole operation instead of emitting a baffling error message. Example: rhaas=# create schema x; CREATE SCHEMA rhaas=# begin; BEGIN rhaas=# drop schema x; DROP SCHEMA Then, in another session: rhaas=# comment on schema x is 'doodle'; Then, in the first session: rhaas=# commit; COMMIT At this point, the first session must error out. The current code produces this: ERROR: cache lookup failed for class 2615 object 16386 subobj 0 With the attached patch, you instead get: ERROR: schema "x" does not exist ...which is obviously quite a bit nicer. Also, if the concurrent transaction drops and creates the schema instead of just dropping it, the new code will allow the operation to succeed (with the expected results) rather than failing. Objections? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
2011/11/9 Robert Haas <robertmhaas@gmail.com>: > I'd like to propose the attached patch, which changes > get_object_address() in a manner similar to what we did in > RangeVarGetRelid() in commit 4240e429d0c2d889d0cda23c618f94e12c13ade7. > The basic idea is that, if we look up an object name, acquire the > corresponding lock, and then find that the object was dropped during > the lock wait, we retry the whole operation instead of emitting a > baffling error message. Example: > > rhaas=# create schema x; > CREATE SCHEMA > rhaas=# begin; > BEGIN > rhaas=# drop schema x; > DROP SCHEMA > > Then, in another session: > > rhaas=# comment on schema x is 'doodle'; > > Then, in the first session: > > rhaas=# commit; > COMMIT > > At this point, the first session must error out. The current code > produces this: > > ERROR: cache lookup failed for class 2615 object 16386 subobj 0 > > With the attached patch, you instead get: > > ERROR: schema "x" does not exist > > ...which is obviously quite a bit nicer. > > Also, if the concurrent transaction drops and creates the schema > instead of just dropping it, the new code will allow the operation to > succeed (with the expected results) rather than failing. > > Objections? Maybe I miss something but: The ERROR message is misleading: the schema 'x' does exist. And also why a drop schema would fail and a drop+create would success ?! > > -- > Robert Haas > EnterpriseDB: http://www.enterprisedb.com > The Enterprise PostgreSQL Company > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation
On Wed, Nov 9, 2011 at 8:37 AM, Cédric Villemain <cedric.villemain.debian@gmail.com> wrote: > Maybe I miss something but: > The ERROR message is misleading: the schema 'x' does exist. No, it doesn't. The concurrent transaction has dropped it. > And also > why a drop schema would fail and a drop+create would success ?! Because you can't comment on a schema that doesn't exist any more, but you can comment on one that does. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > I'd like to propose the attached patch, which changes > get_object_address() in a manner similar to what we did in > RangeVarGetRelid() in commit 4240e429d0c2d889d0cda23c618f94e12c13ade7. I would think you need to drop the now-useless lock, and I sure hope that RangeVarGetRelid does likewise. regards, tom lane
On Wed, Nov 9, 2011 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> I'd like to propose the attached patch, which changes >> get_object_address() in a manner similar to what we did in >> RangeVarGetRelid() in commit 4240e429d0c2d889d0cda23c618f94e12c13ade7. > > I would think you need to drop the now-useless lock, and I sure hope > that RangeVarGetRelid does likewise. It doesn't currently. The now-useless lock doesn't really hurt anything, aside from taking up space in the lock table. But we can certainly make it (and this) do that, if you think it's worth the extra lines of code. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Wed, Nov 9, 2011 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I would think you need to drop the now-useless lock, and I sure hope >> that RangeVarGetRelid does likewise. > It doesn't currently. The now-useless lock doesn't really hurt > anything, aside from taking up space in the lock table. Well, there are corner cases where the object OID gets reused during the lifetime of the transaction, and then the lock *does* do something (and what it does would be bad). But taking up extra space in the finite-size lock table is sufficient reason IMO to drop the lock. It's not like these are performance-critical code paths. regards, tom lane
2011/11/9 Robert Haas <robertmhaas@gmail.com>: > On Wed, Nov 9, 2011 at 8:37 AM, Cédric Villemain > <cedric.villemain.debian@gmail.com> wrote: >> Maybe I miss something but: I read that the error was produced by first session and didn't check carefuly (it fails silently in 9.0! and 'works' as expected in 9.1) No objection, but I would like to still be able to diagnose the same things as in the past, can you make it clear that the schema/object just disappear ? (else we don't know if the relation just never exists or was drop while we were waiting) -- Cédric Villemain +33 (0)6 20 30 22 52 http://2ndQuadrant.fr/ PostgreSQL: Support 24x7 - Développement, Expertise et Formation
On Wed, Nov 9, 2011 at 10:33 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Nov 9, 2011 at 9:54 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> I would think you need to drop the now-useless lock, and I sure hope >>> that RangeVarGetRelid does likewise. > >> It doesn't currently. The now-useless lock doesn't really hurt >> anything, aside from taking up space in the lock table. > > Well, there are corner cases where the object OID gets reused during > the lifetime of the transaction, and then the lock *does* do something > (and what it does would be bad). But taking up extra space in the > finite-size lock table is sufficient reason IMO to drop the lock. > It's not like these are performance-critical code paths. OK. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On Wed, Nov 9, 2011 at 10:50 AM, Cédric Villemain <cedric.villemain.debian@gmail.com> wrote: > 2011/11/9 Robert Haas <robertmhaas@gmail.com>: >> On Wed, Nov 9, 2011 at 8:37 AM, Cédric Villemain >> <cedric.villemain.debian@gmail.com> wrote: >>> Maybe I miss something but: > > I read that the error was produced by first session and didn't check > carefuly (it fails silently in 9.0! and 'works' as expected in 9.1) > > No objection, but I would like to still be able to diagnose the same > things as in the past, can you make it clear that the schema/object > just disappear ? (else we don't know if the relation just never exists > or was drop while we were waiting) I don't see a clean way to do that, and I'm not convinced it's a good idea anyway. I think that if we start generating different error messages based on whether or not a lock wait was involved at some point in the operation, we're going to drive ourselves nuts. There are a lot of places where that can happen. e.g. Suppose that you have a table with a unique index on column a. Transaction A deletes the tuple where a = 1. Transaction B attempts to insert a new tuple with a = 1, and blocks. Now if A commits, B will succeed, but if A rolls back, B will abort. Had transaction A not existed, B would simply abort at once. But the error message will not indicate which of the two it was, and I don't thinkit needs to. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > e.g. Suppose that you have a table with a unique index on column a. > Transaction A deletes the tuple where a = 1. Transaction B attempts to That's DML, I agree with you there, no need. In DML we have MVCC. Back to the problem you raised, it's DDL and we're sitting in between SnapshotNow and catalog cache entries. Not so comfy. I would guess that the problem (I confess didn't read carefully enough) happens after having done a cache lookup when trying to use its result? Could we check the object still exists as part of the cache lookup, or would that mean we don't have a cache anymore? Or is the answer related to consuming invalidation messages before returning a stale entry from the cache? Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
On Wed, Nov 9, 2011 at 3:40 PM, Dimitri Fontaine <dimitri@2ndquadrant.fr> wrote: > Back to the problem you raised, it's DDL and we're sitting in between > SnapshotNow and catalog cache entries. Not so comfy. I would guess > that the problem (I confess didn't read carefully enough) happens after > having done a cache lookup when trying to use its result? There's a test case in the original post, but yes, the problem happens when something changes between the time you do the catcache lookup and the time you acquire the lock. This is not a new problem; I'm just trying to give a more intelligible error message - and avoid unnecessary failures, as in the case where two concurrent DROP IF EXISTS operations target the same object and one of them unnecessarily rolls back. > Could we check the object still exists as part of the cache lookup, or > would that mean we don't have a cache anymore? Or is the answer related > to consuming invalidation messages before returning a stale entry from > the cache? All of that is way beyond the scope of what I'm doing here. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company