Обсуждение: lock level for DETACH PARTITION looks sketchy
While working on the problem of lowering lock levels for ATTACH PARTITION and DETACH PARTITION, I discovered that DETACH PARTITION takes only AccessShareLock on the table being detached, which I think is not good. It seems to me that at a minimum it needs to take a self-exclusive lock, because otherwise you can get things like this, which we ordinarily try to avoid: S1: BEGIN; S1: ALTER TABLE tab DETACH PARTITION tab2; S2: alter table tab2 set (fillfactor = 90); S1: COMMIT; S2: ERROR: tuple concurrently updated I'm not 100% certain that ShareUpdateExclusiveLock is actually a strong enough lock level, but I think that the right answer can't be any weaker than that. So I propose to increase the level to that value and back-patch. This looks to be a bug in the original table partitioning commit (f0e44751d7175fa3394da2c8f85e3ceb3cdbfe63). I noticed another thing that looks odd as well. ATExecDetachPartition does this: idx = index_open(idxid, AccessExclusiveLock); IndexSetParentIndex(idx, InvalidOid); update_relispartition(classRel, idxid, false); relation_close(idx, AccessExclusiveLock); That last line is unlike what we do nearly everywhere else in that it releases a lock on a user relation before commit time. I don't know whether that has any bad consequences, but it means that some other backend could obtain a lock on that relation before we've queued any invalidation messages associated with the change, which seems possibly dangerous. A minor nitpick is that the last like there should probably use index_close() to match the index_open() a few lines above. That code happens to be identical to the code for relation_open(), but it's probably better to use the matching function. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2018-Dec-19, Robert Haas wrote: > While working on the problem of lowering lock levels for ATTACH > PARTITION and DETACH PARTITION, I discovered that DETACH PARTITION > takes only AccessShareLock on the table being detached, which I think > is not good. Oh, I remember eyeing that suspiciously, but was too distracted on making the other thing work to realize it was actually wrong :-( I agree that it's wrong. > I noticed another thing that looks odd as well. ATExecDetachPartition > does this: > > idx = index_open(idxid, AccessExclusiveLock); > IndexSetParentIndex(idx, InvalidOid); > update_relispartition(classRel, idxid, false); > relation_close(idx, AccessExclusiveLock); > > That last line is unlike what we do nearly everywhere else in that it > releases a lock on a user relation before commit time. I don't know > whether that has any bad consequences, but it means that some other > backend could obtain a lock on that relation before we've queued any > invalidation messages associated with the change, which seems possibly > dangerous. Oh yeah, my bug -- at some point I had an XXX comment or several about fixing lock levels at release time everywhere, but evidently this one slipped through. It definitely should be NoLock. Will fix. > A minor nitpick is that the last like there should probably use > index_close() to match the index_open() a few lines above. That code > happens to be identical to the code for relation_open(), but it's > probably better to use the matching function. No objection ... will change too. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Wed, Dec 19, 2018 at 2:44 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > Oh, I remember eyeing that suspiciously, but was too distracted on > making the other thing work to realize it was actually wrong :-( > I agree that it's wrong. OK, cool. If you're going to push a fix for the other changes, do you want to do this one too, or should I fix it separately? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2018-Dec-19, Robert Haas wrote: > On Wed, Dec 19, 2018 at 2:44 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > Oh, I remember eyeing that suspiciously, but was too distracted on > > making the other thing work to realize it was actually wrong :-( > > I agree that it's wrong. > > OK, cool. If you're going to push a fix for the other changes, do you > want to do this one too, or should I fix it separately? I can patch that one too, but it's separate -- it goes back to pg10 I think (the other to pg11) -- and let's think about the lock mode for a bit: as far as I can see, ShareUpdateExclusive is enough; the difference with AccessExclusive is that it lets through modes AccessShare, RowShare and RowExclusive. According to mvcc.sgml, that means we're letting SELECT, SELECT FOR SHARE/UPDATE and INS/UPD/DEL in the partition being detached, respectively. All those seem acceptable to me. All DDL is blocked, of course. So right now if you insert via the parent table, detaching the partition would be blocked because we also acquire lock on the parent (and detach acquires AccessExclusive on the parent). But after DETACH CONCURRENTLY, inserting via the parent should let rows to the partition through, because there's no conflicting lock to stop them. Inserting rows to the partition directly would be let through both before and after DETACH CONCURRENTLY. One note about DETACH CONCURRENTLY: detaching the index from parent now uses AccessExclusiveLock (the block I just patched). For concurrent detach that won't work, so we'll need to reduce that lock level too. Not sure offhand if there are any problems with that. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Dec 20, 2018 at 9:29 AM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I can patch that one too, but it's separate -- it goes back to pg10 I > think (the other to pg11) -- and let's think about the lock mode for a > bit: as far as I can see, ShareUpdateExclusive is enough; the difference > with AccessExclusive is that it lets through modes AccessShare, RowShare > and RowExclusive. According to mvcc.sgml, that means we're letting > SELECT, SELECT FOR SHARE/UPDATE and INS/UPD/DEL in the partition being > detached, respectively. All those seem acceptable to me. All DDL is > blocked, of course. One problem about which I thought is the partition check constraint. When we attach, we need to validate that the check constraint holds of every row in the partition, which means that we need to keep new rows from being added while we're attaching. But when we detach, there's no corresponding problem. If we detach a leaf partition, the check constraint just disappears; if we detach an intermediate partitioned table, the check constraint loses some clauses. Either way, there's no way for a row that was previously acceptable to become unacceptable after the detach operation. There is, however, the possibility that the invalidation messages generated by the detach operation might not get processed immediately by other backends, so those backends might continue to enforce the partition constraint for some period of time after it changes. That seems OK. There would be trouble, though, if we freed the old copy of the partition constraint during a relcache rebuild while somebody still had a pointer to it. I'm not sure that can happen, though. > So right now if you insert via the parent table, detaching the partition > would be blocked because we also acquire lock on the parent (and detach > acquires AccessExclusive on the parent). But after DETACH CONCURRENTLY, > inserting via the parent should let rows to the partition through, > because there's no conflicting lock to stop them. Inserting rows to the > partition directly would be let through both before and after DETACH > CONCURRENTLY. Yeah, that's worrisome. Initially, I thought it was flat-out insane, because if somebody modifies that table in some way that makes it unsuitable as an insertion target for the existing stream of tuples -- adding constraints, changing column definitions, attaching to a new parent, dropping -- then we'd be in big trouble. Then I thought that maybe it's OK, because all of those operations take AccessExclusiveLock and therefore even if the detach goes through without a lock, the subsequent change that is needed to create a problem can't. I can't say that I'm 100% convinced that's bulletproof, though. Suppose for example that a single session opens a cursor, then does the detach, then does something else, then tries to read more from the cursor. Now the locks don't conflict, but we're still OK (I think) because those operations should all CheckTableNotInUse(). But if we try to lower the lock level for some other things in the future, we might open up other problems here, and it's unclear to me what guiding principles we can rely on here to keep us out of trouble. > One note about DETACH CONCURRENTLY: detaching the index from parent now > uses AccessExclusiveLock (the block I just patched). For concurrent > detach that won't work, so we'll need to reduce that lock level too. Not > sure offhand if there are any problems with that. Me neither. I'm pretty sure that you need at least ShareUpdateExclusiveLock, but I don't know whether you need AccessExclusiveLock. Here again, the biggest risk I can see is the sinval race - nothing forces the invalidation messages to be read before opening the relation for read or write. I think it would be nice if it ended up that the lock level is the same for the table and the indexes, because otherwise the legal sequences of operations vary depending on whether you've got a table with inherited indexes or one without, and that multiplies the difficulty of testing. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2018-Dec-20, Robert Haas wrote: > One problem about which I thought is the partition check constraint. > When we attach, we need to validate that the check constraint holds of > every row in the partition, which means that we need to keep new rows > from being added while we're attaching. But when we detach, there's > no corresponding problem. If we detach a leaf partition, the check > constraint just disappears; if we detach an intermediate partitioned > table, the check constraint loses some clauses. Either way, there's > no way for a row that was previously acceptable to become unacceptable > after the detach operation. There is, however, the possibility that > the invalidation messages generated by the detach operation might not > get processed immediately by other backends, so those backends might > continue to enforce the partition constraint for some period of time > after it changes. That seems OK. Check. > There would be trouble, though, if > we freed the old copy of the partition constraint during a relcache > rebuild while somebody still had a pointer to it. I'm not sure that > can happen, though. (Reading the pg10 source) AFAICS the partition constraint is stored in resultRelInfo->ri_PartitionCheck, not affected by relcache invals, so it seems fine. We also read a copy of it at plan time for constraint exclusion purposes, similarly not affected by relcache inval. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018-Dec-19, Robert Haas wrote: > On Wed, Dec 19, 2018 at 2:44 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > Oh, I remember eyeing that suspiciously, but was too distracted on > > making the other thing work to realize it was actually wrong :-( > > I agree that it's wrong. > > OK, cool. If you're going to push a fix for the other changes, do you > want to do this one too, or should I fix it separately? Pushed now. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Dec 20, 2018 at 2:45 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > OK, cool. If you're going to push a fix for the other changes, do you > > want to do this one too, or should I fix it separately? > > Pushed now. Thanks. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
I think what prompted the lock to be AccessShareLock for the child rel in the first place is the fact that ATExecDropInherit() (ALTER TABLE NO INHERIT) uses AccessShare for the *parent* relation. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On Thu, Dec 20, 2018 at 3:58 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > I think what prompted the lock to be AccessShareLock for the child rel > in the first place is the fact that ATExecDropInherit() (ALTER TABLE NO > INHERIT) uses AccessShare for the *parent* relation. Seems like apples and oranges, and also maybe not that safe. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2018-Dec-20, Robert Haas wrote: > On Thu, Dec 20, 2018 at 3:58 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: > > I think what prompted the lock to be AccessShareLock for the child rel > > in the first place is the fact that ATExecDropInherit() (ALTER TABLE NO > > INHERIT) uses AccessShare for the *parent* relation. > > Seems like apples and oranges, Yes, but I can understand someone looking at ATExecDropInherit while writing ATExecDetachRelation and thinking "ah, I have to grab AccessShareLock on the other relation" without stopping to think in what direction the parenthood between the rels goes. > and also maybe not that safe. I think it's strange, but I'm not interested in analyzing that at this time. Its comment do say that DROP TABLE (of the child, I surmise) does not acquire *any* lock on the parent, which is also strange. -- Álvaro Herrera https://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
On 2018/12/21 6:07, Alvaro Herrera wrote: > On 2018-Dec-20, Robert Haas wrote: > >> On Thu, Dec 20, 2018 at 3:58 PM Alvaro Herrera <alvherre@2ndquadrant.com> wrote: >>> I think what prompted the lock to be AccessShareLock for the child rel >>> in the first place is the fact that ATExecDropInherit() (ALTER TABLE NO >>> INHERIT) uses AccessShare for the *parent* relation. >> >> Seems like apples and oranges, > > Yes, but I can understand someone looking at ATExecDropInherit while > writing ATExecDetachRelation and thinking "ah, I have to grab > AccessShareLock on the other relation" without stopping to think in what > direction the parenthood between the rels goes. That was definitely wrong. Partition's schema changes when it's detached, whereas a (regular) inheritance parent's doesn't when one of its children is removed. >> and also maybe not that safe. > > I think it's strange, but I'm not interested in analyzing that at this > time. Its comment do say that DROP TABLE (of the child, I surmise) does > not acquire *any* lock on the parent, which is also strange. Per comments in ATExecDropInherit, the reason we lock parent with AccessShareLock in the DROP INHERIT case is that RemoveInheritance inspects parent's schema to see which of the child's columns to mark as having one less parent. DROP TABLE child doesn't need to do that as you can obviously imagine why. Thank you both for working on this. Thanks, Amit