Обсуждение: improving foreign key locks
Hi, So I've been working on improving locks for foreign key checks, as discussed in a thread started by Joel Jacobson a while ago. I've posted about this: http://www.commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks/ http://www.commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks_part_2/ (Note [1] below). There's a question that arose in internal CMD discussion, which is: is there an use case for keeping SELECT FOR SHARE locks, with the semantics that they currently have, if we replace the FK checks with some other lock implementation? I've been assuming that we would keep FOR SHARE, because it's a feature that's been exposed to users directly as a SQL command for five releases, and so presumably someone may be depending on it. So the new code would be triggered by a different SQL option, and it needs to work in conjunction with FOR SHARE. Now, if the majority opinion here is that we don't need to keep the current FOR SHARE semantics, a few things would be different. Thoughts on keeping vs. removing FOR SHARE? I will be posting more extensively on the implementation of this on this list, later. [1] The blog posts says that FOR SHARE would conflict with FOR KEY LOCK, but I'm having second thoughts about this for various reasons; so they will not conflict (in other words, transaction A can take a FOR SHARE lock in a tuple, and transaction B can take FOR KEY LOCK, and they both can continue). Please consider this if you want to comment on the design presented in those articles. -- Álvaro Herrera <alvherre@alvh.no-ip.org>
On Nov25, 2010, at 23:01 , Alvaro Herrera wrote: > So I've been working on improving locks for foreign key checks, as > discussed in a thread started by Joel Jacobson a while ago. I've posted > about this: > http://www.commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks/ > http://www.commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks_part_2/ To me, the whole thing seems to be special case of allowing to not only lock whole tuples FOR UPDATE or FOR SHARE, but alsoindividual fields or sets of fields. Except that for simplicity, only two sets are supported, which are A) All fieldsB) All fields which are included in some unique constraint, including primary keys. I'd therefore suggest to extend the FOR SHARE / FOR UPDATE syntax to be SELECT FOR { SHARE | UPDATE } [ OF <table1>[.<field1>],... ] and obtain what you call a "KEY LOCK" if (for a particular table) the set of fields is a subset of (B). Otherwise, we'd obtaina full SHARE lock. Thus we'd always lock at least the fields the user told us to, but sometimes more than those, forthe sake of a more efficient implementation. This leads quite naturally to the following behaviour regarding lock conflicts .) An UPDATE conflicts with a SHARE or UPDATE lock if the update touches fields locked by the SHARE or UPDATE lock. Thus,in case (A) they always conflict while in case (B) they only conflict if the UPDATE touches a field contained in someunique index .) A DELETE conflicts always conflicts with a SHARE or UPDATE lock since it, in a way, touches all fields .) A UPDATE lock always conflicts with a SHARE lock, since there are no two non-overlapping sets of fields that we couldlock. .) SHARE locks never conflict with one another. best regards, Florian Pflug
Excerpts from Florian Pflug's message of vie nov 26 10:48:39 -0300 2010: > On Nov25, 2010, at 23:01 , Alvaro Herrera wrote: > > So I've been working on improving locks for foreign key checks, as > > discussed in a thread started by Joel Jacobson a while ago. I've posted > > about this: > > http://www.commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks/ > > http://www.commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks_part_2/ > > To me, the whole thing seems to be special case of allowing to not only lock whole tuples FOR UPDATE or FOR SHARE, butalso individual fields or sets of fields. Except that for simplicity, only two sets are supported, which are > A) All fields > B) All fields which are included in some unique constraint, including primary keys. > > I'd therefore suggest to extend the FOR SHARE / FOR UPDATE syntax to be > SELECT FOR { SHARE | UPDATE } [ OF <table1>[.<field1>], ... ] > and obtain what you call a "KEY LOCK" if (for a particular table) the set of fields is a subset of (B). Otherwise, we'dobtain a full SHARE lock. Thus we'd always lock at least the fields the user told us to, but sometimes more than those,for the sake of a more efficient implementation. The problem with this idea is that it's not possible to implement it. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Nov26, 2010, at 21:06 , Alvaro Herrera wrote: > Excerpts from Florian Pflug's message of vie nov 26 10:48:39 -0300 2010: >> On Nov25, 2010, at 23:01 , Alvaro Herrera wrote: >>> So I've been working on improving locks for foreign key checks, as >>> discussed in a thread started by Joel Jacobson a while ago. I've posted >>> about this: >>> http://www.commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks/ >>> http://www.commandprompt.com/blogs/alvaro_herrera/2010/11/fixing_foreign_key_deadlocks_part_2/ >> >> To me, the whole thing seems to be special case of allowing to not only lock whole tuples FOR UPDATE or FOR SHARE, butalso individual fields or sets of fields. Except that for simplicity, only two sets are supported, which are >> A) All fields >> B) All fields which are included in some unique constraint, including primary keys. >> >> I'd therefore suggest to extend the FOR SHARE / FOR UPDATE syntax to be >> SELECT FOR { SHARE | UPDATE } [ OF <table1>[.<field1>], ... ] >> and obtain what you call a "KEY LOCK" if (for a particular table) the set of fields is a subset of (B). Otherwise, we'dobtain a full SHARE lock. Thus we'd always lock at least the fields the user told us to, but sometimes more than those,for the sake of a more efficient implementation. > > The problem with this idea is that it's not possible to implement it. How so? The implementation you proposed in your blog should work fine for this. XMAX_KEY_LOCK would signal that only fieldsfrom set (B) are locked, while XMAX_SHARE_LOCK (or however thats called, didn't check the code) would signal that allfields are locked. These are the only two field sets that we'd support, and for any set of columns the user specifiedwe'd pick the smallest superset of the set we *do* support and use that (Thus, we obtain a key lock if only fieldsfrom a unique index where specified, and a share lock otherwise). The only difference is that instead of presenting this to the user as an entirely new lock type, we instead present it asa generalization of SHARE locks. The advantage being that *if* we ever figure out a way to support more fine-grained lockingof fields, (say, locking only the fields contain in some *specific* index, maybe by storing locking the index tuple),we can do so completely transparent to the user. greetings, Florian Pflug
Excerpts from Florian Pflug's message of sáb nov 27 01:29:39 -0300 2010: > On Nov26, 2010, at 21:06 , Alvaro Herrera wrote: > > The problem with this idea is that it's not possible to implement it. > > How so? The implementation you proposed in your blog should work fine for this. XMAX_KEY_LOCK would signal that only fieldsfrom set (B) are locked, while XMAX_SHARE_LOCK (or however thats called, didn't check the code) would signal that allfields are locked. These are the only two field sets that we'd support, and for any set of columns the user specifiedwe'd pick the smallest superset of the set we *do* support and use that (Thus, we obtain a key lock if only fieldsfrom a unique index where specified, and a share lock otherwise). > > The only difference is that instead of presenting this to the user as an entirely new lock type, we instead present itas a generalization of SHARE locks. The advantage being that *if* we ever figure out a way to support more fine-grainedlocking of fields, (say, locking only the fields contain in some *specific* index, maybe by storing locking theindex tuple), we can do so completely transparent to the user. Oh, I see. Yeah, perhaps this could work. I'll have a look at both ends. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Excerpts from Florian Pflug's message of vie nov 26 10:48:39 -0300 2010: > To me, the whole thing seems to be special case of allowing to not only lock whole tuples FOR UPDATE or FOR SHARE, butalso individual fields or sets of fields. Except that for simplicity, only two sets are supported, which are > A) All fields > B) All fields which are included in some unique constraint, including primary keys. > > I'd therefore suggest to extend the FOR SHARE / FOR UPDATE syntax to be > SELECT FOR { SHARE | UPDATE } [ OF <table1>[.<field1>], ... ] > and obtain what you call a "KEY LOCK" if (for a particular table) the set of fields is a subset of (B). Otherwise, we'dobtain a full SHARE lock. Thus we'd always lock at least the fields the user told us to, but sometimes more than those,for the sake of a more efficient implementation. This would require some additions in ri_FetchConstraintInfo(). Right now it does a single syscache lookup and then extracts a bunch of attributes from there. If we're going to implement as you suggest, we'd have to: 1. add a relcache lookup in there, and extract column names involved in the FK. 2. store those column names in RI_ConstraintInfo; currently it's about 68 bytes, it'd grow to ~2116 bytes (current size plus RI_MAX_NUMKEYS * NAMEDATALEN). Additionally, we'd have to expend some more cycles at the parse analysis phase (of the "FOR SHARE OF x.col1, x.col2" query) to verify that those columns belong into some non-partial unique index. Is the performance gain sufficient to pay these costs? -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Excerpts from Alvaro Herrera's message of lun nov 29 18:00:55 -0300 2010: > This would require some additions in ri_FetchConstraintInfo(). Right > now it does a single syscache lookup and then extracts a bunch of > attributes from there. If we're going to implement as you suggest, we'd > have to: > > 1. add a relcache lookup in there, and extract column names involved in > the FK. > > 2. store those column names in RI_ConstraintInfo; currently it's about > 68 bytes, it'd grow to ~2116 bytes (current size plus RI_MAX_NUMKEYS * NAMEDATALEN). > > Additionally, we'd have to expend some more cycles at the parse analysis > phase (of the "FOR SHARE OF x.col1, x.col2" query) to verify that those > columns belong into some non-partial unique index. Hmm, actually there's already a relcache lookup when we execute whatever action the FK needs to execute, so maybe we don't need to do any of this. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Alvaro Herrera <alvherre@commandprompt.com> writes: > Excerpts from Alvaro Herrera's message of lun nov 29 18:00:55 -0300 2010: >> Additionally, we'd have to expend some more cycles at the parse analysis >> phase (of the "FOR SHARE OF x.col1, x.col2" query) to verify that those >> columns belong into some non-partial unique index. > Hmm, actually there's already a relcache lookup when we execute whatever > action the FK needs to execute, so maybe we don't need to do any of > this. Checking for existence of a unique index at parse analysis time is quite horrid anyway, because it means the validity of the query can change from parse time to execution time. We got stuck with some of that in relation to GROUP BY dependencies, but I don't want to buy into it anywhere that we're not forced to by the letter of the SQL spec. regards, tom lane
Excerpts from Tom Lane's message of lun nov 29 18:33:19 -0300 2010: > Alvaro Herrera <alvherre@commandprompt.com> writes: > > Excerpts from Alvaro Herrera's message of lun nov 29 18:00:55 -0300 2010: > >> Additionally, we'd have to expend some more cycles at the parse analysis > >> phase (of the "FOR SHARE OF x.col1, x.col2" query) to verify that those > >> columns belong into some non-partial unique index. > Checking for existence of a unique index at parse analysis time is quite > horrid anyway, because it means the validity of the query can change > from parse time to execution time. We got stuck with some of that in > relation to GROUP BY dependencies, but I don't want to buy into it > anywhere that we're not forced to by the letter of the SQL spec. Hmm. Are you less opposed to the idea of some new nonstandard syntax in the locking clause, then? I propose "SELECT FOR KEY LOCK", though anything that the RI code could use would be the same to me, of course. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
On Nov29, 2010, at 22:33 , Tom Lane wrote: > Alvaro Herrera <alvherre@commandprompt.com> writes: >> Excerpts from Alvaro Herrera's message of lun nov 29 18:00:55 -0300 2010: >>> Additionally, we'd have to expend some more cycles at the parse analysis >>> phase (of the "FOR SHARE OF x.col1, x.col2" query) to verify that those >>> columns belong into some non-partial unique index. > >> Hmm, actually there's already a relcache lookup when we execute whatever >> action the FK needs to execute, so maybe we don't need to do any of >> this. > > Checking for existence of a unique index at parse analysis time is quite > horrid anyway, because it means the validity of the query can change > from parse time to execution time. We got stuck with some of that in > relation to GROUP BY dependencies, but I don't want to buy into it > anywhere that we're not forced to by the letter of the SQL spec. The validity wouldn't change, only the kind of lock taken. If all columns to be locked are part of some unique index, we'drecord that fact in the locked tuple's infomask, and thus know that only a certain subset of columns are to be preventedfrom being updated. So I figured we could do the check while executing the query, probably when we're about to lock the first tuple. The resultcould be cached then, so we'd only incur the overhead once. I'd also expect the overhead to be pretty small comparedto the subsequent update of the tuple's xmax and the IO that this entails. best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > The validity wouldn't change, only the kind of lock taken. If all columns to be locked are part of some unique index, we'drecord that fact in the locked tuple's infomask, and thus know that only a certain subset of columns are to be preventedfrom being updated. There's not enough space in the infomask to record which columns (or which unique index) are involved. And if you're talking about data that could remain on disk long after the unique index is gone, that's not going to be good enough. regards, tom lane
On Dec1, 2010, at 17:17 , Tom Lane wrote: > Florian Pflug <fgp@phlo.org> writes: >> The validity wouldn't change, only the kind of lock taken. If all columns to be locked are part of some unique index,we'd record that fact in the locked tuple's infomask, and thus know that only a certain subset of columns are to beprevented from being updated. > > There's not enough space in the infomask to record which columns (or > which unique index) are involved. And if you're talking about data that > could remain on disk long after the unique index is gone, that's not > going to be good enough. We'd distinguish two cases A) The set of locked columns is a subset of the set of columns mentioned in *any* unique index.(In other words, for every locked column there is a unique index which includes that column, though not necessarilyone index which includes them all) B) The set of locked columns does not satisfy (A) We'd mark case (A) by a bit in the infomask (say, HEAP_XMAX_SHARED_LOCK_KEYONLY) when SHARE locking the row. An UPDATE onsuch a SHARE locked row would be allowed despite the lock if it only changed columns not mentioned by any unique index. Creating indices shouldn't pose a problem, since it would only enlarge the set of locked columns for rows with HEAP_XMAX_SHARED_LOCK_KEYset. Dropping an index requires some care, since it retroactively reduces the set of locked columnsfor pre-existing HEAP_XMAX_SHARED_LOCK_KEY locks. A possible solution for that might be to disregard HEAP_XMAX_SHARED_LOCK_KEYONLY,thus treating the row as fully share-locked, if a unique index was recently dropped. Assumingone can find a suitable definition of "recently" in this context, of course. Alvaro's initial proposal suffers fromessentially the same problem I believe. It is somewhat mitigated though by the fact that he expects KEY locks to onlybe used by FOREIGN KEYS. Thus, if a unique index is dropped it either wasn't used by any FK constraint, in which casethe retroactive reduction of lock strength doesn't matter, or the FK constraint must haven been dropped first, in whichcase the importance of enforcing the FK is somewhat diminished. best regards, Florian Pflug
Florian Pflug <fgp@phlo.org> writes: > On Dec1, 2010, at 17:17 , Tom Lane wrote: >> There's not enough space in the infomask to record which columns (or >> which unique index) are involved. And if you're talking about data that >> could remain on disk long after the unique index is gone, that's not >> going to be good enough. > We'd distinguish two cases > A) The set of locked columns is a subset of the set of columns mentioned in > *any* unique index. (In other words, for every locked column there is a > unique index which includes that column, though not necessarily one index > which includes them all) > B) The set of locked columns does not satisfy (A) How's that fix it? The on-disk flags are still falsifiable by subsequent index changes. > Creating indices shouldn't pose a problem, since it would only enlarge the set of locked columns for rows with HEAP_XMAX_SHARED_LOCK_KEYset. Not with that definition. I could create a unique index that doesn't contain some column that every previous unique index contained. regards, tom lane
On Dec1, 2010, at 18:44 , Tom Lane wrote: > Florian Pflug <fgp@phlo.org> writes: >> On Dec1, 2010, at 17:17 , Tom Lane wrote: >>> There's not enough space in the infomask to record which columns (or >>> which unique index) are involved. And if you're talking about data that >>> could remain on disk long after the unique index is gone, that's not >>> going to be good enough. > >> We'd distinguish two cases >> A) The set of locked columns is a subset of the set of columns mentioned in >> *any* unique index. (In other words, for every locked column there is a >> unique index which includes that column, though not necessarily one index >> which includes them all) >> B) The set of locked columns does not satisfy (A) > >> Creating indices shouldn't pose a problem, since it would only enlarge the set of locked columns for rows with HEAP_XMAX_SHARED_LOCK_KEYset. > > Not with that definition. I could create a unique index that doesn't > contain some column that every previous unique index contained. Hm, I seem to be unable to put the definition into unambiguous words. Let me try again. Set A contains all columns for which an unique index exist which includes said column. More formally, if C_i is the set ofcolumns included in the index I, then A is the union over all C_i with I ranging over the unique indices. If for example index unique index I1 contains the columns (a,b) and unique index I2 contains the columns (b,c) (and no otherunique indices exist) then the set contains the columns a,b,c. If there is an additional index I3 on (d), then A = {a,b,c,d}.If a row is SHARE locked and HEAP_XMAX_SHARED_LOCK_KEY is set, only these columns are considered to be locked.If the flag is not set, all columns are considered locked. best regards, Florian Pflug
On Dec 1, 2010, at 11:09 AM, Florian Pflug wrote: > An UPDATE on such a SHARE locked row would be allowed despite the lock if it only changed columns not mentioned by anyunique index. On a side-note, by "changed columns" do you mean the column appeared in the UPDATE statement, or the data actually changed?I suspect the former might be easier to implement, but it's really going to fsck with some applications (Rails isone example that comes to mind). -- Jim C. Nasby, Database Architect jim@nasby.net 512.569.9461 (cell) http://jim.nasby.net
On Dec2, 2010, at 00:59 , Jim Nasby wrote: > On Dec 1, 2010, at 11:09 AM, Florian Pflug wrote: >> An UPDATE on such a SHARE locked row would be allowed despite the lock if it only changed columns not mentioned by anyunique index. > > On a side-note, by "changed columns" do you mean the column appeared in the UPDATE statement, or the data actually changed?I suspect the former might be easier to implement, but it's really going to fsck with some applications (Rails isone example that comes to mind). The most sensible thing to do is probably to make it mean "columns whose new value's binary representation differs from theold value's binary representation". That is also what is checked for HOT updated I believe, though I didn't recheck... best regards, Florian Pflug
Excerpts from Tom Lane's message of mié dic 01 14:44:03 -0300 2010: > Florian Pflug <fgp@phlo.org> writes: > > On Dec1, 2010, at 17:17 , Tom Lane wrote: > >> There's not enough space in the infomask to record which columns (or > >> which unique index) are involved. And if you're talking about data that > >> could remain on disk long after the unique index is gone, that's not > >> going to be good enough. > > > We'd distinguish two cases > > A) The set of locked columns is a subset of the set of columns mentioned in > > *any* unique index. (In other words, for every locked column there is a > > unique index which includes that column, though not necessarily one index > > which includes them all) > > B) The set of locked columns does not satisfy (A) > > How's that fix it? The on-disk flags are still falsifiable by > subsequent index changes. > > > Creating indices shouldn't pose a problem, since it would only enlarge the set of locked columns for rows with HEAP_XMAX_SHARED_LOCK_KEYset. > > Not with that definition. I could create a unique index that doesn't > contain some column that every previous unique index contained. This is not a problem, because a later UPDATE that tried to modify a locked tuple on a column only indexed by the new index, would consider it locked. Which is OK. There would only be a problem if we were allowed to drop an index (which would reduce the set of locked columns), but we don't allow this because DROP INDEX requires an exclusive lock on the table and thus there cannot be anyone with key-locked tuples in it. AFAICT Florian's definition is good. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Of course, there is a much more serious problem with the whole idea. I have worked through most of the necessary changes and I'm now down to changing heap_udate to comply with the new locking protocol. The problem I'm now facing is that we need to know the set of updated columns pretty early -- just after calling HeapTupleSatisfiesUpdate, in fact, so that we can change a BeingUpdated result into MayBeUpdated if the set of columns is right. However, we don't know the set of updated columns until much later, when the function has already undergone a lot of other work and checked other possible error conditions. Short of resturcturing the function so that we can obtain the set of updated columns early (which I'm not fond of doing and may not even be possible), I'm thinking that we should be "optimistic" about the set of updated columns, and recheck them later. The problem with this idea, of course, is that if the test turns out to fail, we could have possibly caused a lot of work (such as TOAST changes) that now need to be discarded. In other words, the optimization is pessimizing some cases :-( I'm not seeing any other way around this ATM. -- Álvaro Herrera <alvherre@commandprompt.com> The PostgreSQL Company - Command Prompt, Inc. PostgreSQL Replication, Consulting, Custom Development, 24x7 support