Обсуждение: CHECK_FOR_INTERRUPTS in spgdoinsert() isn't helpful
One of the annoying things about the SPGiST bug Teodor pointed out is that once you're in the infinite loop, query cancel doesn't work to get you out of it. There are a couple of CHECK_FOR_INTERRUPTS() calls in spgdoinsert() that are meant to get you out of exactly this sort of buggy-opclass situation --- but they don't help. The reason is that when we reach that point we're holding one or more buffer locks, which means the HOLD_INTERRUPTS count is positive, so ProcessInterrupts is afraid to do anything. In point of fact, we'd be happy to give up the locks and then throw the error. So I was thinking about inventing some macro or out-of-line function that went about like this: if (InterruptPending && (QueryCancelPending || ProcDiePending)){ LWLockReleaseAll(); ProcessInterrupts(); elog(ERROR,"ProcessInterrupts failed to throw an error");} where we don't expect to reach the final elog; it's just there to make sure we don't return control after yanking locks out from under the caller. You might object that this is risky since it might release locks other than the buffer locks spgdoinsert() is specifically concerned with. However, if spgdoinsert() were to throw an error for a reason other than query cancel, any such locks would get released anyway as the first step during error cleanup, so I think this is not a real concern. There may be other places in the index AMs or other low-level code where this'd be appropriate; I've not really looked. Thoughts? regards, tom lane
On 05/29/2014 11:25 PM, Tom Lane wrote: > One of the annoying things about the SPGiST bug Teodor pointed out is > that once you're in the infinite loop, query cancel doesn't work to get > you out of it. There are a couple of CHECK_FOR_INTERRUPTS() calls in > spgdoinsert() that are meant to get you out of exactly this sort of > buggy-opclass situation --- but they don't help. The reason is that > when we reach that point we're holding one or more buffer locks, which > means the HOLD_INTERRUPTS count is positive, so ProcessInterrupts is > afraid to do anything. > > In point of fact, we'd be happy to give up the locks and then throw > the error. So I was thinking about inventing some macro or out-of-line > function that went about like this: > > if (InterruptPending && (QueryCancelPending || ProcDiePending)) > { > LWLockReleaseAll(); > ProcessInterrupts(); > elog(ERROR, "ProcessInterrupts failed to throw an error"); > } > > where we don't expect to reach the final elog; it's just there to make > sure we don't return control after yanking locks out from under the > caller. Also checking that CritSectionCount == 0 seems like a good idea... - Heikki
On Thu, May 29, 2014 at 1:34 PM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote: > Also checking that CritSectionCount == 0 seems like a good idea... What do you do if it isn't? -- Peter Geoghegan
Heikki Linnakangas <hlinnakangas@vmware.com> writes: > On 05/29/2014 11:25 PM, Tom Lane wrote: >> In point of fact, we'd be happy to give up the locks and then throw >> the error. So I was thinking about inventing some macro or out-of-line >> function that went about like this: >> >> if (InterruptPending && (QueryCancelPending || ProcDiePending)) >> { >> LWLockReleaseAll(); >> ProcessInterrupts(); >> elog(ERROR, "ProcessInterrupts failed to throw an error"); >> } > Also checking that CritSectionCount == 0 seems like a good idea... Yeah, and there may be a couple other details like that. Right now I'm just thinking about not allowing LWLocks to block the cancel. I guess something else to consider is whether InterruptHoldoffCount could be larger than the number of held LWLocks; if so, that would prevent this from working as desired. regards, tom lane
Peter Geoghegan <pg@heroku.com> writes: > On Thu, May 29, 2014 at 1:34 PM, Heikki Linnakangas > <hlinnakangas@vmware.com> wrote: >> Also checking that CritSectionCount == 0 seems like a good idea... > What do you do if it isn't? Not throw the error. (If we did, it'd turn into a PANIC, which doesn't seem like what we want.) regards, tom lane
On Thu, May 29, 2014 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Not throw the error. (If we did, it'd turn into a PANIC, which doesn't > seem like what we want.) Of course, but it isn't obvious to me that that isn't what we'd want, if the alternative is definitely that we'd be stuck in an infinite uninterruptible loop (I guess we couldn't just avoid throwing the error, thereby risking proceeding without locks -- and so I guess we can basically do nothing like this in critical sections). Now, that probably isn't the actual choice that we must face, but it also isn't obvious to me how you can do better than not throwing the error (and doing nothing extra) when in a critical section. That was the intent of my original question. -- Peter Geoghegan
Peter Geoghegan <pg@heroku.com> writes: > On Thu, May 29, 2014 at 1:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Not throw the error. (If we did, it'd turn into a PANIC, which doesn't >> seem like what we want.) > Of course, but it isn't obvious to me that that isn't what we'd want, > if the alternative is definitely that we'd be stuck in an infinite > uninterruptible loop (I guess we couldn't just avoid throwing the > error, thereby risking proceeding without locks -- and so I guess we > can basically do nothing like this in critical sections). Now, that > probably isn't the actual choice that we must face, but it also isn't > obvious to me how you can do better than not throwing the error (and > doing nothing extra) when in a critical section. That was the intent > of my original question. I don't feel a need for special behavior for this inside critical sections; a critical section that lasts long enough for query cancel response to be problematic is probably already broken by design. The other case of InterruptHoldoffCount being too high might be more relevant. I have not gone around and looked at retail uses of HOLD_INTERRUPTS to see whether there are any that might be holding counts for long periods. But I have checked that for the case in spgdoinsert, the only thing staying ProcessInterrupts' hand is the holdoff count associated with the buffer lock on the current index page. regards, tom lane