Re: Hot Standy introduced problem with query cancel behavior

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Hot Standy introduced problem with query cancel behavior
Дата
Msg-id 201001071614.55679.andres@anarazel.de
обсуждение исходный текст
Ответ на Re: Hot Standy introduced problem with query cancel behavior  (Joachim Wieland <joe@mcknight.de>)
Список pgsql-hackers
On Thursday 07 January 2010 14:45:55 Joachim Wieland wrote:
> On Thu, Dec 31, 2009 at 6:40 PM, Simon Riggs <simon@2ndquadrant.com> wrote:
> >> Building racy infrastructure when it can be avoided with a little care
> >> still seems not to be the best path to me.
> >
> > Doing that will add more complexity in an area that is hard to test
> > effectively. I think the risk of introducing further bugs while trying
> > to fix this rare condition is high. Right now the conflict processing
> > needs more work and is often much less precise than this, so improving
> > this aspect of it would not be a priority. I've added it to the TODO
> > though. Thank you for your research.
> >
> > Patch implements recovery conflict signalling using SIGUSR1
> > multiplexing, then uses a SessionCancelPending mode similar to Joachim's
> > TransactionCancelPending.
> 
> I have reworked Simon's patch a bit and attach the result.
> 
> Quick facts:
> 
> - Hot Standby only uses SIGUSR1
> - SIGINT behaves as it did before: it only cancels running statements
> - pg_cancel_backend() continues to use SIGINT
> - I added pg_cancel_idle_transaction() to cancel an idle transaction via
>   SIGUSR1
> - One central function HandleCancelAction() sets the flags before calling
>   ProcessInterrupts(), it is called from the different signal handlers and
>   receives parameters about what it should do
> - If a SIGUSR1 reason is used that will cancel something, ProcArrayLock is
>   acquired until the signal has been sent to make sure that we won't signal
>  the wrong backend. Does this sufficiently cover the concerns of Andres
>  Freund discussed upthread?
I think it solves the major concern (which I btw could easily reproduce using 
software that is in production) but unfortunately not completely.
The avoided situation is:

C(Client): BEGIN; SELECT WHATEVER;
S(Standby): conflict with C
S: Starting to cancel C
C: COMMIT
S: Sending Signal to C
C: Wrong transaction is aborted

The situation not avoided is:
C: BEGIN; SELECT ...
S: conflict with C, lock procarray, sending signal(thats asynchronous), unlock 
procarray
C: COMMIT; BEGIN
C: Signal arrives
C: Wrong txn is killled

It should be easy to fix this by having a cancel_localTransactionId field in the 
procarray which gets cleaned uppon transaction/backend start and gets checked 
in the signal handler (should be casted to sig_atomic_t)

Will cookup a patch if nobody speaks against something like that.

Andres


В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Fetter
Дата:
Сообщение: Re: Auto-extending table partitions?
Следующее
От: "Greg Sabino Mullane"
Дата:
Сообщение: Re: Testing with concurrent sessions