Re: Postgres stucks in deadlock detection

Поиск
Список
Период
Сортировка
От Юрий Соколов
Тема Re: Postgres stucks in deadlock detection
Дата
Msg-id CAL-rCA1CVze9Y8uqJTH2vCffCvggcWQO6UQqaJnV9Q60NnJiyQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Postgres stucks in deadlock detection  (Andres Freund <andres@anarazel.de>)
Ответы Re: Postgres stucks in deadlock detection  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Список pgsql-hackers
пт, 13 апр. 2018 г., 21:10 Andres Freund <andres@anarazel.de>:
Hi,

On 2018-04-13 19:13:07 +0300, Konstantin Knizhnik wrote:
> On 13.04.2018 18:41, Andres Freund wrote:
> > On 2018-04-13 16:43:09 +0300, Konstantin Knizhnik wrote:
> > > Updated patch is attached.
> > > + /*
> > > +  * Ensure that only one backend is checking for deadlock.
> > > +  * Otherwise under high load cascade of deadlock timeout expirations can cause stuck of Postgres.
> > > +  */
> > > + if (!pg_atomic_test_set_flag(&ProcGlobal->activeDeadlockCheck))
> > > + {
> > > +         enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
> > > +         return;
> > > + }
> > > + inside_deadlock_check = true;
> > I can't see that ever being accepted.  This means there's absolutely no
> > bound for deadlock checks happening even under light concurrency, even
> > if there's no contention for a large fraction of the time.
>
> It may cause problems only if
> 1. There is large number of active sessions
> 2. They perform deadlock-prone queries (so no attempts to avoid deadlocks at
> application level)
> 3. Deadlock timeout is set to be very small (10 msec?)

That's just not true.


> Otherwise either probability that all backends  once and once again are
> trying to check deadlocks concurrently is very small (and can be even more
> reduced by using random timeout for subsequent deadlock checks), either
> system can not normally function in any case because large number of clients
> fall into deadlock.

Operating systems batch wakeups.


> I completely agree that there are plenty of different approaches, but IMHO
> the currently used strategy is the worst one, because it can stall system
> even if there are not deadlocks at all.


> I always think that deadlock is a programmer's error rather than normal
> situation. May be it is wrong assumption

It is.


> So before implementing some complicated solution of the problem9too slow
> deadlock detection), I think that first it is necessary to understand
> whether there is such problem at al and under which workload it can happen.

Sure. I'm not saying that you shouldn't experiment with a patch like the
one you sent. What I am saying is that that can't be the actual solution
that will be integrated.

What about my version? 
It still performs deadlock detection every time, but it tries to detect it with shared lock first,
and only if there is probability of real deadlock, it rechecks with exclusive lock. 

Although even shared lock leads to some stalleness for active transactions, but in the catastrophic situation, where many backends to check for inexisting deadlock at the same time, it greately reduce pause. 

Regards, 
Yura. 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Proposal: Adding json logging
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: partitioning code reorganization