Re: Postgres stucks in deadlock detection

Поиск
Список
Период
Сортировка
От Konstantin Knizhnik
Тема Re: Postgres stucks in deadlock detection
Дата
Msg-id 2e13a34b-0fb4-72f6-0b1b-a4458b5ca33b@postgrespro.ru
обсуждение исходный текст
Ответ на Re: Postgres stucks in deadlock detection  (Andres Freund <andres@anarazel.de>)
Ответы Re: Postgres stucks in deadlock detection  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers

On 13.04.2018 18:41, Andres Freund wrote:
> Hi,
>
> On 2018-04-13 16:43:09 +0300, Konstantin Knizhnik wrote:
>> Updated patch is attached.
>> +    /*
>> +     * Ensure that only one backend is checking for deadlock.
>> +     * Otherwise under high load cascade of deadlock timeout expirations can cause stuck of Postgres.
>> +     */
>> +    if (!pg_atomic_test_set_flag(&ProcGlobal->activeDeadlockCheck))
>> +    {
>> +        enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
>> +        return;
>> +    }
>> +    inside_deadlock_check = true;
> I can't see that ever being accepted.  This means there's absolutely no
> bound for deadlock checks happening even under light concurrency, even
> if there's no contention for a large fraction of the time.

It may cause problems only if
1. There is large number of active sessions
2. They perform deadlock-prone queries (so no attempts to avoid 
deadlocks at application level)
3. Deadlock timeout is set to be very small (10 msec?)

Otherwise either probability that all backends  once and once again are 
trying to check deadlocks concurrently is very small (and can be even 
more reduced by using random timeout for subsequent deadlock checks), 
either system can not normally function in any case because large number 
of clients fall into deadlock.


>
> If you want to improve this, improve the efficiency of the
> implementation, check multiple lockers at the same time (set a bit
> afterwards that they don't recheck themselves). There's plenty
> approaches that don't result in a significantly worse user experience.

It seems to me that the best best solution will be to perform deadlock 
by some dedicated process (deadlock checker) with specified frequency,
or do it in backends themselves but ensure that interval between 
deadlock checks exceeds deadlock_timeout.
On both cases it is necessary to check for all deadlock loops, not only 
deadlocks involving self transaction.
And definitely we should be able to abort "foreign" transaction, not 
only transaction performed by this backend.
I do not know how difficult it will be to implement. I expect that it 
will be not so easy, otherwise why deadlock detection was not 
implemented in this way in Postgres from the very beginning...

I completely agree that there are plenty of different approaches, but 
IMHO the currently used strategy is the worst one, because it can stall 
system even if there are not deadlocks at all.
I always think that deadlock is a programmer's error rather than normal 
situation. May be it is wrong assumption but I can not believe that some 
system can normally work when deadlocks happen quite frequently.  
Consider classical example: transfer money from one account to another:

start transaction;
update accounts set balance=balance+? where aid=?;
update accounts set balance=balance-? where aid=?;
commit transactions;

With thousands accounts and random choice of account IDs concurrent 
execution of this transaction by multiple clients (~100) cause deadlocks 
in less than few percents of transactions.
My patch doesn't somehow affect performance and latency in this case.
Obviously that if number of accounts is changed to 10 and number of 
clients to 1000, then deadlocks will happen permanently. But in this 
case system will not be able to work normally either with my patch, 
either without it.

So before implementing some complicated solution of the problem9too slow 
deadlock detection), I think that first it is necessary to understand 
whether there is such problem at al and under which workload it can happen.

-- 
Konstantin Knizhnik
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Nikolay Samokhvalov
Дата:
Сообщение: Re: Built-in connection pooling
Следующее
От: Amit Langote
Дата:
Сообщение: Re: crash with sql language partition support function