Excessive CPU usage in StandbyReleaseLocks()

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Excessive CPU usage in StandbyReleaseLocks()
Дата
Msg-id CAEepm=1mL0KiQ2KJ4yuPpLGX94a4Ns_W6TL4EGRouxWibu56pA@mail.gmail.com
обсуждение исходный текст
Ответы Re: Excessive CPU usage in StandbyReleaseLocks()  (Andres Freund <andres@anarazel.de>)
Re: Excessive CPU usage in StandbyReleaseLocks()  (David Rowley <david.rowley@2ndquadrant.com>)
Список pgsql-hackers
Hello hackers,

Andres Freund diagnosed a case of $SUBJECT in a customer's 9.6 system.
I've written a minimal reproducer and a prototype patch to address the
root cause.

The problem is that StandbyReleaseLocks() does a linear search of all
known AccessExclusiveLocks when a transaction ends.  Luckily, since
v10 (commit 9b013dc2) that is skipped for transactions that haven't
taken any AELs and aren't using 2PC, but that doesn't help all users.

It's fine if the AEL list is short, but if you do something that takes
a lot of AELs such as restoring a database with many tables or
truncating a lot of partitions while other transactions are in flight
then we start doing O(txrate * nlocks * nsubxacts) work and that can
hurt.

The reproducer script I've attached creates one long-lived transaction
that acquires 6,000 AELs and takes a nap, while 48 connections run
trivial 2PC transactions (I was also able to reproduce the effect
without 2PC by creating a throw-away temporary table in every
transaction, but it was unreliable due to contention slowing
everything down).  For me, the standby's startup process becomes 100%
pegged, replay_lag begins to climb and perf says something like:

+   97.88%    96.96%  postgres  postgres            [.] StandbyReleaseLocks

The attached patch splits the AEL list into one list per xid and
sticks them in a hash table.  That makes perf say something like:

+    0.60%     0.00%  postgres  postgres            [.] StandbyReleaseLocks

This seems like something we'd want to back-patch because the problem
affects all branches (the older releases more severely because they
lack the above-mentioned optimisation).

Thoughts?

-- 
Thomas Munro
http://www.enterprisedb.com

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ashutosh Bapat
Дата:
Сообщение: Re: Remove mention in docs that foreign keys on partitioned tablesare not supported
Следующее
От: Michael Paquier
Дата:
Сообщение: Re: Partitioning with temp tables is broken