Excessive PostmasterIsAlive calls slow down WAL redo

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Excessive PostmasterIsAlive calls slow down WAL redo
Дата
Msg-id 7261eb39-0369-f2f4-1bb5-62f3b6083b5e@iki.fi
обсуждение исходный текст
Ответы Re: Excessive PostmasterIsAlive calls slow down WAL redo  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Re: Excessive PostmasterIsAlive calls slow down WAL redo  (Simon Riggs <simon@2ndquadrant.com>)
Re: Excessive PostmasterIsAlive calls slow down WAL redo  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
I started looking at the "Improve compactify_tuples and 
PageRepairFragmentation" patch, and set up a little performance test of 
WAL replay. I ran pgbench, scale 5, to generate about 1 GB of WAL, and 
timed how long it takes to replay that WAL. To focus purely on CPU 
overhead, I kept the data directory in /dev/shm/.

Profiling that, without any patches applied, I noticed that a lot of 
time was spent in read()s on the postmaster-death pipe, i.e. in 
PostmasterIsAlive(). We call that between *every* WAL record.

As a quick test to see how much that matters, I commented out the 
PostmasterIsAlive() call from HandleStartupProcInterrupts(). On 
unpatched master, replaying that 1 GB of WAL takes about 20 seconds on 
my laptop. Without the PostmasterIsAlive() call, 17 seconds.

That seems like an utter waste of time. I'm almost inclined to call that 
a performance bug. As a straightforward fix, I'd suggest that we call 
HandleStartupProcInterrupts() in the WAL redo loop, not on every record, 
but only e.g. every 32 records. That would make the main redo loop less 
responsive to shutdown, SIGHUP, or postmaster death, but that seems OK. 
There are also calls to HandleStartupProcInterrupts() in the various 
other loops, that wait for new WAL to arrive or recovery delay, so this 
would only affect the case where we're actively replaying records.

- Heikki

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Craig Ringer
Дата:
Сообщение: Re: PostgreSQL's handling of fsync() errors is unsafe and risks data loss at least on XFS
Следующее
От: Amit Langote
Дата:
Сообщение: Re: [HACKERS] Add support for tuple routing to foreign partitions