Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)
Дата
Msg-id 4F5E4F8A.40201@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)  (Heikki Linnakangas <heikki.linnakangas@enterprisedb.com>)
Ответы Re: Scaling XLog insertion (was Re: Moving more work outside WALInsertLock)
Список pgsql-hackers
On 09.03.2012 12:04, Heikki Linnakangas wrote:
> I've been doing some performance testing with this, using a simple C
> function that just inserts a dummy WAL record of given size. I'm not
> totally satisfied. Although the patch helps with scalability at 3-4
> concurrent backends doing WAL insertions, it seems to slow down the
> single-client case with small WAL records by about 5-10%. This is what
> Robert also saw with an earlier version of the patch
> (http://archives.postgresql.org/pgsql-hackers/2011-12/msg01223.php). I
> tested this with the data directory on a RAM drive, unfortunately I
> don't have a server with a hard drive that can sustain the high
> insertion rate. I'll post more detailed results, once I've refined the
> tests a bit.

So, here's more detailed test results, using Greg Smith's excellent 
pgbench-tools test suite:

http://community.enterprisedb.com/xloginsert-scale-tests/

The workload in all of these tests was a simple C function that writes a 
lot of very small WAL records, with 16 bytes of payload each. I ran the 
tests with the data directory on a regular hard drive, on an SDD, and on 
a ram drive (/dev/shm). With HDD, I also tried fsync=off and 
synchronous_commit_off. For each of those, I ran the tests with 1-16 
concurrent backends.

Summary: The patch hurts single-backend performance by about 10%, except 
for the synchronous_commit=off test. Between 2-6 clients, it either 
helps, doesn't make any difference, or hurts. With > 6 clients, it hurts.

So, that's quite disappointing. The patch has two problems: the 10% 
slowdown in single-client case, and the slowdown with > 6 clients. I 
don't know where exactly the single-client slowdown comes from, although 
I'm not surprised that the bookkeeping with slots etc. has some 
overhead. Hopefully that overhead can be made smaller, if not eliminated 
completely..

The slowdown with > 6 clients seems to be spinlock contention. I ran 
"perf record" for a short duration during one of the ramdrive tests, and 
saw the spinlock acquisition in ReserveXLogInsertLocation() consuming 
about 80% of all CPU time.

I then hacked the patch a little bit, removing the check in XLogInsert 
for fullPageWrites and forcePageWrites, as well as the check for "did a 
checkpoint just happen" (see 
http://community.enterprisedb.com/xloginsert-scale-tests/disable-fpwcheck.patch). 
My hunch was that accessing those fields causes cache line stealing, 
making the cache line containing the spinlock even more busy. That hunch 
seems to be correct; when I reran the tests with that patch, the 
performance with high # of clients became much better. See the results 
with "xloginsert-scale-13.patch". With that change, the single-client 
case is still about 10% slower than current code, but the performance 
with > 8 clients is almost as good as with current code. Between 2-6 
clients, the patch is a win.

The hack that restored the > 6 clients performance to current level is 
not safe, of course, so I'll have to figure out a safe way to get that 
effect. Also, even when the performance is as good as current code, it's 
not good to spend all the CPU time spinning on the spinlock. I didn't 
measure the CPU usage with current code, but I would expect it to be 
sleeping, not spinning, when not doing useful work.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: foreign key locks, 2nd attempt
Следующее
От: Daniel Farina
Дата:
Сообщение: Re: query planner does not canonicalize infix operators