Re: WAL format and API changes (9.5)

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: WAL format and API changes (9.5)
Дата
Msg-id 20140916102138.GI23806@awork2.anarazel.de
обсуждение исходный текст
Ответ на Re: WAL format and API changes (9.5)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: WAL format and API changes (9.5)  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-hackers
On 2014-09-15 15:41:22 +0300, Heikki Linnakangas wrote:
> Here we go. I've split this again into two patches. The first patch is just
> refactoring the current code. It moves XLogInsert into a new file,
> xloginsert.c, and the definition of XLogRecord to new xlogrecord.h header
> file. As a result, there is a a lot of churn in the #includes in C files
> that generate WAL records, or contain redo routines.  The number of files
> that pull in xlog.h - directly or indirectly through other headers - is
> greatly reduced.
> 
> The second patch contains the interesting changes.
> 
> I wrote a little benchmark kit to performance test this. I'm trying to find
> out two things:
> 
> 1) How much CPU overhead do the new XLogBeginInsert and XLogRegister*
> functions add, compared to the current approach with XLogRecDatas.
> 
> 2) How much extra WAL is generated with the patch. This affects the CPU time
> spent in the tests, but it's also interesting to measure directly, because
> WAL size affects many things like WAL archiving, streaming replication etc.
> 
> Attached is the test kit I'm using. To run the battery of tests, use "psql
> -f run.sql". To answer the question of WAL volume, it runs a bunch of tests
> that exercise heap insert, update and delete, as well as b-tree and GIN
> insertions. To answer the second test, it runs a heap insertion test, with a
> tiny record size that's chosen so that it generates exactly the same amount
> of WAL after alignment with and without the patch. The test is repeated many
> times, and the median of runtimes is printed out.
> 
> Here are the results, comparing unpatched and patched versions. First, the
> WAL sizes:
> 
> A heap insertion records are 2 bytes larger with the patch. Due to
> alignment, that makes for a 0 or 8 byte difference in the record sizes.
> Other WAL records have a similar store; a few extra bytes but no big
> regressions. There are a few outliers above where it appears that the
> patched version takes less space. Not sure why that would be, probably just
> a glitch in the test, autovacuum kicked in or something.

I've to admit, that's already not a painless amount of overhead.

> Now, for the CPU overhead:
> 
>   description   | dur_us (orig) | dur_us (patched) |   %
> ----------------+---------------+------------------+--------
>  heap insert 30 |     0.7752835 |         0.831883 | 107.30
> (1 row)
> 
> So, the patched version runs 7.3 % slower. That's disappointing :-(.
> 
> This are the result I got on my laptop today. Previously, the typical result
> I've gotten has been about 5%, so that's a bit high. Nevertheless, even a 5%
> slowdown is probably not acceptable.

Yes, I definitely think it's not.

> While I've trying to nail down where that difference comes from, I've seen a
> lot of strange phenomenon. At one point, the patched version was 10% slower,
> but I was able to bring the difference down to 5% if I added a certain
> function in xloginsert.c, but never called it. It was very repeatable at the
> time, I tried adding and removing it many times and always got the same
> result, but I don't see it with the current HEAD and patch version anymore.
> So I think 5% is pretty close to the margin of error that arises from
> different compiler optimizations, data/instruction cache effects etc.
> 
> Looking at the 'perf' profile, The new function calls only amount to about
> 2% of overhead, so I'm not sure where the slowdown is coming from. Here are
> explanations I've considered, but I haven't been able to prove any of them:

I'd suggest doing:
a) perf stat -vvv of both workloads. That will often tell you stuff already
b) Look at other events. Particularly stalled-cycles-frontend,  stalled-cycles-backend, cache-misses


Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



В списке pgsql-hackers по дате отправления:

Предыдущее
От: David Rowley
Дата:
Сообщение: Re: Patch to support SEMI and ANTI join removal
Следующее
От: Andres Freund
Дата:
Сообщение: Re: CRC algorithm (was Re: [REVIEW] Re: Compression of full-page-writes)