Re: Optimize Arm64 crc32c implementation in Postgresql

Поиск
Список
Период
Сортировка
От Heikki Linnakangas
Тема Re: Optimize Arm64 crc32c implementation in Postgresql
Дата
Msg-id 37204430-76fb-0eaa-06d9-dbf4f6473c99@iki.fi
обсуждение исходный текст
Ответ на Re: Optimize Arm64 crc32c implementation in Postgresql  (Andres Freund <andres@anarazel.de>)
Ответы Re: Optimize Arm64 crc32c implementation in Postgresql  (Andres Freund <andres@anarazel.de>)
Re: Optimize Arm64 crc32c implementation in Postgresql  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Optimize Arm64 crc32c implementation in Postgresql  (Daniel Gustafsson <daniel@yesql.se>)
Re: Optimize Arm64 crc32c implementation in Postgresql  (Thomas Munro <thomas.munro@enterprisedb.com>)
Список pgsql-hackers
On 03/04/18 19:09, Andres Freund wrote:
> Hi,
> 
> On 2018-04-03 19:05:19 +0300, Heikki Linnakangas wrote:
>> On 01/04/18 20:32, Andres Freund wrote:
>>> On 2018-03-06 02:44:35 +0800, Heikki Linnakangas wrote:
>>>> * I tested this on Linux, with gcc and clang, on an ARM64 virtual machine
>>>> that I had available (not an emulator, but a VM on a shared ARM64 server).
>>>
>>> Have you seen actual postgres performance benefits with the patch?
>>
>> I just ran a small test with pg_waldump, similar to what Abhijit Menon-Sen
>> ran with the Slicing-by-8 and Intel SSE patches, when we added those
>> (https://www.postgresql.org/message-id/20141119155811.GA32492%40toroid.org).
>> I ran pgbench, with scale factor 5, until it had generated about 1 GB of
>> WAL, and then I ran pg_waldump -z on that WAL. With slicing-by-8, it took
>> about 7 s, and with the special CPU instructions, about 5 s. 'perf' showed
>> that the CRC computation took about 30% of the CPU time before, and about
>> 12% after, which sounds about right. That's not as big a speedup as we saw
>> with the corresponding Intel SSE instructions back in 2014, but still quite
>> worthwhile.
> 
> Cool.  Based on a skim the patch looks reasonable.

Thanks.

I bikeshedded with myself on the naming of things, and decided to use 
"ARMv8" in the variable and file names, instead of ARM64 or ARMCE or 
ARM64CE. The CRC instructions were introduced in ARM v8 (as an optional 
feature), it's not really related to the 64-bitness, even though the 
64-bit instruction set was also introduced in ARM v8. Other than that, 
and some comment fixes, this is the same as the previous patch version.

I was just about to commit this, when I started to wonder: Do we need to 
worry about alignment? As the patch stands, it will merrily do unaligned 
8-byte loads. Is that OK on ARM? It seems to work on the system I've 
been testing on, but I don't know. And even if it's OK, would it perform 
better if we did 1-byte loads in the beginning, until we reach the 
8-byte boundary?

- Heikki

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Changing WAL Header to reduce contention duringReserveXLogInsertLocation()
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Optimize Arm64 crc32c implementation in Postgresql