add AVX2 support to simd.h
От | Nathan Bossart |
---|---|
Тема | add AVX2 support to simd.h |
Дата | |
Msg-id | 20231129171526.GA857928@nathanxps13 обсуждение исходный текст |
Ответы |
Re: add AVX2 support to simd.h
Re: add AVX2 support to simd.h Re: add AVX2 support to simd.h |
Список | pgsql-hackers |
On Wed, Nov 22, 2023 at 12:49:35PM -0600, Nathan Bossart wrote: > On Wed, Nov 22, 2023 at 02:54:13PM +0200, Ants Aasma wrote: >> For reference, executing the page checksum 10M times on a AMD 3900X CPU: >> >> clang-14 -O2 4.292s (17.8 GiB/s) >> clang-14 -O2 -msse4.1 2.859s (26.7 GiB/s) >> clang-14 -O2 -msse4.1 -mavx2 1.378s (55.4 GiB/s) > > Nice. I've noticed similar improvements with AVX2 intrinsics in simd.h. I've alluded to this a few times now, so I figured I'd park the patch and preliminary benchmarks in a new thread while we iron out how to support newer instructions (see discussion here [0]). Using the same benchmark as we did for the SSE2 linear searches in XidInMVCCSnapshot() (commit 37a6e5d) [1] [2], I see the following: writers sse2 avx2 % 256 1195 1188 -1 512 928 1054 +14 1024 633 716 +13 2048 332 420 +27 4096 162 203 +25 8192 162 182 +12 It's been a while since I ran these benchmarks, but I vaguely recall also seeing something like a 50% improvement for a dedicated pg_lfind32() benchmark on long arrays. As is, the patch likely won't do anything unless you add -mavx2 or -march=native to your CFLAGS. I don't intend for this patch to be seriously considered until we have better support for detecting/compiling AVX2 instructions and a buildfarm machine that uses them. I plan to start another thread for AVX2 support for the page checksums. [0] https://postgr.es/m/20231107024734.GB729644%40nathanxps13 [1] https://postgr.es/m/057a9a95-19d2-05f0-17e2-f46ff20e9b3e@2ndquadrant.com [2] https://postgr.es/m/20220713170950.GA3116318%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Вложения
В списке pgsql-hackers по дате отправления: