Re: Popcount optimization using AVX512

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: Popcount optimization using AVX512
Дата
Msg-id 20240209183450.snllkk7erharc73y@awork3.anarazel.de
обсуждение исходный текст
Ответ на RE: Popcount optimization using AVX512  ("Amonson, Paul D" <paul.d.amonson@intel.com>)
Ответы RE: Popcount optimization using AVX512  ("Amonson, Paul D" <paul.d.amonson@intel.com>)
Список pgsql-hackers
Hi,

On 2024-02-09 17:39:46 +0000, Amonson, Paul D wrote:

> diff --git a/meson.build b/meson.build
> index 8ed51b6aae..1e7a4dc942 100644
> --- a/meson.build
> +++ b/meson.build
> @@ -1773,6 +1773,45 @@ elif cc.links('''
>  endif
>  
>  
> +# XXX: The configure.ac check for __cpuidex() is broken, we don't copy that
> +# here. To prevent problems due to two detection methods working, stop
> +# checking after one.

This seems like a bogus copy-paste.


> +if cc.links('''
> +    #include <cpuid.h>
> +    int main(int arg, char **argv)
> +    {
> +        unsigned int exx[4] = {0, 0, 0, 0};
> +        __get_cpuid_count(7, 0, &exx[0], &exx[1], &exx[2], &exx[3]);
> +    }
> +    ''', name: '__get_cpuid_count',
> +    args: test_c_args)
> +  cdata.set('HAVE__GET_CPUID_COUNT', 1)
> +elif cc.links('''
> +    #include <intrin.h>
> +    int main(int arg, char **argv)
> +    {
> +        unsigned int exx[4] = {0, 0, 0, 0};
> +        __cpuidex(exx, 7, 0);
> +    }
> +    ''', name: '__cpuidex',
> +    args: test_c_args)
> +  cdata.set('HAVE__CPUIDEX', 1)
> +endif
> +
> +
> +# Check for header immintrin.h
> +if cc.links('''
> +    #include <immintrin.h>
> +    int main(int arg, char **argv)
> +    {
> +      return 1701;
> +    }
> +    ''', name: '__immintrin',
> +    args: test_c_args)
> +  cdata.set('HAVE__IMMINTRIN', 1)
> +endif

Do these all actually have to link?  Invoking the linker is slow.

I think you might be able to just use cc.has_header_symbol().



> +###############################################################
> +# AVX 512 POPCNT Intrinsic check
> +###############################################################
> +have_avx512_popcnt = false
> +cflags_avx512_popcnt = []
> +if host_cpu == 'x86_64'
> +  prog = '''
> +      #include <immintrin.h>
> +      #include <stdint.h>
> +      void main(void)
> +      {
> +        __m512i tmp __attribute__((aligned(64)));
> +        __m512i input = _mm512_setzero_si512();
> +        __m512i output = _mm512_popcnt_epi64(input);
> +        uint64_t cnt = 999;
> +        _mm512_store_si512(&tmp, output);
> +        cnt = _mm512_reduce_add_epi64(tmp);
> +        /* return computed value, to prevent the above being optimized away */
> +        return cnt == 0;
> +      }'''

Does this work with msvc?


> +    if cc.links(prog, name: '_mm512_setzero_si512, _mm512_popcnt_epi64, _mm512_store_si512, and
_mm512_reduce_add_epi64with -mavx512vpopcntdq -mavx512f',
 

That's a very long line in the output, how about using the avx feature name or
something?



> diff --git a/src/port/Makefile b/src/port/Makefile
> index dcc8737e68..6a01a7d89a 100644
> --- a/src/port/Makefile
> +++ b/src/port/Makefile
> @@ -87,6 +87,11 @@ pg_crc32c_sse42.o: CFLAGS+=$(CFLAGS_CRC)
>  pg_crc32c_sse42_shlib.o: CFLAGS+=$(CFLAGS_CRC)
>  pg_crc32c_sse42_srv.o: CFLAGS+=$(CFLAGS_CRC)
>  
> +# Newer Intel processors can use AVX-512 POPCNT Capabilities (01/30/2024)
> +pg_bitutils.o: CFLAGS+=$(CFLAGS_AVX512_POPCNT)
> +pg_bitutils_shlib.o: CFLAGS+=$(CFLAGS_AVX512_POPCNT)
> +pg_bitutils_srv.o:CFLAGS+=$(CFLAGS_AVX512_POPCNT)
> +
>  # all versions of pg_crc32c_armv8.o need CFLAGS_CRC
>  pg_crc32c_armv8.o: CFLAGS+=$(CFLAGS_CRC)
>  pg_crc32c_armv8_shlib.o: CFLAGS+=$(CFLAGS_CRC)
> diff --git a/src/port/meson.build b/src/port/meson.build
> index 69b30ab21b..1c48a3b07e 100644
> --- a/src/port/meson.build
> +++ b/src/port/meson.build
> @@ -184,6 +184,7 @@ foreach name, opts : pgport_variants
>        link_with: cflag_libs,
>        c_pch: pch_c_h,
>        kwargs: opts + {
> +        'c_args': opts.get('c_args', []) + cflags_avx512_popcnt,
>          'dependencies': opts['dependencies'] + [ssl],
>        }
>      )

This will build all of pgport with the avx flags, which wouldn't be correct, I
think? The compiler might inject automatic uses of avx512 in places, which
would cause problems, no?

While you don't do the same for make, isn't even just using the avx512 for all
of pg_bitutils.c broken for exactly that reson? That's why the existing code
builds the files for various crc variants as their own file.


Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Popcount optimization using AVX512
Следующее
От: Andres Freund
Дата:
Сообщение: Re: glibc qsort() vulnerability