Re: Popcount optimization using AVX512

Поиск
Список
Период
Сортировка
От David Rowley
Тема Re: Popcount optimization using AVX512
Дата
Msg-id CAApHDvqyMNGVgwpaOPtENdq5uEMR=vSkRJEgG1S+X7Vtk1-EnA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Popcount optimization using AVX512  (Nathan Bossart <nathandbossart@gmail.com>)
Ответы Re: Popcount optimization using AVX512
Список pgsql-hackers
On Tue, 19 Mar 2024 at 06:30, Nathan Bossart <nathandbossart@gmail.com> wrote:
> Here is a more fleshed-out version of what I believe David is proposing.
> On my machine, the gains aren't quite as impressive (~8.8s to ~5.2s for the
> test_popcount benchmark).  I assume this is because this patch turns
> pg_popcount() into a function pointer, which is what the AVX512 patches do,
> too.  I left out the 32-bit section from pg_popcount_fast(), but I'll admit
> that I'm not yet 100% sure that we can assume we're on a 64-bit system
> there.

I looked at your latest patch and tried out the performance on a Zen4
running windows and a Zen2 running on Linux. As follows:

AMD 3990x:

master:
postgres=# select drive_popcount(10000000, 1024);
Time: 11904.078 ms (00:11.904)
Time: 11907.176 ms (00:11.907)
Time: 11927.983 ms (00:11.928)

patched:
postgres=# select drive_popcount(10000000, 1024);
Time: 3641.271 ms (00:03.641)
Time: 3610.934 ms (00:03.611)
Time: 3663.423 ms (00:03.663)


AMD 7945HX Windows

master:
postgres=# select drive_popcount(10000000, 1024);
Time: 9832.845 ms (00:09.833)
Time: 9844.460 ms (00:09.844)
Time: 9858.608 ms (00:09.859)

patched:
postgres=# select drive_popcount(10000000, 1024);
Time: 3427.942 ms (00:03.428)
Time: 3364.262 ms (00:03.364)
Time: 3413.407 ms (00:03.413)

The only thing I'd question in the patch is in pg_popcount_fast(). It
looks like you've opted to not do the 32-bit processing on 32-bit
machines. I think that's likely still worth coding in a similar way to
how pg_popcount_slow() works. i.e. use "#if SIZEOF_VOID_P >= 8".
Probably one day we'll remove that code, but it seems strange to have
pg_popcount_slow() do it and not pg_popcount_fast().

> IMHO this work is arguably a prerequisite for the AVX512 work, as turning
> pg_popcount() into a function pointer will likely regress performance for
> folks on systems without AVX512 otherwise.

I think so too.

David



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Possibility to disable `ALTER SYSTEM`
Следующее
От: Nathan Bossart
Дата:
Сообщение: Re: Popcount optimization using AVX512