Обсуждение: Changing default -march landscape

Поиск

Список

Период

Сортировка

Changing default -march landscape

От

Thomas Munro

Дата:

13 июня, 02:11:56

Hi,

David R and I were discussing vectorisation and microarchitectures and
what you can expect the target microarchitecture to be these days, and
it seemed like some of our choices are not really very
forward-looking.

Distros targeting x86-64 traditionally assumed the original AMD64 K8
instruction set, so if we want to use newer instructions we use
various configure or runtime checks to see if that's safe.

Recent GCC and Clang versions understand -march=x86-64-v{2,3,4}[1].
RHEL9 and similar and SUSE tumbleweed now require x86-64-v2, and IIUC
they changed the -march default to -v2 in their build of GCC, and I
think Ubuntu has something in the works perhaps for -v3[2].

Some of our current tricks won't won't take proper advantage of that:
we'll still access POPCNT through a function pointer!  I was wondering
how to do it.  One idea that seems kinda weird is to try $(CC) -S
test_builtin_popcount.c, and then grepping for POPCNT in
test_builtin_popcount.s!  I assume we don't want to use
__builtin_popcount() if it doesn't generate the instruction (using the
compiler flags provided or default otherwise), because on a more
conservative distro we'll use GCC/Clang's fallback code, instead of
our own runtime-checked POPCNT-instruction-through-a-function-pointer.
(Presumably we believed that to be better.)  Being able to use
__builtin_popcount() directly without any function pointer nonsense is
obviously faster, but also automatically vectorisable.

That's not like the CRC32 instruction checks we have, because those
either work or don't work with default compiler flags, but for POPCNT
it always works but might general fallback code instead of the desired
instruction so you have to inspect what it generates.

FWIW Windows 11 on x86 requires the POPCNT instruction to boot.
Windows 10 EOL is October next year so we can use MSVC's intrinsic
without a function pointer if we just wait :-)

All ARM64 bit systems have CNT, but we don't use it!  Likewise for all
modern POWER (8+) and SPARC chips that any OS can actually run on
these days.  For RISCV it's part of the bit manipulation option, but
we're already relying on that by detecting and using other
pg_bitutils.h builtins.

So I think we should probably just use the builtin directly
everywhere, except on x86 where we should either check if it generates
the instruction we want, OR, if we can determine that the modern
GCC/Clangfallback code is actually faster than our function pointer
hop, then maybe we should just always use it even there, after
checking that it exists.

[1] https://en.wikipedia.org/wiki/X86-64#Microarchitecture_levels
[2] https://ubuntu.com/blog/optimising-ubuntu-performance-on-amd64-architecture

Re: Changing default -march landscape

От

Nathan Bossart

Дата:

13 июня, 04:09:45

On Thu, Jun 13, 2024 at 11:11:56AM +1200, Thomas Munro wrote:
> David R and I were discussing vectorisation and microarchitectures and
> what you can expect the target microarchitecture to be these days, and
> it seemed like some of our choices are not really very
> forward-looking.
> 
> Distros targeting x86-64 traditionally assumed the original AMD64 K8
> instruction set, so if we want to use newer instructions we use
> various configure or runtime checks to see if that's safe.
> 
> Recent GCC and Clang versions understand -march=x86-64-v{2,3,4}[1].
> RHEL9 and similar and SUSE tumbleweed now require x86-64-v2, and IIUC
> they changed the -march default to -v2 in their build of GCC, and I
> think Ubuntu has something in the works perhaps for -v3[2].
> 
> Some of our current tricks won't won't take proper advantage of that:
> we'll still access POPCNT through a function pointer!

This is perhaps only tangentially related, but I've found it really
difficult to avoid painting ourselves into a corner with this stuff.  Let's
use the SSE 4.2 CRC32C code as an example.  Right now, if your default
compiler flags indicate support for SSE 4.2 (which I'll note can be assumed
with x86-64-v2), we'll use it unconditionally, no function pointer
required.  If additional compiler flags happen to convince the compiler to
generate SSE 4.2 code, we'll instead build both a fallback version and the
SSE version, and then we'll use a function pointer to direct to whatever we
detect is available on the CPU when the server starts.

Now, let's say we require x86-64-v2.  Once we have that, we can avoid the
function pointer on many more x86 machines.  While that sounds great, now
we have a different problem.  If someone wants to add, say, AVX-512 support
[0], which is a much newer instruction set, we'll need to use the function
pointer again.  And we're back where we started.  We could instead just ask
folks to compile with --march=native, but then these optimizations are only
available for a subset of users until we decide the instructions are
standard enough 20 years from now.

The idea that's been floating around recently is to build a bunch of
different versions of Postgres and to choose one on startup based on what
the CPU supports.  That seems like quite a lot of work, and it'll increase
the size of the builds quite a bit, but it at least doesn't have the
aforementioned problem.

Sorry if I just rambled on about something unrelated, but your message had
enough keywords to get me thinking about this again.

[0] https://postgr.es/m/BL1PR11MB530401FA7E9B1CA432CF9DC3DC192%40BL1PR11MB5304.namprd11.prod.outlook.com

-- 
nathan

Re: Changing default -march landscape

От

Thomas Munro

Дата:

13 июня, 04:20:17

On Thu, Jun 13, 2024 at 1:09 PM Nathan Bossart <nathandbossart@gmail.com> wrote:
> Now, let's say we require x86-64-v2.  Once we have that, we can avoid the
> function pointer on many more x86 machines.  While that sounds great, now
> we have a different problem.  If someone wants to add, say, AVX-512 support
> [0], which is a much newer instruction set, we'll need to use the function
> pointer again.  And we're back where we started.  We could instead just ask
> folks to compile with --march=native, but then these optimizations are only
> available for a subset of users until we decide the instructions are
> standard enough 20 years from now.

The way I think about it, it's not our place to require anything (I
mean, we can't literally put -march=XXX into our build files, or if we
do the Debian et al maintainers will have to remove them by local
patch because they are in charge of what the baseline is for their
distro), but we should do the best thing possible when people DO build
with modern -march.  I would assume for example that Amazon Linux is
set up to use a default -march that targets the actual minimum
microarch level on AWS hosts.  I guess what I'm pointing out here is
that the baseline is (finally!) moving on common distributions, and
yet we've coded things in a way that doesn't benefit...

> The idea that's been floating around recently is to build a bunch of
> different versions of Postgres and to choose one on startup based on what
> the CPU supports.  That seems like quite a lot of work, and it'll increase
> the size of the builds quite a bit, but it at least doesn't have the
> aforementioned problem.

I guess another idea would be for the PGDG packagers or someone else
interested in performance to create repos with binaries built for
these microarch levels and users can research what they need.  The new
-v2 etc levels are a lot more practical than the microarch names and
individual features...

Re: Changing default -march landscape

От

Nathan Bossart

Дата:

13 июня, 05:00:41

On Thu, Jun 13, 2024 at 01:20:17PM +1200, Thomas Munro wrote:
> The way I think about it, it's not our place to require anything (I
> mean, we can't literally put -march=XXX into our build files, or if we
> do the Debian et al maintainers will have to remove them by local
> patch because they are in charge of what the baseline is for their
> distro), but we should do the best thing possible when people DO build
> with modern -march.  I would assume for example that Amazon Linux is
> set up to use a default -march that targets the actual minimum
> microarch level on AWS hosts.  I guess what I'm pointing out here is
> that the baseline is (finally!) moving on common distributions, and
> yet we've coded things in a way that doesn't benefit...

That's true, but my point is that as soon as we start avoiding function
pointers more commonly, it becomes difficult to justify adding them back in
order to support new instruction sets.  Should we just compile in the SSE
4.2 version, or should we take a chance on AVX-512 with the function
pointer?

>> The idea that's been floating around recently is to build a bunch of
>> different versions of Postgres and to choose one on startup based on what
>> the CPU supports.  That seems like quite a lot of work, and it'll increase
>> the size of the builds quite a bit, but it at least doesn't have the
>> aforementioned problem.
> 
> I guess another idea would be for the PGDG packagers or someone else
> interested in performance to create repos with binaries built for
> these microarch levels and users can research what they need.  The new
> -v2 etc levels are a lot more practical than the microarch names and
> individual features...

Heartily agreed.

-- 
nathan

Re: Changing default -march landscape

От

Peter Eisentraut

Дата:

13 июня, 10:41:33

On 13.06.24 04:00, Nathan Bossart wrote:
> That's true, but my point is that as soon as we start avoiding function
> pointers more commonly, it becomes difficult to justify adding them back in
> order to support new instruction sets.  Should we just compile in the SSE
> 4.2 version, or should we take a chance on AVX-512 with the function
> pointer?
> 
>>> The idea that's been floating around recently is to build a bunch of
>>> different versions of Postgres and to choose one on startup based on what
>>> the CPU supports.  That seems like quite a lot of work, and it'll increase
>>> the size of the builds quite a bit, but it at least doesn't have the
>>> aforementioned problem.
>>
>> I guess another idea would be for the PGDG packagers or someone else
>> interested in performance to create repos with binaries built for
>> these microarch levels and users can research what they need.  The new
>> -v2 etc levels are a lot more practical than the microarch names and
>> individual features...
> 
> Heartily agreed.

One thing that is perhaps not clear (to me?) is how much this matters 
and how much of it matters.  Obviously, we know that it matters some, 
otherwise we'd not be working on it.  But does it, like, matter only 
with checksums, or with thousands of partitions, or with many CPUs, or 
certain types of indexes, etc.?

If we're going to, say, create some recommendations for packagers around 
this, how are they supposed to determine the tradeoffs?  It might be 
easy for a packager to set some slightly-higher -march flag that is in 
line with their distro's policies, but it would probably be a lot more 
work to create separate binaries or a separate repository for, say, 
moving from SSE-something to AVX-something.  And how are they supposed 
to decide that, and how are they supposed to communicate that to their 
users?  (And how can we get different packagers to make somewhat 
consistent decisions around this?)

We have in a somewhat similar case quite clearly documented that without 
native spinlock support everything will be terrible.  And there is 
probably some information out there that without certain CPU support 
checksum performance will be terrible.  But beyond that we probably 
don't have much.

Re: Changing default -march landscape

От

Magnus Hagander

Дата:

13 июня, 11:14:52

On Thu, Jun 13, 2024 at 9:41 AM Peter Eisentraut <peter@eisentraut.org> wrote:

On 13.06.24 04:00, Nathan Bossart wrote:
> That's true, but my point is that as soon as we start avoiding function
> pointers more commonly, it becomes difficult to justify adding them back in
> order to support new instruction sets. Should we just compile in the SSE
> 4.2 version, or should we take a chance on AVX-512 with the function
> pointer?
>
>>> The idea that's been floating around recently is to build a bunch of
>>> different versions of Postgres and to choose one on startup based on what
>>> the CPU supports. That seems like quite a lot of work, and it'll increase
>>> the size of the builds quite a bit, but it at least doesn't have the
>>> aforementioned problem.
>>
>> I guess another idea would be for the PGDG packagers or someone else
>> interested in performance to create repos with binaries built for
>> these microarch levels and users can research what they need. The new
>> -v2 etc levels are a lot more practical than the microarch names and
>> individual features...
>
> Heartily agreed.

One thing that is perhaps not clear (to me?) is how much this matters
and how much of it matters. Obviously, we know that it matters some,
otherwise we'd not be working on it. But does it, like, matter only
with checksums, or with thousands of partitions, or with many CPUs, or
certain types of indexes, etc.?

If we're going to, say, create some recommendations for packagers around
this, how are they supposed to determine the tradeoffs? It might be
easy for a packager to set some slightly-higher -march flag that is in
line with their distro's policies, but it would probably be a lot more
work to create separate binaries or a separate repository for, say,
moving from SSE-something to AVX-something. And how are they supposed
to decide that, and how are they supposed to communicate that to their
users? (And how can we get different packagers to make somewhat
consistent decisions around this?)

We have in a somewhat similar case quite clearly documented that without
native spinlock support everything will be terrible. And there is
probably some information out there that without certain CPU support
checksum performance will be terrible. But beyond that we probably
don't have much.

Yeah, I think it's completely unreasonable to push this on packagers and just say "this is your problem now". If we do that, we can assume the only people to get any benefit from these optimizations are those that use a fully managed cloud service like azure or RDS.

They can do it, but we need to tell them how and when. And if we intend for packagers to be part of the solution we need to explicitly bring them into the discussion of how to do it, at a fairly early stage (and no, we can't expect them to follow every thread on hackers).

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: Changing default -march landscape

От

Thomas Munro

Дата:

14 июня, 03:49:43

On Thu, Jun 13, 2024 at 8:15 PM Magnus Hagander <magnus@hagander.net> wrote:
> Yeah, I think it's completely unreasonable to push this on packagers and just say "this is your problem now". If we
dothat, we can assume the only people to get any benefit from these optimizations are those that use a fully managed
cloudservice like azure or RDS.

It would also benefit distros that have decided to move their baseline
micro-arch level right now, probably without any additional action
from the maintainers assuming that gcc defaults to -march=*-v2 etc.
The cloud vendors' internal distros aren't really special in that
regard are they?

Hmm, among Linux distros, maybe it's really only Debian that isn't
moving or talking about moving the baseline yet? (Are they?)

> They can do it, but we need to tell them how and when. And if we intend for packagers to be part of the solution we
needto explicitly bring them into the discussion of how to do it, at a fairly early stage (and no, we can't expect them
tofollow every thread on hackers).

OK let me CC Christoph and ask the question this way: hypothetically,
if RHEL users' PostgreSQL packages became automatically faster than
Debian users' packages because of default -march choice in the
toolchain, what would the Debian project think about that, and what
should we do about it? The options discussed so far were:

1. Consider a community apt repo that is identical to the one except
targeting the higher microarch levels, that users can point a machine
to if they want to.
2. Various ideas for shipping multiple versions of the code at a
higher granularity than we're doing to day (a callback for a single
instruction! the opposite extreme being the whole executable +
libraries), probably using some of techniques mentioned at
https://wiki.debian.org/InstructionSelection.

I would guess that 1 is about 100x easier but I haven't looked into it.

Re: Changing default -march landscape

От

Christoph Berg

Дата:

24 июня, 15:16:28

Hi,

sorry for the delayed reply, I suck at prioritizing things.

Re: Thomas Munro
> OK let me CC Christoph and ask the question this way: hypothetically,
> if RHEL users' PostgreSQL packages became automatically faster than
> Debian users' packages because of default -march choice in the
> toolchain, what would the Debian project think about that, and what
> should we do about it?  The options discussed so far were:
> 
> 1.  Consider a community apt repo that is identical to the one except
> targeting the higher microarch levels, that users can point a machine
> to if they want to.

There are sub-variations of that: Don't use -march in Debian for
strict baseline compatiblity, but use -march=something in
apt.postgresql.org; bump to -march=x86-64-v2 for the server build (but
not libpq and psql) saying that PG servers need better hardware; ...

I'd rather want to avoid adding yet another axis to the matrix we
target on apt.postgresql.org, it's already complex enough. So ideally
there should be only one server-build. Or if we decide it's worth to
have several, extension builds should still be compatible with either.

> 2.  Various ideas for shipping multiple versions of the code at a
> higher granularity than we're doing to day (a callback for a single
> instruction!  the opposite extreme being the whole executable +
> libraries), probably using some of techniques mentioned at
> https://wiki.debian.org/InstructionSelection.

I don't have enough experience with that to say anything about the
trade-offs, or if the online instruction selectors themselves add too
much overhead.

Not to mention that testing things against all instruction variants is
probably next to impossible in practice.

> I would guess that 1 is about 100x easier but I haven't looked into it.

Are there any numbers about what kind of speedup we could expect?

If the online selection isn't feasible/worthwhile, I'd be willing to
bump the baseline for the server packages. There are already packages
doing that, and there's even infrastructure in the "isa-support"
package that lets packages declare a dependency on CPU features.

Or Debian might just bump the baseline. PostgreSQL asking for it might
just be the reason we wanted to hear to make it happen.

Christoph

Re: Changing default -march landscape

От

Christoph Berg

Дата:

24 июня, 15:24:25

Re: To Thomas Munro
> Or Debian might just bump the baseline. PostgreSQL asking for it might
> just be the reason we wanted to hear to make it happen.

Which level would PostgreSQL specifically want? x86-64-v2 or even
x86-64-v3?

Christoph

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Changing default -march landscape