Re: Popcount optimization using AVX512
От | Nathan Bossart |
---|---|
Тема | Re: Popcount optimization using AVX512 |
Дата | |
Msg-id | 20240212205507.GB1815383@nathanxps13 обсуждение исходный текст |
Ответ на | Re: Popcount optimization using AVX512 (Noah Misch <noah@leadboat.com>) |
Список | pgsql-hackers |
On Sat, Feb 10, 2024 at 03:52:38PM -0800, Noah Misch wrote: > On Fri, Feb 09, 2024 at 08:33:23PM -0800, Andres Freund wrote: >> My understanding is that the ifunc mechanism just avoid the need for repeated >> indirect calls/jumps to implement a single function call, not the use of >> indirect function calls at all. Calls into shared libraries, like libc, are >> indirected via the GOT / PLT, i.e. an indirect function call/jump. Without >> ifuncs, the target of the function call would then have to dispatch to the >> resolved function. Ifuncs allow to avoid this repeated dispatch by moving the >> dispatch to the dynamic linker stage, modifying the contents of the GOT/PLT to >> point to the right function. Thus ifuncs are an optimization when calling a >> function in a shared library that's then dispatched depending on the cpu >> capabilities. >> >> However, in our case, where the code is in the same binary, function calls >> implemented in the main binary directly (possibly via a static library) don't >> go through GOT/PLT. In such a case, use of ifuncs turns a normal direct >> function call into one going through the GOT/PLT, i.e. makes it indirect. The >> same is true for calls within a shared library if either explicit symbol >> visibility is used, or -symbolic, -Wl,-Bsymbolic or such is used. Therefore >> there's no efficiency gain of ifuncs over a call via function pointer. >> >> >> This isn't because ifunc is implemented badly or something - the reason for >> this is that dynamic relocations aren't typically implemented by patching all >> callsites (".text relocations"), which is what you would need to avoid the >> need for an indirect call to something that fundamentally cannot be a constant >> address at link time. The reason text relocations are disfavored is that >> they can make program startup quite slow, that they require allowing >> modifications to executable pages which are disliked due to the security >> implications, and that they make the code non-shareable, as the in-memory >> executable code has to differ from the on-disk code. >> >> >> I actually think ifuncs within the same binary are a tad *slower* than plain >> function pointer calls, unless -fno-plt is used. Without -fno-plt, an ifunc is >> called by 1) a direct call into the PLT, 2) loading the target address from >> the GOT, 3) making an an indirect jump to that address. Whereas a "plain >> indirect function call" is just 1) load target address from variable 2) making >> an indirect jump to that address. With -fno-plt the callsites themselves load >> the address from the GOT. > > That sounds more accurate than what I wrote. Thanks. +1, thanks for the detailed explanation, Andres. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
В списке pgsql-hackers по дате отправления: