always use runtime checks for CRC-32C instructions
От | Nathan Bossart |
---|---|
Тема | always use runtime checks for CRC-32C instructions |
Дата | |
Msg-id | 20231030161706.GA3011@nathanxps13 обсуждение исходный текст |
Ответы |
Re: always use runtime checks for CRC-32C instructions
|
Список | pgsql-hackers |
This is an offshoot of the "CRC32C Parallel Computation Optimization on ARM" thread [0]. I intend for this to be a prerequisite patch set. Presently, for the SSE 4.2 and ARMv8 CRC instructions used in the CRC32C code for WAL records, etc., we first check if the intrinsics are available with the default compiler flags. If so, we only bother compiling the implementation that uses those intrinsics. If not, we also check whether the intrinsics are available with some extra CFLAGS, and if they are, we compile both the implementation that uses the intrinsics as well as a fallback implementation that doesn't require any special instructions. Then, at runtime, we check what's available in the hardware and choose the appropriate CRC32C implementation. The aforementioned other thread [0] aims to further optimize this code by using another instruction that requires additional configure and/or runtime checks. $SUBJECT has been in the back of my mind for a while, but given proposals to add further complexity to this code, I figured it might be a good time to propose this simplification. Specifically, I think we shouldn't worry about trying to compile only the special instrinics versions, and instead always try to build both and choose the appropriate one at runtime. AFAICT the trade-offs aren't too bad. With some simple testing, I see that the runtime check occurs once at startup, so I don't anticipate any noticeable performance impact. I suppose each process might need to do the check in EXEC_BACKEND builds, but even so, I suspect the difference is negligible. I also see that the SSE 4.2 runtime check requires the CPUID instruction, so we wouldn't use the instrinsics for hardware that supports SSE 4.2 but not CPUID. However, I'm not sure such hardware even exists. Wikipedia says that CPUID was introduced in 1993 [1], and meson.build appears to omit the CPUID check when determining which CRC32C implementation to use. Furthermore, meson.build alludes to problems with some of the CPUID-related checks: # XXX: The configure.ac check for __cpuid() is broken, we don't copy that # here. To prevent problems due to two detection methods working, stop # checking after one. Are there any other reasons that we should try to avoid the runtime check when possible? I've attached two patches. 0001 adds a debug message to the SSE 4.2 runtime check that matches the one already present for the ARMv8 check. This message just notes whether the runtime check found that the special CRC instructions are available. 0002 is a first attempt at $SUBJECT. I've tested it on both x86 and ARM, and it seems to work as intended. You'll notice that I'm still checking for the intrinsics with the default compiler flags first. I didn't see any strong reason to change this, and doing so allows us to avoid sending extra CFLAGS when possible. Thoughts? [0] https://postgr.es/m/DB9PR08MB6991329A73923BF8ED4B3422F5DBA%40DB9PR08MB6991.eurprd08.prod.outlook.com [1] https://en.wikipedia.org/wiki/CPUID -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Вложения
В списке pgsql-hackers по дате отправления: