Re: CRC32C Parallel Computation Optimization on ARM

Поиск
Список
Период
Сортировка
От Nathan Bossart
Тема Re: CRC32C Parallel Computation Optimization on ARM
Дата
Msg-id 20231110163608.GA1225566@nathanxps13
обсуждение исходный текст
Ответ на RE: CRC32C Parallel Computation Optimization on ARM  (Xiang Gao <Xiang.Gao@arm.com>)
Ответы RE: CRC32C Parallel Computation Optimization on ARM  (Xiang Gao <Xiang.Gao@arm.com>)
Список pgsql-hackers
On Tue, Nov 07, 2023 at 08:05:45AM +0000, Xiang Gao wrote:
> I think I understand what you mean, this is the latest patch. Thank you!

Thanks for the new patch.

+# PGAC_ARMV8_VMULL_INTRINSICS
+# ----------------------------
+# Check if the compiler supports the vmull_p64
+# intrinsic functions. These instructions
+# were first introduced in ARMv8 crypto Extension.

I wonder if it'd be better to call this PGAC_ARMV8_CRYPTO_INTRINSICS since
this check seems to indicate the presence of +crypto.  Presumably there are
other instructions in this extension that could be used elsewhere, in which
case we could reuse this.

+# Use ARM VMULL if available and ARM CRC32C intrinsic is avaliable too.
+if test x"$USE_ARMV8_VMULL" = x"" && test x"$USE_ARMV8_VMULL_WITH_RUNTIME_CHECK" = x"" && (test x"$USE_ARMV8_CRC32C" =
x"1"|| test x"$USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK" = x"1"); then
 
+  if test x"$pgac_armv8_vmull_intrinsics" = x"yes" && test x"$CFLAGS_VMULL" = x""; then
+    USE_ARMV8_VMULL=1
+  else
+    if test x"$pgac_armv8_vmull_intrinsics" = x"yes"; then
+      USE_ARMV8_VMULL_WITH_RUNTIME_CHECK=1
+    fi
+  fi
+fi

I'm not sure I see the need to check USE_ARMV8_CRC32C* when setting these.
Couldn't we set them solely on the results of our
PGAC_ARMV8_VMULL_INTRINSICS check?  It looks like this is what you are
doing in meson.build already.

+extern pg_crc32c pg_comp_crc32c_with_vmull_armv8(pg_crc32c crc, const void *data, size_t len);

nitpick: Maybe pg_comp_crc32_armv8_parallel()?

-# all versions of pg_crc32c_armv8.o need CFLAGS_CRC
-pg_crc32c_armv8.o: CFLAGS+=$(CFLAGS_CRC)
-pg_crc32c_armv8_shlib.o: CFLAGS+=$(CFLAGS_CRC)
-pg_crc32c_armv8_srv.o: CFLAGS+=$(CFLAGS_CRC)

Why are these lines deleted?

-  ['pg_crc32c_armv8', 'USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK', 'crc'],
+  ['pg_crc32c_armv8', 'USE_ARMV8_CRC32C_WITH_RUNTIME_CHECK'],

What is the purpose of this change?

+__attribute__((target("+crc+crypto")))

I'm not sure we can assume that all compilers will understand this, and I'm
not sure we need it.

+    if (use_vmull)
+    {
+/*
+ * Crc32c parallel computation Input data is divided into three
+ * equal-sized blocks. Block length : 42 words(42 * 8 bytes).
+ * CRC0: 0 ~ 41 * 8,
+ * CRC1: 42 * 8 ~ (42 * 2 - 1) * 8,
+ * CRC2: 42 * 2 * 8 ~ (42 * 3 - 1) * 8.
+ */

Shouldn't we surround this with #ifdefs for USE_ARMV8_VMULL*?

     if (pg_crc32c_armv8_available())
+    {
+#if defined(USE_ARMV8_VMULL)
+        pg_comp_crc32c = pg_comp_crc32c_with_vmull_armv8;
+#elif defined(USE_ARMV8_VMULL_WITH_RUNTIME_CHECK)
+        if (pg_vmull_armv8_available())
+        {
+            pg_comp_crc32c = pg_comp_crc32c_with_vmull_armv8;
+        }
+        else
+        {
+            pg_comp_crc32c = pg_comp_crc32c_armv8;
+        }
+#else
         pg_comp_crc32c = pg_comp_crc32c_armv8;
+#endif
+    }

IMO it'd be better to move the #ifdefs into the functions so that we can
simplify this to something like

    if (pg_crc32c_armv8_available())
    {
        if (pg_crc32c_armv8_crypto_available())
            pg_comp_crc32c = pg_comp_crc32c_armv8_parallel;
        else
            pg_comp_crc32c = pg_comp_crc32c_armv8;
    else
        pc_comp_crc32c = pg_comp_crc32c_sb8;

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Heikki Linnakangas
Дата:
Сообщение: Re: AdvanceXLInsertBuffers() vs wal_sync_method=open_datasync
Следующее
От: Nathan Bossart
Дата:
Сообщение: Re: SLRU optimization - configurable buffer pool and partitioning the SLRU lock