Making type Datum be 8 bytes everywhere

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Making type Datum be 8 bytes everywhere
Дата
Msg-id 1749799.1752797397@sss.pgh.pa.us
обсуждение исходный текст
Список pgsql-hackers
In a discussion on Discord (in the PG #core-hacking channel,
which unfortunately is inaccessible to non-members), Andres
and Robert complained about the development/maintenance costs
of continuing to support 32-bit platforms.  Here is a modest
proposal to reduce those costs without going so far as to
entirely desupport such platforms: let's require them to use
8-byte Datums even though that's probably not a native data
type for them.  That lets us get rid of logic to support the
!USE_FLOAT8_BYVAL case, and allows a few other simplifications.

The attached patch switches to 8-byte Datums everywhere, but
doesn't make any effort to remove the now-dead code.  I made
it just as a proof-of-concept that this can work.  It compiled
cleanly and passed check-world for me on a 32-bit FreeBSD
image.

I've not looked into the performance consequences.  We probably
should at least try to measure that, though I'm not sure what
our threshold of pain would be for deciding not to do this.

Thoughts?

            regards, tom lane

From 175803f26dd76be274c469f958aef9c391bf88ff Mon Sep 17 00:00:00 2001
From: Tom Lane <tgl@sss.pgh.pa.us>
Date: Thu, 17 Jul 2025 19:55:31 -0400
Subject: [PATCH v0] Make type Datum be 8 bytes wide everywhere.

This very WIP patch aims to make sizeof(Datum) be 8 on all
platforms including 32-bit ones.  The objective is to allow
USE_FLOAT8_BYVAL to be true everywhere, and in consequence to
remove a lot of code that is specific to pass-by-reference
handling of float8, int8, etc.  In this way we can reduce the
maintenance effort involved in supporting 32-bit platforms,
without going so far as to actually desupport them.  Since Datum
is strictly an in-memory concept, this has no impact on on-disk
storage, though an initdb or pg_upgrade will be needed to fix
affected pg_type.typbyval entries.

We can expect that this change will make 32-bit builds a bit slower
and more memory-hungry, although being able to use pass-by-value
handling of 8-byte types may buy back some of that.  But we stopped
optimizing for 32-bit cases a long time ago, and this seems like
just another step on that path.

This initial patch simply forces the correct type definition and
USE_FLOAT8_BYVAL setting, and cleans up a couple of minor compiler
complaints that ensued.  This is sufficient for testing purposes.
It does compile cleanly and pass check-world for me on a 32-bit
platform.

If we pursue this path, we would of course want to remove all
the now-dead code (search for mentions of SIZEOF_DATUM and
USE_FLOAT8_BYVAL to find likely places to clean up).  I have
not bothered with that yet.  A more interesting question is
whether there are comments or calculations that need adjustment.
There is probably an effect on the largest array that we can
support in 32-bit mode, for example.
---
 src/backend/storage/ipc/ipc.c  |  2 +-
 src/include/nodes/nodes.h      |  8 ++++++--
 src/include/pg_config_manual.h | 13 ++++---------
 src/include/postgres.h         | 13 +++++++------
 4 files changed, 18 insertions(+), 18 deletions(-)

diff --git a/src/backend/storage/ipc/ipc.c b/src/backend/storage/ipc/ipc.c
index 567739b5be9..2704e80b3a7 100644
--- a/src/backend/storage/ipc/ipc.c
+++ b/src/backend/storage/ipc/ipc.c
@@ -399,7 +399,7 @@ cancel_before_shmem_exit(pg_on_exit_callback function, Datum arg)
         before_shmem_exit_list[before_shmem_exit_index - 1].arg == arg)
         --before_shmem_exit_index;
     else
-        elog(ERROR, "before_shmem_exit callback (%p,0x%" PRIxPTR ") is not the latest entry",
+        elog(ERROR, "before_shmem_exit callback (%p,0x%" PRIx64 ") is not the latest entry",
              function, arg);
 }

diff --git a/src/include/nodes/nodes.h b/src/include/nodes/nodes.h
index fbe333d88fa..b2dc380b57b 100644
--- a/src/include/nodes/nodes.h
+++ b/src/include/nodes/nodes.h
@@ -188,6 +188,8 @@ castNodeImpl(NodeTag type, void *ptr)
  * ----------------------------------------------------------------
  */

+#ifndef FRONTEND
+
 /*
  * nodes/{outfuncs.c,print.c}
  */
@@ -198,7 +200,7 @@ extern void outNode(struct StringInfoData *str, const void *obj);
 extern void outToken(struct StringInfoData *str, const char *s);
 extern void outBitmapset(struct StringInfoData *str,
                          const struct Bitmapset *bms);
-extern void outDatum(struct StringInfoData *str, uintptr_t value,
+extern void outDatum(struct StringInfoData *str, Datum value,
                      int typlen, bool typbyval);
 extern char *nodeToString(const void *obj);
 extern char *nodeToStringWithLocations(const void *obj);
@@ -212,7 +214,7 @@ extern void *stringToNode(const char *str);
 extern void *stringToNodeWithLocations(const char *str);
 #endif
 extern struct Bitmapset *readBitmapset(void);
-extern uintptr_t readDatum(bool typbyval);
+extern Datum readDatum(bool typbyval);
 extern bool *readBoolCols(int numCols);
 extern int *readIntCols(int numCols);
 extern Oid *readOidCols(int numCols);
@@ -235,6 +237,8 @@ extern void *copyObjectImpl(const void *from);
  */
 extern bool equal(const void *a, const void *b);

+#endif                            /* !FRONTEND */
+

 /*
  * Typedef for parse location.  This is just an int, but this way
diff --git a/src/include/pg_config_manual.h b/src/include/pg_config_manual.h
index 125d3eb5fff..7e1aa422332 100644
--- a/src/include/pg_config_manual.h
+++ b/src/include/pg_config_manual.h
@@ -74,17 +74,12 @@
 #define PARTITION_MAX_KEYS    32

 /*
- * Decide whether built-in 8-byte types, including float8, int8, and
- * timestamp, are passed by value.  This is on by default if sizeof(Datum) >=
- * 8 (that is, on 64-bit platforms).  If sizeof(Datum) < 8 (32-bit platforms),
- * this must be off.  We keep this here as an option so that it is easy to
- * test the pass-by-reference code paths on 64-bit platforms.
- *
- * Changing this requires an initdb.
+ * This symbol is now vestigial: built-in 8-byte types, including float8,
+ * int8, and timestamp, are always passed by value since we require Datum
+ * to be wide enough to permit that.  We continue to define the symbol here
+ * so as not to unnecessarily break extension code.
  */
-#if SIZEOF_VOID_P >= 8
 #define USE_FLOAT8_BYVAL 1
-#endif


 /*
diff --git a/src/include/postgres.h b/src/include/postgres.h
index 8a41a668687..f2d3ee5af3b 100644
--- a/src/include/postgres.h
+++ b/src/include/postgres.h
@@ -58,15 +58,18 @@

 /*
  * A Datum contains either a value of a pass-by-value type or a pointer to a
- * value of a pass-by-reference type.  Therefore, we require:
- *
- * sizeof(Datum) == sizeof(void *) == 4 or 8
+ * value of a pass-by-reference type.  Therefore, we must have
+ * sizeof(Datum) >= sizeof(void *).  No current or foreseeable Postgres
+ * platform has pointers wider than 8 bytes, and standardizing on Datum being
+ * exactly 8 bytes has advantages in reducing cross-platform differences.
  *
  * The functions below and the analogous functions for other types should be used to
  * convert between a Datum and the appropriate C type.
  */

-typedef uintptr_t Datum;
+typedef uint64_t Datum;
+
+#define SIZEOF_DATUM 8

 /*
  * A NullableDatum is used in places where both a Datum and its nullness needs
@@ -83,8 +86,6 @@ typedef struct NullableDatum
     /* due to alignment padding this could be used for flags for free */
 } NullableDatum;

-#define SIZEOF_DATUM SIZEOF_VOID_P
-
 /*
  * DatumGetBool
  *        Returns boolean value of a datum.
--
2.43.5


В списке pgsql-hackers по дате отправления: