Обсуждение: table AM option passing
Hello, In the table AM world, we provide table_tuple_insert() that gets some behavior-changing options via a bitmask; and then we have table_tuple_update and table_tuple_delete which take a couple of booleans for the same purpose. To me it seems that it would make sense to make these things consistent. Now I don't want consistency just for the sake of it. The issue is that the REPACK CONCURRENTLY patch wants to add one more option to those two routines. (In reality, the repack patch as posted doesn't do that because it deals with heapam.c routines directly instead of going through table-AM, so there was no need to touch tableam.h; but of course that's not a good way to implement it, because then you can't repack tables that aren't heapam-based. So that patch would get more invasive as we add those bool options everywhere.) We could solve this easily by adding one more boolean to each, but I think this is a good time to instead consolidate the API by using a bitmask; this also allows us not to have changingPart in the wrong place of the heap_delete API. So here's a patch to do that, which shouldn't change any behavior. (This change is vaguely similar to b7271aa1d71a, except I used 'int' instead of 'bits32', to keep the interface consistent with the existing heap_insert() one. Maybe I should make all three take bits64 instead? We don't actually have that type at present, so I'd have to add that too.) While at it, I noticed that the table_insert() and heap_insert() uses one set of value definitions for each half of the interface; that is, in tableam.h we have /* "options" flag bits for table_tuple_insert */ /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */ #define TABLE_INSERT_SKIP_FSM 0x0002 #define TABLE_INSERT_FROZEN 0x0004 #define TABLE_INSERT_NO_LOGICAL 0x0008 and in heapam.h we have /* "options" flag bits for heap_insert */ #define HEAP_INSERT_SKIP_FSM TABLE_INSERT_SKIP_FSM #define HEAP_INSERT_FROZEN TABLE_INSERT_FROZEN #define HEAP_INSERT_NO_LOGICAL TABLE_INSERT_NO_LOGICAL #define HEAP_INSERT_SPECULATIVE 0x0010 This seems rather odd to me -- how could heapam.c have a different set of behaviors than what table AM uses? I find it even more weird that HEAP_INSERT_SPECULATIVE is defined so that as soon as some future patch defines the next "free" tableam.h flag value to do something new, we'll have a conflict. I think this would be cleaner if we removed from heapam.h the flags that correspond to anything in tableam.h, and use heapam.c and all its direct callers use the tableam.h flag definitions instead; and perhaps move HEAP_INSERT_SPECULATIVE to be at the other end of the bitmask (0x1000) -- maybe simply say in tableam.h that the first byte of the options int is reserved for internal use. Anyway, this is the reason I only defined these flags in tableam.h and nothing appears in heapam.h about it. They're just something heapam.c is forced to know about. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ Al principio era UNIX, y UNIX habló y dijo: "Hello world\n". No dijo "Hello New Jersey\n", ni "Hello USA\n".
Вложения
On Tue, Mar 17, 2026 at 05:50:41PM +0100, Álvaro Herrera wrote: > We could solve this easily by adding one more boolean to each, but I > think this is a good time to instead consolidate the API by using a > bitmask; this also allows us not to have changingPart in the wrong place > of the heap_delete API. > > So here's a patch to do that, which shouldn't change any behavior. Seems entirely reasonable to me. I read through the patch and nothing stood out. > (This change is vaguely similar to b7271aa1d71a, except I used 'int' > instead of 'bits32', to keep the interface consistent with the existing > heap_insert() one. Maybe I should make all three take bits64 instead? > We don't actually have that type at present, so I'd have to add that > too.) Why bits64 and not bits32? I must be missing something. > While at it, I noticed that the table_insert() and heap_insert() uses > one set of value definitions for each half of the interface; that is, in > tableam.h we have > > /* "options" flag bits for table_tuple_insert */ > /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */ > #define TABLE_INSERT_SKIP_FSM 0x0002 > #define TABLE_INSERT_FROZEN 0x0004 > #define TABLE_INSERT_NO_LOGICAL 0x0008 > > and in heapam.h we have > /* "options" flag bits for heap_insert */ > #define HEAP_INSERT_SKIP_FSM TABLE_INSERT_SKIP_FSM > #define HEAP_INSERT_FROZEN TABLE_INSERT_FROZEN > #define HEAP_INSERT_NO_LOGICAL TABLE_INSERT_NO_LOGICAL > #define HEAP_INSERT_SPECULATIVE 0x0010 > > This seems rather odd to me -- how could heapam.c have a different set > of behaviors than what table AM uses? I find it even more weird that > HEAP_INSERT_SPECULATIVE is defined so that as soon as some future patch > defines the next "free" tableam.h flag value to do something new, we'll > have a conflict. I think this would be cleaner if we removed from > heapam.h the flags that correspond to anything in tableam.h, and use > heapam.c and all its direct callers use the tableam.h flag definitions > instead; and perhaps move HEAP_INSERT_SPECULATIVE to be at the other end > of the bitmask (0x1000) -- maybe simply say in tableam.h that the first > byte of the options int is reserved for internal use. Probably a good idea. -- nathan
Hi Nathan, thanks for looking, On 2026-Mar-17, Nathan Bossart wrote: > On Tue, Mar 17, 2026 at 05:50:41PM +0100, Álvaro Herrera wrote: > > (This change is vaguely similar to b7271aa1d71a, except I used 'int' > > instead of 'bits32', to keep the interface consistent with the existing > > heap_insert() one. Maybe I should make all three take bits64 instead? > > We don't actually have that type at present, so I'd have to add that > > too.) > > Why bits64 and not bits32? I must be missing something. augh, that's just a thinko -- yeah, we could use bits32 here and that wouldn't represent a reduction in number of possible flags. Does anybody oppose changing table_tuple_insert() to use bits32 instead of integer for the 'options' argument? -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "I dream about dreams about dreams", sang the nightingale under the pale moon (Sandman)
On Tue, Mar 17, 2026 at 08:47:22PM +0100, Álvaro Herrera wrote: > Does anybody oppose changing table_tuple_insert() to use bits32 instead > of integer for the 'options' argument? No objections here. -- nathan
Hi, On 2026-03-17 17:50:41 +0100, Álvaro Herrera wrote: > In the table AM world, we provide table_tuple_insert() that gets some > behavior-changing options via a bitmask; and then we have > table_tuple_update and table_tuple_delete which take a couple of > booleans for the same purpose. To me it seems that it would make sense > to make these things consistent. I'm not sure these cases are *entirely* comparable. The boolean argument for table_tuple_delete()/table_tuple_update() is whether to wait, which doesn't influence the on-disk layout, whereas TABLE_INSERT_* do influence the disk layout. > While at it, I noticed that the table_insert() and heap_insert() uses > one set of value definitions for each half of the interface; that is, in > tableam.h we have > > /* "options" flag bits for table_tuple_insert */ > /* TABLE_INSERT_SKIP_WAL was 0x0001; RelationNeedsWAL() now governs */ > #define TABLE_INSERT_SKIP_FSM 0x0002 > #define TABLE_INSERT_FROZEN 0x0004 > #define TABLE_INSERT_NO_LOGICAL 0x0008 > > and in heapam.h we have > /* "options" flag bits for heap_insert */ > #define HEAP_INSERT_SKIP_FSM TABLE_INSERT_SKIP_FSM > #define HEAP_INSERT_FROZEN TABLE_INSERT_FROZEN > #define HEAP_INSERT_NO_LOGICAL TABLE_INSERT_NO_LOGICAL > #define HEAP_INSERT_SPECULATIVE 0x0010 > > This seems rather odd to me -- how could heapam.c have a different set > of behaviors than what table AM uses? I am not sure I understand what you mean by that. Just that the flags better always have the same values? I think the background for the HEAP_* ones to exist is just that there were (probably are) direct callers to heap_insert() and it seemed a bit odd to refer to the generic flags and that there was a need for a heap specific private flag. > I find it even more weird that HEAP_INSERT_SPECULATIVE is defined so that as > soon as some future patch defines the next "free" tableam.h flag value to do > something new, we'll have a conflict. I think this would be cleaner if we > removed from heapam.h the flags that correspond to anything in tableam.h, > and use heapam.c and all its direct callers use the tableam.h flag > definitions instead; and perhaps move HEAP_INSERT_SPECULATIVE to be at the > other end of the bitmask (0x1000) -- maybe simply say in tableam.h that the > first byte of the options int is reserved for internal use. I'm ok with both of those. > From 794f655f33bb89d979a9948810fdd4800a8241ff Mon Sep 17 00:00:00 2001 > From: =?UTF-8?q?=C3=81lvaro=20Herrera?= <alvherre@kurilemu.de> > Date: Tue, 17 Mar 2026 17:21:14 +0100 > Subject: [PATCH v1] table_tuple_update/_delete(): use an options bitmask > > This replaces a couple of booleans, making the interface more similar to > what table_tuple_insert() uses, and better suited for future expansion. > > Discussion: https://postgr.es/m/202603171606.kf6pmhscqbqz@alvherre.pgsql > --- > src/backend/access/heap/heapam.c | 15 +++++---- > src/backend/access/heap/heapam_handler.c | 10 +++--- > src/backend/access/table/tableam.c | 3 +- > src/backend/executor/nodeModifyTable.c | 11 ++++--- > src/include/access/heapam.h | 6 ++-- > src/include/access/tableam.h | 40 ++++++++++++++++-------- > 6 files changed, 52 insertions(+), 33 deletions(-) > > diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c > index e5bd062de77..1cf74ed8c46 100644 > --- a/src/backend/access/heap/heapam.c > +++ b/src/backend/access/heap/heapam.c > @@ -2852,8 +2852,8 @@ xmax_infomask_changed(uint16 new_infomask, uint16 old_infomask) > */ > TM_Result > heap_delete(Relation relation, const ItemPointerData *tid, > - CommandId cid, Snapshot crosscheck, bool wait, > - TM_FailureData *tmfd, bool changingPart) > + CommandId cid, Snapshot crosscheck, int options, > + TM_FailureData *tmfd) If we introduce new flag things, we should make them unsigned imo. It's a bad habit that we don't do that everywhere. I've spent a fair bit of time finding bugs due to that in the past (e.g. 2a2e1b470b9). Greetings, Andres Freund
Hi, On 2026-03-17 20:47:22 +0100, Álvaro Herrera wrote: > On 2026-Mar-17, Nathan Bossart wrote: > > > On Tue, Mar 17, 2026 at 05:50:41PM +0100, Álvaro Herrera wrote: > > > > (This change is vaguely similar to b7271aa1d71a, except I used 'int' > > > instead of 'bits32', to keep the interface consistent with the existing > > > heap_insert() one. Maybe I should make all three take bits64 instead? > > > We don't actually have that type at present, so I'd have to add that > > > too.) > > > > Why bits64 and not bits32? I must be missing something. > > augh, that's just a thinko -- yeah, we could use bits32 here and that > wouldn't represent a reduction in number of possible flags. > > Does anybody oppose changing table_tuple_insert() to use bits32 instead > of integer for the 'options' argument? Personally I object to the existence of the bits* types, to me they're just noise over using the corresponding unsigned integer types. One more thing that one has to just know what it means without there being any actual improved type checking or such. It's not like using bits* would make it any easier to make the underlying type a struct or such (which is different to e.g. TransactionId, we could probably replace that with a struct without crazy amounts of trouble). I think we should just rip the bits* types out and replace them with the underlying types. ... I do however think we should make the table_tuple_insert options argument unsigned though. So I guess I might have to swallow the bitter pill of the bits* type. Greetings, Andres Freund
Hello! I think there's a change missing in simple_table_tuple_update that works by accident, as true == 1 == TABLE_UPDATE_WAIT. Maybe the values could use a different starting value instead of 1 to surface possible issues? + * TABLE_DELETE_WAIT -- set if should wait for any conflicting + * update/delete to commit/abort + * TABLE_DELETE_CHANGING_PART -- set iff the tuple is being moved to + * another partition table due to an update of the partition key. + * Otherwise, false. "Otherwise, false" seems like a leftover from the previous comment version? tableam.h also have two leftover "wait == false" comments.
On Tue, Mar 17, 2026 at 05:09:49PM -0400, Andres Freund wrote: > Personally I object to the existence of the bits* types, to me they're just > noise over using the corresponding unsigned integer types. One more thing that > one has to just know what it means without there being any actual improved > type checking or such. It's not like using bits* would make it any easier to > make the underlying type a struct or such (which is different to > e.g. TransactionId, we could probably replace that with a struct without crazy > amounts of trouble). Yeah, I don't see why you'd prefer bits32 over uint32. If anything, uint32 is probably less confusing because most hackers will have used it before. AFAICT the bits* types are a relic of the 80s, and there used to be other types like bool8 and word32, all of which were just uint* behind the scenes. Those were removed in 2004 by commit ca7a1f0c86. I assume bits* was left behind because it was still in use. > I think we should just rip the bits* types out and replace them with the > underlying types. +1. If there seems to be consensus, I'm happy to write the patch. -- nathan
Hello, Thanks Zsolt for the notes -- I have fixed these. On 2026-Mar-17, Andres Freund wrote: > On 2026-03-17 17:50:41 +0100, Álvaro Herrera wrote: > > In the table AM world, we provide table_tuple_insert() that gets some > > behavior-changing options via a bitmask; and then we have > > table_tuple_update and table_tuple_delete which take a couple of > > booleans for the same purpose. To me it seems that it would make sense > > to make these things consistent. > > I'm not sure these cases are *entirely* comparable. The boolean argument for > table_tuple_delete()/table_tuple_update() is whether to wait, which doesn't > influence the on-disk layout, whereas TABLE_INSERT_* do influence the disk > layout. True -- I agree that 'wait' is not "the same kind" of option, makes sense that it would be separate from options that affect how the tuple is stored. tuple_delete's changingPart however looks like it should be part of a bitmask of options. The repack patch wants to add further options to both tuple_update and tuple_delete, namely NO_LOGICAL which pretty much mirrors what TABLE_INSERT_NO_LOGICAL does. So the attached 0001 adds a 'bits32 options' argument to both table_update and table_delete, and makes TABLE_DELETE_CHANGING_PARTITION the first option for table_delete to replace the 'bool changingPart'. There are no options for table_update() at present, but it seems inconsistent to not have an argument to it, so I added one that for now remains unused. The first such option is going to be added by repack. > > While at it, I noticed that the table_insert() and heap_insert() uses > > one set of value definitions for each half of the interface; [...] > I am not sure I understand what you mean by that. Just that the flags better > always have the same values? > > I think the background for the HEAP_* ones to exist is just that there were > (probably are) direct callers to heap_insert() and it seemed a bit odd to > refer to the generic flags and that there was a need for a heap specific > private flag. Yeah, it makes sense -- I mean, it's not wrong. But I think it's a bit awkward. In a way, it's as if heapam.c continues to live in a world where tableam.h could one day be removed, so it (heapam.h) needs to redefine its interface in an agnostic way. But I think that's somewhat delusional, and there's no value in keeping this separation. My proposal is simply that heapam.c can be made to work under the tableam.h names of those flag bits, and no abstraction is being broken. 0002 does this, as well as move HEAP_INSERT_SPECULATIVE be the highest bit rather than immediately follow the TABLE_INSERT_* ones. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ Essentially, you're proposing Kevlar shoes as a solution for the problem that you want to walk around carrying a loaded gun aimed at your foot. (Tom Lane)
Вложения
Hello, I just pushed the part of 0001 that modifies the API of table_insert and its sibling functions. So here 0001 adds the options bitmask to table_update and table_delete, which is important for REPACK; and 0002 is identical as before and makes the interface (IMO) a bit more future-proof. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Every machine is a smoke machine if you operate it wrong enough." https://twitter.com/libseybieda/status/1541673325781196801
Вложения
Hi, On 2026-Mar-30, Álvaro Herrera wrote: > I just pushed the part of 0001 that modifies the API of table_insert and > its sibling functions. So here 0001 adds the options bitmask to > table_update and table_delete, which is important for REPACK; and 0002 > is identical as before and makes the interface (IMO) a bit more > future-proof. FWIW I just posted 0001 as part of the repack series here [1]; 0002 is inessential, so I didn't; but here's both of them. I have drafted commit messages also. [1] https://postgr.es/m/202603311523.iqhng5ljkzpq@alvherre.pgsql -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Pensar que el espectro que vemos es ilusorio no lo despoja de espanto, sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis)
Вложения
static inline TM_Result table_tuple_delete(Relation rel, ItemPointer tid, CommandId cid, - Snapshot snapshot, Snapshot crosscheck, bool wait, - TM_FailureData *tmfd, bool changingPart) + uint32 options, Snapshot snapshot, Snapshot crosscheck, + bool wait, TM_FailureData *tmfd) The doc comment still referneces changingPart Similarly table_tuple_update doesn't document the new options parameter. @@ -339,7 +341,8 @@ heapam_tuple_update(Relation relation, ItemPointer otid, TupleTableSlot *slot, slot->tts_tableOid = RelationGetRelid(relation); tuple->t_tableOid = slot->tts_tableOid; - result = heap_update(relation, otid, tuple, cid, crosscheck, wait, + result = heap_update(relation, otid, tuple, cid, options, + crosscheck, wait, tmfd, lockmode, update_indexes); ItemPointerCopy(&tuple->t_self, &slot->tts_tid); options is marked pg_attribute_unused above, that seems misleading. Should the annotation be part of the heap_update signature instead?
> On Apr 1, 2026, at 00:10, Álvaro Herrera <alvherre@kurilemu.de> wrote: > > Hi, > > On 2026-Mar-30, Álvaro Herrera wrote: > >> I just pushed the part of 0001 that modifies the API of table_insert and >> its sibling functions. So here 0001 adds the options bitmask to >> table_update and table_delete, which is important for REPACK; and 0002 >> is identical as before and makes the interface (IMO) a bit more >> future-proof. > > FWIW I just posted 0001 as part of the repack series here [1]; 0002 is > inessential, so I didn't; but here's both of them. I have drafted > commit messages also. > > [1] https://postgr.es/m/202603311523.iqhng5ljkzpq@alvherre.pgsql > > -- > Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ > "Pensar que el espectro que vemos es ilusorio no lo despoja de espanto, > sólo le suma el nuevo terror de la locura" (Perelandra, C.S. Lewis) > <v4-0001-Give-options-parameter-to-table_delete-table_upda.patch><v4-0002-Change-heapam.c-to-obey-tableam.h-option-bits-dir.patch> Looks like v4-0001/0002 are pure refactoring. A few small comments: 1 - 0001 For table_tuple_delete(), options is added and changingPart is removed, but the header comment should be updated to reflectthe change. 2 - 0002 For table_tuple_update(), options is added, the header comment should be updated as well. 3 - 0002 Now TABLE_INSERT_SKIP_FSM, TABLE_INSERT_FROZEN, TABLE_INSERT_NO_LOGICAL belong to the same options word as HEAP_INSERT_SPECULATIVE,but they are still defined as: ``` #define TABLE_INSERT_SKIP_FSM 0x0002 #define TABLE_INSERT_FROZEN 0x0004 #define TABLE_INSERT_NO_LOGICAL 0x0008 ``` Could it make sense to change them to the left-shift form? 4 - 0002 In heap_multi_insert(), heap_prepare_insert(), and heap_insert(), options is defined as uint32, but in RelationGetBufferForTuple()and raw_heap_insert() it is still defined as int. Would it make sense to take this opportunityto change all “options" to uint32 for consistency? Otherwise, if this is left for later, a trivial follow-up patchjust to change int to uint32 may be harder to get processed. Best regards, -- Chao Li (Evan) HighGo Software Co., Ltd. https://www.highgo.com/
On 2026-Apr-01, Chao Li wrote: > 1 - 0001 > For table_tuple_delete(), options is added and changingPart is > removed, but the header comment should be updated to reflect the > change. True, fixed. > 2 - 0002 > For table_tuple_update(), options is added, the header comment should > be updated as well. Done. > 3 - 0002 > Now TABLE_INSERT_SKIP_FSM, TABLE_INSERT_FROZEN, TABLE_INSERT_NO_LOGICAL belong to the same options word as HEAP_INSERT_SPECULATIVE,but they are still defined as: > ``` > #define TABLE_INSERT_SKIP_FSM 0x0002 > #define TABLE_INSERT_FROZEN 0x0004 > #define TABLE_INSERT_NO_LOGICAL 0x0008 > ``` > > Could it make sense to change them to the left-shift form? Yeah, I don't know. I don't like this style, but some people like it, and I don't want to get into an argument about this kind of thing. > 4 - 0002 > In heap_multi_insert(), heap_prepare_insert(), and heap_insert(), > options is defined as uint32, but in RelationGetBufferForTuple() and > raw_heap_insert() it is still defined as int. Would it make sense to > take this opportunity to change all “options" to uint32 for > consistency? Otherwise, if this is left for later, a trivial follow-up > patch just to change int to uint32 may be harder to get processed. Hmm, this is actually a problem in 1bd6f22f43ac. I added a preliminary patch that should fix this. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "No renuncies a nada. No te aferres a nada."
Вложения
Álvaro Herrera <alvherre@kurilemu.de> wrote: > Hmm, this is actually a problem in 1bd6f22f43ac. I added a preliminary > patch that should fix this. I've reviewed this patch set too. The only question that occurs to me is whether INSERT_SKIP_FSM and INSERT_FROZEN shouldn't actually be prefixed with HEAP_, as these are rather low-level and thus might be considered AM-specific. (I like the idea to use the high bits to avoid collisions.) -- Antonin Houska Web: https://www.cybertec-postgresql.com
On 2026-Apr-01, Antonin Houska wrote: > I've reviewed this patch set too. The only question that occurs to me is > whether INSERT_SKIP_FSM and INSERT_FROZEN shouldn't actually be prefixed with > HEAP_, as these are rather low-level and thus might be considered > AM-specific. Thanks! I pushed 0001 and 0002 from this patchset, with minimal cosmetic corrections. I realized that patch 0003 is doing two different things, and they should each be their own patch which can be rejected if we don't like them; so I split it in two. One moves the heapam.h-private bit to the 32th bit. The other removes the duplicity of definitions, so that heapam.h doesn't have to feel it needs to redefine the tableam.h interface. At this point these patches could be considered WIP in the sense that I wouldn't commit exactly as is (minor corrections might be appropriate, for example to the nearby comments), but I would like to know opinions in case we decide to just throw them away. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/ "Las navajas y los monos deben estar siempre distantes" (Germán Poo)
Вложения
Álvaro Herrera <alvherre@kurilemu.de> wrote: > On 2026-Apr-01, Antonin Houska wrote: > I realized that patch 0003 is doing two different things, and they > should each be their own patch which can be rejected if we don't like > them; so I split it in two. One moves the heapam.h-private bit to the > 32th bit. I'm sorry I haven't recalled yesterday, but this technique resembles the DSM keys in parallel.c: /* * Magic numbers for per-context parallel state sharing. Higher-level code * should use smaller values, leaving these very large ones for use by this * module. */ #define PARALLEL_KEY_FIXED UINT64CONST(0xFFFFFFFFFFFF0001) ... What I found inspiring here is that the "core" uses the high bits while users of the API use the lower ones. Perhaps it'd be appropriate in v6-0001 to reserve the high bits for the TABLE_ options and leave the lower ones for the HEAP_ options. If someone implements a new AM (possibly as an extension), it should be more comfortable for him. -- Antonin Houska Web: https://www.cybertec-postgresql.com