Обсуждение: Refactor how we form HeapTuples for CatalogTuple(Insert|Update)
Hello,
I've been working on expanding HOT updates [1] and have recently had
some success (will be posted soon to the thread) moving the work done
within HeapDetermineColumnsInfo() into the executor. I found that
heap_update() is used in two paths: heap_tuple_update() and
simple_tuple_update(), both call into heap_update() which then uses
HeapDetermineColumnsInfo() to find the set of indexed attributes that
have been modified during the update. This makes sense on the normal
path from the executor, but the simple_tuple_update() path is only
called from CatalogTupleUpdate() and when I looked at how catalog tuples
were formed/mutated on that path I discovered something that I had to fix.
Catalog tuples have knowledge of what attributes are mutated implicitly,
but they don't preserve that information and so
HeapDetermineColumnsInfo() has to re-discover that later on (while the
page is locked). While on the surface this isn't such a big deal, it's
been that way and worked well for years I have other motivations (see
[1] again) for improving this.
So, I set about to create a way to retain the knowledge of what was
modified while mutating catalog tuples and pass that information on to
CatalogTupleUpdate() as a Bitmapset. That is the underlying goal of
this patch set. To make no behavioral changes, only syntactic ones, and
end up with something better overall.
That turned out to be a bit of a challenge as we have a lot of places
where catalog information is updated. We also have two different
methods for updating said information. And we intermix them. At times
we hack our way to a solution and ignore convention. I went down the
rabbit hole on this one, but I'm back up for a cup of tea because what
I've done seems materially better to me and I've accomplished the goal
(and can eventually make use of the result in [1]). Is this patch
useful even without that other work? I think so, just on the basis of
cleanup. Let me explain.
Historically there have been two methods for modifying heap tuples
destined for the catalog; Form/GETSTRUCT, and values/nulls/replaces.
This new method provides a more intuitive and less error-prone approach
without changing the fundamentals of the process. In addition, it is
now possible to retain knowledge of the set of mutated attributes when
working with catalog tuples. This isn't used in practice (yet [1]), but
could be a way to avoid the overhead of HeapDetermineColumnsInfo() in
heap_update() where (while holding a lock) we re-discover that set by
comparing old/new HeapTuple datums when trying to find what indexed
attributes have new values and prevent HOT updates.
Method 1: Form/GETSTRUCT
The "Form/GETSTRUCT" model allows for direct access to the tuple data
that is then modified, copied, and then updated via CatalogTupleUpdate().
Old:
Form_pg_index relform = (Form_pg_index) GETSTRUCT(tuple);
relform->inisclustered = false;
CatalogTupleUpdate(pg_index, &tuple->t_self, tuple);
New:
Bitmapset *updated = NULL;
Form_pg_index relform = (Form_pg_index) GETSTRUCT(tuple);
HeapTupleUpdateField(pg_index, indisclustered, false, indexForm, updated);
CatalogTupleUpdate(pg_index, &indexTuple->t_self, indexTuple, updated, NULL);
bms_free(updated);
Method 2: values/nulls/replaces
The "values/nulls/replaces" model collects the necessary information to
either update or create a heap tuple using heap_modify_tuple() or
heap_form_tuple() or sometimes heap_modify_tuple_by_cols(). While all
those functions remain unchanged now the prefered function for catalog
tuple modification is heap_update_tuple().
Old:
bool nulls[Natts_pg_type];
bool replaces[Natts_pg_type];
Datum values[Natts_pg_type];
values[Anum_pg_type_typtype - 1] = CharGetDatum(typeType);
nulls[Anum_pg_type_typdefaultbin - 1] = true;
replaces[Anum_pg_type_oid - 1] = false;
tuple = heap_modify_tuple(tuple, desc, values, nulls, replaces);
CatalogTupleUpdate(pg_type_desc, &tuple->t_self, tuple);
New:
Datum values[Natts_pg_type] = {0};
bool nulls[Natts_pg_type] = {false};
Bitmapset *updated = NULL;
HeapTupleUpdateValue(pg_type, typtype, CharGetDatum(typeType), values,
nulls, updated);
HeapTupleUpdateValueNull(pg_type, typdefaultbin, values, nulls, updated);
HeapTupleSetColumnNotUpdated(pg_type, oid, updated);
tuple = heap_update_tuple(tuple, desc, values, nulls, updated);
CatalogTupleUpdate(pg_type_desc, &tuple->t_self, tuple, updated, NULL);
In addition CatalogTupleUpdate and CatalogTupleInsert now have an
optional argument for the CatalogIndexState, when not provided it is
created and then released for the caller. This eliminates the need for
their "WithInfo" companion functions.
heap_update_tuple() is functionally equivalent to heap_modify_tuple(),
but takes a Bitmapset called "updated" rather than an array of bool
(generally) called "replaces" as a method for indicating what was
modified. Additionally, this new function tries to balance the
trade-offs of calling heap_getattr() versus heap_deform_tuple() based on
the ratio of attributes updated and their known runtime complexities.
Both paths are functionally equivalent.
The changes also include initialization of the values/nulls arrays
rather than loops or memset(). Some effort was also given to unify
naming of these variables and their order so as to lower cognitive
overhead.
So, let me dive into the attached patches. On advice from my mentor
Noah Misch I've divided the changes in two. The first patch contains
all the infrastructure and the set of files where applying the changes
got "interesting" and should be highlighted. The second is the
mechanical changes to the remaining files, and it's HUGE, apologies for
the blast radius here but it couldn't be helped.
0001 - Refactor how we form HeapTuples for CatalogTuple(Insert|Update)
htup.h: is where you'll find the new macros. There is a "getter" called
HeapTupleValue() to find the field in the values array and not worry
about "-1". After that, they break down into a two cases, insert and
update, with forms addressing method 1 (Form/GETSTRUCT) and method 2 (values/nulls/updated).
Insert macros take the form "HeapTupleSet...()" as in
"HeapTupleSetField(...)" or "HeapTupleSetValue(...)". There is the
assignment to NULL which is "HeapTupleSetFieldNull(...)" and
"HeapTupleSetValueNull(...)" as well.
Arguments to these macros are:
table_name, field, value, (nulls), (form_ptr)
These macros simply translate back to the forms that we use now and in
some cases combine work or ensure correctness.
Update macros follow a similar pattern "HeapTupleUpdate...()" and then
have "HeapTupleUpdateField..." and "HeapTupleUpdateValue..." cases to
cover the two methods we use.
Arguments are again uniform:
table_name, field, value, values, nulls, updated
There are additional macros for a few update-related special cases as well:
* HeapTupleMarkColumnUpdated() - not something you'll normally use as
the update methods all capture this information already, but there are
occasions where this is useful.
* HeapTupleUpdateSetAllColumnsUpdated() - 99.9% of the time we have
historically started off with a "replaces" array where every value in
that array is "false", but there are a few places where we had occasion
to do the opposite, start with the assumption that all attributes were
modified. This covers that case setting all the bits in the "updated"
Bitmapset correctly.
heaptuple.c: you'll find here my new heap_update_tuple() function. This
one is interesting as it tries to balance out the cost of heap_getattr()
vs heap_deform_tuple(). Why did I bother? Read the comment on
heap_modify_tuple(), and I was there... so why not? You may disagree
with my approach, here's a portion of the comment that explains it:
* Performance strategy:
* - If updating many attributes (> 2*natts/3), use heap_getattr() to extract
* only the few non-updated attributes. This is O(k*n) where k is the number
* of non-updated attributes, which is small when updating many.
* - If updating few attributes (<= 2*natts/3), use heap_deform_tuple() to
* extract all attributes at once (O(n)), then replace the updated ones.
* This avoids the O(n^2) cost of many heap_getattr() calls.
*
* The threshold of 2*natts/3 balances the fixed O(n) cost of heap_deform_tuple
* against the variable O(k*n) cost of heap_getattr, where k = natts - num_updated.
indexing.c: has the change to CatalogTupleUpdate/Insert and removal of
their "WithInfo" companions. If that second part is controversial then
I don't mind reverting it, but IMO this is cleaner and might either
inspire more call sites to create and reuse CatalogIndexState or for
someone (me?) to address the comment that pre-dates my work:
* XXX: At some point we might cache the CatalogIndexState data
somewhere (perhaps
* in the relcache) so that callers needn't trouble over this.
The remaining files in this patch (we're still on 0001) are in the
category of *interesting* or non-obvious changes I ran into that were
"fun" to fix...
pg_aggregate.c: in AggregateCreate() we run into a "compromise" pattern
where the code code either update or insert. In such cases hackers
should use the macros for update, not insert, everywhere. This records
the additional information for the update path and doesn't burden the
insert path with much more than a Bitmapset to free.
This reminds me, there is also the case where a function is looping and
updating a number of catalog tuples. Care must be taken to scope the
Bitmapset and free it every iteration as in:
while( doing all the things )
{
Bitmapset updated = NULL;
... update a tuple
CatalogTupleUpdate();
bms_free(updated);
}
This also comes up when a function has multiple updates not in a loop.
Don't reuse the Bitmapset, just scope it. If you do reuse it then:
bms_free(updated);
updated = NULL;
and later... when you start using it again:
Assert(bms_is_empty(updated));
pg_constraint.c: brings us to namestrcpy(), take a look at
RenameConstraintById(). When you run into namestrcpy() you've likely
just done the mutation to the form directly but you'll want to mark that
field as updated, as in:
namestrcpy(&(con->conname), newname);
HeapTupleMarkColumnUpdated(pg_constraint, conname, updated);
This pattern is also needed after functions like pg_add_s32_overflow(),
just HeapTupleMarkColumnUpdated() and you're fine. I used the regex:
"namestrcpy|pg_.*_.*flow" to find these cases, YMMV.
alter.c: take a look at AlterObjectRename_internal() which uses
get_object_attnum_name() to get the AttrNumber that is to be altered.
This is fine, but it's not something we can just plug into a macro and
get the proper name expansion. So, we do it the manual way and add a comment:
/*
* NOTE: We can't use the HeapTupleMarkColumnUpdated() macro here because
* 'Anum_name' isn't a table/column name, it's a index for the relation
* passed into the function as an argument.
*/
namestrcpy(&nameattrdata, new_name);
values[Anum_name - 1] = NameGetDatum(&nameattrdata);
updated = bms_add_member(updated, Anum_name - FirstLowInvalidHeapAttributeNumber);
analyze.c: uses the HeapTupleSetAllColumnsUpdated() at the start of the
update_attstats() function to match the replaced code. Later there's
another interesting deviation from the normal pattern in that function
where we want to update the nth stakind/opt/coll this was frustrating at
first until I found that this worked:
HeapTupleSetValue(pg_statistic, stakind1 + k,
Int16GetDatum(stats->stakind[k]), values);
I'll be "cpp", here's what that turns into:
(values)[Anum_pg_statistic_stakind1 + k - 1] = (Int16GetDatum(stats->stakind[k]));
So, we're taking the attribute number adding "k" then subtracting 1, and
voila "Bob's your uncle." While a bit unconventional, I think the
resulting form is understandable, much more than before which was:
i = Anum_pg_statistic_stakind1 - 1;
for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
values[i++] = Int16GetDatum(stats->stakind[k]); /* stakindN */
attribute_stats.c: take a look at set_stats_slot() where I have a
comment explaining this "field + n" method adding a "Don't Panic" might
help too. Here's that comment:
/*
* NOTE: This might seem odd, to be adding 'i' to the name of the field on
* these macros, but that's what we need to do here to find the proper
* slot offset and record the proper value into the updated bitmap.
*
* Example: HeapTupleValue(pg_statistic, stakind1 + i, values);
*
* Becomes: values[Anum_pg_statistic_stakind1 + i - 1];
*
* The result is you're indexing the i'th value, exactly what we needed.
*/
The really amazing and valuable piece here is how that plays out in the following:
HeapTupleUpdateValue(pg_statistic, stakind1 + i, Int16GetDatum(stakind),
values, nulls, updated);
Playing the role of "cpp" again we get:
do { \
(values)[Anum_statistic_stakind1 + i - 1] = (Datum)
(Int16GetDatum(stakind)); \
(nulls)[Anum_statistic_stakind1 + i - 1] = false; \
Anum_statistic_stakind1 - (-7))
} while(0);
The result is that we've set the value/isnull at the correct position in
both the values and nulls arrays and then added the correct attribute
number into the Bitmapset (drops the mic).
Okay, and that about does it for "0001" with the foundation and then the
odd cases.
0002 - Update the remainder of catalog updates using the new APIs
This is a lot of applying the pattern over and over until it all works.
That's it. Nothing much more interesting here.
This is a lot of change, I get that, I think there's value in change
when the result is cleaner and more maintainable. The one place I'll
agree up front where this might case problems is back-patching fixes.
That'll have to have the new approach in some branches and the old
approach in others. It's a 5-year commitment, I don't take that lightly.
If you made it this far, I owe you a beer/coffee/token-of-appreciation.
Without this patch my [1] will have an ugly wart
(HeapDetermineColumnsInfo()) for the CatalogTupleUpdate path only. I
can live with that, I'd rather not.
best,
-greg
[1] https://www.postgresql.org/message-id/flat/78574B24-BE0A-42C5-8075-3FA9FA63B8FC%40amazon.com
Вложения
On Fri, Nov 14, 2025 at 10:31:28AM -0500, Greg Burd wrote:
> Catalog tuples have knowledge of what attributes are mutated implicitly,
> but they don't preserve that information and so
> HeapDetermineColumnsInfo() has to re-discover that later on (while the
> page is locked). While on the surface this isn't such a big deal, it's
> been that way and worked well for years I have other motivations (see
> [1] again) for improving this.
Yeah, I've heard that this has been also discussed at the New York
pgconf, where designs were mentioned so as we do not execute
expressions while locking pages. One issue is that this could easily
deadlock if an expression has the idea to re-read the same page we are
locking. So I have a summary of what you have been doing in mind, a
very rough one I suspect.
> That turned out to be a bit of a challenge as we have a lot of places
> where catalog information is updated. We also have two different
> methods for updating said information. And we intermix them. At times
> we hack our way to a solution and ignore convention. I went down the
> rabbit hole on this one, but I'm back up for a cup of tea because what
> I've done seems materially better to me and I've accomplished the goal
> (and can eventually make use of the result in [1]). Is this patch
> useful even without that other work? I think so, just on the basis of
> cleanup. Let me explain.
It's a mix of historical and individual commit style, so having things
slightly different depending on the catalog code path is not a
surprise, at least here.
> [... Two methods for heap tuple updates ... ]
>
> tuple = heap_modify_tuple(tuple, desc, values, nulls, replaces);
> CatalogTupleUpdate(pg_type_desc, &tuple->t_self, tuple);
As far as I can see after applying 0002, 90%-ish of the existing
callers of heap_modify_tuple() are updated to use heap_update_tuple().
That's interesting to see.
> heap_update_tuple() is functionally equivalent to heap_modify_tuple(),
> but takes a Bitmapset called "updated" rather than an array of bool
> (generally) called "replaces" as a method for indicating what was
> modified. Additionally, this new function tries to balance the
> trade-offs of calling heap_getattr() versus heap_deform_tuple() based on
> the ratio of attributes updated and their known runtime complexities.
> Both paths are functionally equivalent.
I don't think that we can completely remove heap_update_tuple() as
code path used for the catalogs updates, but we can get very close to
that. Perhaps we should document at the top of heap_update_tuple()
that developers should not use this API for catalog changes, switching
to the bitmap interface instead in priority?
> * Performance strategy:
> * - If updating many attributes (> 2*natts/3), use heap_getattr() to extract
> * only the few non-updated attributes. This is O(k*n) where k is the number
> * of non-updated attributes, which is small when updating many.
> * - If updating few attributes (<= 2*natts/3), use heap_deform_tuple() to
> * extract all attributes at once (O(n)), then replace the updated ones.
> * This avoids the O(n^2) cost of many heap_getattr() calls.
> *
> * The threshold of 2*natts/3 balances the fixed O(n) cost of heap_deform_tuple
> * against the variable O(k*n) cost of heap_getattr, where k = natts - num_updated.
This change has been puzzling me quite a bit. Why this choice of
formula with 2/3 of arguments? The maximum number of arguments is
1400 by design, so would one choice matter more than the other in most
cases? Catalogs don't have that many arguments either, so I am not
sure if we need to care with this amount of tuning, TBH, but I am OK
to be wrong.
> indexing.c: has the change to CatalogTupleUpdate/Insert and removal of
> their "WithInfo" companions. If that second part is controversial then
> I don't mind reverting it, but IMO this is cleaner and might either
> inspire more call sites to create and reuse CatalogIndexState or for
> someone (me?) to address the comment that pre-dates my work:
I'm OK with the simplification on this one, but let's make that a
separate patch extracted from 0001. The changes with
CatalogTupleUpdate() and CatalogTupleInsert() don't have to be
included with the changes that introduce a bitmap integration to track
the attributes that are updated. This is pointing to the fact that
there are not that many callers of the WithInfo() flavors in the tree,
compared to the non-info ones (24 AFAIK, vs 160-ish for the Insert
path it seems, for example).
> pg_aggregate.c: in AggregateCreate() we run into a "compromise" pattern
> where the code code either update or insert. In such cases hackers
> should use the macros for update, not insert, everywhere. This records
> the additional information for the update path and doesn't burden the
> insert path with much more than a Bitmapset to free.
Hmm. Fine by me. The bitmap is not used for an INSERT path, so it's
a bit of a waste, but I can see why you have done so to satisfy the
"replace" case, from CREATE OR REPLACE (never liked this kind of
grammar as it can be impredictible implementation-wise with syscache
lookups required, just a personal rant).
Another thing that you could do here is enforce a check in a couple of
places where CatalogTupleInsert() is called for the "updated" bitmap:
all the bits in the map should be set, like in OperatorCreate(). That
may catch bugs at runtime perhaps when adding fields when dealing with
an insert instead of an update?
> pg_constraint.c: brings us to namestrcpy(), take a look at
> RenameConstraintById(). When you run into namestrcpy() you've likely
> just done the mutation to the form directly but you'll want to mark that
> field as updated, as in:
>
> alter.c: take a look at AlterObjectRename_internal() which uses
> get_object_attnum_name() to get the AttrNumber that is to be altered.
> This is fine, but it's not something we can just plug into a macro and
> get the proper name expansion. So, we do it the manual way and add a comment:
Not surprising to have exceptions like that. This is a global path
used for the renames of a bunch of objects. This makes me wonder if
we really have to absolutely change all the code paths that currently
use heap_modify_tuple(). Like this one in alter.c for renames, the
change is not an improvement in readability. Same for
AlterObjectNamespace_internal(). These would live better if still
based on heap_modify_tuple(), because of the fact that they can be fed
several object types.
> analyze.c: uses the HeapTupleSetAllColumnsUpdated() at the start of the
> update_attstats() function to match the replaced code. Later there's
> another interesting deviation from the normal pattern in that function
> where we want to update the nth stakind/opt/coll this was frustrating at
> first until I found that this worked:
>
> HeapTupleSetValue(pg_statistic, stakind1 + k,
> Int16GetDatum(stats->stakind[k]), values);
I'm getting concerned about HeapTupleSetAllColumnsUpdated() while
reaching this part of your message, FWIW.
> So, we're taking the attribute number adding "k" then subtracting 1, and
> voila "Bob's your uncle." While a bit unconventional, I think the
> resulting form is understandable, much more than before which was:
>
> i = Anum_pg_statistic_stakind1 - 1;
> for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
> values[i++] = Int16GetDatum(stats->stakind[k]); /* stakindN */
And more concerns. This change makes me question how wise it is to
have HeapTupleUpdateSetAllColumnsUpdated(), actually. Couldn't this
encourage bugs if somebody forgets to update some of the fields in the
new tuple, still the bitmap has been updated with a range to mark all
the attributes as updated. Having HeapTupleUpdateField() do an update
of the bitmap makes sure that the bitmap and the fields updated are
never out-of-sync. I'm OK with marking the individual attributes as
updated. Having a more aggressive API that marks all of them as
updated sounds dangerous to me.
> 0002 - Update the remainder of catalog updates using the new APIs
>
> This is a lot of applying the pattern over and over until it all works.
> That's it. Nothing much more interesting here.
>
> This is a lot of change, I get that, I think there's value in change
> when the result is cleaner and more maintainable. The one place I'll
> agree up front where this might case problems is back-patching fixes.
> That'll have to have the new approach in some branches and the old
> approach in others. It's a 5-year commitment, I don't take that lightly.
Backpatches imply catalog tuple manipulations from time to time, so
that would create noise, yes.
> If you made it this far, I owe you a beer/coffee/token-of-appreciation.
Sentence noted.
> Without this patch my [1] will have an ugly wart
> (HeapDetermineColumnsInfo()) for the CatalogTupleUpdate path only. I
> can live with that, I'd rather not.
It seems to me that this answer is better answered as "rather not",
especially if this improves the code maintenance in the long-term.
Patch 0001 could be split into more pieces, particularly regarding the
argument it makes about the fact that there are few callers of
CatalogTuplesMultiInsertWithInfo(), CatalogTupleInsertWithInfo() and
CatalogTupleUpdateWithInfo(), where we can just plug NULL for the
index state. I'd suggest to make these routines leaner in a first
patch.
/* No, insert new tuple */
stup = heap_form_tuple(RelationGetDescr(sd), values, nulls);
- CatalogTupleInsertWithInfo(sd, stup, indstate);
+ CatalogTupleInsert(sd, stup, NULL);
}
This one in update_attstats() looks wrong. Why are you passing NULL
instead of an opened index state? We keep track of indstate for all
the pg_statistic tuples updated.
HeapTupleSetField() is used nowhere. Do we really need it?
--
Michael
Вложения
On Dec 2 2025, at 2:09 am, Michael Paquier <michael@paquier.xyz> wrote:
> On Fri, Nov 14, 2025 at 10:31:28AM -0500, Greg Burd wrote:
>> Catalog tuples have knowledge of what attributes are mutated implicitly,
>> but they don't preserve that information and so
>> HeapDetermineColumnsInfo() has to re-discover that later on (while the
>> page is locked). While on the surface this isn't such a big deal, it's
>> been that way and worked well for years I have other motivations (see
>> [1] again) for improving this.
>
> Yeah, I've heard that this has been also discussed at the New York
> pgconf, where designs were mentioned so as we do not execute
> expressions while locking pages. One issue is that this could easily
> deadlock if an expression has the idea to re-read the same page we are
> locking. So I have a summary of what you have been doing in mind, a
> very rough one I suspect.
Hi Michael,
Thanks for taking the time to review this chunky patch. I've been
quietly reworking it a bit, I'll go over that in a bit because I'd
appreciate opinions before I'm too far into the changes.
Yes, the motivation for this patch started off as a secondary effect of
the other work I've mentioned. It's morphing a bit given some feedback
off-list. There are a few reasons to standardize catalog modification
code and move it away from direct Form/GETSTRUCT or
values/nulls/replaces-style access. First, these current methods can be
error prone and require attention to a variety of details that can
easily be overlooked. Second, at present while we know what fields are
modified we don't preserve that information and pass it on to the
heap_update() code. Third, I understand that at some point we'll need
in-memory representations completely decoupled from the disk to support
upgradeable catalogs. I started off just trying to make
HeapDetermineColumnsInfo() go away, these seem like good goals as well
and touch the same area of code. I think, if possible, it would make
sense to ensure that a large-ish change like this doesn't need to be
undone/redone to accomplish these other goals.
>> That turned out to be a bit of a challenge as we have a lot of places
>> where catalog information is updated. We also have two different
>> methods for updating said information. And we intermix them. At times
>> we hack our way to a solution and ignore convention. I went down the
>> rabbit hole on this one, but I'm back up for a cup of tea because what
>> I've done seems materially better to me and I've accomplished the goal
>> (and can eventually make use of the result in [1]). Is this patch
>> useful even without that other work? I think so, just on the basis of
>> cleanup. Let me explain.
>
> It's a mix of historical and individual commit style, so having things
> slightly different depending on the catalog code path is not a
> surprise, at least here.
Got it, I don't begrudge anyone's style or how it's done now. I just
want to address the goals I've laid out and hopefully make things a bit
better along the way.
>> [... Two methods for heap tuple updates ... ]
>>
>> tuple = heap_modify_tuple(tuple, desc, values, nulls, replaces);
>> CatalogTupleUpdate(pg_type_desc, &tuple->t_self, tuple);
>
> As far as I can see after applying 0002, 90%-ish of the existing
> callers of heap_modify_tuple() are updated to use heap_update_tuple().
> That's interesting to see.
Yes, that was intentional.
>> heap_update_tuple() is functionally equivalent to heap_modify_tuple(),
>> but takes a Bitmapset called "updated" rather than an array of bool
>> (generally) called "replaces" as a method for indicating what was
>> modified. Additionally, this new function tries to balance the
>> trade-offs of calling heap_getattr() versus heap_deform_tuple() based on
>> the ratio of attributes updated and their known runtime complexities.
>> Both paths are functionally equivalent.
[NOTE: below I think you meant to say "heap_modify_tuple()", so I
correct it]
> I don't think that we can completely remove heap_modify_tuple() as
> code path used for the catalogs updates, but we can get very close to
> that. Perhaps we should document at the top of heap_modify_tuple()
> that developers should not use this API for catalog changes, switching
> to the bitmap interface instead in priority?
I agree that it's hard to remove heap_modify_tuple() function straight
away which is why I kept it in and created a new function
heap_update_tuple(). I agree, comments are needed to explain why and
point out the preferred way.
>> * Performance strategy:
>> * - If updating many attributes (> 2*natts/3), use heap_getattr() to extract
>> * only the few non-updated attributes. This is O(k*n) where k is
>> the number
>> * of non-updated attributes, which is small when updating many.
>> * - If updating few attributes (<= 2*natts/3), use
>> heap_deform_tuple() to
>> * extract all attributes at once (O(n)), then replace the updated ones.
>> * This avoids the O(n^2) cost of many heap_getattr() calls.
>> *
>> * The threshold of 2*natts/3 balances the fixed O(n) cost of heap_deform_tuple
>> * against the variable O(k*n) cost of heap_getattr, where k = natts
>> - num_updated.
>
> This change has been puzzling me quite a bit. Why this choice of
> formula with 2/3 of arguments? The maximum number of arguments is
> 1400 by design, so would one choice matter more than the other in most
> cases? Catalogs don't have that many arguments either, so I am not
> sure if we need to care with this amount of tuning, TBH, but I am OK
> to be wrong.
Good points, I saw the comment about optimizing it and I took a swing.
There are two categories for catalog updates, changing a few or setting
them all. My hope was to choose the less computationally onerous path
based on the ratio of the amount updated vs the number in the relation.
>> indexing.c: has the change to CatalogTupleUpdate/Insert and removal of
>> their "WithInfo" companions. If that second part is controversial then
>> I don't mind reverting it, but IMO this is cleaner and might either
>> inspire more call sites to create and reuse CatalogIndexState or for
>> someone (me?) to address the comment that pre-dates my work:
>
> I'm OK with the simplification on this one, but let's make that a
> separate patch extracted from 0001. The changes with
> CatalogTupleUpdate() and CatalogTupleInsert() don't have to be
> included with the changes that introduce a bitmap integration to track
> the attributes that are updated. This is pointing to the fact that
> there are not that many callers of the WithInfo() flavors in the tree,
> compared to the non-info ones (24 AFAIK, vs 160-ish for the Insert
> path it seems, for example).
That makes sense, but another approach is to do the work mentioned in
the comment and somehow cache the CatalogIndexState (which is a somewhat
redacted ResultRelInfo by another name) somewhere.
>> pg_aggregate.c: in AggregateCreate() we run into a "compromise" pattern
>> where the code code either update or insert. In such cases hackers
>> should use the macros for update, not insert, everywhere. This records
>> the additional information for the update path and doesn't burden the
>> insert path with much more than a Bitmapset to free.
>
> Hmm. Fine by me. The bitmap is not used for an INSERT path, so it's
> a bit of a waste, but I can see why you have done so to satisfy the
> "replace" case, from CREATE OR REPLACE (never liked this kind of
> grammar as it can be impredictible implementation-wise with syscache
> lookups required, just a personal rant).
Right, this isn't ideal. The "waste" of alloc/free of a Bitmapset is
real, and different people will have different views on that, but IMO
it's worth it. That said, I did rethink this a bit more.
> Another thing that you could do here is enforce a check in a couple of
> places where CatalogTupleInsert() is called for the "updated" bitmap:
> all the bits in the map should be set, like in OperatorCreate(). That
> may catch bugs at runtime perhaps when adding fields when dealing with
> an insert instead of an update?
That's not a bad idea, catching bugs early is a good outcome.
>> pg_constraint.c: brings us to namestrcpy(), take a look at
>> RenameConstraintById(). When you run into namestrcpy() you've likely
>> just done the mutation to the form directly but you'll want to mark that
>> field as updated, as in:
>>
>> alter.c: take a look at AlterObjectRename_internal() which uses
>> get_object_attnum_name() to get the AttrNumber that is to be altered.
>> This is fine, but it's not something we can just plug into a macro and
>> get the proper name expansion. So, we do it the manual way and add a comment:
>
> Not surprising to have exceptions like that. This is a global path
> used for the renames of a bunch of objects. This makes me wonder if
> we really have to absolutely change all the code paths that currently
> use heap_modify_tuple(). Like this one in alter.c for renames, the
> change is not an improvement in readability. Same for
> AlterObjectNamespace_internal(). These would live better if still
> based on heap_modify_tuple(), because of the fact that they can be fed
> several object types.
I'd agree except that the paths using heap_update_tuple() won't benefit
from HOT updates.
I think readability is debatable, but I am playing games with macros
which at times makes me feel a bit dirty.
>> analyze.c: uses the HeapTupleSetAllColumnsUpdated() at the start of the
>> update_attstats() function to match the replaced code. Later there's
>> another interesting deviation from the normal pattern in that function
>> where we want to update the nth stakind/opt/coll this was frustrating at
>> first until I found that this worked:
>>
>> HeapTupleSetValue(pg_statistic, stakind1 + k,
>> Int16GetDatum(stats->stakind[k]), values);
>
> I'm getting concerned about HeapTupleSetAllColumnsUpdated() while
> reaching this part of your message, FWIW.
The reason for HeapTupleSetAllColumnsUpdated() was to mimic existing
logic and change as little as possible by simply adding a layer of
friendly macros into the mix.
>> So, we're taking the attribute number adding "k" then subtracting 1, and
>> voila "Bob's your uncle." While a bit unconventional, I think the
>> resulting form is understandable, much more than before which was:
>>
>> i = Anum_pg_statistic_stakind1 - 1;
>> for (k = 0; k < STATISTIC_NUM_SLOTS; k++)
>> values[i++] = Int16GetDatum(stats->stakind[k]); /* stakindN */
>
> And more concerns. This change makes me question how wise it is to
> have HeapTupleUpdateSetAllColumnsUpdated(), actually. Couldn't this
> encourage bugs if somebody forgets to update some of the fields in the
> new tuple, still the bitmap has been updated with a range to mark all
> the attributes as updated. Having HeapTupleUpdateField() do an update
> of the bitmap makes sure that the bitmap and the fields updated are
> never out-of-sync. I'm OK with marking the individual attributes as
> updated. Having a more aggressive API that marks all of them as
> updated sounds dangerous to me.
Well, sure I suppose so but again this was simply a replacement for the
existing code which initialized the "replaces" array to all true rather
than the normal case of all false.
>> 0002 - Update the remainder of catalog updates using the new APIs
>>
>> This is a lot of applying the pattern over and over until it all
>> works.
>> That's it. Nothing much more interesting here.
>>
>> This is a lot of change, I get that, I think there's value in change
>> when the result is cleaner and more maintainable. The one place I'll
>> agree up front where this might case problems is back-patching fixes.
>> That'll have to have the new approach in some branches and the old
>> approach in others. It's a 5-year commitment, I don't take that lightly.
>
> Backpatches imply catalog tuple manipulations from time to time, so
> that would create noise, yes.
>
>> If you made it this far, I owe you a
>> beer/coffee/token-of-appreciation.
>
> Sentence noted.
Excellent!
>> Without this patch my [1] will have an ugly wart
>> (HeapDetermineColumnsInfo()) for the CatalogTupleUpdate path only. I
>> can live with that, I'd rather not.
>
> It seems to me that this answer is better answered as "rather not",
> especially if this improves the code maintenance in the long-term.
>
> Patch 0001 could be split into more pieces, particularly regarding the
> argument it makes about the fact that there are few callers of
> CatalogTuplesMultiInsertWithInfo(), CatalogTupleInsertWithInfo() and
> CatalogTupleUpdateWithInfo(), where we can just plug NULL for the
> index state. I'd suggest to make these routines leaner in a first
> patch.
>
> /* No, insert new tuple */
> stup = heap_form_tuple(RelationGetDescr(sd), values, nulls);
> - CatalogTupleInsertWithInfo(sd, stup, indstate);
> + CatalogTupleInsert(sd, stup, NULL);
> }
>
> This one in update_attstats() looks wrong. Why are you passing NULL
> instead of an opened index state? We keep track of indstate for all
> the pg_statistic tuples updated.
>
> HeapTupleSetField() is used nowhere. Do we really need it?
> --
> Michael
As I hinted at earlier on, I've continue to develop this patch [1].
It's not ready to post as patches just yet. I've paused making all the
changes as the time required is non-trivial. Here's some of what I
thought through and a taste so that I can (hopefully) get feedback (pos
or neg) before investing the rest of a week into updating every case.
bool replaces[Natts...] was static, Bitmapset * is a heap alloc/free
====================================================================
Yes, and I tried a few ideas out to avoid this. If you recall my
earlier work to test Bitmapset[2] was in part to see if I could extend
that datatype to allow for a static version. Closest I could come was this
+#define CatalogUpdateContext(table_name, var, tuple) \
+ do { \
+ struct { \
+ bool nulls[Natts_##table_name]; \
+ Datum values[Natts_##table_name]; \
+ Bitmapset updated; \
+ bitmapword bits[BITMAPSET_SIZE(Natts_##table_name)]; \
+ } _##table_name##_##var##_ = {.nulls = {false}, .values = {0}, \
+ .updated.nwords = BITMAPSET_SIZE(Natts_##table_name), \
+ .updated.type = T_Bitmapset, .bits = {0} }; \
+ CatalogTupleContext *(var##_ctx) = \
+ (CatalogTupleContext *)&_##table_name##_##var##_; \
+ Form_##table_name (var##_form) = \
+ (Form_##table_name) GETSTRUCT(tuple); \
+ } while(0)
Which would create the right size statically allocated Bitmapset, but it
wouldn't be "valid" because the last word in the set is all zeros and so
calling into any function that calls bms_is_valid() would assert. I
could simply embed the logic for bms_add_member() etc into the macros,
but then I'd be creating a mess for the future. The only advantage is
that you'd be avoiding a palloc()/free() possibly a realloc(). There's
something to be said for that, but this felt a bit over the line.
When I thought about adding a patch that extended/changed Bitmapset to
allow for static allocation I felt that too would become the objection
that sinks this whole idea, so for now I'm not going to try that. Maybe
some other time.
In the end, after trying a few things (including pre-sizing the
Bitmapset by creating it as a singleton setting the Natts+1 value at
initialization to avoid realloc) I went back to accepting the
palloc/free model as simple enough.
E_TOO_MANY_MACROS
=======================================================================
I'm not thrilled with the multitude of macros for somewhat similar
cases. So I tried using _Generic(), but that's not supported on MSVC
and only allowed to be generic at compile time over a single type and
dispatch to a function, not another macro, so you couldn't simply
de-reference in the Form case.
+static inline void
+_CatalogTupleValueSet(CatalogTupleContext *ctx, int attnum, Datum
value, size_t off, size_t sz)
+{
+ ctx->values[attnum] = value;
+ ctx->nulls[attnum] = false;
+}
+
+static inline void
+_CatalogTupleFormSet(void *form, int attnum, Datum value,
+ size_t off, size_t sz)
+{
+ memcpy((char *)form + off, &value, sz);
+}
+
+#define CatalogTupleSet(ctx_or_form, table_name, field, value) \
+ _Generic((ctx_or_form), \
+ CatalogTupleContext *: _CatalogTupleValueSet, \
+ Form_##table_name * : _CatalogTupleFormSet \
+ )(ctx_or_form, \
+ Anum_##table_name##_##field - 1, \
+ value, \
+ offsetof(FormData_##table_name, field), \
+ sizeof(((FormData_##table_name *)0)->field))
Another method for macros was to depend on sizeof() at compile time to
adjust what was built.
+#define _CAT_CTX_SET(ctx, attnum, value) \
+ ((ctx)->values[(attnum)] = (value), \
+ (ctx)->nulls[(attnum)] = false, \
+ (ctx)->updated = bms_add_member((ctx)->updated, (attnum)))
+
+#define _CAT_FORM_SET(form, table_name, field, value) \
+ (form)[Anum_##table_name##_##field - 1] = (value)
+
+#define CatalogTupleSet_turnary(ctx_or_form, table_name, field, value) \
+ (sizeof(*(ctx_or_form)) == sizeof(CatalogTupleContext) \
+ ? (_CAT_CTX_SET((CatalogTupleContext *)(ctx_or_form), \
+ Anum_##table_name##_##field - 1, value), \
+ (void)0) /* force statement context */ \
+ : (_CAT_FORM_SET(((Anum_##table_name *)(ctx_or_form)), table_name,
field, value), \
+ (void)0))
This feels wrong as who knows if today or someday the size of a form
just happens to be the same as the CatalogTupleContext? So I put that aside.
I looked into some wild ideas that included tuple arity to encode the
action type as Thomas mentioned would be innovative and cool something like:
HeapTupleSet((pg_type, values, nulls),
(typname, NameGetDatum(&name)),
(typnamespace, ObjectIdGetDatum(typeNamespace)),
(typowner, ObjectIdGetDatum(ownerId)),
(typlen, Int16GetDatum(sizeof(int32),
(typdefaultbi), /* list of 1 means null! */
(typdefault),
(some_attr, FooGetDatum(...), foo > bar), /* list of 3 =
conditional null! */
...);
But a) I couldn't ever get that to work as I'd hoped it would and b)
inventing a DSL in the C pre-processor isn't a goal of mine.
Everything needs you to pass in "values, nulls, updated"!
=====================================================================
I thought about this a bit too, the only way to avoid this was to own
the declaration as well as the manipulation of these things. I don't
want to change the entirety of the way we manipulate catalog tuples for
insert/update, I just wanted to capture what attributes changed on
update so I could use that later to avoid HeapDetermineColumnsInfo().
But, I've taken a swing at it and here's where I've landed:
Old "Form/GETSTRUCT" model:
Form_pg_index form = (Form_pg_index) GETSTRUCT(tuple);
form->inisclustered = false;
CatalogTupleUpdate(relation, &tuple->t_self, tuple);
New:
CatalogUpdateFormContext(pg_index, ctx);
CatalogSetForm(pg_index, ctx, tuple);
CatalogTupleUpdateField(ctx, pg_index, indisclustered, false);
ModifyCatalogTupleField(relation, tuple, ctx);
Old "values/nulls/replaces" model:
bool nulls[Natts_pg_type];
bool replaces[Natts_pg_type];
Datum values[Natts_pg_type];
values[Anum_pg_type_typtype - 1] = CharGetDatum(typeType);
nulls[Anum_pg_type_typdefaultbin - 1] = true;
replaces[Anum_pg_type_oid - 1] = false;
tup = heap_modify_tuple(tuple, desc, values, nulls, replaces);
CatalogTupleUpdate(relation, &tuple->t_self, tuple);
New:
CatalogUpdateValuesContext(pg_type, ctx);
CatalogTupleUpdateValue(ctx, pg_type, typtype, CharGetDatum(typeType));
ModifyCatalogTupleValues(relation, tuple, ctx);
I'm about a third of the way done converting to this new pattern and
it's better. As I said earlier I'm concerned that after a lot of effort
the pattern would need to change again, so I'd rather shake out those
concerns/ideas now early on. You can see the changes in [1], take a
look at htup.h, aclchk.c, pg_aggregate.c, and a few other places.
Now there are macros for: 1) declaration, 2) setting/mutating, 3)
modifying/inserting. I guess I was starting to feel like I was digging
a hole no one would appreciate or agree was necessary so I'm asking for
early feedback because rule #1 when you find yourself digging a hole is
to stop digging.
best.
-greg
[1] https://github.com/gburd/postgres/pull/18
[2] https://www.postgresql.org/message-id/flat/7BD1ABDB-B03A-464A-9BA9-A73B55AD8A1F%40getmailspring.com
On Tue, Dec 02, 2025 at 10:21:36AM -0500, Greg Burd wrote:
> On Dec 2 2025, at 2:09 am, Michael Paquier <michael@paquier.xyz> wrote:
> Thanks for taking the time to review this chunky patch. I've been
> quietly reworking it a bit, I'll go over that in a bit because I'd
> appreciate opinions before I'm too far into the changes.
Noted.
> I started off just trying to make
> HeapDetermineColumnsInfo() go away, these seem like good goals as well
> and touch the same area of code.
This one is an interesting piece of argument.
>>> indexing.c: has the change to CatalogTupleUpdate/Insert and removal of
>>> their "WithInfo" companions. If that second part is controversial then
>>> I don't mind reverting it, but IMO this is cleaner and might either
>>> inspire more call sites to create and reuse CatalogIndexState or for
>>> someone (me?) to address the comment that pre-dates my work:
>>
>> I'm OK with the simplification on this one, but let's make that a
>> separate patch extracted from 0001. The changes with
>> CatalogTupleUpdate() and CatalogTupleInsert() don't have to be
>> included with the changes that introduce a bitmap integration to track
>> the attributes that are updated. This is pointing to the fact that
>> there are not that many callers of the WithInfo() flavors in the tree,
>> compared to the non-info ones (24 AFAIK, vs 160-ish for the Insert
>> path it seems, for example).
>
> That makes sense, but another approach is to do the work mentioned in
> the comment and somehow cache the CatalogIndexState (which is a somewhat
> redacted ResultRelInfo by another name) somewhere.
Yes, perhaps we should just do that. That may simplify some of the
existing catalog paths, where we decide to open directly the indexes,
as you have already noticed while working on 0002.
> As I hinted at earlier on, I've continue to develop this patch [1].
> It's not ready to post as patches just yet. I've paused making all the
> changes as the time required is non-trivial. Here's some of what I
> thought through and a taste so that I can (hopefully) get feedback (pos
> or neg) before investing the rest of a week into updating every case.
Okay, noted.
> +#define CatalogUpdateContext(table_name, var, tuple) \
> + do { \
> + struct { \
> + bool nulls[Natts_##table_name]; \
> + Datum values[Natts_##table_name]; \
> + Bitmapset updated; \
> + bitmapword bits[BITMAPSET_SIZE(Natts_##table_name)]; \
> + } _##table_name##_##var##_ = {.nulls = {false}, .values = {0}, \
> + .updated.nwords = BITMAPSET_SIZE(Natts_##table_name), \
> + .updated.type = T_Bitmapset, .bits = {0} }; \
> + CatalogTupleContext *(var##_ctx) = \
> + (CatalogTupleContext *)&_##table_name##_##var##_; \
> + Form_##table_name (var##_form) = \
> + (Form_##table_name) GETSTRUCT(tuple); \
> + } while(0)
>
> Which would create the right size statically allocated Bitmapset, but it
> wouldn't be "valid" because the last word in the set is all zeros and so
> calling into any function that calls bms_is_valid() would assert. I
> could simply embed the logic for bms_add_member() etc into the macros,
> but then I'd be creating a mess for the future. The only advantage is
> that you'd be avoiding a palloc()/free() possibly a realloc(). There's
> something to be said for that, but this felt a bit over the line.
Hmm. That feels like too much magic to me. We don't have such
complex macros in the tree, currently. And do we really need to
sacrifice readability over some extra palloc()/free() in what are
non-hot code paths?
> In the end, after trying a few things (including pre-sizing the
> Bitmapset by creating it as a singleton setting the Natts+1 value at
> initialization to avoid realloc) I went back to accepting the
> palloc/free model as simple enough.
Yeah, I'd agree that this is better.
> +#define CatalogTupleSet_turnary(ctx_or_form, table_name, field, value) \
> + (sizeof(*(ctx_or_form)) == sizeof(CatalogTupleContext) \
> + ? (_CAT_CTX_SET((CatalogTupleContext *)(ctx_or_form), \
> + Anum_##table_name##_##field - 1, value), \
> + (void)0) /* force statement context */ \
> + : (_CAT_FORM_SET(((Anum_##table_name *)(ctx_or_form)), table_name,
> field, value), \
> + (void)0))
>
> This feels wrong as who knows if today or someday the size of a form
> just happens to be the same as the CatalogTupleContext? So I put that aside.
I wouldn't do that either, yes.
> New:
> CatalogUpdateValuesContext(pg_type, ctx);
>
> CatalogTupleUpdateValue(ctx, pg_type, typtype, CharGetDatum(typeType));
> ModifyCatalogTupleValues(relation, tuple, ctx);
>
> I'm about a third of the way done converting to this new pattern and
> it's better. As I said earlier I'm concerned that after a lot of effort
> the pattern would need to change again, so I'd rather shake out those
> concerns/ideas now early on. You can see the changes in [1], take a
> look at htup.h, aclchk.c, pg_aggregate.c, and a few other places.
One thing not mentioned here is CatalogUpdateValuesDecl(), macro that
hides the bitmap with the updated values, so you'd need one to ensure
that the rest works. Perhaps all that should not be in htup.h. We
may want to header reordering to show that there is a split with heap
tuples, also one of your goals.
> Now there are macros for: 1) declaration, 2) setting/mutating, 3)
> modifying/inserting. I guess I was starting to feel like I was digging
> a hole no one would appreciate or agree was necessary so I'm asking for
> early feedback because rule #1 when you find yourself digging a hole is
> to stop digging.
I'm not completely sure about this one. Some of the directions and
decisions that need to be taken here also depend on your other work
posted at [1]. Looking at v23 from this other thread, the code paths
touched there are different from what's touched here (heaptuple.c vs
heapam.c). You have mentioned that both share some ground. Is there
a specific part in the patch of the other thread I should look at to
get an idea of the bigger picture you have in mind?
[1]: https://www.postgresql.org/message-id/78574B24-BE0A-42C5-8075-3FA9FA63B8FC@amazon.com
--
Michael
Вложения
On Dec 2 2025, at 11:16 pm, Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Dec 02, 2025 at 10:21:36AM -0500, Greg Burd wrote: >> On Dec 2 2025, at 2:09 am, Michael Paquier <michael@paquier.xyz> wrote: >> Thanks for taking the time to review this chunky patch. I've been >> quietly reworking it a bit, I'll go over that in a bit because I'd >> appreciate opinions before I'm too far into the changes. Michael, Thanks for continuing to help me along with this patch, I appreciate your attention and time. > Noted. > >> I started off just trying to make >> HeapDetermineColumnsInfo() go away, these seem like good goals as well >> and touch the same area of code. > > This one is an interesting piece of argument. I've looked in the email archives a bit and didn't come up with much related to catalog upgrades. I don't know if there is much on the record information on this idea, but from what I've been told the idea of decoupling heap from catalogs with the goal of getting closer to making online upgrades possible has been referenced a few times in hallway tracks at various conferences. >>>> indexing.c: has the change to CatalogTupleUpdate/Insert and removal of >>>> their "WithInfo" companions. If that second part is controversial then >>>> I don't mind reverting it, but IMO this is cleaner and might either >>>> inspire more call sites to create and reuse CatalogIndexState or for >>>> someone (me?) to address the comment that pre-dates my work: >>> >>> I'm OK with the simplification on this one, but let's make that a >>> separate patch extracted from 0001. The changes with >>> CatalogTupleUpdate() and CatalogTupleInsert() don't have to be >>> included with the changes that introduce a bitmap integration to track >>> the attributes that are updated. This is pointing to the fact that >>> there are not that many callers of the WithInfo() flavors in the tree, >>> compared to the non-info ones (24 AFAIK, vs 160-ish for the Insert >>> path it seems, for example). >> >> That makes sense, but another approach is to do the work mentioned in >> the comment and somehow cache the CatalogIndexState (which is a somewhat >> redacted ResultRelInfo by another name) somewhere. > > Yes, perhaps we should just do that. That may simplify some of the > existing catalog paths, where we decide to open directly the indexes, > as you have already noticed while working on 0002. I'll see what I can do to extract this piece into a separate patch. >> New: >> CatalogUpdateValuesContext(pg_type, ctx); >> >> CatalogTupleUpdateValue(ctx, pg_type, typtype, CharGetDatum(typeType)); >> ModifyCatalogTupleValues(relation, tuple, ctx); >> >> I'm about a third of the way done converting to this new pattern and >> it's better. As I said earlier I'm concerned that after a lot of effort >> the pattern would need to change again, so I'd rather shake out those >> concerns/ideas now early on. You can see the changes in [1], take a >> look at htup.h, aclchk.c, pg_aggregate.c, and a few other places. > > One thing not mentioned here is CatalogUpdateValuesDecl(), macro that > hides the bitmap with the updated values, so you'd need one to ensure > that the rest works. Perhaps all that should not be in htup.h. We > may want to header reordering to show that there is a split with heap > tuples, also one of your goals. Yes, my current thinking for htup.h is here [1] L90-L277 in a GitHub branch on my fork of the Postgres repo. Places I've converted so as to see the new pattern: * aclchk.c [2] shows a insert or modify case * heap.c [3] modify a catalog tuple * attribute_stats.c [4] a bit more tricky use case There are more. >> Now there are macros for: 1) declaration, 2) setting/mutating, 3) >> modifying/inserting. I guess I was starting to feel like I was digging >> a hole no one would appreciate or agree was necessary so I'm asking for >> early feedback because rule #1 when you find yourself digging a hole is >> to stop digging. > > I'm not completely sure about this one. Some of the directions and > decisions that need to be taken here also depend on your other work > posted at [1]. Looking at v23 from this other thread, the code paths > touched there are different from what's touched here (heaptuple.c vs > heapam.c). You have mentioned that both share some ground. Is there > a specific part in the patch of the other thread I should look at to > get an idea of the bigger picture you have in mind? Yes, the links referenced above on that branch. I'll post an updated/rebased patch set here again for reference, but understand it's only partially done, as a way to capture the idea (and not depend on links into GitHub). > [1]: https://www.postgresql.org/message-id/78574B24-BE0A-42C5-8075-3FA9FA63B8FC@amazon.com Right, within that thread there is this message [5] that notes: > Ideally, my patch to restructure how catalog tuples are updated [<this > thread>] is committed and we can fully remove > HeapDetermineColumnsInfo() and likely speed up all catalog updates in > the process. That's what motivated [<this thread>], please take a > look, it required a huge number of changes so I thought it deserved a > life/thread of its own. There are two major paths into heap_update(), one from heapam_tuple_update() and one from simple_heap_update(). The first is from the executor side via the table AM API and my other thread [6] that you referenced is trying to address that. This thread is focused on the catalog updates that flow into simple_heap_update(). Really that function should be renamed to catalog_heap_update() as that's the only purpose AFAICT. Does that connect the dots a bit more clearly? > -- > Michael Included in the attached patch set is that first patch from the other thread as the 0003 patch here and shows the ultimate goal, but it is a work-in-progress. This patch set removes the need to call HeapDetermineColumnsInfo() on the simple_heap_update() path because the Bitmapset of modified columns is passed in and intersected against the indexed attributes accomplishing the same outcome. I haven't yet split out the "WithInfo" functions or worked out how to cache the CatalogIndexState. I thought I'd finalize the approach, then work that out as it might be impacted by refinements to proposed approach. * 0001 Reorganize heap update logic and update simple_heap_update() Focus on the macros in htup.h and the changes to the files that use these macros in this commit, my ask of reviewers is that we agree on the pattern established in this commit before I fully implement it across all files in the 0002 patch. * 0002 Update the remainder of catalog updates using the new APIs Most files in this patch are still using the first version of the macros. Only aclchk.c, heap.c, and index.c have been converted to show how this approach plays out. * 0003 Refactor how we form HeapTuples for CatalogTuple(Insert|Update) This commit is borrowed from the other thread to demonstrate how these changes impact heap updates. Thanks for your consideration, I appreciate your time. best. -greg [1] https://github.com/gburd/postgres/blob/cf-6221/src/include/access/htup.h#L90-L277 [2] https://github.com/gburd/postgres/blob/cf-6221/src/backend/catalog/aclchk.c#L1319-L1341 [3] https://github.com/gburd/postgres/blob/cf-6221/src/backend/catalog/heap.c#L1686C2-L1746 [4] https://github.com/gburd/postgres/blob/cf-6221/src/backend/statistics/attribute_stats.c#L750-L806 [5] https://www.postgresql.org/message-id/BCF1BF66-7A1B-4E01-87DC-0BE45EDF2F98%40greg.burd.me [6] https://www.postgresql.org/message-id/78574B24-BE0A-42C5-8075-3FA9FA63B8FC@amazon.com
Вложения
On Wed, Dec 03, 2025 at 09:11:03AM -0500, Greg Burd wrote: > On Dec 2 2025, at 11:16 pm, Michael Paquier <michael@paquier.xyz> wrote: > I've looked in the email archives a bit and didn't come up with much > related to catalog upgrades. I don't know if there is much on the > record information on this idea, but from what I've been told the idea > of decoupling heap from catalogs with the goal of getting closer to > making online upgrades possible has been referenced a few times in > hallway tracks at various conferences. John Naylor has been looking at this problem at some point, if I recall correctly. Adding him in CC here for comments and opinions, or perhaps I am just wrong in assuming that he has looked at this area. > I'll see what I can do to extract this piece into a separate patch. Thanks. I suspect that it would help with the clarity of the changes. At least I am getting the same impression after reading v2, which is close enough to v1 except for the various renames. > There are two major paths into heap_update(), one from > heapam_tuple_update() and one from simple_heap_update(). The first is > from the executor side via the table AM API and my other thread [6] that > you referenced is trying to address that. This thread is focused on the > catalog updates that flow into simple_heap_update(). Really that > function should be renamed to catalog_heap_update() as that's the only > purpose AFAICT. You have a point based on how things are presented in this patch set, where there is only one caller of simple_heap_update(), versus two on HEAD. > Does that connect the dots a bit more clearly? > > * 0003 Refactor how we form HeapTuples for CatalogTuple(Insert|Update) > > This commit is borrowed from the other thread to demonstrate how these > changes impact heap updates. Thanks for your consideration, I > appreciate your time. Now I see the connection, thanks to 0003 where simple_heap_update() gains its Bitmapset that tracks the set of updated tuples, for the business that happens in CatalogTupleUpdate(). All that is going to need a careful lookup. -- Michael
Вложения
On Thu, Dec 4, 2025 at 12:50 PM Michael Paquier <michael@paquier.xyz> wrote: > > On Wed, Dec 03, 2025 at 09:11:03AM -0500, Greg Burd wrote: > > On Dec 2 2025, at 11:16 pm, Michael Paquier <michael@paquier.xyz> wrote: > > I've looked in the email archives a bit and didn't come up with much > > related to catalog upgrades. I don't know if there is much on the > > record information on this idea, but from what I've been told the idea > > of decoupling heap from catalogs with the goal of getting closer to > > making online upgrades possible has been referenced a few times in > > hallway tracks at various conferences. > > John Naylor has been looking at this problem at some point, if I > recall correctly. Adding him in CC here for comments and opinions, or > perhaps I am just wrong in assuming that he has looked at this area. Hmm, I think the decoupling on-disk format from in-memory format was most directly relevant for the idea of changing the "name" type from a fixed length type, to a domain over text. Catalog manipulation as speculated for online upgrades could probably operate on the values/nulls array. -- John Naylor Amazon Web Services