Обсуждение: get rid of RM_HEAP2_ID
Making the XLogRecord header nicer has been proposed several times [1][2][3]. In [2], Robert wondered if with 2 bytes of xl_info, we could get rid of the separate xl_xact_info. There could be other possibilities as well, but I haven't done anything in that direction. The attached just gets as far as the modest goal mentioned in the subject. 0001: Split xl_info into xl_info for rmgr-specific info and xl_geninfo for generic flags. I used the XLogInsertExtended() idea from one of Matthias's patches in [3], but wrote the rest a different way to keep churn small. 0002 and 0003: To simplify the rmgrs that have an opmask and separate flag (I saw two obvious ones but I didn't make sure they were the only ones), reserve the high 4 bits of xl_info for the "record type" (I see xlogstats.c calls it a "recid", so that might be a better name) and the lower 4 bits for the flag. Using the same scheme everywhere simplifies things. I've put the mask into these macros to reduce the places that know about them: #define XLogRecGetRecType(decoder) ((decoder)->record->header.xl_info & XLR_REC_TYPE_MASK) #define XLogRecGetRecFlags(decoder) ((decoder)->record->header.xl_info & XLR_REC_FLAGS_MASK) The former is for callers that explicitly want to know about the record type only, without flags. The latter is unused, since checking a mask doesn't need to isolate all the masks first, so it may not end up in the final version. There are new XLR_* masks whose names reflect their new purpose better, but they are the same values as the old ones. XLR_INFO_MASK is kept around for compatibility, but it doesn't do anything useful since the rmgr has the full byte available. In order for ~XLR_INFO_MASK to become a no-op on a full byte, XLR_INFO_MASK has to be zero, which looks very odd. Removing it would be clearer, at the cost of more churn. There is a small wrinkle in that heap_identify does different things depending on presence of the XLOG_HEAP_INIT_PAGE flag, but xact_identify() doesn't do that with XLOG_XACT_HAS_INFO, so the latter still masks out the flags. 0004: get rid of RM_HEAP2_ID. This was simple once the prerequisites were in place. All of the HEAP2_* macros keep the same name and only differ in value. Heap rmgr is completely full so it might be good to increase to 5 bits for the record type to give it some breathing room. I wasn't sure about this comment in heapam_xlog.c -- it seems strange that this property would occur exactly at the point where the first heap rmgr id ran out of bits, but check world passes without doing anything additional: /* * These operations don't overwrite MVCC data so no conflict processing is * required. The ones in heap2 rmgr do. */ [1] https://www.postgresql.org/message-id/20220715173731.6t3km5cww3f5ztfq%40awork3.anarazel.de [2] https://www.postgresql.org/message-id/CA%2BTgmoa7pNxxe_K%3D3mTHHZGSmnrc_YgApArx3OFHN2g57nzLNw%40mail.gmail.com [3] https://www.postgresql.org/message-id/CAEze2Wjd3jY_UhhOGdGGnC6NO%3D%2BNmtNOmd%3DJaYv-v-nwBAiXXA%40mail.gmail.com -- John Naylor Amazon Web Services
Вложения
On Mon, Oct 06, 2025 at 04:04:00PM +0700, John Naylor wrote: > 0004: get rid of RM_HEAP2_ID. This was simple once the prerequisites > were in place. All of the HEAP2_* macros keep the same name and only > differ in value. Heap rmgr is completely full so it might be good to > increase to 5 bits for the record type to give it some breathing room. I haven't looked at the patch set in depth yet, but +1 to adding some breathing room if possible. In my early attempts at partial HOT, I had to add a third heap rmgr because the two existing ones were already full. It'd be nice to avoid going back to multiple heap rmgrs. -- nathan
On Mon, Oct 06, 2025 at 04:04:00PM +0700, John Naylor wrote: > Making the XLogRecord header nicer has been proposed several times > [1][2][3]. In [2], Robert wondered if with 2 bytes of xl_info, we > could get rid of the separate xl_xact_info. There could be other > possibilities as well, but I haven't done anything in that direction. > The attached just gets as far as the modest goal mentioned in the > subject. Splitting the generic flags by stealing one of the padding bytes in XLogRecord sounds like an improvement on clarity grounds, at least. > 0001: Split xl_info into xl_info for rmgr-specific info and xl_geninfo > for generic flags. I used the XLogInsertExtended() idea from one of > Matthias's patches in [3], but wrote the rest a different way to keep > churn small. Removing XLR_RMGR_INFO_MASK becomes natural here, causing XLR_INFO_MASK to also become unnecessary. One benefit of this proposal is that we would not need a heap3 once the opcodes are completely full for the two existing ones, which should be fine enough for many years to come. I have noticed after writing this sentence that it is what Matthias has done in his patch to separate these fields. More code churn, but I'd be in favor of fully biting the bullet and do as Matthias is proposing, getting rid of XLR_INFO_MASK entirely if we let the full byte at disposal of the RMGRs. > 0002 and 0003: To simplify the rmgrs that have an opmask and separate > flag (I saw two obvious ones but I didn't make sure they were the only > ones), reserve the high 4 bits of xl_info for the "record type" (I see > xlogstats.c calls it a "recid", so that might be a better name) and > the lower 4 bits for the flag. Using the same scheme everywhere > simplifies things. XLR_REC_TYPE_MASK and XLR_REC_FLAGS_MASK seem a bit confusing to me. This is an attempt to generalize a rule that two RMGRs have been using locally. In short I am not sure what we are gaining here, coming back to the previous point that we should get rid of XLR_INFO_MASK, IMO. That's the kind of rules I'd leave up to the RMGRs, so this does not like a necessary abstraction, at least IMO. > There are new XLR_* masks whose names reflect their new purpose > better, but they are the same values as the old ones. XLR_INFO_MASK is > kept around for compatibility, but it doesn't do anything useful since > the rmgr has the full byte available. In order for ~XLR_INFO_MASK to > become a no-op on a full byte, XLR_INFO_MASK has to be zero, which > looks very odd. Removing it would be clearer, at the cost of more > churn. I think that it would be weird to keep XLR_INFO_MASK for compatibility reasons, with it set at 0, so I would bite the bullet and just get rid of it entirely. There was not so long ago a proposal to get entirely rid of most the footprint of XLR_INFO_MASK in the code, switching to an equivalent macro: https://www.postgresql.org/message-id/CAGjhLkNYhNfbhsZhR_sEJ=1VqJmFCzx1BVfWWZ0+it2ucqG-pw@mail.gmail.com That's also what you are doing here.. > 0004: get rid of RM_HEAP2_ID. This was simple once the prerequisites > were in place. All of the HEAP2_* macros keep the same name and only > differ in value. Heap rmgr is completely full so it might be good to > increase to 5 bits for the record type to give it some breathing room. This is making an increase in size a prerequisite for anybody who would want to add a new heap record, which is a problem because this is pushing this responsibility to the one who would like to add a new record. In short, I am not sure that we will ever get rid of heap2, but I see a pretty good set of arguments to do things so as we'll never need an equivalent for heap3, or an extra RMGR for any of the existing RMGRs that use already all of their opcodes (Gin seems to be using 8 opcodes currently, as one example, so it's full). -- Michael
Вложения
On Thu, Oct 9, 2025 at 1:43 PM Michael Paquier <michael@paquier.xyz> wrote: > > 0001: Split xl_info into xl_info for rmgr-specific info and xl_geninfo > > for generic flags. I used the XLogInsertExtended() idea from one of > > Matthias's patches in [3], but wrote the rest a different way to keep > > churn small. > > Removing XLR_RMGR_INFO_MASK becomes natural here, causing > XLR_INFO_MASK to also become unnecessary. One benefit of this > proposal is that we would not need a heap3 once the opcodes are > completely full for the two existing ones, which should be fine enough > for many years to come. I have noticed after writing this sentence > that it is what Matthias has done in his patch to separate these > fields. More code churn, but I'd be in favor of fully biting the > bullet and do as Matthias is proposing, getting rid of XLR_INFO_MASK > entirely if we let the full byte at disposal of the RMGRs. Okay. > > 0002 and 0003: To simplify the rmgrs that have an opmask and separate > > flag (I saw two obvious ones but I didn't make sure they were the only > > ones), reserve the high 4 bits of xl_info for the "record type" (I see > > xlogstats.c calls it a "recid", so that might be a better name) and > > the lower 4 bits for the flag. Using the same scheme everywhere > > simplifies things. > > XLR_REC_TYPE_MASK and XLR_REC_FLAGS_MASK seem a bit confusing to me. > This is an attempt to generalize a rule that two RMGRs have been using > locally. In short I am not sure what we are gaining here, coming back > to the previous point that we should get rid of XLR_INFO_MASK, IMO. > That's the kind of rules I'd leave up to the RMGRs, so this does not > like a necessary abstraction, at least IMO. Leaving that up to the rmgr makes sense. One consideration I didn't mention was for xlogstats.c -- "record_stats[rmid][recid]" would get 16x larger if recid could take up the full byte. Maybe that's harmless, but I didn't want to assume. Any thoughts on that? -- John Naylor Amazon Web Services
On Thu, Oct 09, 2025 at 03:15:11PM +0700, John Naylor wrote: > Leaving that up to the rmgr makes sense. One consideration I didn't > mention was for xlogstats.c -- "record_stats[rmid][recid]" would get > 16x larger if recid could take up the full byte. Maybe that's > harmless, but I didn't want to assume. Any thoughts on that? I've missed this interaction, thanks for mentioning it. XLogStats is a local state that's only used by pg_walinspect and pg_waldump, so this extra memory consumed does worry me much; this stuff interacts with no critical paths. -- Michael
Вложения
On Fri, Oct 10, 2025 at 9:47 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Thu, Oct 09, 2025 at 03:15:11PM +0700, John Naylor wrote: > > Leaving that up to the rmgr makes sense. One consideration I didn't > > mention was for xlogstats.c -- "record_stats[rmid][recid]" would get > > 16x larger if recid could take up the full byte. Maybe that's > > harmless, but I didn't want to assume. Any thoughts on that? > > I've missed this interaction, thanks for mentioning it. XLogStats is > a local state that's only used by pg_walinspect and pg_waldump, so > this extra memory consumed does worry me much; this stuff interacts > with no critical paths. Okay, v2 gets rid of the general info mask (split out into 0002 for readability) and leaves alone the RMGR-specific masks (i.e. leaves out v1 0002/3). It runs fine with installcheck while streaming to a standby with wal_consistency_checking. I've also left out the removal of HEAP2 for now. Giving existing RMGRs more breathing room seems like a better motivator, going by Nathan's comment and yours. It's worth thinking about backward compatibility if we did go as far as a 2-byte xl_info (upthread: to allow more RMGR-specific flags, so e.g. XACT wouldn't need xl_xact_info) In that case, we'd probably still want a convention that only the lowest byte can contain the record type. XLogStats could simply assume that in most cases. For XACT 8 flags in the upper byte still won't be enough, and it will still need to have its own opcode mask, but that's no worse than the situation we have already. -- John Naylor Amazon Web Services
Вложения
On 14/10/2025 11:13, John Naylor wrote: > Okay, v2 gets rid of the general info mask (split out into 0002 for > readability) and leaves alone the RMGR-specific masks (i.e. leaves out > v1 0002/3). It runs fine with installcheck while streaming to a > standby with wal_consistency_checking. I've also left out the removal > of HEAP2 for now. Giving existing RMGRs more breathing room seems like > a better motivator, going by Nathan's comment and yours. > > It's worth thinking about backward compatibility if we did go as far > as a 2-byte xl_info (upthread: to allow more RMGR-specific flags, so > e.g. XACT wouldn't need xl_xact_info) In that case, we'd probably > still want a convention that only the lowest byte can contain the > record type. XLogStats could simply assume that in most cases. For > XACT 8 flags in the upper byte still won't be enough, and it will > still need to have its own opcode mask, but that's no worse than the > situation we have already. First let me say that I'm not objecting to this patch. It makes the code a little bit more clear, which is good, so I think I'm +0.5 on this overall. With that said: I'm not sure I agree with the premise that we should try to get rid of RM_HEAP2_ID. There's nothing wrong with that scheme as such. As an alternative, we could easily teach e.g pg_waldump to treat RM_HEAP_ID and RM_HEAP2_ID the same for statistics purposes. This patch consumes one of the padding bytes. That's not entirely free, as there is an opportunity cost: we could squeeze out the padding bytes and save 2 bytes on every WAL record instead. > typedef struct XLogRecord > { > uint32 xl_tot_len; /* total len of entire record */ > TransactionId xl_xid; /* xact id */ > XLogRecPtr xl_prev; /* ptr to previous record in log */ > uint8 xl_info; /* RMGR-specific info */ > RmgrId xl_rmid; /* resource manager for this record */ > uint8 xl_geninfo; /* flag bits, see below */ > /* 1 byte of padding here, initialize to zero */ > pg_crc32c xl_crc; /* CRC for this record */ > > /* XLogRecordBlockHeaders and XLogRecordDataHeader follow, no padding */ > > } XLogRecord; I'd suggest some minor reordering and renaming: typedef struct XLogRecord { uint32 xl_tot_len; /* total len of entire record */ TransactionId xl_xid; /* xact id */ XLogRecPtr xl_prev; /* ptr to previous record in log */ uint8 xl_flags; /* flag bits, see below */ RmgrId xl_rmid; /* resource manager for this record */ uint8 xl_rminfo; /* RMGR-specific info */ /* 1 byte of padding here, initialize to zero */ pg_crc32c xl_crc; /* CRC for this record */ /* XLogRecordBlockHeaders and XLogRecordDataHeader follow, no padding */ } XLogRecord; In summary: - Rename 'xl_info' to 'xl_rminfo' to make it more clear that it's RMGR-specific. - Rename 'xl_geninfo' to 'xl_flags'. I guess the 'gen' meant 'generic' or 'general' to distinguish from the rmgr-specific field. But we don't use a 'gen' prefix like that for any of the other non-rmgr-specific fields. We could keep it under the old 'xl_info' name instead, but it's nice to rename it to avoid confusion with the old xl_info field. It now only contains flags, so 'xl_flags' seems appropriate. - Reorder the fields so that 'xl_rmid' comes before 'xl_rminfo'. I find that more intuitive, because the contents of 'xl_rminfo' depends on 'xl_rmid'. While we're at it, I wonder if it would be more clear to have explicit 'xl_padding' field for the unused byte. - Heikki
On Tue, Oct 14, 2025 at 03:20:16PM +0300, Heikki Linnakangas wrote: > I'm not sure I agree with the premise that we should try to get rid of > RM_HEAP2_ID. There's nothing wrong with that scheme as such. As an > alternative, we could easily teach e.g pg_waldump to treat RM_HEAP_ID and > RM_HEAP2_ID the same for statistics purposes. Yeah, I'd rather keep heap2 as well. As long as there is more room for the record IDs, we're still going to need it in the long run. > This patch consumes one of the padding bytes. That's not entirely free, as > there is an opportunity cost: we could squeeze out the padding bytes and > save 2 bytes on every WAL record instead. Do you recall an alternative where it would have been possible to save 2 bytes for each record by removing the padding, and still have the full byte of xl_info be usable freely by each RMGR? I cannot recall any magic based on how XLogRecord is designed now, but perhaps I have missed an argument. We could move out xl_xid, which should not be required for all records, shaving 4 bytes from the base XLogRecord. I'm afraid of the duplication this would create if we push this data to each RMGR, which would, I guess, require a new RMGR callback to retrieve this field on a per-record basis. But perhaps it would not be that bad. > In summary: > > - Rename 'xl_info' to 'xl_rminfo' to make it more clear that it's > RMGR-specific. > > - Rename 'xl_geninfo' to 'xl_flags'. I guess the 'gen' meant 'generic' or > 'general' to distinguish from the rmgr-specific field. But we don't use a > 'gen' prefix like that for any of the other non-rmgr-specific fields. We > could keep it under the old 'xl_info' name instead, but it's nice to rename > it to avoid confusion with the old xl_info field. It now only contains > flags, so 'xl_flags' seems appropriate. > > - Reorder the fields so that 'xl_rmid' comes before 'xl_rminfo'. I find that > more intuitive, because the contents of 'xl_rminfo' depends on 'xl_rmid'. Okay. > While we're at it, I wonder if it would be more clear to have explicit > 'xl_padding' field for the unused byte. Adding an extra field to document the padding sounds like a good idea here. It should always be zero, so perhaps it could even be used as a sanity check? -- Michael
Вложения
On Tue, Oct 14, 2025 at 7:20 PM Heikki Linnakangas <hlinnaka@iki.fi> wrote: > This patch consumes one of the padding bytes. That's not entirely free, > as there is an opportunity cost: we could squeeze out the padding bytes > and save 2 bytes on every WAL record instead. I must be misunderstanding, because I'm not sure how that would work with the alignment requirement of later parts. If you mean put some other parts of certain records into header (like the xl_xact_info idea mentioned earlier, or something else), it's worth thinking about that -- maybe some things would be helped by this patch, and others would be closed off by it. > I'd suggest some minor reordering and renaming: > > typedef struct XLogRecord > { > uint32 xl_tot_len; /* total len of entire record */ > TransactionId xl_xid; /* xact id */ > XLogRecPtr xl_prev; /* ptr to previous record in log */ > uint8 xl_flags; /* flag bits, see below */ > RmgrId xl_rmid; /* resource manager for this record */ > uint8 xl_rminfo; /* RMGR-specific info */ > /* 1 byte of padding here, initialize to zero */ > pg_crc32c xl_crc; /* CRC for this record */ Works for me. > - Rename 'xl_info' to 'xl_rminfo' to make it more clear that it's > RMGR-specific. That makes sense here in the definition. There are also lots of local variables named "info", and it would be less invasive to keep those as they are. Likewise for the macro XLogRecGetInfo. -- John Naylor Amazon Web Services
On Wed, Oct 15, 2025 at 6:55 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Oct 14, 2025 at 03:20:16PM +0300, Heikki Linnakangas wrote: > > I'm not sure I agree with the premise that we should try to get rid of > > RM_HEAP2_ID. There's nothing wrong with that scheme as such. As an > > alternative, we could easily teach e.g pg_waldump to treat RM_HEAP_ID and > > RM_HEAP2_ID the same for statistics purposes. > > Yeah, I'd rather keep heap2 as well. As long as there is more room > for the record IDs, we're still going to need it in the long run. Okay, I'll drop that aspect. > We could move out xl_xid, which should not be required for all > records, shaving 4 bytes from the base XLogRecord. I'm afraid of the > duplication this would create if we push this data to each RMGR, which > would, I guess, require a new RMGR callback to retrieve this field on > a per-record basis. But perhaps it would not be that bad. I've wondered if it would be possible to make xl_tot_len a varint that starts in the last byte of the header, with the next bytes being like XLogRecordDataHeader[Short|Long], but likewise using a varint. -- John Naylor Amazon Web Services
On Wed, Oct 15, 2025 at 12:01:44PM +0700, John Naylor wrote: > On Wed, Oct 15, 2025 at 6:55 AM Michael Paquier <michael@paquier.xyz> wrote: >> We could move out xl_xid, which should not be required for all >> records, shaving 4 bytes from the base XLogRecord. I'm afraid of the >> duplication this would create if we push this data to each RMGR, which >> would, I guess, require a new RMGR callback to retrieve this field on >> a per-record basis. But perhaps it would not be that bad. > > I've wondered if it would be possible to make xl_tot_len a varint that > starts in the last byte of the header, with the next bytes being like > XLogRecordDataHeader[Short|Long], but likewise using a varint. This suggestion gives me some shivers, TBH. We've had a couple of issues with the read of records that spawn across multiple pages, relying now very strongly on xl_tot_len being the first field in XLogRecord. Making this field variable may be really tricky while the code we have now in xlogreader.h is pretty stable. -- Michael