Обсуждение: [PATCH] Add zstd compression for TOAST using extended header format
Want to submit a patch that implements zstd compression for TOAST data using a 20-byte TOAST pointer format, directly addressing the concerns raised in prior discussions [1][2][3].
A bit of a background in the 2022 thread [3], Robert Haas suggested:
"we had better reserve the fourth bit pattern for something extensible e.g. another byte or several to specify the actual method"
i.e. something like:
00 = PGLZ
01 = LZ4
10 = reserved for future emergencies
11 = extended header with additional type byte
Michael also asked whether we should have "something a bit more extensible for the design of an extensible varlena header."
This patch implements that idea.
The format:
struct varatt_external_extended {
int32 va_rawsize; /* same as legacy */
uint32 va_extinfo; /* cmid=3 signals extended format */
uint8 va_flags; /* feature flags */
uint8 va_data[3]; /* va_data[0] = compression method */
Oid va_valueid; /* same as legacy */
Oid va_toastrelid; /* same as legacy */
};
- Zstd only applies to external TOAST, not inline compression. The 2-bit limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work fine anyway. Zstd's wins show up on larger values.
- A GUC use_extended_toast_header controls whether pglz/lz4 also use the 20-byte format (defaults to off for compatibility, can enable it if you want consistency).
- Legacy 16-byte pointers continue to work - we check the vartag to determine which format to read.
The 4 extra bytes per pointer is negligible for typical TOAST data sizes, and it gives us room to grow.
Dharin
Вложения
Apologies for the spam, updated the patch with the tests corrected.
Thanks,
Dharin
Hello PG Hackers,
Want to submit a patch that implements zstd compression for TOAST data using a 20-byte TOAST pointer format, directly addressing the concerns raised in prior discussions [1][2][3].
A bit of a background in the 2022 thread [3], Robert Haas suggested:
"we had better reserve the fourth bit pattern for something extensible e.g. another byte or several to specify the actual method"
i.e. something like:
00 = PGLZ
01 = LZ4
10 = reserved for future emergencies
11 = extended header with additional type byte
Michael also asked whether we should have "something a bit more extensible for the design of an extensible varlena header."
This patch implements that idea.
The format:
struct varatt_external_extended {
int32 va_rawsize; /* same as legacy */
uint32 va_extinfo; /* cmid=3 signals extended format */
uint8 va_flags; /* feature flags */
uint8 va_data[3]; /* va_data[0] = compression method */
Oid va_valueid; /* same as legacy */
Oid va_toastrelid; /* same as legacy */
};A few notes:
- Zstd only applies to external TOAST, not inline compression. The 2-bit limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work fine anyway. Zstd's wins show up on larger values.
- A GUC use_extended_toast_header controls whether pglz/lz4 also use the 20-byte format (defaults to off for compatibility, can enable it if you want consistency).
- Legacy 16-byte pointers continue to work - we check the vartag to determine which format to read.
The 4 extra bytes per pointer is negligible for typical TOAST data sizes, and it gives us room to grow.Regards,
Dharin
Вложения
Want to submit a patch that implements zstd compression for TOAST data using a 20-byte TOAST pointer format, directly addressing the concerns raised in prior discussions [1][2][3].
A bit of a background in the 2022 thread [3], The overall suggestion was to have something extensible for the TOAST header
i.e. something like:
00 = PGLZ
01 = LZ4
10 = reserved for future emergencies
11 = extended header with additional type byte
This patch implements that idea.
The new header format:
struct varatt_external_extended {
int32 va_rawsize; /* same as legacy */
uint32 va_extinfo; /* cmid=3 signals extended format */
uint8 va_flags; /* feature flags */
uint8 va_data[3]; /* va_data[0] = compression method */
Oid va_valueid; /* same as legacy */
Oid va_toastrelid; /* same as legacy */
};
- Zstd only applies to external TOAST, not inline compression. The 2-bit limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work fine anyway. Zstd's wins show up on larger values.
- A GUC use_extended_toast_header controls whether pglz/lz4 also use the 20-byte format (defaults to off for compatibility, can enable it if you want consistency).
- Legacy 16-byte pointers continue to work - we check the vartag to determine which format to read.
The 4 extra bytes per pointer is negligible for typical TOAST data sizes, and it gives us room to grow.
Dharin
Вложения
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format
On 16.12.25 11:51, Dharin Shah wrote: > - Zstd only applies to external TOAST, not inline compression. The 2-bit > limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work fine > anyway. Zstd's wins show up on larger values. This is a very complicated patch. To motivate it, you should show some detailed performance measurements that show these wins.
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format
On Wed, Dec 17, 2025 at 04:11:38PM +0100, Peter Eisentraut wrote: > On 16.12.25 11:51, Dharin Shah wrote: > > - Zstd only applies to external TOAST, not inline compression. The 2-bit > > limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work fine > > anyway. Zstd's wins show up on larger values. > > This is a very complicated patch. To motivate it, you should show some > detailed performance measurements that show these wins. Yes, this is expected for any patch posted. Zstd is an improved version of lz4, acting as a sort of industry standard these days, and any byte sequences I have looked at points that zstd leads kind of always to a better compression ratio for less or equivalent CPU cost compared to LZ4. Not saying that numbers are not required, they are. But I strongly suspect numbers among these lines. FWIW, it's not a complicated patch, it is a large mechanical patch that enforces a bunch of TOAST code paths to do what it wants. If we are going to do something about that and agree on something, I think that we should just use a new vartag_external for this matter (spoiler: I think we should use a new vartag_external value), but keep the toast structure at 16 bytes all the time, leaving alone the extra bit in the existing varatt_external structure so as there is no impact for heap relations if zstd is used, as long as the TOAST value is 32 bits. The patch introduces a new vartag_external with VARTAG_ONDISK_EXTENDED, so while it leads to a better compatibility, it also means that all zstd entries have to pay an extra amount of space in the main relation as an effect of a different default_toast_compression. The difficulty is not in the implementation, it would be on agreeing on what folks would be OK with in terms if vartag and varatt structures, and that's one of the oldest areas of the PG code, that has complications and assumptions of its own. The test implementation looks wrong to me. Why is there any need for an extra test module test_toast_ext? You could just reuse the same structure as compression_lz4.sql, but adapted to zstd. That's why a split with compression.sql has been done in 74a3fc36f314, FYI. You should also aim at splitting the patch more to make it easier to review: one of the sticky point of this area of the code is to untie completely DefaultCompressionId, its GUC and the TOAST code. The GUC default_toast_compression accepts by design only 4 values. This needs to go further, and should be refactored as a piece of its own. And also, I would prefer if the 32-bit value issue is tackled first, but that's a digression here, for a different thread. :) -- Michael
Вложения
Thanks for the detailed feedback — this is really helpful.
I want to make sure I understand your main point: you're OK with a new `vartag_external`, but prefer we avoid increasing the heap TOAST pointer from 16 -> 20 bytes since every zstd-toasted value would pay +4 bytes in the main heap tuple.
I also realize the "compatibility" of the extended header doesn't buy us much — we'll need to support the existing 16-byte varatt_external forever for backward compatibility. Adding a 20-byte structure just means two formats to maintain indefinitely.
A couple clarifying questions if we go with new vartag (e.g., `VARTAG_ONDISK_ZSTD`), same 16-byte `varatt_external` payload, vartag as discriminator
1. How should we handle future methods beyond zstd? One tag per method, or store a method id elsewhere (e.g., in TOAST chunk header)?
2. And re: "as long as the TOAST value is 32 bits" — are you referring to the 30-bit extsize field in va_extinfo (i.e., avoid stealing bits from extsize for method encoding)?
Test | Rows | Uncompressed | PGLZ | LZ4 | ZSTD | PGLZ/ZSTD | LZ4/ZSTD |
T1: Large JSON (~18KB/row) | 500 | ~9,000 KB | 1496 KB | 1528 KB | 976 KB | 1.53x | 1.57x |
T2: Repetitive Text (~246KB/row) | 500 | ~123,000 KB | 1672 KB | 648 KB | 248 KB | 6.74x | 2.61x |
T3: MD5 Hash Data (~16KB/row) | 500 | ~8,000 KB | 8288 KB | 8232 KB | 4256 KB | 1.95x | 1.93x |
T4: Server Logs (~3.5KB/row) | 1000 | ~3,500 KB | 400 KB | 352 KB | 456 KB | 0.88x | 0.77x |
Key findings (i guess well known at this point):
- ZSTD excels for repetitive/pattern-heavy data (6.7x better than PGLZ)
- For low-redundancy data (MD5 hashes), ZSTD still achieves ~2x better
- The T4 result showing zstd as "worse" is not about compression quality - it's about missing inline storage support. ZSTD actually compresses better, but pays unnecessary TOAST overhead.
On Testing and Patch Structure
Agreed on both points:
- I'll split the GUC refactoring into a separate preparatory patch
Once you confirm which representation you're advocating, I'll respin accordingly.
Thanks,
Dharin
On Wed, Dec 17, 2025 at 04:11:38PM +0100, Peter Eisentraut wrote:
> On 16.12.25 11:51, Dharin Shah wrote:
> > - Zstd only applies to external TOAST, not inline compression. The 2-bit
> > limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work fine
> > anyway. Zstd's wins show up on larger values.
>
> This is a very complicated patch. To motivate it, you should show some
> detailed performance measurements that show these wins.
Yes, this is expected for any patch posted. Zstd is an improved
version of lz4, acting as a sort of industry standard these days, and
any byte sequences I have looked at points that zstd leads kind of
always to a better compression ratio for less or equivalent CPU cost
compared to LZ4. Not saying that numbers are not required, they are.
But I strongly suspect numbers among these lines.
FWIW, it's not a complicated patch, it is a large mechanical patch
that enforces a bunch of TOAST code paths to do what it wants. If we
are going to do something about that and agree on something, I think
that we should just use a new vartag_external for this matter
(spoiler: I think we should use a new vartag_external value), but
keep the toast structure at 16 bytes all the time, leaving alone the
extra bit in the existing varatt_external structure so as there is no
impact for heap relations if zstd is used, as long as the TOAST value
is 32 bits. The patch introduces a new vartag_external with
VARTAG_ONDISK_EXTENDED, so while it leads to a better compatibility,
it also means that all zstd entries have to pay an extra amount of
space in the main relation as an effect of a different
default_toast_compression. The difficulty is not in the
implementation, it would be on agreeing on what folks would be OK
with in terms if vartag and varatt structures, and that's one of the
oldest areas of the PG code, that has complications and assumptions of
its own.
The test implementation looks wrong to me. Why is there any need for
an extra test module test_toast_ext? You could just reuse the same
structure as compression_lz4.sql, but adapted to zstd. That's why a
split with compression.sql has been done in 74a3fc36f314, FYI.
You should also aim at splitting the patch more to make it easier to
review: one of the sticky point of this area of the code is to untie
completely DefaultCompressionId, its GUC and the TOAST code. The GUC
default_toast_compression accepts by design only 4 values. This needs
to go further, and should be refactored as a piece of its own.
And also, I would prefer if the 32-bit value issue is tackled first,
but that's a digression here, for a different thread. :)
--
Michael
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format
On Thu, Dec 18, 2025 at 10:44:22PM +0100, Dharin Shah wrote: > I want to make sure I understand your main point: you're OK with a new > `vartag_external`, but prefer we avoid increasing the heap TOAST pointer > from 16 -> 20 bytes since every zstd-toasted value would pay +4 bytes in > the main heap tuple. That would be my choice, yes. Not sure about the opinion of others on this matter. > I also realize the "compatibility" of the extended header doesn't buy us > much — we'll need to support the existing 16-byte varatt_external forever > for backward compatibility. Adding a 20-byte structure just means two > formats to maintain indefinitely. Yes. Patches have to maintain on-disk compatibility. > A couple clarifying questions if we go with new vartag (e.g., > `VARTAG_ONDISK_ZSTD`), same 16-byte `varatt_external` payload, vartag as > discriminator > 1. How should we handle future methods beyond zstd? One tag per method, or > store a method id elsewhere (e.g., in TOAST chunk header)? My suspicion would be that we could either use a new set of vartags in the future for each compression method. When it comes to zstd there is something that comes in play: we could set some bits related to dictionnaries at tuple level. Not sure if this is the best design or if using an attribute-level option is more adapted (for example a JSONB blob could be applied as an attribute with common keys in a dictionnary saving a lot of on-disk space even before compression), but keeping some bits free in the 16-byte header leaves this option open with a new vartag_external. Saying that, zstd is good enough that I strongly suspect that we would not regret it for quite a few years. One issue that has pushed towards the addition of lz4 as an option for toast compression is that pglz was worse in terms of CPU cost. zlib is also more expensive than lz4 or zstd, especially at very high compression level for usually little compression gains. > 2. And re: "as long as the TOAST value is 32 bits" — are you referring to > the 30-bit extsize field in va_extinfo (i.e., avoid stealing bits from > extsize for method encoding)? I mean extending the TOAST value to 8 bytes, as per the following issues: https://www.postgresql.org/message-id/764273.1669674269%40sss.pgh.pa.us https://commitfest.postgresql.org/patch/5830/ > *Key findings (i guess well known at this point):* > - ZSTD excels for repetitive/pattern-heavy data (6.7x better than PGLZ) > - For low-redundancy data (MD5 hashes), ZSTD still achieves ~2x better > - The T4 result showing zstd as "worse" is not about compression quality - > it's about missing inline storage support. ZSTD actually compresses better, > but pays unnecessary TOAST overhead. > > I'll share the detailed benchmark script with the next patch revision. But > also a potential path forward could be that we could just fully replace > pglz (can bring it up later in different thread) I don't think that we will ever be able to remove pglz. It would be nice, as final result of course, but I also expect that not being able to decompress pglz data is going to lead to a lot of user pain. That would be also very expensive to check at upgrade for large instances. > *On Testing and Patch Structure* > Agreed on both points: > - I'll use `compression_zstd.sql` following the `compression_lz4.sql` > pattern (removing the test_toast_ext module) Okay. > - I'll split the GUC refactoring into a separate preparatory patch This refactoring, if done nicely, is worth an independent piece. It's something that I have actually done for the sake of the other thread, though the result was not really much liked by others. Perhaps I'm just lacking imagination with this abstraction, and I'd surely welcome different ideas. -- Michael
Вложения
Following up on my earlier patch submission, I've reworked the zstd TOAST compression implementation based on our discussion here. The new patch now avoids the 20-byte extended header.
Current Approach
- New `VARTAG_ONDISK_ZSTD` (value 19) for ZSTD external storage
- Maintains existing 16-byte varatt_external structure
- ZSTD external-only (no inline compression)
ZSTD: 22 MB (7.60x) - 38.7% space savings vs LZ4
PGLZ: 36 MB (4.76x)
LZ4: 36 MB (4.66x)
Key findings:
- External-only avoids consuming cmid=3 and extended header complexity
- Slice access: no ZSTD-specific optimization (follow-up area)
- Hybrid inline/external for small values: not in this patch (feedback welcome)
Reviewer Questions - Is vartag-based external-only acceptable?
- Should compression level (currently 3) be configurable? - Is the external storage overhead for small values acceptable, or is hybrid inline/external behavior needed?
On Thu, Dec 18, 2025 at 10:44:22PM +0100, Dharin Shah wrote:
> I want to make sure I understand your main point: you're OK with a new
> `vartag_external`, but prefer we avoid increasing the heap TOAST pointer
> from 16 -> 20 bytes since every zstd-toasted value would pay +4 bytes in
> the main heap tuple.
That would be my choice, yes. Not sure about the opinion of others on
this matter.
> I also realize the "compatibility" of the extended header doesn't buy us
> much — we'll need to support the existing 16-byte varatt_external forever
> for backward compatibility. Adding a 20-byte structure just means two
> formats to maintain indefinitely.
Yes. Patches have to maintain on-disk compatibility.
> A couple clarifying questions if we go with new vartag (e.g.,
> `VARTAG_ONDISK_ZSTD`), same 16-byte `varatt_external` payload, vartag as
> discriminator
> 1. How should we handle future methods beyond zstd? One tag per method, or
> store a method id elsewhere (e.g., in TOAST chunk header)?
My suspicion would be that we could either use a new set of vartags in
the future for each compression method. When it comes to zstd there
is something that comes in play: we could set some bits related to
dictionnaries at tuple level. Not sure if this is the best design or
if using an attribute-level option is more adapted (for example a
JSONB blob could be applied as an attribute with common keys in a
dictionnary saving a lot of on-disk space even before compression),
but keeping some bits free in the 16-byte header leaves this option
open with a new vartag_external. Saying that, zstd is good enough
that I strongly suspect that we would not regret it for quite a few
years. One issue that has pushed towards the addition of lz4 as an
option for toast compression is that pglz was worse in terms of CPU
cost. zlib is also more expensive than lz4 or zstd, especially at
very high compression level for usually little compression gains.
> 2. And re: "as long as the TOAST value is 32 bits" — are you referring to
> the 30-bit extsize field in va_extinfo (i.e., avoid stealing bits from
> extsize for method encoding)?
I mean extending the TOAST value to 8 bytes, as per the following
issues:
https://www.postgresql.org/message-id/764273.1669674269%40sss.pgh.pa.us
https://commitfest.postgresql.org/patch/5830/
> *Key findings (i guess well known at this point):*
> - ZSTD excels for repetitive/pattern-heavy data (6.7x better than PGLZ)
> - For low-redundancy data (MD5 hashes), ZSTD still achieves ~2x better
> - The T4 result showing zstd as "worse" is not about compression quality -
> it's about missing inline storage support. ZSTD actually compresses better,
> but pays unnecessary TOAST overhead.
>
> I'll share the detailed benchmark script with the next patch revision. But
> also a potential path forward could be that we could just fully replace
> pglz (can bring it up later in different thread)
I don't think that we will ever be able to remove pglz. It would be
nice, as final result of course, but I also expect that not being able
to decompress pglz data is going to lead to a lot of user pain. That
would be also very expensive to check at upgrade for large instances.
> *On Testing and Patch Structure*
> Agreed on both points:
> - I'll use `compression_zstd.sql` following the `compression_lz4.sql`
> pattern (removing the test_toast_ext module)
Okay.
> - I'll split the GUC refactoring into a separate preparatory patch
This refactoring, if done nicely, is worth an independent piece. It's
something that I have actually done for the sake of the other thread,
though the result was not really much liked by others. Perhaps I'm
just lacking imagination with this abstraction, and I'd surely welcome
different ideas.
--
Michael
Вложения
On Thu, Dec 18, 2025 at 5:44 PM Michael Paquier <michael@paquier.xyz> wrote: > On Thu, Dec 18, 2025 at 10:44:22PM +0100, Dharin Shah wrote: > > I'll share the detailed benchmark script with the next patch revision. But > > also a potential path forward could be that we could just fully replace > > pglz (can bring it up later in different thread) > > I don't think that we will ever be able to remove pglz. It would be > nice, as final result of course, but I also expect that not being able > to decompress pglz data is going to lead to a lot of user pain. That > would be also very expensive to check at upgrade for large instances. > Agreed that I can't see pglz being removed any time soon, if ever. Thinking through what a conversion process would look like seems unwieldy at best, so I think we definitely need it for backwards compatibility, plus I think it is useful to have a self-contained option. I'd almost suggest we should look at replacing lz4, but I don't think that is significantly easier, it just has a smaller, more invested, blast radius. That said, I do suspect ztsd could quickly become a popular recommendation and/or default among users / consultants / service providers. Robert Treat https://xzilla.net
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format
On Wed, Dec 24, 2025 at 11:50:48AM -0500, Robert Treat wrote: > Agreed that I can't see pglz being removed any time soon, if ever. > Thinking through what a conversion process would look like seems > unwieldy at best, so I think we definitely need it for backwards > compatibility, plus I think it is useful to have a self-contained > option. I'd almost suggest we should look at replacing lz4, but I > don't think that is significantly easier, it just has a smaller, more > invested, blast radius. Backward-compatibility requirements make a replacement of LZ4 basically impossible to me, for the same reasons as pglz. We could not replace the bit used in the va_extinfo to track if LZ4 compression is used, either. One thing that I do wonder is if it would make things simpler in the long-run if we introduced a new separated vartag for LZ4-compressed external TOAST pointers as well. At least we'd have a leaner design: it means that we have to keep the varatt_external available on read, but we could update to the new format when writing entries. Or perhaps that's not worth the complication based on the last sentence you are writing.. > That said, I do suspect ztsd could quickly > become a popular recommendation and/or default among users / > consultants / service providers. .. Because I strongly suspect that this is going to be true, and that zstd would just be a better replacement over lz4. That's a trend that I see is already going on for wal_compression. Note that I am not on board with simply reusing varatt_external for zstd-compressed entries, neither do I think that this is the best move ever. It makes the core patch simpler, but it makes things like ToastCompressionId more complicated to think about. If anything, I'd consider a rename of varatt_external as the best way to go with an intermediate "translation" structure only used in memory as I am proposing on the other thread (something that others seem meh enough about but I am not seeing alternate proposals floating around, either). This would make things like detoast_external_attr() less confusing, I think, as the latest patch posted on this thread is actually proving with its shortcut for toast_fetch_datum as one example of something I'd rather not do.. -- Michael
Вложения
Agreed — I don’t think it’s realistic or practical to talk about deprecating or replacing pglz (or lz4) given on-disk compatibility requirements.
> Note that I am not on board with simply reusing varatt_external for
> zstd-compressed entries, neither do I think that this is the best move
> ever. It makes the core patch simpler, but it makes things like
> ToastCompressionId more complicated to think about. If anything, I'd
> consider a rename of varatt_external as the best way to go with an
> intermediate "translation" structure only used in memory as I am
> proposing on the other thread (something that others seem meh enough
> about but I am not seeing alternate proposals floating around,
> either). This would make things like detoast_external_attr() less
> confusing, I think, as the latest patch posted on this thread is
> actually proving with its shortcut for toast_fetch_datum as one
> example of something I'd rather not do..
On the design: I understand & share the same concerns that we’d end up with multiple “sources of truth” for external compression method identification (pglz/lz4 via va_extinfo bits, zstd via vartag), and that this pushes method-specific shortcuts into detoast paths.
Would you be OK if I split this into two steps?
1.First, a refactor-only patch introducing a small decoded/in-memory representation of an external TOAST pointer, so detoast/toast code paths don’t have to reason directly about tcinfo vs vartag vs va_extinfo. This would be a cleanup with no on-disk format change and no behavioral change for existing methods. Is this the same “translation structure” approach you mentioned in the other thread? If you can point me to it, I’ll align with that proposal.
2. Then, a follow-up patch adding zstd using VARTAG_ONDISK_ZSTD, implemented on top of that abstraction to keep zstd handling centralized and minimize special-casing in detoast.
If that direction matches what you had in mind, I can first post the proposed translation structure/API for feedback before respinning the zstd patch.
Thanks,
Dharin
On Wed, Dec 24, 2025 at 11:50:48AM -0500, Robert Treat wrote:
> Agreed that I can't see pglz being removed any time soon, if ever.
> Thinking through what a conversion process would look like seems
> unwieldy at best, so I think we definitely need it for backwards
> compatibility, plus I think it is useful to have a self-contained
> option. I'd almost suggest we should look at replacing lz4, but I
> don't think that is significantly easier, it just has a smaller, more
> invested, blast radius.
Backward-compatibility requirements make a replacement of LZ4
basically impossible to me, for the same reasons as pglz. We could
not replace the bit used in the va_extinfo to track if LZ4 compression
is used, either. One thing that I do wonder is if it would make
things simpler in the long-run if we introduced a new separated vartag
for LZ4-compressed external TOAST pointers as well. At least we'd
have a leaner design: it means that we have to keep the
varatt_external available on read, but we could update to the new
format when writing entries. Or perhaps that's not worth the
complication based on the last sentence you are writing..
> That said, I do suspect ztsd could quickly
> become a popular recommendation and/or default among users /
> consultants / service providers.
.. Because I strongly suspect that this is going to be true, and that
zstd would just be a better replacement over lz4. That's a trend that
I see is already going on for wal_compression.
Note that I am not on board with simply reusing varatt_external for
zstd-compressed entries, neither do I think that this is the best move
ever. It makes the core patch simpler, but it makes things like
ToastCompressionId more complicated to think about. If anything, I'd
consider a rename of varatt_external as the best way to go with an
intermediate "translation" structure only used in memory as I am
proposing on the other thread (something that others seem meh enough
about but I am not seeing alternate proposals floating around,
either). This would make things like detoast_external_attr() less
confusing, I think, as the latest patch posted on this thread is
actually proving with its shortcut for toast_fetch_datum as one
example of something I'd rather not do..
--
Michael
Following up on the discussion about avoiding method-specific shortcuts in detoast paths, this patch is a refactor-only step: it introduces a small decoded/in-memory representation of an on-disk external TOAST pointer, and refactors detoast_attr() and detoast_attr_slice() to use it.
The goal is to centralize “how do we interpret an external datum?” so that detoast code paths don’t have to reason directly about va_extinfo encoding vs payload layout details. This is intended as groundwork for a follow-up patch adding a new vartag-based method (e.g., zstd) without scattering special cases in detoast paths.
Key changes
Thanks Michael & Robert,
Agreed — I don’t think it’s realistic or practical to talk about deprecating or replacing pglz (or lz4) given on-disk compatibility requirements.
> Note that I am not on board with simply reusing varatt_external for
> zstd-compressed entries, neither do I think that this is the best move
> ever. It makes the core patch simpler, but it makes things like
> ToastCompressionId more complicated to think about. If anything, I'd
> consider a rename of varatt_external as the best way to go with an
> intermediate "translation" structure only used in memory as I am
> proposing on the other thread (something that others seem meh enough
> about but I am not seeing alternate proposals floating around,
> either). This would make things like detoast_external_attr() less
> confusing, I think, as the latest patch posted on this thread is
> actually proving with its shortcut for toast_fetch_datum as one
> example of something I'd rather not do..
On the design: I understand & share the same concerns that we’d end up with multiple “sources of truth” for external compression method identification (pglz/lz4 via va_extinfo bits, zstd via vartag), and that this pushes method-specific shortcuts into detoast paths.
Would you be OK if I split this into two steps?
1.First, a refactor-only patch introducing a small decoded/in-memory representation of an external TOAST pointer, so detoast/toast code paths don’t have to reason directly about tcinfo vs vartag vs va_extinfo. This would be a cleanup with no on-disk format change and no behavioral change for existing methods. Is this the same “translation structure” approach you mentioned in the other thread? If you can point me to it, I’ll align with that proposal.
2. Then, a follow-up patch adding zstd using VARTAG_ONDISK_ZSTD, implemented on top of that abstraction to keep zstd handling centralized and minimize special-casing in detoast.
If that direction matches what you had in mind, I can first post the proposed translation structure/API for feedback before respinning the zstd patch.
Thanks,
DharinOn Thu, Dec 25, 2025 at 1:25 AM Michael Paquier <michael@paquier.xyz> wrote:On Wed, Dec 24, 2025 at 11:50:48AM -0500, Robert Treat wrote:
> Agreed that I can't see pglz being removed any time soon, if ever.
> Thinking through what a conversion process would look like seems
> unwieldy at best, so I think we definitely need it for backwards
> compatibility, plus I think it is useful to have a self-contained
> option. I'd almost suggest we should look at replacing lz4, but I
> don't think that is significantly easier, it just has a smaller, more
> invested, blast radius.
Backward-compatibility requirements make a replacement of LZ4
basically impossible to me, for the same reasons as pglz. We could
not replace the bit used in the va_extinfo to track if LZ4 compression
is used, either. One thing that I do wonder is if it would make
things simpler in the long-run if we introduced a new separated vartag
for LZ4-compressed external TOAST pointers as well. At least we'd
have a leaner design: it means that we have to keep the
varatt_external available on read, but we could update to the new
format when writing entries. Or perhaps that's not worth the
complication based on the last sentence you are writing..
> That said, I do suspect ztsd could quickly
> become a popular recommendation and/or default among users /
> consultants / service providers.
.. Because I strongly suspect that this is going to be true, and that
zstd would just be a better replacement over lz4. That's a trend that
I see is already going on for wal_compression.
Note that I am not on board with simply reusing varatt_external for
zstd-compressed entries, neither do I think that this is the best move
ever. It makes the core patch simpler, but it makes things like
ToastCompressionId more complicated to think about. If anything, I'd
consider a rename of varatt_external as the best way to go with an
intermediate "translation" structure only used in memory as I am
proposing on the other thread (something that others seem meh enough
about but I am not seeing alternate proposals floating around,
either). This would make things like detoast_external_attr() less
confusing, I think, as the latest patch posted on this thread is
actually proving with its shortcut for toast_fetch_datum as one
example of something I'd rather not do..
--
Michael
Вложения
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format
On Mon, Dec 29, 2025 at 02:45:27PM +0100, Dharin Shah wrote:
> The goal is to centralize “how do we interpret an external datum?” so that
> detoast code paths don’t have to reason directly about va_extinfo encoding
> vs payload layout details. This is intended as groundwork for a follow-up
> patch adding a new vartag-based method (e.g., zstd) without scattering
> special cases in detoast paths.
+static bool
+decode_external_toast_pointer(const struct varlena *attr,
+ DecodedExternalToast *decoded)
[...]
+typedef enum ToastDecompressMethod
+{
+ TOAST_DECOMP_NONE = 0,
+ TOAST_DECOMP_PGLZ = 1,
+ TOAST_DECOMP_LZ4 = 2
+} ToastDecompressMethod;
+
+typedef struct DecodedExternalToast
+{
+ Oid toastrelid;
+ Oid valueid;
+ uint32 rawsize; /* Decompressed size; for future methods without tcinfo */
+ uint32 extsize; /* On-disk payload size */
+ ToastDecompressMethod method;
+ uint8 flags;
+} DecodedExternalToast;
Yeah, honestly this is a layer I have been thinking about as well as
one option, but contrary to you I have been focusing on putting that
into varatt.h, with the exception of the value being an Oid8. I think
that you have an interesting point in focusing your implementation to
be stored in the detoast part, though. I'd need to spend a bit more
time to see the result this would lead at with the larger 8-byte issue
in mind, but this is something that would come at no real cost as it
has no function pointer redirection compared to what I was first
envisioning on the other thread. That's especially true if it makes
the CompressionId business easier to mold around when adding a new
vartag.
> Why HAS_TCINFO?
> - Previously, “is compressed?” was used as a proxy for whether the external
> payload begins with tcinfo. This patch makes that explicit: HAS_TCINFO
> captures payload layout, which is distinct from whether the value is
> compressed. This separation is needed for future methods that may store
> external compressed payloads without tcinfo.
It is possible to model the on-memory data as we want. This
suggestion would be OK with some flags.
--
Michael
Вложения
After looking more closely at your “8‑byte TOAST values / infinite loop” thread and patch series, I see this is very much the same direction you outlined there: introduce a normalized in-memory representation for external pointers (toast_external_data) and keep most call sites from having to reason about vartag_external/va_extinfo details directly [1].
For this refactor patch I kept the decoder local to detoast.c to minimize scope and avoid committing to a broader API boundary too early. But if the consensus heads toward a shared interface closer to the format definitions (as in your toast_external approach), I’m happy to respin/rework this patch to align with that direction, rather than working on parallel abstractions. It should also be straightforward to mold this refactor in the direction of the 8‑byte value-id work without changing the overall detoast structure.
On HAS_TCINFO flag: the intent is to make payload layout explicit. In the current code, “external is compressed” effectively implies “payload begins with tcinfo”, which is wired into fetch/slice logic. For a vartag-based follow-up (e.g., zstd), we may want compressed payloads without a tcinfo prefix, so having an explicit flag keeps detoast paths uniform and avoids method-specific shortcuts.
On Mon, Dec 29, 2025 at 02:45:27PM +0100, Dharin Shah wrote:
> The goal is to centralize “how do we interpret an external datum?” so that
> detoast code paths don’t have to reason directly about va_extinfo encoding
> vs payload layout details. This is intended as groundwork for a follow-up
> patch adding a new vartag-based method (e.g., zstd) without scattering
> special cases in detoast paths.
+static bool
+decode_external_toast_pointer(const struct varlena *attr,
+ DecodedExternalToast *decoded)
[...]
+typedef enum ToastDecompressMethod
+{
+ TOAST_DECOMP_NONE = 0,
+ TOAST_DECOMP_PGLZ = 1,
+ TOAST_DECOMP_LZ4 = 2
+} ToastDecompressMethod;
+
+typedef struct DecodedExternalToast
+{
+ Oid toastrelid;
+ Oid valueid;
+ uint32 rawsize; /* Decompressed size; for future methods without tcinfo */
+ uint32 extsize; /* On-disk payload size */
+ ToastDecompressMethod method;
+ uint8 flags;
+} DecodedExternalToast;
Yeah, honestly this is a layer I have been thinking about as well as
one option, but contrary to you I have been focusing on putting that
into varatt.h, with the exception of the value being an Oid8. I think
that you have an interesting point in focusing your implementation to
be stored in the detoast part, though. I'd need to spend a bit more
time to see the result this would lead at with the larger 8-byte issue
in mind, but this is something that would come at no real cost as it
has no function pointer redirection compared to what I was first
envisioning on the other thread. That's especially true if it makes
the CompressionId business easier to mold around when adding a new
vartag.
> Why HAS_TCINFO?
> - Previously, “is compressed?” was used as a proxy for whether the external
> payload begins with tcinfo. This patch makes that explicit: HAS_TCINFO
> captures payload layout, which is distinct from whether the value is
> compressed. This separation is needed for future methods that may store
> external compressed payloads without tcinfo.
It is possible to model the on-memory data as we want. This
suggestion would be OK with some flags.
--
Michael
Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format
On Wed, Dec 31, 2025 at 04:02:24PM +0100, Dharin Shah wrote: > Let me know what you’d prefer for next steps: keep this patch as a > detoast-local refactor, or respin it to align more directly with a shared > decoded external-pointer interface in the direction of the 8‑byte work. My apologies for the rather long silence on this thread. As the next step of this project, I am going to put my hands of what you are suggesting here, and see how I can align it with the 64-bit toast value patch: https://www.postgresql.org/message-id/CAOj6k6dEVi0NvLjMLDhyrJS_n_NZO5D_OU89AO1u53u6NCDDwQ@mail.gmail.com What I am pretty sure about at this stage is that there is little love for the patch set I have sent on the other thread where I have been using pointer redirections for the TOAST function calls with callbacks (perhaps I'll be able to apply some of the renaming patches anyway, nobody would scream at me for that), at least nobody has put a +1 on it or just ignored it, so this approach feels dead to me. What you are suggesting upthread, though, is a direction I'd like to dig into and this comes down to how I can unify what you want to do for zstd and what I want to do with Oid8. Perhaps that you are right and that it is just simpler to invest on an interface in the detoast code, but I still see that there is nothing done for the logical decoding or amcheck code paths, which is something my other patch is able to deal with transparently. -- Michael