Обсуждение: Refactor code around GUC default_toast_compression
Hi all,
While hacking on the TOAST code, I have been annoyed more than once
with the following piece in toast_compression.h:
/*
* Built-in compression method ID. The toast compression header will store
* this in the first 2 bits of the raw length. These built-in compression
* method IDs are directly mapped to the built-in compression methods.
*
* Don't use these values for anything other than understanding the meaning
* of the raw bits from a varlena; in particular, if the goal is to identify
* a compression method, use the constants TOAST_PGLZ_COMPRESSION, etc.
* below. We might someday support more than 4 compression methods, but
* we can never have more than 4 values in this enum, because there are
* only 2 bits available in the places where this is stored.
*/
typedef enum ToastCompressionId
{
TOAST_PGLZ_COMPRESSION_ID = 0,
TOAST_LZ4_COMPRESSION_ID = 1,
TOAST_INVALID_COMPRESSION_ID = 2,
} ToastCompressionId;
This is due the fact that we have only two bits that can be used in
va_tcinfo or va_extinfo. While looking at the addition of a new
compression method, this was causing a mess, so I have hacked the
attached patch, that makes the addition of more compression methods
easier. The idea is centralized in toast_compression.c, with the
addition of a registry that knows about all the TOAST compression
methods and its meta-data:
- name
- GUC enum values.
- attcompression char value.
- varatt on-disk value.
This is coupled with a set of translation routines, used in other code
paths. This has also the merit to remove TOAST_INVALID_COMPRESSION_ID
from the list of GUC values, which did not really make sense to begin
with. I don't deny that the addition of a new compression method
would require more tweaks, particularly for the decompression part,
but I think that this is a nice cleanup anyway. This is added to the
next commit fest, to be considered for v20.
Thanks,
--
Michael
Вложения
Hi,
On Fri, 1 May 2026 at 13:21, Michael Paquier <michael@paquier.xyz> wrote:
Hi all,
While hacking on the TOAST code, I have been annoyed more than once
with the following piece in toast_compression.h:
/*
* Built-in compression method ID. The toast compression header will store
* this in the first 2 bits of the raw length. These built-in compression
* method IDs are directly mapped to the built-in compression methods.
*
* Don't use these values for anything other than understanding the meaning
* of the raw bits from a varlena; in particular, if the goal is to identify
* a compression method, use the constants TOAST_PGLZ_COMPRESSION, etc.
* below. We might someday support more than 4 compression methods, but
* we can never have more than 4 values in this enum, because there are
* only 2 bits available in the places where this is stored.
*/
typedef enum ToastCompressionId
{
TOAST_PGLZ_COMPRESSION_ID = 0,
TOAST_LZ4_COMPRESSION_ID = 1,
TOAST_INVALID_COMPRESSION_ID = 2,
} ToastCompressionId;
This is due the fact that we have only two bits that can be used in
va_tcinfo or va_extinfo. While looking at the addition of a new
compression method, this was causing a mess, so I have hacked the
attached patch, that makes the addition of more compression methods
easier. The idea is centralized in toast_compression.c, with the
addition of a registry that knows about all the TOAST compression
methods and its meta-data:
- name
- GUC enum values.
- attcompression char value.
- varatt on-disk value.
I looked at the patch, and the refactoring direction looks reasonable
to me. I noticed a few small things worth cleaning up (although
it is for v20, just wanted to drop it here for future)
1. The patch includes an unrelated hunk in
doc/src/sgml/ref/alter_index.sgml, adding text about `ALTER INDEX ...
ATTACH PARTITION`. That looks like an accidental carry-over from another
patch and shouldn't be there ig.
2. The comment in src/include/access/toast_compression.h describing
default_toast_compression looks stale after this change? It still says
that the GUC value is one of the char values stored in
pg_attribute.attcompression, but the patch changes it to use the new
ToastCompressionGucValue enum values instead.(Maybe I'm
doc/src/sgml/ref/alter_index.sgml, adding text about `ALTER INDEX ...
ATTACH PARTITION`. That looks like an accidental carry-over from another
patch and shouldn't be there ig.
2. The comment in src/include/access/toast_compression.h describing
default_toast_compression looks stale after this change? It still says
that the GUC value is one of the char values stored in
pg_attribute.attcompression, but the patch changes it to use the new
ToastCompressionGucValue enum values instead.(Maybe I'm
missing something)
3. One minor point: CompressionIdToMethod() seems to be added as a public
helper, but I could not find any callers in this patch. Also,pg_column_compression() still keeps its own cmid-to-name switch. If the
intent is to centralize these mappings in the registry, perhaps that code
could use the new helper path as well, otherwise the unused helper may not
be necessary yet (though it might be in future).
Thanks for the patch!
Regards,
Ayush
Regards,
Ayush
On Fri, May 01, 2026 at 02:44:07PM +0530, Ayush Tiwari wrote: > 1. The patch includes an unrelated hunk in > doc/src/sgml/ref/alter_index.sgml, adding text about `ALTER INDEX ... > ATTACH PARTITION`. That looks like an accidental carry-over from another > patch and shouldn't be there ig. Sorry about that. That feels like a rebase fart. > 2. The comment in src/include/access/toast_compression.h describing > default_toast_compression looks stale after this change? It still says > that the GUC value is one of the char values stored in > pg_attribute.attcompression, but the patch changes it to use the new > ToastCompressionGucValue enum values instead.(Maybe I'm > missing something) Nope, you are missing nothing. I was re-reading the patch and I think that we could just remove the whole paragraph. Even by doing so we lose no information. > 3. One minor point: CompressionIdToMethod() seems to be added as a public > helper, but I could not find any callers in this patch. Oops, removed. I may have used it at some point. > Also, > pg_column_compression() still keeps its own cmid-to-name switch. If the > intent is to centralize these mappings in the registry, perhaps that code > could use the new helper path as well, otherwise the unused helper may > not be necessary yet (though it might be in future). Yes, this is part of the extra tweaks that would be needed when added a new compression method. This part looks at a varlena pointer, retrieves the on-disk ID. So this is left as-is on purpose, like the direct TOAST decompress business based on varlena pointers. Attached is a v2, to keep the CI happy for as long as we can use it. -- Michael
Вложения
> On May 2, 2026, at 06:43, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Fri, May 01, 2026 at 02:44:07PM +0530, Ayush Tiwari wrote:
>> 1. The patch includes an unrelated hunk in
>> doc/src/sgml/ref/alter_index.sgml, adding text about `ALTER INDEX ...
>> ATTACH PARTITION`. That looks like an accidental carry-over from another
>> patch and shouldn't be there ig.
>
> Sorry about that. That feels like a rebase fart.
>
>> 2. The comment in src/include/access/toast_compression.h describing
>> default_toast_compression looks stale after this change? It still says
>> that the GUC value is one of the char values stored in
>> pg_attribute.attcompression, but the patch changes it to use the new
>> ToastCompressionGucValue enum values instead.(Maybe I'm
>> missing something)
>
> Nope, you are missing nothing. I was re-reading the patch and I think
> that we could just remove the whole paragraph. Even by doing so we
> lose no information.
>
>> 3. One minor point: CompressionIdToMethod() seems to be added as a public
>> helper, but I could not find any callers in this patch.
>
> Oops, removed. I may have used it at some point.
>
>> Also,
>> pg_column_compression() still keeps its own cmid-to-name switch. If the
>> intent is to centralize these mappings in the registry, perhaps that code
>> could use the new helper path as well, otherwise the unused helper may
>> not be necessary yet (though it might be in future).
>
> Yes, this is part of the extra tweaks that would be needed when added
> a new compression method. This part looks at a varlena pointer,
> retrieves the on-disk ID. So this is left as-is on purpose, like the
> direct TOAST decompress business based on varlena pointers.
>
> Attached is a v2, to keep the CI happy for as long as we can use it.
> --
> Michael
> <v2-0001-Refactor-some-code-logic-around-GUC-default_toast.patch>
Overall looks good. A few small comments:
1
```
/*
* GUC support.
- *
- * default_toast_compression is an integer for purposes of the GUC machinery,
- * but the value is one of the char values defined below, as they appear in
- * pg_attribute.attcompression, e.g. TOAST_PGLZ_COMPRESSION.
*/
extern PGDLLIMPORT int default_toast_compression;
```
Before this patch, default_toast_compression stores 'p'/'l'. With this patch, it is changed to store 0/1. Would it be
betterto rename this variable?
Otherwise, a third-party extension that relies on this variable could silently misbehave. I understand that a major
releaseis allowed to change API/ABI contracts, but a build failure would be better than silent misbehavior. Or at least
weshould document this change somewhere.
2
```
#ifdef USE_LZ4
-#define DEFAULT_TOAST_COMPRESSION TOAST_LZ4_COMPRESSION
+#define DEFAULT_TOAST_COMPRESSION TOAST_LZ4_COMPRESSION_GUC
#else
-#define DEFAULT_TOAST_COMPRESSION TOAST_PGLZ_COMPRESSION
+#define DEFAULT_TOAST_COMPRESSION TOAST_PGLZ_COMPRESSION_GUC
#endif
```
Would it better to also rename DEFAULT_TOAST_COMPRESSION to DEFAULT_TOAST_COMPRESSION_GUC.
3
```
+#define TOAST_COMPRESS_PGLZ 0
+#define TOAST_COMPRESS_LZ4 1
+#define TOAST_COMPRESS_INVALID 2
```
Now TOAST_COMPRESS_PGLZ is 0, and TOAST_PGLZ_COMPRESSION is ‘p’. When they appear together in the code, it’s hard to
guesswhich is 0 and which is ‘p’. So, would it better to rename TOAST_COMPRESS_PGLZ to TOAST_PGLZ_COMPRESS_ID, and
renameTOAST_PGLZ_COMPRESSION to TOAST_PGLZ_COMPRESS_METHOD?
4
```
/*
- * Call appropriate compression routine for the compression method.
+ * Translate the compression method char to the on-disk compression ID
+ * via the Method Registry, then dispatch to the appropriate compression
+ * routine.
*/
+ cmid = MethodToCompressionId(cmethod);
switch (cmethod)
{
case TOAST_PGLZ_COMPRESSION:
tmp = pglz_compress_datum((const varlena *) DatumGetPointer(value));
- cmid = TOAST_PGLZ_COMPRESSION_ID;
break;
case TOAST_LZ4_COMPRESSION:
tmp = lz4_compress_datum((const varlena *) DatumGetPointer(value));
- cmid = TOAST_LZ4_COMPRESSION_ID;
break;
default:
elog(ERROR, "invalid compression method %c", cmethod);
```
As the switch/default explicitly rejects invalid cmethod, I feel slightly better for readability to place "cmid =
MethodToCompressionId(cmethod);"after the switch clause.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/