Обсуждение: [PATCH] Add zstd compression for TOAST using extended header format

Поиск
Список
Период
Сортировка

[PATCH] Add zstd compression for TOAST using extended header format

От
Dharin Shah
Дата:
Hello PG Hackers,

Want to submit a patch that implements zstd compression for TOAST data using a 20-byte TOAST pointer format, directly addressing the concerns raised in prior discussions [1][2][3].

A bit of a background in the 2022 thread [3], Robert Haas suggested:
"we had better reserve the fourth bit pattern for something extensible e.g. another byte or several to specify the actual method"

i.e. something like:
00 = PGLZ
01 = LZ4
10 = reserved for future emergencies
11 = extended header with additional type byte

Michael also asked whether we should have "something a bit more extensible for the design of an extensible varlena header."

This patch implements that idea.
The format:

  struct varatt_external_extended {
      int32   va_rawsize;     /* same as legacy */
      uint32  va_extinfo;     /* cmid=3 signals extended format */
      uint8   va_flags;       /* feature flags */
      uint8   va_data[3];     /* va_data[0] = compression method */
      Oid     va_valueid;     /* same as legacy */
      Oid     va_toastrelid;  /* same as legacy */
  };

A few notes:

- Zstd only applies to external TOAST, not inline compression. The 2-bit limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work fine anyway. Zstd's wins show up on larger values.
- A GUC use_extended_toast_header controls whether pglz/lz4 also use the 20-byte format (defaults to off for compatibility, can enable it if you want consistency).
- Legacy 16-byte pointers continue to work - we check the vartag to determine which format to read.

The 4 extra bytes per pointer is negligible for typical TOAST data sizes, and it gives us room to grow.

Regards,
Dharin 
Вложения

Re: [PATCH] Add zstd compression for TOAST using extended header format

От
Dharin Shah
Дата:
Hello,

Apologies for the spam, updated the patch with the tests corrected.

Thanks,
Dharin

On Sat, Dec 13, 2025 at 6:31 PM Dharin Shah <dharinshah95@gmail.com> wrote:
Hello PG Hackers,

Want to submit a patch that implements zstd compression for TOAST data using a 20-byte TOAST pointer format, directly addressing the concerns raised in prior discussions [1][2][3].

A bit of a background in the 2022 thread [3], Robert Haas suggested:
"we had better reserve the fourth bit pattern for something extensible e.g. another byte or several to specify the actual method"

i.e. something like:
00 = PGLZ
01 = LZ4
10 = reserved for future emergencies
11 = extended header with additional type byte

Michael also asked whether we should have "something a bit more extensible for the design of an extensible varlena header."

This patch implements that idea.
The format:

  struct varatt_external_extended {
      int32   va_rawsize;     /* same as legacy */
      uint32  va_extinfo;     /* cmid=3 signals extended format */
      uint8   va_flags;       /* feature flags */
      uint8   va_data[3];     /* va_data[0] = compression method */
      Oid     va_valueid;     /* same as legacy */
      Oid     va_toastrelid;  /* same as legacy */
  };

A few notes:

- Zstd only applies to external TOAST, not inline compression. The 2-bit limit in va_tcinfo stays as-is for inline data, where pglz/lz4 work fine anyway. Zstd's wins show up on larger values.
- A GUC use_extended_toast_header controls whether pglz/lz4 also use the 20-byte format (defaults to off for compatibility, can enable it if you want consistency).
- Legacy 16-byte pointers continue to work - we check the vartag to determine which format to read.

The 4 extra bytes per pointer is negligible for typical TOAST data sizes, and it gives us room to grow.

Regards,
Dharin 
Вложения