Обсуждение: Compress prune/freeze records with Delta Frame of Reference algorithm

Поиск
Список
Период
Сортировка

Compress prune/freeze records with Delta Frame of Reference algorithm

От
Evgeny Voropaev
Дата:
Hi hackers,

A prune/freeze record contains four sequences of integers representing
frozen, redirected, unused, and dead tuples. Currently, these
sequences are uncompressed, which adds overhead. Eliminating empty
leading bits in these offsets can reduce the size of the record, and
applying bit-packing algorithms can reduce it even further. Reducing
WAL record size also lessens the load on disk and network, which are
heavily used for storing WAL and transmitting it to replicas. For
example, with BLCKSZ equal to 8192, the maximum tuple offset is around
580. Despite this, all offsets are stored as 16-bit integers with no
compression applied.

In the proposed patch, using the customized Delta Frame of Reference
(DFoR) algorithm, the unused and dead sequences are compressed. The
frozen and redirected sequences cannot be compressed with DFoR because
the order of their elements is significant, and DFoR does not yet
support unsorted sequences. The theoretical compression ratio for
dfor_u16 can reach up to 16.

The new GUC wal_prune_dfor_compression controls (enables or disables)
compression for prune/freeze records.

An integral TAP test, 052_prune_dfor_compression.pl, has been
implemented. It demonstrates an average compression ratio of at least
5 when analyzing prune/freeze records in practice.

The patch series includes:
1. 0001-Implement-vect-and-uniqsortvect-containers-and-bitpack.patch
     - Introduces the vect and uniqsortvect containers and the bitpack
       unit used by DFoR.

2. 0002-Implement-Delta-Frame-of-Reference-compression.patch
     - Implements DFoR itself.

3. 0003-Use-Delta-Frame-of-Reference-DFoR-to-compress-prune.patch
     - Applies DFoR to prune/freeze records.

Unit tests are included and integrated into the PostgreSQL build
system.

Using vect, bitpack, and DFoR for prune/freeze records is just one
example of their application. Independently of prune/freeze, these
algorithms can be effectively used by developers in other areas of
PostgreSQL.

Looking forward to discussion!

Best regards,
Evgeny Voropaev,
Tantor Labs, LLC.

Вложения

Re: Compress prune/freeze records with Delta Frame of Reference algorithm

От
Andrey Borodin
Дата:

> On 7 Mar 2026, at 18:56, Evgeny Voropaev <evgeny.voropaev@tantorlabs.com> wrote:
>
> Looking forward to discussion!

Great idea and nice library for DFoR!

Prune/freeze records can be large, and reducing their size would help streaming
replication a lot. The algorithm itself (Delta Frame of Reference bit-packing) is
sound for sorted integer sequences.

Eventually we will have wholesale WAL compression, but proposed here type of
compression is more perspective for particular case, because it exploits knowledge
about nature of compressed data.

The patchset is, obviously, a prototype that worth working on further.
Perhaps, first thing to start is to fix CI failures [0].

I do not know if we can budle LGPL lib. Luckily it's only for tests, when all
design questions are resolved we can just re-implement tests with something else.

As a minor nit: do not use stdlib assert(), use capital Assert() :)

Are we 100% sure qsort() won't allocate something anywhere? sort_template.h seems
to be allocation-free, but just in case...


Best regards, Andrey Borodin.


[0] https://github.com/x4m/postgres_g/runs/67140025428


Re: Compress prune/freeze records with Delta Frame of Reference algorithm

От
Evgeny Voropaev
Дата:
Hello Andrey,

 > Great idea and nice library for DFoR! Thank you for your attention.
I wish the patch would be useful.

All your proposals and recommendations have been implemented in v02.
Also the meson settings has been updated for supporting the new
developments.

 > Perhaps, first thing to start is to fix CI failures

Once meson is fixed, tests should pass successfully now. Looking
forward to this.

 > As a minor nit: do not use stdlib assert(), use capital Assert()

Done.

 > Are we 100% sure qsort() won't allocate something anywhere?
 > sort_template.h seems to be allocation-free, but just in case...

No guarantees from libraries or from descriptions about qsort's
behaviour regarding dynamic memory allocation. So, the qsort is just
substituted with the sort_template, which we trust, and as you
proposed.

Waiting for tests to have passed, and then I hope we could move further.

Best regards, Evgeny Voropaev.

Вложения

Re: Compress prune/freeze records with Delta Frame of Reference algorithm

От
Evgeny Voropaev
Дата:
Rebase the patch onto 516310ed4db, added checks that a destintation
pointer passed to memcpy is not equal to NULL in the function
vect_reserve.

Looking forward to CFBOT results.


Вложения

Re: Compress prune/freeze records with Delta Frame of Reference algorithm

От
Andres Freund
Дата:
Hi,

On 2026-03-07 21:56:18 +0800, Evgeny Voropaev wrote:
> A prune/freeze record contains four sequences of integers representing
> frozen, redirected, unused, and dead tuples. Currently, these
> sequences are uncompressed, which adds overhead. Eliminating empty
> leading bits in these offsets can reduce the size of the record, and
> applying bit-packing algorithms can reduce it even further. Reducing
> WAL record size also lessens the load on disk and network, which are
> heavily used for storing WAL and transmitting it to replicas. For
> example, with BLCKSZ equal to 8192, the maximum tuple offset is around
> 580. Despite this, all offsets are stored as 16-bit integers with no
> compression applied.

I'm unconvinced that this is a serious problem - typically the overhead of WAL
volume due to pruning / freezing is due to the full page images emitted, not
the raw size of the records. Once an FPI is emitted, this doesn't matter.

What gains have you measured in somewhat realistic workloads?

Greetings,

Andres Freund



Re: Compress prune/freeze records with Delta Frame of Reference algorithm

От
Evgeny Voropaev
Дата:
Hello Andres,

> I'm unconvinced that this is a serious problem - typically the overhead of WAL
> volume due to pruning / freezing is due to the full page images emitted, not
> the raw size of the records. Once an FPI is emitted, this doesn't matter.
> 
> What gains have you measured in somewhat realistic workloads?

So far, we have had no tests in any real production environment. 
Moreover, the load in the new test 
(recovery/t/052_prune_dfor_compression.pl) is quite contrived. However, 
it demonstrates a compression ratio of more than 5, and it is measured 
for an overall size of all prune/freeze records with no filtering.

Further development is the implementation of compression of unsorted 
sequences. This is going to allow PostgreSQL to compress also the 
'frozen' and the 'redirected' offset sequences, which should result in a 
greater compression ratio.

But I agree with you, Andres, we need practical results to estimate a 
profit. I wish we would test it on some real load soon.

Also I hope, independently of its usage in prune/freeze records, the 
DFoR itself might be used for compression sequences in other places of PG.

Best regards,
Evgeny Voropaev.




Re: Compress prune/freeze records with Delta Frame of Reference algorithm

От
Evgeny Voropaev
Дата:
Rebase onto 2102ebb1953.
Fix warnings about gnu_printf in tap.c.
Fix wrong destination pointer in memcpy in vect_init.
Fix intl library linking problem for the dfor tests under FreBSD.

Вложения

Re: Compress prune/freeze records with Delta Frame of Reference algorithm

От
Evgeny Voropaev
Дата:

Re: Compress prune/freeze records with Delta Frame of Reference algorithm

От
Evgeny Voropaev
Дата:

Re: Compress prune/freeze records with Delta Frame of Reference algorithm

От
Tomas Vondra
Дата:
On 3/24/26 15:28, Evgeny Voropaev wrote:
> Hello Andres,
> 
>> I'm unconvinced that this is a serious problem - typically the
>> overhead of WAL
>> volume due to pruning / freezing is due to the full page images
>> emitted, not
>> the raw size of the records. Once an FPI is emitted, this doesn't matter.
>>
>> What gains have you measured in somewhat realistic workloads?
> 
> So far, we have had no tests in any real production environment.
> Moreover, the load in the new test (recovery/
> t/052_prune_dfor_compression.pl) is quite contrived. However, it
> demonstrates a compression ratio of more than 5, and it is measured for
> an overall size of all prune/freeze records with no filtering.
> 
> Further development is the implementation of compression of unsorted
> sequences. This is going to allow PostgreSQL to compress also the
> 'frozen' and the 'redirected' offset sequences, which should result in a
> greater compression ratio.
> 
> But I agree with you, Andres, we need practical results to estimate a
> profit. I wish we would test it on some real load soon.
> 
> Also I hope, independently of its usage in prune/freeze records, the
> DFoR itself might be used for compression sequences in other places of PG.
> 

IMHO Andres is right. A ~170kB patch really should present some numbers
quantifying the expected benefit. It doesn't need to be a real workload
from production, but something plausible enough. Even some basic
back-of-the-envelope calculations might be enough to show the promise.

Without this, the cost/benefit is so unclear most senior contributors
will probably review something else. You need to make the case why this
is worth it.

I only quickly skimmed the patches, for exactly this reason. I'm a bit
confused why this needs to add the whole libtap thing in 0001, instead
of just testing this through the SQL interface (same as test_aio etc.).

Also, I find it somewhat unlikely we'd import a GPLv3 library like this,
even if it's just a testing framework. Even ignoring the question of
having a different license for some of the code, it'd mean maintenance
burden (maybe libtap is stable/mature, no idea). I don't see why this
would be better than "write a SQL callable test module".


regards

-- 
Tomas Vondra