Обсуждение: Compress ReorderBuffer spill files using LZ4

Поиск
Список
Период
Сортировка

Compress ReorderBuffer spill files using LZ4

От
Julien Tachoires
Дата:
Hi,

When the content of a large transaction (size exceeding
logical_decoding_work_mem) and its sub-transactions has to be
reordered during logical decoding, then, all the changes are written
on disk in temporary files located in pg_replslot/<slot_name>.
Decoding very large transactions by multiple replication slots can
lead to disk space saturation and high I/O utilization.

When compiled with LZ4 support (--with-lz4), this patch enables data
compression/decompression of these temporary files. Each transaction
change that must be written on disk (ReorderBufferDiskChange) is now
compressed and encapsulated in a new structure.

3 different compression strategies are implemented:

1. LZ4 streaming compression is the preferred one and works
   efficiently for small individual changes.
2. LZ4 regular compression when the changes are too large for using
   the streaming API.
3. No compression when compression fails, the change is then stored
   not compressed.

When not using compression, the following case generates 1590MB of
spill files:

  CREATE TABLE t (i INTEGER PRIMARY KEY, t TEXT);
  INSERT INTO t
    SELECT i, 'Hello number n°'||i::TEXT
    FROM generate_series(1, 10000000) as i;

With LZ4 compression, it creates 653MB of spill files: 58.9% less
disk space usage.

Open items:

1. The spill_bytes column from pg_stat_get_replication_slot() still returns
plain data size, not the compressed data size. Should we expose the
compressed data size when compression occurs?

2. Do we want a GUC to switch compression on/off?

Regards,

JT

Вложения

Re: Compress ReorderBuffer spill files using LZ4

От
Amit Kapila
Дата:
On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:
>
> When the content of a large transaction (size exceeding
> logical_decoding_work_mem) and its sub-transactions has to be
> reordered during logical decoding, then, all the changes are written
> on disk in temporary files located in pg_replslot/<slot_name>.
> Decoding very large transactions by multiple replication slots can
> lead to disk space saturation and high I/O utilization.
>

Why can't one use 'streaming' option to send changes to the client
once it reaches the configured limit of 'logical_decoding_work_mem'?

>
> 2. Do we want a GUC to switch compression on/off?
>

It depends on the overhead of decoding. Did you try to measure the
decoding overhead of decompression when reading compressed files?

--
With Regards,
Amit Kapila.



Re: Compress ReorderBuffer spill files using LZ4

От
Dilip Kumar
Дата:
On Thu, Jun 6, 2024 at 4:43 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
>
> On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:
> >
> > When the content of a large transaction (size exceeding
> > logical_decoding_work_mem) and its sub-transactions has to be
> > reordered during logical decoding, then, all the changes are written
> > on disk in temporary files located in pg_replslot/<slot_name>.
> > Decoding very large transactions by multiple replication slots can
> > lead to disk space saturation and high I/O utilization.
> >
>
> Why can't one use 'streaming' option to send changes to the client
> once it reaches the configured limit of 'logical_decoding_work_mem'?
>
> >
> > 2. Do we want a GUC to switch compression on/off?
> >
>
> It depends on the overhead of decoding. Did you try to measure the
> decoding overhead of decompression when reading compressed files?

I think it depends on the trade-off between the I/O savings from
reducing the data size and the performance cost of compressing and
decompressing the data. This balance is highly dependent on the
hardware. For example, if you have a very slow disk and a powerful
processor, compression could be advantageous. Conversely, if the disk
is very fast, the I/O savings might be minimal, and the compression
overhead could outweigh the benefits. Additionally, the effectiveness
of compression also depends on the compression ratio, which varies
with the type of data being compressed.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Compress ReorderBuffer spill files using LZ4

От
Julien Tachoires
Дата:
Le jeu. 6 juin 2024 à 04:13, Amit Kapila <amit.kapila16@gmail.com> a écrit :
>
> On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:
> >
> > When the content of a large transaction (size exceeding
> > logical_decoding_work_mem) and its sub-transactions has to be
> > reordered during logical decoding, then, all the changes are written
> > on disk in temporary files located in pg_replslot/<slot_name>.
> > Decoding very large transactions by multiple replication slots can
> > lead to disk space saturation and high I/O utilization.
> >
>
> Why can't one use 'streaming' option to send changes to the client
> once it reaches the configured limit of 'logical_decoding_work_mem'?

That's right, setting subscription's option 'streaming' to 'on' moves
the problem away from the publisher to the subscribers. This patch
tries to improve the default situation when 'streaming' is set to
'off'.

> > 2. Do we want a GUC to switch compression on/off?
> >
>
> It depends on the overhead of decoding. Did you try to measure the
> decoding overhead of decompression when reading compressed files?

Quick benchmarking executed on my laptop shows 1% overhead.

Table DDL:
CREATE TABLE t (i INTEGER PRIMARY KEY, t TEXT);

Data generated with:
INSERT INTO t SELECT i, 'Text number n°'||i::TEXT FROM
generate_series(1, 10000000) as i;

Restoration duration measured using timestamps of log messages:
"DEBUG:  restored XXXX/YYYY changes from disk"

HEAD: 25.54s, 25.94s, 25.516s, 26.267s, 26.11s / avg=25.874s
Patch: 26.872s, 26.311s, 25.753s, 26.003, 25.843s / avg=26.156s

Regards,

JT



Re: Compress ReorderBuffer spill files using LZ4

От
Amit Kapila
Дата:
On Thu, Jun 6, 2024 at 6:22 PM Julien Tachoires <julmon@gmail.com> wrote:
>
> Le jeu. 6 juin 2024 à 04:13, Amit Kapila <amit.kapila16@gmail.com> a écrit :
> >
> > On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:
> > >
> > > When the content of a large transaction (size exceeding
> > > logical_decoding_work_mem) and its sub-transactions has to be
> > > reordered during logical decoding, then, all the changes are written
> > > on disk in temporary files located in pg_replslot/<slot_name>.
> > > Decoding very large transactions by multiple replication slots can
> > > lead to disk space saturation and high I/O utilization.
> > >
> >
> > Why can't one use 'streaming' option to send changes to the client
> > once it reaches the configured limit of 'logical_decoding_work_mem'?
>
> That's right, setting subscription's option 'streaming' to 'on' moves
> the problem away from the publisher to the subscribers. This patch
> tries to improve the default situation when 'streaming' is set to
> 'off'.
>

Can we think of changing the default to 'parallel'? BTW, it would be
better to use 'parallel' for the 'streaming' option, if the workload
has large transactions. Is there a reason to use a default value in
this case?

> > > 2. Do we want a GUC to switch compression on/off?
> > >
> >
> > It depends on the overhead of decoding. Did you try to measure the
> > decoding overhead of decompression when reading compressed files?
>
> Quick benchmarking executed on my laptop shows 1% overhead.
>

Thanks. We probably need different types of data (say random data in
bytea column, etc.) for this.

--
With Regards,
Amit Kapila.



Re: Compress ReorderBuffer spill files using LZ4

От
Alvaro Herrera
Дата:
On 2024-Jun-06, Amit Kapila wrote:

> On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:
> >
> > When the content of a large transaction (size exceeding
> > logical_decoding_work_mem) and its sub-transactions has to be
> > reordered during logical decoding, then, all the changes are written
> > on disk in temporary files located in pg_replslot/<slot_name>.
> > Decoding very large transactions by multiple replication slots can
> > lead to disk space saturation and high I/O utilization.

I like the general idea of compressing the output of logical decoding.
It's not so clear to me that we only want to do so for spilling to disk;
for instance, if the two nodes communicate over a slow network, it may
even be beneficial to compress when streaming, so to this question:

> Why can't one use 'streaming' option to send changes to the client
> once it reaches the configured limit of 'logical_decoding_work_mem'?

I would say that streaming doesn't necessarily have to mean we don't
want compression, because for some users it might be beneficial.

I think a GUC would be a good idea.  Also, what if for whatever reason
you want a different compression algorithm or different compression
parameters?  Looking at the existing compression UI we offer in
pg_basebackup, perhaps you could add something like this:

compress_logical_decoding = none
compress_logical_decoding = lz4:42
compress_logical_decoding = spill-zstd:99

"none" says to never use compression (perhaps should be the default),
"lz4:42" says to use lz4 with parameters 42 on both spilling and
streaming, and "spill-zstd:99" says to use Zstd with parameter 99 but
only for spilling to disk.

(I don't mean to say that you should implement Zstd compression with
this patch, only that you should choose the implementation so that
adding Zstd support (or whatever) later is just a matter of adding some
branches here and there.  With the current #ifdef you propose, it's hard
to do that.  Maybe separate the parts that depend on the specific
algorithm to algorithm-agnostic functions.)

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/



Re: Compress ReorderBuffer spill files using LZ4

От
Julien Tachoires
Дата:
Le jeu. 6 juin 2024 à 06:40, Amit Kapila <amit.kapila16@gmail.com> a écrit :
>
> On Thu, Jun 6, 2024 at 6:22 PM Julien Tachoires <julmon@gmail.com> wrote:
> >
> > Le jeu. 6 juin 2024 à 04:13, Amit Kapila <amit.kapila16@gmail.com> a écrit :
> > >
> > > On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:
> > > >
> > > > When the content of a large transaction (size exceeding
> > > > logical_decoding_work_mem) and its sub-transactions has to be
> > > > reordered during logical decoding, then, all the changes are written
> > > > on disk in temporary files located in pg_replslot/<slot_name>.
> > > > Decoding very large transactions by multiple replication slots can
> > > > lead to disk space saturation and high I/O utilization.
> > > >
> > >
> > > Why can't one use 'streaming' option to send changes to the client
> > > once it reaches the configured limit of 'logical_decoding_work_mem'?
> >
> > That's right, setting subscription's option 'streaming' to 'on' moves
> > the problem away from the publisher to the subscribers. This patch
> > tries to improve the default situation when 'streaming' is set to
> > 'off'.
> >
>
> Can we think of changing the default to 'parallel'? BTW, it would be
> better to use 'parallel' for the 'streaming' option, if the workload
> has large transactions. Is there a reason to use a default value in
> this case?

You're certainly right, if using the streaming API helps to avoid bad
situations and there is no downside, it could be used by default.

> > > > 2. Do we want a GUC to switch compression on/off?
> > > >
> > >
> > > It depends on the overhead of decoding. Did you try to measure the
> > > decoding overhead of decompression when reading compressed files?
> >
> > Quick benchmarking executed on my laptop shows 1% overhead.
> >
>
> Thanks. We probably need different types of data (say random data in
> bytea column, etc.) for this.

Yes, good idea, will run new tests in that sense.

Thank you!

Regards,

JT



Re: Compress ReorderBuffer spill files using LZ4

От
Julien Tachoires
Дата:
Le jeu. 6 juin 2024 à 07:24, Alvaro Herrera <alvherre@alvh.no-ip.org> a écrit :
>
> On 2024-Jun-06, Amit Kapila wrote:
>
> > On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:
> > >
> > > When the content of a large transaction (size exceeding
> > > logical_decoding_work_mem) and its sub-transactions has to be
> > > reordered during logical decoding, then, all the changes are written
> > > on disk in temporary files located in pg_replslot/<slot_name>.
> > > Decoding very large transactions by multiple replication slots can
> > > lead to disk space saturation and high I/O utilization.
>
> I like the general idea of compressing the output of logical decoding.
> It's not so clear to me that we only want to do so for spilling to disk;
> for instance, if the two nodes communicate over a slow network, it may
> even be beneficial to compress when streaming, so to this question:
>
> > Why can't one use 'streaming' option to send changes to the client
> > once it reaches the configured limit of 'logical_decoding_work_mem'?
>
> I would say that streaming doesn't necessarily have to mean we don't
> want compression, because for some users it might be beneficial.

Interesting idea, will try to evaluate how to compress/decompress data
transiting via streaming and how good the compression ratio would be.

> I think a GUC would be a good idea.  Also, what if for whatever reason
> you want a different compression algorithm or different compression
> parameters?  Looking at the existing compression UI we offer in
> pg_basebackup, perhaps you could add something like this:
>
> compress_logical_decoding = none
> compress_logical_decoding = lz4:42
> compress_logical_decoding = spill-zstd:99
>
> "none" says to never use compression (perhaps should be the default),
> "lz4:42" says to use lz4 with parameters 42 on both spilling and
> streaming, and "spill-zstd:99" says to use Zstd with parameter 99 but
> only for spilling to disk.

I agree, if the server was compiled with support of multiple
compression libraries, users should be able to choose which one they
want to use.

> (I don't mean to say that you should implement Zstd compression with
> this patch, only that you should choose the implementation so that
> adding Zstd support (or whatever) later is just a matter of adding some
> branches here and there.  With the current #ifdef you propose, it's hard
> to do that.  Maybe separate the parts that depend on the specific
> algorithm to algorithm-agnostic functions.)

Makes sense, will rework this patch in that way.

Thank you!

Regards,

JT



Re: Compress ReorderBuffer spill files using LZ4

От
Dilip Kumar
Дата:
On Thu, Jun 6, 2024 at 7:54 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2024-Jun-06, Amit Kapila wrote:
>
> > On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:
> > >
> > > When the content of a large transaction (size exceeding
> > > logical_decoding_work_mem) and its sub-transactions has to be
> > > reordered during logical decoding, then, all the changes are written
> > > on disk in temporary files located in pg_replslot/<slot_name>.
> > > Decoding very large transactions by multiple replication slots can
> > > lead to disk space saturation and high I/O utilization.
>
> I like the general idea of compressing the output of logical decoding.
> It's not so clear to me that we only want to do so for spilling to disk;
> for instance, if the two nodes communicate over a slow network, it may
> even be beneficial to compress when streaming, so to this question:
>
> > Why can't one use 'streaming' option to send changes to the client
> > once it reaches the configured limit of 'logical_decoding_work_mem'?
>
> I would say that streaming doesn't necessarily have to mean we don't
> want compression, because for some users it might be beneficial.

+1

> I think a GUC would be a good idea.  Also, what if for whatever reason
> you want a different compression algorithm or different compression
> parameters?  Looking at the existing compression UI we offer in
> pg_basebackup, perhaps you could add something like this:
>
> compress_logical_decoding = none
> compress_logical_decoding = lz4:42
> compress_logical_decoding = spill-zstd:99
>
> "none" says to never use compression (perhaps should be the default),
> "lz4:42" says to use lz4 with parameters 42 on both spilling and
> streaming, and "spill-zstd:99" says to use Zstd with parameter 99 but
> only for spilling to disk.
>

I think the compression option should be supported at the CREATE
SUBSCRIPTION level instead of being controlled by a GUC. This way, we
can decide on compression for each subscription individually rather
than applying it to all subscribers. It makes more sense for the
subscriber to control this, especially when we are planning to
compress the data sent downstream.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Compress ReorderBuffer spill files using LZ4

От
Alvaro Herrera
Дата:
On 2024-Jun-07, Dilip Kumar wrote:

> I think the compression option should be supported at the CREATE
> SUBSCRIPTION level instead of being controlled by a GUC. This way, we
> can decide on compression for each subscription individually rather
> than applying it to all subscribers. It makes more sense for the
> subscriber to control this, especially when we are planning to
> compress the data sent downstream.

True.  (I think we have some options that are in GUCs for the general
behavior and can be overridden by per-subscription options for specific
tailoring; would that make sense here?  I think it does, considering
that what we mostly want is to save disk space in the publisher when
spilling to disk.)

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"I can't go to a restaurant and order food because I keep looking at the
fonts on the menu.  Five minutes later I realize that it's also talking
about food" (Donald Knuth)



Re: Compress ReorderBuffer spill files using LZ4

От
Dilip Kumar
Дата:
On Fri, Jun 7, 2024 at 2:39 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2024-Jun-07, Dilip Kumar wrote:
>
> > I think the compression option should be supported at the CREATE
> > SUBSCRIPTION level instead of being controlled by a GUC. This way, we
> > can decide on compression for each subscription individually rather
> > than applying it to all subscribers. It makes more sense for the
> > subscriber to control this, especially when we are planning to
> > compress the data sent downstream.
>
> True.  (I think we have some options that are in GUCs for the general
> behavior and can be overridden by per-subscription options for specific
> tailoring; would that make sense here?  I think it does, considering
> that what we mostly want is to save disk space in the publisher when
> spilling to disk.)

Yeah, that makes sense.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: Compress ReorderBuffer spill files using LZ4

От
Amit Kapila
Дата:
On Thu, Jun 6, 2024 at 7:54 PM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
>
> On 2024-Jun-06, Amit Kapila wrote:
>
> > On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:
> > >
> > > When the content of a large transaction (size exceeding
> > > logical_decoding_work_mem) and its sub-transactions has to be
> > > reordered during logical decoding, then, all the changes are written
> > > on disk in temporary files located in pg_replslot/<slot_name>.
> > > Decoding very large transactions by multiple replication slots can
> > > lead to disk space saturation and high I/O utilization.
>
> I like the general idea of compressing the output of logical decoding.
> It's not so clear to me that we only want to do so for spilling to disk;
> for instance, if the two nodes communicate over a slow network, it may
> even be beneficial to compress when streaming, so to this question:
>
> > Why can't one use 'streaming' option to send changes to the client
> > once it reaches the configured limit of 'logical_decoding_work_mem'?
>
> I would say that streaming doesn't necessarily have to mean we don't
> want compression, because for some users it might be beneficial.
>

Fair enough. it would be an interesting feature if we see the wider
usefulness of compression/decompression of logical changes. For
example, if this can improve the performance of applying large
transactions (aka reduce the apply lag for them) even when the
'streaming' option is 'parallel' then it would have a much wider
impact.

--
With Regards,
Amit Kapila.



Re: Compress ReorderBuffer spill files using LZ4

От
Amit Kapila
Дата:
On Fri, Jun 7, 2024 at 2:08 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> I think the compression option should be supported at the CREATE
> SUBSCRIPTION level instead of being controlled by a GUC. This way, we
> can decide on compression for each subscription individually rather
> than applying it to all subscribers. It makes more sense for the
> subscriber to control this, especially when we are planning to
> compress the data sent downstream.
>

Yes, that makes sense. However, we then need to provide this option
via SQL APIs as well for other plugins.

--
With Regards,
Amit Kapila.



Re: Compress ReorderBuffer spill files using LZ4

От
Tomas Vondra
Дата:
On 6/6/24 16:24, Alvaro Herrera wrote:
> On 2024-Jun-06, Amit Kapila wrote:
> 
>> On Thu, Jun 6, 2024 at 4:28 PM Julien Tachoires <julmon@gmail.com> wrote:
>>>
>>> When the content of a large transaction (size exceeding
>>> logical_decoding_work_mem) and its sub-transactions has to be
>>> reordered during logical decoding, then, all the changes are written
>>> on disk in temporary files located in pg_replslot/<slot_name>.
>>> Decoding very large transactions by multiple replication slots can
>>> lead to disk space saturation and high I/O utilization.
> 
> I like the general idea of compressing the output of logical decoding.
> It's not so clear to me that we only want to do so for spilling to disk;
> for instance, if the two nodes communicate over a slow network, it may
> even be beneficial to compress when streaming, so to this question:
> 
>> Why can't one use 'streaming' option to send changes to the client
>> once it reaches the configured limit of 'logical_decoding_work_mem'?
> 
> I would say that streaming doesn't necessarily have to mean we don't
> want compression, because for some users it might be beneficial.
> 
> I think a GUC would be a good idea.  Also, what if for whatever reason
> you want a different compression algorithm or different compression
> parameters?  Looking at the existing compression UI we offer in
> pg_basebackup, perhaps you could add something like this:
> 
> compress_logical_decoding = none
> compress_logical_decoding = lz4:42
> compress_logical_decoding = spill-zstd:99
> 
> "none" says to never use compression (perhaps should be the default),
> "lz4:42" says to use lz4 with parameters 42 on both spilling and
> streaming, and "spill-zstd:99" says to use Zstd with parameter 99 but
> only for spilling to disk.
> 
> (I don't mean to say that you should implement Zstd compression with
> this patch, only that you should choose the implementation so that
> adding Zstd support (or whatever) later is just a matter of adding some
> branches here and there.  With the current #ifdef you propose, it's hard
> to do that.  Maybe separate the parts that depend on the specific
> algorithm to algorithm-agnostic functions.)
> 

I haven't been following the "libpq compression" thread, but wouldn't
that also do compression for the streaming case? That was my assumption,
at least, and it seems like the right way - we probably don't want to
patch every place that sends data over network independently, right?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Compress ReorderBuffer spill files using LZ4

От
Tomas Vondra
Дата:
On 6/6/24 12:58, Julien Tachoires wrote:
> ...
>
> When compiled with LZ4 support (--with-lz4), this patch enables data
> compression/decompression of these temporary files. Each transaction
> change that must be written on disk (ReorderBufferDiskChange) is now
> compressed and encapsulated in a new structure.
> 

I'm a bit confused, but why tie this to having lz4? Why shouldn't this
be supported even for pglz, or whatever algorithms we add in the future?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Compress ReorderBuffer spill files using LZ4

От
Julien Tachoires
Дата:
Le ven. 7 juin 2024 à 05:59, Tomas Vondra
<tomas.vondra@enterprisedb.com> a écrit :
>
> On 6/6/24 12:58, Julien Tachoires wrote:
> > ...
> >
> > When compiled with LZ4 support (--with-lz4), this patch enables data
> > compression/decompression of these temporary files. Each transaction
> > change that must be written on disk (ReorderBufferDiskChange) is now
> > compressed and encapsulated in a new structure.
> >
>
> I'm a bit confused, but why tie this to having lz4? Why shouldn't this
> be supported even for pglz, or whatever algorithms we add in the future?

That's right, reworking this patch in that sense.

Regards,

JT