Обсуждение: Refactoring of compression options in pg_basebackup

Поиск
Список
Период
Сортировка

Refactoring of compression options in pg_basebackup

От
Michael Paquier
Дата:
Hi all,
(Added Georgios in CC.)

When working on the support of LZ4 for pg_receivewal, walmethods.c has
gained an extra parameter for the compression method.  It gets used on
the DirectoryMethod instead of the compression level to decide which
type of compression is used.  One thing that I left out during this
previous work is that the TarMethod also gained knowledge of this
compression method, but we still use the compression level to check if
tars should be compressed or not.

This is wrong on multiple aspects.  First, this is not consistent with
the directory method, making walmethods.c harder to figure out.
Second, this is not extensible if we want to introduce more
compression methods in pg_basebackup, like LZ4.  This reflects on the
options used by pg_receivewal and pg_basebackup, that are not
inconsistent as well.

The attached patch refactors the code of pg_basebackup and the
TarMethod of walmethods.c to use the compression method where it
should, splitting entirely the logic related the compression level.

This is one step toward the introduction of LZ4 in pg_basebackup, but
this refactoring is worth doing on its own, hence a separate thread to
deal with this problem first.  The options of pg_basebackup are
reworked to be consistent with pg_receivewal, as follows:
- --compress ranges now from 1 to 9, instead of 0 to 9.
- --compression-method={none,gzip} is added, the default is none, same
as HEAD.
- --gzip/-z has the same meaning as before, being just a synonym of
--compression-method=gzip with the default compression level of ZLIB
assigned if there is no --compress.

One more thing that I have noticed while hacking this stuff is that we
have no regression tests for gzip with pg_basebackup, so I have added
some that are skipped when not compiling the code with ZLIB.

Opinions?
--
Michael

Вложения

Re: Refactoring of compression options in pg_basebackup

От
Robert Haas
Дата:
On Sat, Dec 18, 2021 at 6:29 AM Michael Paquier <michael@paquier.xyz> wrote:
> This is one step toward the introduction of LZ4 in pg_basebackup, but
> this refactoring is worth doing on its own, hence a separate thread to
> deal with this problem first.  The options of pg_basebackup are
> reworked to be consistent with pg_receivewal, as follows:
> - --compress ranges now from 1 to 9, instead of 0 to 9.
> - --compression-method={none,gzip} is added, the default is none, same
> as HEAD.
> - --gzip/-z has the same meaning as before, being just a synonym of
> --compression-method=gzip with the default compression level of ZLIB
> assigned if there is no --compress.

One thing we should keep in mind is that I'm also working on adding
server-side compression, initially with gzip, but Jeevan Ladhe has
posted patches to extend that to LZ4. So however we structure the
options they should take that into account. Currently that patch set
adds --server-compression={none,gzip,gzipN} where N from 1 to 9, but
perhaps it should be done differently. Not sure.

> One more thing that I have noticed while hacking this stuff is that we
> have no regression tests for gzip with pg_basebackup, so I have added
> some that are skipped when not compiling the code with ZLIB.

If they don't decompress the backup and run pg_verifybackup on it then
I'm not sure how much they help. Yet, I don't know how to do that
portably.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Refactoring of compression options in pg_basebackup

От
gkokolatos@pm.me
Дата:
Hi,

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
On Saturday, December 18th, 2021 at 12:29 PM, Michael Paquier
<michael@paquier.xyz> wrote:
>Hi all,
>(Added Georgios in CC.)

thank you for the shout out.

>When working on the support of LZ4 for pg_receivewal, walmethods.c has
>gained an extra parameter for the compression method.  It gets used on
>the DirectoryMethod instead of the compression level to decide which
>type of compression is used.  One thing that I left out during this
>previous work is that the TarMethod also gained knowledge of this
>compression method, but we still use the compression level to check if
>tars should be compressed or not.
>
>This is wrong on multiple aspects.  First, this is not consistent with
>the directory method, making walmethods.c harder to figure out.
>Second, this is not extensible if we want to introduce more
>compression methods in pg_basebackup, like LZ4.  This reflects on the
>options used by pg_receivewal and pg_basebackup, that are not
>inconsistent as well.

Agreed with all the above.

>The attached patch refactors the code of pg_basebackup and the
>TarMethod of walmethods.c to use the compression method where it
>should, splitting entirely the logic related the compression level.

Thanks.

>This is one step toward the introduction of LZ4 in pg_basebackup, but
>this refactoring is worth doing on its own, hence a separate thread to
>deal with this problem first.  The options of pg_basebackup are
>reworked to be consistent with pg_receivewal, as follows:
>- --compress ranges now from 1 to 9, instead of 0 to 9.
>- --compression-method={none,gzip} is added, the default is none, same
>as HEAD.
>- --gzip/-z has the same meaning as before, being just a synonym of
>--compression-method=gzip with the default compression level of ZLIB
>assigned if there is no --compress.

Indeed this is consistent with pg_receivewal. It gets my +1.

>One more thing that I have noticed while hacking this stuff is that we
>have no regression tests for gzip with pg_basebackup, so I have added
>some that are skipped when not compiling the code with ZLIB.

As far as the code is concerned, I have a minor nitpick.

+       if (compression_method == COMPRESSION_NONE)
+           streamer = bbstreamer_plain_writer_new(archive_filename,
+                                                  archive_file);
 #ifdef HAVE_LIBZ
-       if (compresslevel != 0)
+       else if (compression_method == COMPRESSION_GZIP)
        {
            strlcat(archive_filename, ".gz", sizeof(archive_filename));
            streamer = bbstreamer_gzip_writer_new(archive_filename,
                                                  archive_file,
                                                  compresslevel);
        }
-       else
 #endif
-           streamer = bbstreamer_plain_writer_new(archive_filename,
-                                                  archive_file);
-
+       else
+       {
+           Assert(false);      /* not reachable */
+       }

The above block moves the initialization of 'streamer' within two conditional
blocks. Despite this being correct, it is possible that some compilers will
complain for lack of initialization of 'streamer' when it is eventually used a
bit further ahead in:
        if (must_parse_archive)
            streamer = bbstreamer_tar_archiver_new(streamer);

I propose to initialize streamer to NULL when declared at the top of
CreateBackupStreamer().

As far as the tests are concerned, I think that 2 too many tests are skipped
when HAVE_LIBZ is not defined to be 1. The patch reads:

+Check ZLIB compression if available.
+SKIP:
+{
+   skip "postgres was not built with ZLIB support", 5
+     if (!check_pg_config("#define HAVE_LIBZ 1"));
+
+   $node->command_ok(
+       [
+           'pg_basebackup',        '-D',
+           "$tempdir/backup_gzip", '--compression-method',
+           'gzip', '--compress', '1', '--no-sync', '--format', 't'
+       ],
+       'pg_basebackup with --compress and --compression-method=gzip');
+
+   # Verify that the stored files are generated with their expected
+   # names.
+   my @zlib_files = glob "$tempdir/backup_gzip/*.tar.gz";
+   is(scalar(@zlib_files), 2,
+       "two files created with gzip (base.tar.gz and pg_wal.tar.gz)");
+
+   # Check the integrity of the files generated.
+   my $gzip = $ENV{GZIP_PROGRAM};
+   skip "program gzip is not found in your system", 1
+     if ( !defined $gzip
+       || $gzip eq ''
+       || system_log($gzip, '--version') != 0);
+
+   my $gzip_is_valid = system_log($gzip, '--test', @zlib_files);
+   is($gzip_is_valid, 0, "gzip verified the integrity of compressed data");
+   rmtree("$tempdir/backup_gzip");
+}

You can see that after the check_pg_config() test, only 3 tests follow,
namely:
 *  $node->command_ok()
 *  is(scalar(), ...)
 *  is($gzip_is_valid, ...)

>Opinions?

Other than the minor issues above, I think this is a solid improvement. +1

>--
>Michael

Cheers,
//Georgios



Re: Refactoring of compression options in pg_basebackup

От
Michael Paquier
Дата:
(Added Jeevan in CC, for awareness)

On Mon, Jan 03, 2022 at 03:35:57PM +0000, gkokolatos@pm.me wrote:
> I propose to initialize streamer to NULL when declared at the top of
> CreateBackupStreamer().

Yes, that may be noisy.  Done this way.

> You can see that after the check_pg_config() test, only 3 tests follow,
> namely:
>  *  $node->command_ok()
>  *  is(scalar(), ...)
>  *  is($gzip_is_valid, ...)

Indeed.  The CF bot was complaining about that, actually.

Thinking more about this stuff, pg_basebackup --compress is an option
that exists already for a couple of years, and that's independent of
the backend-side compression that Robert and Jeevan are working on, so
I'd like to move on this code cleanup.  We can always figure out the
LZ4 part for pg_basebackup after, if necessary.

Attached is an updated patch.  The CF bot should improve with that.
--
Michael

Вложения

Re: Refactoring of compression options in pg_basebackup

От
gkokolatos@pm.me
Дата:
On Wednesday, January 5th, 2022 at 9:00 AM, Michael Paquier <michael@paquier.xyz> wrote:

> On Mon, Jan 03, 2022 at 03:35:57PM +0000, gkokolatos@pm.me wrote:
> > I propose to initialize streamer to NULL when declared at the top of
> > CreateBackupStreamer().
>
> Yes, that may be noisy.  Done this way.

Great.

> > You can see that after the check_pg_config() test, only 3 tests follow,
> > namely:
> >  *  $node->command_ok()
> >  *  is(scalar(), ...)
> >  *  is($gzip_is_valid, ...)
>
> Indeed.  The CF bot was complaining about that, actually.

Great.

> Thinking more about this stuff, pg_basebackup --compress is an option
> that exists already for a couple of years, and that's independent of
> the backend-side compression that Robert and Jeevan are working on, so
> I'd like to move on this code cleanup.  We can always figure out the
> LZ4 part for pg_basebackup after, if necessary.

I agree that the cleanup in itself is helpful. It feels awkward to have two
utilities under the same path, with distinct options for the same
functionality.

When the backend-side compression is completed, were there really be a need for
client-side compression? If yes, then it seems logical to have distinct options
for them and this cleanup makes sense. If not, then it seems logical to maintain
the current options list and 'simply' change the internals of the code, and this
cleanup makes sense.

> Attached is an updated patch.  The CF bot should improve with that.

+1

> --
> Michael

Cheers,
//Georgios



Re: Refactoring of compression options in pg_basebackup

От
Robert Haas
Дата:
On Mon, Dec 20, 2021 at 7:43 PM Michael Paquier <michael@paquier.xyz> wrote:
> Yeah, consistency would be good.  For the client-side compression of
> LZ4, we have shaped things around the existing --compress option, and
> there is 6f164e6 that offers an API to parse that at option-level,
> meaning less custom error strings.  I'd like to think that splitting
> the compression level and the compression method is still the right
> choice, except if --server-compression combined with a client-side
> compression is a legal combination.  This would not really make sense,
> though, no?  So we'd better block this possibility from the start?

Right. It's blocked right now, but Tushar noticed on the other thread
that the error message isn't as good as it could be, so I'll improve
that a bit. Still the issue wasn't overlooked.

> > If they don't decompress the backup and run pg_verifybackup on it then
> > I'm not sure how much they help. Yet, I don't know how to do that
> > portably.
>
> They help in checking that an environment does not use a buggy set of
> GZIP, at least.  Using pg_verifybackup on a base backup with
> --format='t' could be tweaked with $ENV{TAR} for the tarballs
> generation, for example, as we do in some other tests.  Option sets
> like "xvf" or "zxvf" should be rather portable across the buildfarm,
> no?  I'd like to think that this is not a requirement for adding
> checks in the compression path, as a first step, though, but I agree
> that it could be extended more.

Oh, well, if we have a working tar available, then it's not so bad. I
was thinking we couldn't really count on that, especially on Windows.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Refactoring of compression options in pg_basebackup

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> Oh, well, if we have a working tar available, then it's not so bad. I
> was thinking we couldn't really count on that, especially on Windows.

I think the existing precedent is to skip the test if tar isn't there,
cf pg_basebackup/t/010_pg_basebackup.pl.  But certainly the majority of
buildfarm animals have it.

            regards, tom lane



Re: Refactoring of compression options in pg_basebackup

От
Robert Haas
Дата:
On Wed, Jan 5, 2022 at 4:17 AM <gkokolatos@pm.me> wrote:
> When the backend-side compression is completed, were there really be a need for
> client-side compression? If yes, then it seems logical to have distinct options
> for them and this cleanup makes sense. If not, then it seems logical to maintain
> the current options list and 'simply' change the internals of the code, and this
> cleanup makes sense.

I think we're going to want to offer both options. We can't know
whether the user prefers to consume CPU cycles on the server or on the
client. Compressing on the server has the advantage of potentially
saving transfer bandwidth, but the server is also often the busiest
part of the whole system, and users are often keen to offload as much
work as possible.

Given that, I'd like us to be thinking about what the full set of
options looks like once we have (1) compression on either the server
or the client and (2) multiple compression algorithms and (3) multiple
compression levels. Personally, I don't really like the decision made
by this proposed patch. In this patch's view of the world, -Z is a way
of providing the compression level for whatever compression algorithm
you happen to have selected, but I think of -Z as being the upper-case
version of -z which I think of as selecting specifically gzip. It's
not particularly intuitive to me that in a command like pg_basebackup
--compress=<something>, <something> is a compression level rather than
an algorithm. So what I would propose is probably something like:

pg_basebackup --compress=ALGORITHM [--compression-level=NUMBER]
pg_basebackup --server-compress=ALGORITHM [--compression-level=NUMBER]

And then make -z short for --compress=gzip and -Z <n> short for
--compress=gzip --compression-level=<n>. That would be a
backward-incompatible change to the definition of --compress, but as
long as -Z <n> works the same as today, I don't think many people will
notice. If we like, we can notice if the argument to --compress is an
integer and suggest using either -Z or --compress=gzip
--compression-level=<n> instead.

In the proposed patch, you end up with pg_basebackup
--compression-method=lz4 -Z2 meaning compression with lz4 level 2. I
find that quite odd, though as with all such things, opinions may
vary. In my proposal, that would be an error, because it would be
equivalent to --compress=lz4 --compress=gzip --compression-level=2,
and would thus involve conflicting compression method specifications.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Refactoring of compression options in pg_basebackup

От
Michael Paquier
Дата:
On Wed, Jan 05, 2022 at 10:33:38AM -0500, Robert Haas wrote:
> I think we're going to want to offer both options. We can't know
> whether the user prefers to consume CPU cycles on the server or on the
> client. Compressing on the server has the advantage of potentially
> saving transfer bandwidth, but the server is also often the busiest
> part of the whole system, and users are often keen to offload as much
> work as possible.

Yeah.  There are cases for both.  I just got to wonder whether it
makes sense to allow both server-side and client-side compression to
be used at the same time.  That would be a rather strange case, but
well, with the correct set of options that could be possible.

> Given that, I'd like us to be thinking about what the full set of
> options looks like once we have (1) compression on either the server
> or the client and (2) multiple compression algorithms and (3) multiple
> compression levels. Personally, I don't really like the decision made
> by this proposed patch. In this patch's view of the world, -Z is a way
> of providing the compression level for whatever compression algorithm
> you happen to have selected, but I think of -Z as being the upper-case
> version of -z which I think of as selecting specifically gzip. It's
> not particularly intuitive to me that in a command like pg_basebackup
> --compress=<something>, <something> is a compression level rather than
> an algorithm. So what I would propose is probably something like:
>
> pg_basebackup --compress=ALGORITHM [--compression-level=NUMBER]
> pg_basebackup --server-compress=ALGORITHM [--compression-level=NUMBER]
>
> And then make -z short for --compress=gzip and -Z <n> short for
> --compress=gzip --compression-level=<n>. That would be a
> backward-incompatible change to the definition of --compress, but as
> long as -Z <n> works the same as today, I don't think many people will
> notice. If we like, we can notice if the argument to --compress is an
> integer and suggest using either -Z or --compress=gzip
> --compression-level=<n> instead.

My view of things is slightly different, aka I'd rather keep
--compress to mean a compression level with an integer option, but
introduce a --compression-method={lz4,gzip,none}, with -Z being a
synonym of --compression-method=gzip.  That's at least the path we
chose for pg_receivewal.  I don't mind sticking with one way or
another, as what you are proposing is basically the same thing I have
in mind, but both tools ought to use the same set of options.

Hmm.  Perhaps at the end the problem is with --compress, where we
don't know if it means a compression level or a compression method?
For me, --compress means the former, and for you the latter.  So a
third way of seeing things is to drop completely --compress, but have
one --compression-method and one --compression-level.  That would
bring a clear split.  Or just one --compression-method for the
client-side compression as you are proposing for the server-side
compression, however I'd like to think that a split between the method
and level is more intuitive.

> In the proposed patch, you end up with pg_basebackup
> --compression-method=lz4 -Z2 meaning compression with lz4 level 2. I
> find that quite odd, though as with all such things, opinions may
> vary. In my proposal, that would be an error, because it would be
> equivalent to --compress=lz4 --compress=gzip --compression-level=2,
> and would thus involve conflicting compression method specifications.

It seems to me that you did not read the patch closely enough.  The
attached patch does not add support for LZ4 in pg_basebackup on the
client-side yet.  Once it gets added, though, the idea is that  using
--compress with LZ4 would result in an error.  That's what happens
with pg_receivewal on HEAD, for one.  The patch just shapes things to
plug LZ4 more easily in the existing code of pg_basebackup.c, and
walmethods.c.

So..  As of now, it is actually possible to cut the pie in three
parts.  There are no real objections to the cleanup of walmethods.c
and the addition of some conditional TAP tests with pg_basebackup and
client-side compression, as far as I can see, only to the option
renaming part.  Attached are two patches, then.  0001 is the cleanup
of walmethods.c to rely the compression method, with more tests (tests
that could be moved into their own patch, as well).  0002 is the
addition of the options I suggested upthread, but we may change that
depending on what gets used for the server-side compression for
consistency so I am not suggesting to merge that until we agree on the
full picture.  The point of this thread was mostly about 0001, so I am
fine to discard 0002.  Thoughts?
--
Michael

Вложения

Re: Refactoring of compression options in pg_basebackup

От
Michael Paquier
Дата:
On Wed, Jan 05, 2022 at 10:22:06AM -0500, Tom Lane wrote:
> I think the existing precedent is to skip the test if tar isn't there,
> cf pg_basebackup/t/010_pg_basebackup.pl.  But certainly the majority of
> buildfarm animals have it.

Even Windows environments should be fine, aka recent edc2332.
--
Michael

Вложения

Re: Refactoring of compression options in pg_basebackup

От
Robert Haas
Дата:
On Thu, Jan 6, 2022 at 12:04 AM Michael Paquier <michael@paquier.xyz> wrote:
> Yeah.  There are cases for both.  I just got to wonder whether it
> makes sense to allow both server-side and client-side compression to
> be used at the same time.  That would be a rather strange case, but
> well, with the correct set of options that could be possible.

I don't think it makes sense to support that. On the one hand, it
doesn't seem useful: compressing already-compressed data doesn't
usually work very well. Alternatively, I suppose the intent could be
to compress one way for transfer and then decompress and recompress
for storage, but that seems too inefficient to take seriously. On the
other hand, it requires a more complex user interface, and it's
already fairly complicated anyway.

> My view of things is slightly different, aka I'd rather keep
> --compress to mean a compression level with an integer option, but
> introduce a --compression-method={lz4,gzip,none}, with -Z being a
> synonym of --compression-method=gzip.  That's at least the path we
> chose for pg_receivewal.  I don't mind sticking with one way or
> another, as what you are proposing is basically the same thing I have
> in mind, but both tools ought to use the same set of options.

Did you mean that -z would be a synonym for --compression-method=gzip?
It doesn't really make sense for -Z to be that, unless it's also
setting the compression level.

My objection to --compress=$LEVEL is that the compression level seems
like it ought to rightfully be subordinate to the choice of algorithm.
In general, there's no reason why a compression algorithm has to offer
a choice of compression levels at all, or why they have to be numbered
0 through 9. For example, lz4 on my system claims to offer compression
levels from 1 through 12, plus a separate set of "fast" compression
levels starting with 1 and going up to an unspecified number. And then
it also has options to favor decompression speed, change the block
size, and a few other parameters. We don't necessarily want to expose
all of those options, but we should structure things so that we could
if it became important. The way to do that is to make the compression
algorithm the primary setting, and then anything else you can set for
that compressor is somehow a subordinate setting.

Put another way, we don't decide first that we want to compress with
level 7, and then afterward decide whether that's gzip, lz4, or bzip2.
We pick the compressor first, and then MAYBE think about changing the
compression level.

> > In the proposed patch, you end up with pg_basebackup
> > --compression-method=lz4 -Z2 meaning compression with lz4 level 2. I
> > find that quite odd, though as with all such things, opinions may
> > vary. In my proposal, that would be an error, because it would be
> > equivalent to --compress=lz4 --compress=gzip --compression-level=2,
> > and would thus involve conflicting compression method specifications.
>
> It seems to me that you did not read the patch closely enough.  The
> attached patch does not add support for LZ4 in pg_basebackup on the
> client-side yet.  Once it gets added, though, the idea is that  using
> --compress with LZ4 would result in an error.  That's what happens
> with pg_receivewal on HEAD, for one.  The patch just shapes things to
> plug LZ4 more easily in the existing code of pg_basebackup.c, and
> walmethods.c.

Well what I was looking at was this:

- printf(_("  -Z, --compress=0-9     compress tar output with given
compression level\n"));
+ printf(_("  -Z, --compress=1-9     compress tar output with given
compression level\n"));
+ printf(_("      --compression-method=METHOD\n"
+ "                         method to compress data\n"));

That seems to show that, post-patch, the argument to -Z would be a
compression level, even if --compression-method were something other
than gzip.

It's possible that I haven't read something carefully enough, but to
me, what I said seems to be a straightforward conclusion based on
looking at the usage help in the patch. So if I came to the wrong
conclusion, perhaps that usage help isn't reflecting the situation you
intend to create, or not as clearly as it ought.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Refactoring of compression options in pg_basebackup

От
Michael Paquier
Дата:
On Thu, Jan 06, 2022 at 09:27:19AM -0500, Robert Haas wrote:
> Did you mean that -z would be a synonym for --compression-method=gzip?
> It doesn't really make sense for -Z to be that, unless it's also
> setting the compression level.

Yes, I meant "-z", not "-Z", to be a synonym of
--compression-method=gzip.  Sorry for the typo.

> My objection to --compress=$LEVEL is that the compression level seems
> like it ought to rightfully be subordinate to the choice of algorithm.
> In general, there's no reason why a compression algorithm has to offer
> a choice of compression levels at all, or why they have to be numbered
> 0 through 9. For example, lz4 on my system claims to offer compression
> levels from 1 through 12, plus a separate set of "fast" compression
> levels starting with 1 and going up to an unspecified number. And then
> it also has options to favor decompression speed, change the block
> size, and a few other parameters. We don't necessarily want to expose
> all of those options, but we should structure things so that we could
> if it became important. The way to do that is to make the compression
> algorithm the primary setting, and then anything else you can set for
> that compressor is somehow a subordinate setting.

For any compression method, that maps to an integer, so..  But I am
not going to fight hard on that.

> Put another way, we don't decide first that we want to compress with
> level 7, and then afterward decide whether that's gzip, lz4, or bzip2.
> We pick the compressor first, and then MAYBE think about changing the
> compression level.

Which is why things should be checked once all the options are
processed.  I'd recommend that you read the option patch a bit more,
that may help.

> Well what I was looking at was this:
>
> - printf(_("  -Z, --compress=0-9     compress tar output with given
> compression level\n"));
> + printf(_("  -Z, --compress=1-9     compress tar output with given
> compression level\n"));
> + printf(_("      --compression-method=METHOD\n"
> + "                         method to compress data\n"));
>
> That seems to show that, post-patch, the argument to -Z would be a
> compression level, even if --compression-method were something other
> than gzip.

Yes, after the patch --compress would be a compression level.  And, if
attempting to use with --compression-method set to "none", or
potentially "lz4", it would just fail.  If not using this "gzip", the
compression level is switched to Z_DEFAULT_COMPRESSION.  That's this
area of the patch, FWIW:
+   /*
+    * Compression-related options.
+    */
+   switch (compression_method)

> It's possible that I haven't read something carefully enough, but to
> me, what I said seems to be a straightforward conclusion based on
> looking at the usage help in the patch. So if I came to the wrong
> conclusion, perhaps that usage help isn't reflecting the situation you
> intend to create, or not as clearly as it ought.

Perhaps the --help output could be clearer, then.  Do you have a
suggestion?

Bringing walmethods.c at the same page for the directory and the tar
methods was my primary goal here, and the tests are a bonus, so I've
applied this part for now, leaving pg_basebackup alone until we figure
out the layer of options we should use.  Perhaps it would be better to
revisit that stuff once the server-side compression has landed.
--
Michael

Вложения

Re: Refactoring of compression options in pg_basebackup

От
Robert Haas
Дата:
On Fri, Jan 7, 2022 at 1:43 AM Michael Paquier <michael@paquier.xyz> wrote:
> Which is why things should be checked once all the options are
> processed.  I'd recommend that you read the option patch a bit more,
> that may help.

I don't think that the problem here is my lack of understanding. I
have two basic concerns about your proposed patch:

1. If, as you propose, we add a new flag --compression-method=METHOD
then how will the user specify server-side compression?
2. If, as we seem to agree, the compression method is more important
than the compression level, then why is the option to set the
less-important thing called just --compress, and the option to set the
more important thing has a longer name?

I proposed to solve both of these problems by using
--compression-level=NUMBER to set the compression level and
--compress=METHOD or --server-compress=METHOD to set the algorithm and
specify on which side it is to be applied. If, instead of doing that,
we go with what you have proposed here, then I don't understand how to
fit server-side compression into the framework in a reasonably concise
way. I think we would end up with something like pg_basebackup
--compression-method=lz4 --compress-on-server, which seems rather long
and awkward. Do you have a better idea?

I think I understand what the patch is doing. I just think it creates
a problem for my patch. And I'd like to know whether you have an idea
how to solve that problem. And if not, then I'd like you to consider
the solution that I am proposing rather than the patch you've already
got.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Refactoring of compression options in pg_basebackup

От
Michael Paquier
Дата:
On Thu, Jan 13, 2022 at 01:36:00PM -0500, Robert Haas wrote:
> 1. If, as you propose, we add a new flag --compression-method=METHOD
> then how will the user specify server-side compression?

This would require a completely different option switch, which is
basically the same thing as what you are suggesting with
--server-compress.

> 2. If, as we seem to agree, the compression method is more important
> than the compression level, then why is the option to set the
> less-important thing called just --compress, and the option to set the
> more important thing has a longer name?

I agree that the method is more important than the level for most
users, and I would not mind dropping completely --compress in favor of
something else, which is something I implied upthread.

> I proposed to solve both of these problems by using
> --compression-level=NUMBER to set the compression level and
> --compress=METHOD or --server-compress=METHOD to set the algorithm and
> specify on which side it is to be applied. If, instead of doing that,
> we go with what you have proposed here, then I don't understand how to
> fit server-side compression into the framework in a reasonably concise
> way. I think we would end up with something like pg_basebackup
> --compression-method=lz4 --compress-on-server, which seems rather long
> and awkward. Do you have a better idea?

Using --compression-level=NUMBER and --server-compress=METHOD to
specify a server-side compression method with a level is fine by me,
but I find the reuse of --compress to specify a compression method
confusing as it maps with the past option we have kept in
pg_basebackup for a couple of years now.  Based on your suggested set
of options, we could then have a --client-compress=METHOD and
--compression-level=NUMBER to specify a client-side compression method
with a level.  If we do that, I guess that we should then:
1) Block the combination of --server-compress and --client-compress.
2) Remove the existing -Z/--compress and -z/--gzip.

You have implied 1) upthread as far as I recall, 2) is something I am
adding on top of it.

> I think I understand what the patch is doing. I just think it creates
> a problem for my patch. And I'd like to know whether you have an idea
> how to solve that problem. And if not, then I'd like you to consider
> the solution that I am proposing rather than the patch you've already
> got.

I am fine to drop this thread's patch with its set of options and work
on top of your proposal, aka what's drafted two paragraphs above.
--
Michael

Вложения

Re: Refactoring of compression options in pg_basebackup

От
Robert Haas
Дата:
On Thu, Jan 13, 2022 at 10:23 PM Michael Paquier <michael@paquier.xyz> wrote:
> Using --compression-level=NUMBER and --server-compress=METHOD to
> specify a server-side compression method with a level is fine by me,
> but I find the reuse of --compress to specify a compression method
> confusing as it maps with the past option we have kept in
> pg_basebackup for a couple of years now.  Based on your suggested set
> of options, we could then have a --client-compress=METHOD and
> --compression-level=NUMBER to specify a client-side compression method
> with a level.  If we do that, I guess that we should then:
> 1) Block the combination of --server-compress and --client-compress.
> 2) Remove the existing -Z/--compress and -z/--gzip.

I could live with that. I'm not sure that --client-compress instead of
reusing --compress is going to be better ... but I don't think it's
awful so much as just not my first choice. I also don't think it would
be horrid to leave -z, --gzip, and -Z as shorthands for the
--client-compress=gzip with --compression-level also in the last case,
instead of removing all that stuff.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Refactoring of compression options in pg_basebackup

От
Michael Paquier
Дата:
On Fri, Jan 14, 2022 at 04:53:12PM -0500, Robert Haas wrote:
> On Thu, Jan 13, 2022 at 10:23 PM Michael Paquier <michael@paquier.xyz> wrote:
>> Using --compression-level=NUMBER and --server-compress=METHOD to
>> specify a server-side compression method with a level is fine by me,
>> but I find the reuse of --compress to specify a compression method
>> confusing as it maps with the past option we have kept in
>> pg_basebackup for a couple of years now.  Based on your suggested set
>> of options, we could then have a --client-compress=METHOD and
>> --compression-level=NUMBER to specify a client-side compression method
>> with a level.  If we do that, I guess that we should then:
>> 1) Block the combination of --server-compress and --client-compress.
>> 2) Remove the existing -Z/--compress and -z/--gzip.
>
> I could live with that. I'm not sure that --client-compress instead of
> reusing --compress is going to be better ... but I don't think it's
> awful so much as just not my first choice. I also don't think it would
> be horrid to leave -z, --gzip, and -Z as shorthands for the
> --client-compress=gzip with --compression-level also in the last case,
> instead of removing all that stuff.

Okay.  So, based on this feedback, I guess that something like the
attached would be what we are looking for.  I have maximized the
amount of code removed with the removal of -z/-Z,  but I won't fight
hard if the consensus is to keep them, either.  We could also keep
-z/--gzip, and stick -Z to the new --compression-level with
--compress removed.
--
Michael

Вложения

Re: Refactoring of compression options in pg_basebackup

От
Magnus Hagander
Дата:
On Fri, Jan 14, 2022 at 10:53 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Thu, Jan 13, 2022 at 10:23 PM Michael Paquier <michael@paquier.xyz> wrote:
> > Using --compression-level=NUMBER and --server-compress=METHOD to
> > specify a server-side compression method with a level is fine by me,
> > but I find the reuse of --compress to specify a compression method
> > confusing as it maps with the past option we have kept in
> > pg_basebackup for a couple of years now.  Based on your suggested set
> > of options, we could then have a --client-compress=METHOD and
> > --compression-level=NUMBER to specify a client-side compression method
> > with a level.  If we do that, I guess that we should then:
> > 1) Block the combination of --server-compress and --client-compress.
> > 2) Remove the existing -Z/--compress and -z/--gzip.
>
> I could live with that. I'm not sure that --client-compress instead of
> reusing --compress is going to be better ... but I don't think it's
> awful so much as just not my first choice. I also don't think it would
> be horrid to leave -z, --gzip, and -Z as shorthands for the
> --client-compress=gzip with --compression-level also in the last case,
> instead of removing all that stuff.

It never makes sense to compress *both* in server and client, right?

One argument in that case for using --compress would be that we could
have that one take options like --compress=gzip (use gzip in the
client) and --compress=server-lz4 (use lz4 on the server), and
automatically make it impossible to do both. And maybe also accept
--compress=client-gzip (which would be the same as just specifying
gzip).

That would be an argument for actually keeping --compress and not
using --client-compress, because obviously it would be silly to have
--client-compress=server-lz4...

And yes, I agree that considering both server and client compression
even if we don't have server compression makes sense, since we don't
want to change things around again when we get it.

We could perhaps also consider accepting --compress=gzip:7
(<method>:<level>) as a way to specify the level, for both client and
server side.

I think having --client-compress and --server-compress separately but
having --compression-level *not* being separate would be confusing and
I *think* that's what the current patch proposes?

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Refactoring of compression options in pg_basebackup

От
Michael Paquier
Дата:
On Sat, Jan 15, 2022 at 04:15:26PM +0100, Magnus Hagander wrote:
> I think having --client-compress and --server-compress separately but
> having --compression-level *not* being separate would be confusing and
> I *think* that's what the current patch proposes?

Yep, your understanding is right.  The last version of the patch
posted does exactly that.
--
Michael

Вложения

Re: Refactoring of compression options in pg_basebackup

От
Robert Haas
Дата:
On Fri, Jan 14, 2022 at 9:54 PM Michael Paquier <michael@paquier.xyz> wrote:
> Okay.  So, based on this feedback, I guess that something like the
> attached would be what we are looking for.  I have maximized the
> amount of code removed with the removal of -z/-Z,  but I won't fight
> hard if the consensus is to keep them, either.  We could also keep
> -z/--gzip, and stick -Z to the new --compression-level with
> --compress removed.

I mean, I really don't understand the benefit of removing -z and -Z.
-z can remain a synonym for --client-compress=gzip and -Z for
--client-compress=gzip --compression-level=$N and nobody will be
harmed. Taking them out reduces backward compatibility for no gain
that I can see.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Refactoring of compression options in pg_basebackup

От
Robert Haas
Дата:
On Sat, Jan 15, 2022 at 10:15 AM Magnus Hagander <magnus@hagander.net> wrote:
> It never makes sense to compress *both* in server and client, right?

Yeah.

> One argument in that case for using --compress would be that we could
> have that one take options like --compress=gzip (use gzip in the
> client) and --compress=server-lz4 (use lz4 on the server), and
> automatically make it impossible to do both. And maybe also accept
> --compress=client-gzip (which would be the same as just specifying
> gzip).
>
> That would be an argument for actually keeping --compress and not
> using --client-compress, because obviously it would be silly to have
> --client-compress=server-lz4...

I still like distinguishing it using the option name, but differently:
--compress=METHOD and --server-compress=METHOD. But this is also a
reasonable proposal.

> And yes, I agree that considering both server and client compression
> even if we don't have server compression makes sense, since we don't
> want to change things around again when we get it.

Especially not because I'm pretty close to having a committable patch
and intend to try to get this into v15. See the refactoring
basebackup.c thread.

> We could perhaps also consider accepting --compress=gzip:7
> (<method>:<level>) as a way to specify the level, for both client and
> server side.

That's not crazy either. Initially I was thinking --compression=gzip7
but then it turns out lz4 is one of the methods we want to use, and
lz47 would be, err, slightly unclear. lz4:7 is better, for sure.

> I think having --client-compress and --server-compress separately but
> having --compression-level *not* being separate would be confusing and
> I *think* that's what the current patch proposes?

Depends on what you mean by "separate". There's no proposal to have
--client-compression-level and also --server-compression-level.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Refactoring of compression options in pg_basebackup

От
Magnus Hagander
Дата:
On Mon, Jan 17, 2022 at 3:18 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Sat, Jan 15, 2022 at 10:15 AM Magnus Hagander <magnus@hagander.net> wrote:
> > It never makes sense to compress *both* in server and client, right?
>
> Yeah.
>
> > One argument in that case for using --compress would be that we could
> > have that one take options like --compress=gzip (use gzip in the
> > client) and --compress=server-lz4 (use lz4 on the server), and
> > automatically make it impossible to do both. And maybe also accept
> > --compress=client-gzip (which would be the same as just specifying
> > gzip).
> >
> > That would be an argument for actually keeping --compress and not
> > using --client-compress, because obviously it would be silly to have
> > --client-compress=server-lz4...
>
> I still like distinguishing it using the option name, but differently:
> --compress=METHOD and --server-compress=METHOD. But this is also a
> reasonable proposal.
>
> > And yes, I agree that considering both server and client compression
> > even if we don't have server compression makes sense, since we don't
> > want to change things around again when we get it.
>
> Especially not because I'm pretty close to having a committable patch
> and intend to try to get this into v15. See the refactoring
> basebackup.c thread.
>
> > We could perhaps also consider accepting --compress=gzip:7
> > (<method>:<level>) as a way to specify the level, for both client and
> > server side.
>
> That's not crazy either. Initially I was thinking --compression=gzip7
> but then it turns out lz4 is one of the methods we want to use, and
> lz47 would be, err, slightly unclear. lz4:7 is better, for sure.
>
> > I think having --client-compress and --server-compress separately but
> > having --compression-level *not* being separate would be confusing and
> > I *think* that's what th e current patch proposes?
>
> Depends on what you mean by "separate". There's no proposal to have
> --client-compression-level and also --server-compression-level.

I mean that I think it would be confusing to have
--client-compression=x, --server-compression=y, and
compression-level=z as the options. Why, in that scenario, does the
"compression" part get two parameters, but the "compression level"
part get one. In that case, there should either be --compression=x and
--compression-level=z (which is what I'd suggest, per above), or there
should be --client-compression, --server-compression,
--client-compression-level and --server-compression-level, for it to
be consistent. But having one of them be split in two parameters and
the other one not, is what I'd consider confusing.

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Refactoring of compression options in pg_basebackup

От
Magnus Hagander
Дата:
On Mon, Jan 17, 2022 at 4:56 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Sat, Jan 15, 2022 at 04:15:26PM +0100, Magnus Hagander wrote:
> > I think having --client-compress and --server-compress separately but
> > having --compression-level *not* being separate would be confusing and
> > I *think* that's what the current patch proposes?
>
> Yep, your understanding is right.  The last version of the patch
> posted does exactly that.

Ok. Then that is exactly what I think is confusing, and thus object to :)

-- 
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/



Re: Refactoring of compression options in pg_basebackup

От
Robert Haas
Дата:
On Mon, Jan 17, 2022 at 9:27 AM Magnus Hagander <magnus@hagander.net> wrote:
> I mean that I think it would be confusing to have
> --client-compression=x, --server-compression=y, and
> compression-level=z as the options. Why, in that scenario, does the
> "compression" part get two parameters, but the "compression level"
> part get one. In that case, there should either be --compression=x and
> --compression-level=z (which is what I'd suggest, per above), or there
> should be --client-compression, --server-compression,
> --client-compression-level and --server-compression-level, for it to
> be consistent. But having one of them be split in two parameters and
> the other one not, is what I'd consider confusing.

I don't find that confusing, but confusion is a pretty subjective
experience so that doesn't really prove anything. Of the two
alternatives that you propose, I prefer --compress=["server-"]METHOD
and --compression-level=NUMBER to having both
--client-compression-level and --server-compression-level. To me,
that's still a bit more surprising than my proposal, because having
the client compress stuff and having the server compress stuff feel
like somewhat different kinds of things ... but it's unsurprising that
I like my own proposal, and what really matters is that we converge
relatively quickly on something we can all live with.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Refactoring of compression options in pg_basebackup

От
Alvaro Herrera
Дата:
On 2022-Jan-17, Robert Haas wrote:

> Of the two
> alternatives that you propose, I prefer --compress=["server-"]METHOD
> and --compression-level=NUMBER to having both
> --client-compression-level and --server-compression-level. To me,
> that's still a bit more surprising than my proposal, because having
> the client compress stuff and having the server compress stuff feel
> like somewhat different kinds of things ... but it's unsurprising that
> I like my own proposal, and what really matters is that we converge
> relatively quickly on something we can all live with.

I think having a single option where you specify everything is simpler.
I propose we accept these forms:

--compress=[{server,client}-]method[:level]    new in 15
--compress=level        (accepted by 14)
-Z level            (accepted by 14)
-z                (accepted by 14)

This way, compatibility with the existing release is maintained; and we
introduce all the new functionality without cluttering the interface.

So starting from 15, in addition to the already supported forms, users
will be able to do

--compress=server-gzip:8    (completely specified options)
--compress=client-lz4        (client-side lz4 compression, default level)
--compress=zstd            (server-side zstd compression)

there's a bit of string parsing required to implement, but that seems
okay to me -- the UI seems clear enough and easily documented.



One missing feature in this spec is the ability to specify compression
to be used with whatever the default method is.  I'm not sure we want to
allow for that, but it could be
--compress=client
--compress=server
which uses whatever method is default, with whatever level is default,
at either side.

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/



Re: Refactoring of compression options in pg_basebackup

От
"David G. Johnston"
Дата:
On Mon, Jan 17, 2022 at 8:17 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Jan 17, 2022 at 9:27 AM Magnus Hagander <magnus@hagander.net> wrote:
> I mean that I think it would be confusing to have
> --client-compression=x, --server-compression=y, and
> compression-level=z as the options. Why, in that scenario, does the
> "compression" part get two parameters, but the "compression level"
> part get one. In that case, there should either be --compression=x and
> --compression-level=z (which is what I'd suggest, per above), or there
> should be --client-compression, --server-compression,
> --client-compression-level and --server-compression-level, for it to
> be consistent. But having one of them be split in two parameters and
> the other one not, is what I'd consider confusing.

I don't find that confusing, but confusion is a pretty subjective
experience so that doesn't really prove anything. Of the two
alternatives that you propose, I prefer --compress=["server-"]METHOD
and --compression-level=NUMBER to having both
--client-compression-level and --server-compression-level. To me,
that's still a bit more surprising than my proposal, because having
the client compress stuff and having the server compress stuff feel
like somewhat different kinds of things ... but it's unsurprising that
I like my own proposal, and what really matters is that we converge
relatively quickly on something we can all live with.


Quick look-over of the email thread:

The bare "--compress" option isn't liked anymore.  I would prefer that we officially deprecate -z, -Z, and --compress but otherwise leave them alone for backward compatibility.

We do not want to entertain performing both server and client compression.  It thus seems undesirable to have different sets of options for them.  Therefore:

--compression-method={gzip|lz4|...}
--compression-level={string} (which can be any string value, the validation logic for compression-method will evaluate what is provided and error if it is not happy, each method would have its own default)
--compression-location={client|server} (Can be added once server compression is active.  I would suggest it would default to server-side compression - which would be a change in behavior by necessity)

If you really want a concise option here I say we make available:

--compression={method}[;string][;{client|server}]

The two trailing optional (with default) sub-arguments are unambiguous as to which one is present if only two sub-arguments are provided.

David J.

Re: Refactoring of compression options in pg_basebackup

От
"David G. Johnston"
Дата:
On Mon, Jan 17, 2022 at 8:41 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2022-Jan-17, Robert Haas wrote:

> Of the two
> alternatives that you propose, I prefer --compress=["server-"]METHOD
> and --compression-level=NUMBER to having both
> --client-compression-level and --server-compression-level. To me,
> that's still a bit more surprising than my proposal, because having
> the client compress stuff and having the server compress stuff feel
> like somewhat different kinds of things ... but it's unsurprising that
> I like my own proposal, and what really matters is that we converge
> relatively quickly on something we can all live with.

I think having a single option where you specify everything is simpler.
I propose we accept these forms:

--compress=[{server,client}-]method[:level]     new in 15
--compress=level                (accepted by 14)
-Z level                        (accepted by 14)
-z                              (accepted by 14)

I am also in favor of this option.  Whether this is better than deprecating --compress and introducing --compression I am having trouble deciding.  My personal preference is to add --compression and leave --compress alone and deprecated; but we don't usually do anything with deprecations and having users seeing both --compress and --compression out in the wild, even if never at the same time, is bound to elicit questions (though so is seeing --compress with "number only" rules and "composite value" rules...)


This way, compatibility with the existing release is maintained; and we
introduce all the new functionality without cluttering the interface.

I would still "clutter" the interface with:

--compress-method
--compress-options (extending from my prior post, I would make this more generic - i.e., not named "level" -  and deal with valid values, meaning, and format, in a per-method description in the documentation)
--compress-location

Users have different preferences for what they want to use, and it provides a level of self-documentation for the composite specification and a degree of explicitness for the actual documentation of the methods.

One missing feature in this spec is the ability to specify compression
to be used with whatever the default method is.  I'm not sure we want to
allow for that

I'm not too keen on making a default method in code.  Saying "if in doubt gzip is a widely used compression method." in the documentation seems sufficient.

David J.

Re: Refactoring of compression options in pg_basebackup

От
Robert Haas
Дата:
On Mon, Jan 17, 2022 at 11:50 AM David G. Johnston
<david.g.johnston@gmail.com> wrote:
>> I think having a single option where you specify everything is simpler.
>> I propose we accept these forms:
>>
>> --compress=[{server,client}-]method[:level]     new in 15
>> --compress=level                (accepted by 14)
>> -Z level                        (accepted by 14)
>> -z                              (accepted by 14)
>
> I am also in favor of this option.  Whether this is better than deprecating --compress and introducing --compression
Iam having trouble deciding.  My personal preference is to add --compression and leave --compress alone and deprecated;
butwe don't usually do anything with deprecations and having users seeing both --compress and --compression out in the
wild,even if never at the same time, is bound to elicit questions (though so is seeing --compress with "number only"
rulesand "composite value" rules...) 

Alvaro's proposal is fine with me. I don't see any value in replacing
--compress with --compression. It's longer but not superior in any way
that I can see. Having both seems worst of all -- that's just
confusing.

> I'm not too keen on making a default method in code.  Saying "if in doubt gzip is a widely used compression method."
inthe documentation seems sufficient. 

Yeah, I agree that a default method doesn't seem necessary. People who
want to compress without thinking hard can use -z; others can say what
they want.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: Refactoring of compression options in pg_basebackup

От
Michael Paquier
Дата:
On Mon, Jan 17, 2022 at 12:48:12PM -0500, Robert Haas wrote:
> Alvaro's proposal is fine with me. I don't see any value in replacing
> --compress with --compression. It's longer but not superior in any way
> that I can see. Having both seems worst of all -- that's just
> confusing.

Okay, that looks like a consensus, then.  Robert, would it be better
to gather all that on the thread that deals with the server-side
compression?  Doing that here would be fine by me, with the option to
only specify the client.  Now it would be a bit weird to do things
with only the client part and not the server part :)
--
Michael

Вложения

Re: Refactoring of compression options in pg_basebackup

От
Robert Haas
Дата:
On Mon, Jan 17, 2022 at 8:36 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Mon, Jan 17, 2022 at 12:48:12PM -0500, Robert Haas wrote:
> > Alvaro's proposal is fine with me. I don't see any value in replacing
> > --compress with --compression. It's longer but not superior in any way
> > that I can see. Having both seems worst of all -- that's just
> > confusing.
>
> Okay, that looks like a consensus, then.  Robert, would it be better
> to gather all that on the thread that deals with the server-side
> compression?  Doing that here would be fine by me, with the option to
> only specify the client.  Now it would be a bit weird to do things
> with only the client part and not the server part :)

I think it could make sense for you implement
--compress=METHOD[:LEVEL], keeping -z and -Z N as synonyms for
--compress=gzip and --compress=gzip:N, and with --compress=N being
interpreted as --compress=gzip:N. Then I'll generalize that to
--compress=[{client|server}-]METHOD[:LEVEL] on the other thread. I
think we should leave it where, if you don't say either client or
server, you get client, because that's the historical behavior.

If that doesn't work for you, please let me know what you would prefer.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Refactoring of compression options in pg_basebackup

От
Michael Paquier
Дата:
On Tue, Jan 18, 2022 at 10:04:56AM -0500, Robert Haas wrote:
> I think it could make sense for you implement
> --compress=METHOD[:LEVEL], keeping -z and -Z N as synonyms for
> --compress=gzip and --compress=gzip:N, and with --compress=N being
> interpreted as --compress=gzip:N. Then I'll generalize that to
> --compress=[{client|server}-]METHOD[:LEVEL] on the other thread. I
> think we should leave it where, if you don't say either client or
> server, you get client, because that's the historical behavior.
>
> If that doesn't work for you, please let me know what you would prefer.

WFM.  Attached is a patch that extends --compress to handle a method
with an optional compression level.  Some extra tests are added to
cover all that.

Thoughts?
--
Michael

Вложения

Re: Refactoring of compression options in pg_basebackup

От
Robert Haas
Дата:
On Tue, Jan 18, 2022 at 11:27 PM Michael Paquier <michael@paquier.xyz> wrote:
> WFM.  Attached is a patch that extends --compress to handle a method
> with an optional compression level.  Some extra tests are added to
> cover all that.

I think that this will reject something like --compress=nonetheless by
telling you that 't' is not a valid separator. I think it would be
better to code this so that we first identify the portion preceding
the first colon, or the whole string if there is no colon. Then we
check whether that part is a compression method that we recognize. If
not, we complain. If so, we then check whatever is after the separator
for validity - and this might differ by type. For example, we could
then immediately reject none:4, and if in the future we want to allow
lz4:fast3, we could.

I think the code that handles the bare integer case should be at the
top of the function and should return, because that code is short.
Then the rest of the function doesn't need to be indented as deeply.

"First check after the compression method" seems like it would be
better written "First check for the compression method" or "First
check the compression method".

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Refactoring of compression options in pg_basebackup

От
Alvaro Herrera
Дата:
On 2022-Jan-19, Michael Paquier wrote:

> +    printf(_("  -Z, --compress=[{gzip,none}[:LEVEL] or [LEVEL]\n"
> +             "                         compress tar output with given compression method or level\n"));

Note there is an extra [ before the {gzip bit.

-- 
Álvaro Herrera         PostgreSQL Developer  —  https://www.EnterpriseDB.com/
"I suspect most samba developers are already technically insane...
Of course, since many of them are Australians, you can't tell." (L. Torvalds)



Re: Refactoring of compression options in pg_basebackup

От
Michael Paquier
Дата:
On Wed, Jan 19, 2022 at 12:50:44PM -0300, Alvaro Herrera wrote:
> Note there is an extra [ before the {gzip bit.

Thanks!
--
Michael

Вложения

Re: Refactoring of compression options in pg_basebackup

От
Michael Paquier
Дата:
On Wed, Jan 19, 2022 at 08:35:23AM -0500, Robert Haas wrote:
> I think that this will reject something like --compress=nonetheless by
> telling you that 't' is not a valid separator. I think it would be
> better to code this so that we first identify the portion preceding
> the first colon, or the whole string if there is no colon. Then we
> check whether that part is a compression method that we recognize. If
> not, we complain.

Well, if no colon is specified, we still need to check if optarg
is a pure integer if it does not match any of the supported methods,
as --compress=0 should be backward compatible with no compression and
--compress=1~9 should imply gzip, no?

> If so, we then check whatever is after the separator
> for validity - and this might differ by type. For example, we could
> then immediately reject none:4, and if in the future we want to allow
> lz4:fast3, we could.

Okay.

> I think the code that handles the bare integer case should be at the
> top of the function and should return, because that code is short.
> Then the rest of the function doesn't need to be indented as deeply.

Done this way, I hope.
--
Michael

Вложения

Re: Refactoring of compression options in pg_basebackup

От
Robert Haas
Дата:
On Thu, Jan 20, 2022 at 2:03 AM Michael Paquier <michael@paquier.xyz> wrote:
> Well, if no colon is specified, we still need to check if optarg
> is a pure integer if it does not match any of the supported methods,
> as --compress=0 should be backward compatible with no compression and
> --compress=1~9 should imply gzip, no?

Yes.

> Done this way, I hope.

This looks better, but this part could be switched around:

+ /*
+ * Check if the first part of the string matches with a supported
+ * compression method.
+ */
+ if (pg_strcasecmp(firstpart, "gzip") != 0 &&
+ pg_strcasecmp(firstpart, "none") != 0)
+ {
+ /*
+ * It does not match anything known, so check for the
+ * backward-compatible case of only an integer, where the implied
+ * compression method changes depending on the level value.
+ */
+ if (!option_parse_int(firstpart, "-Z/--compress", 0,
+   INT_MAX, levelres))
+ exit(1);
+
+ *methodres = (*levelres > 0) ?
+ COMPRESSION_GZIP : COMPRESSION_NONE;
+ return;
+ }
+
+ /* Supported method found. */
+ if (pg_strcasecmp(firstpart, "gzip") == 0)
+ *methodres = COMPRESSION_GZIP;
+ else if (pg_strcasecmp(firstpart, "none") == 0)
+ *methodres = COMPRESSION_NONE;

You don't need to test for gzip and none in two places each. Just make
the block with the "It does not match ..." comment the "else" clause
for this last part.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Refactoring of compression options in pg_basebackup

От
Michael Paquier
Дата:
On Thu, Jan 20, 2022 at 10:25:43AM -0500, Robert Haas wrote:
> You don't need to test for gzip and none in two places each. Just make
> the block with the "It does not match ..." comment the "else" clause
> for this last part.

Indeed, that looks better.  I have done an extra pass on this stuff
this morning, and applied it, so we should be done here.
--
Michael

Вложения

Re: Refactoring of compression options in pg_basebackup

От
Robert Haas
Дата:
On Thu, Jan 20, 2022 at 9:18 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Thu, Jan 20, 2022 at 10:25:43AM -0500, Robert Haas wrote:
> > You don't need to test for gzip and none in two places each. Just make
> > the block with the "It does not match ..." comment the "else" clause
> > for this last part.
>
> Indeed, that looks better.  I have done an extra pass on this stuff
> this morning, and applied it, so we should be done here.

Thanks. One thing I just noticed is that the enum we're using here is
called WalCompressionMethod. But we're not compressing WAL. We're
compressing tarfiles of the data directory.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Refactoring of compression options in pg_basebackup

От
Michael Paquier
Дата:
On Fri, Jan 21, 2022 at 09:57:41AM -0500, Robert Haas wrote:
> Thanks. One thing I just noticed is that the enum we're using here is
> called WalCompressionMethod. But we're not compressing WAL. We're
> compressing tarfiles of the data directory.

Also, having this enum in walmethods.h is perhaps not the best place
either, even more if you plan to use that in pg_basebackup for the
server-side compression.  One idea is to rename this enum to
DataCompressionMethod, moving it into a new header, like common.h as
of the attached.
--
Michael

Вложения

Re: Refactoring of compression options in pg_basebackup

От
Robert Haas
Дата:
On Sat, Jan 22, 2022 at 12:47 AM Michael Paquier <michael@paquier.xyz> wrote:
> Also, having this enum in walmethods.h is perhaps not the best place
> either, even more if you plan to use that in pg_basebackup for the
> server-side compression.  One idea is to rename this enum to
> DataCompressionMethod, moving it into a new header, like common.h as
> of the attached.

Well, we also have CompressionAlgorithm competing for the same job.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: Refactoring of compression options in pg_basebackup

От
Michael Paquier
Дата:
On Tue, Jan 25, 2022 at 03:14:13PM -0500, Robert Haas wrote:
> On Sat, Jan 22, 2022 at 12:47 AM Michael Paquier <michael@paquier.xyz> wrote:
>> Also, having this enum in walmethods.h is perhaps not the best place
>> either, even more if you plan to use that in pg_basebackup for the
>> server-side compression.  One idea is to rename this enum to
>> DataCompressionMethod, moving it into a new header, like common.h as
>> of the attached.
>
> Well, we also have CompressionAlgorithm competing for the same job.

Sure, but I don't think that it is a good idea to unify that yet, at
least not until pg_dump is able to handle LZ4 as an option, as the
main benefit that we'd gain here is to be able to change the code to a
switch/case without defaults where we would detect code paths that
require a refresh once adding support for a new option.
--
Michael

Вложения

Re: Refactoring of compression options in pg_basebackup

От
Robert Haas
Дата:
On Tue, Jan 25, 2022 at 8:15 PM Michael Paquier <michael@paquier.xyz> wrote:
> Sure, but I don't think that it is a good idea to unify that yet, at
> least not until pg_dump is able to handle LZ4 as an option, as the
> main benefit that we'd gain here is to be able to change the code to a
> switch/case without defaults where we would detect code paths that
> require a refresh once adding support for a new option.

I think those places could just throw a "lz4 compression is not
supported" elog() and then you could just grep for everyplace where
that string appears. But I am not of a mind to fight about it. I was
just pointing out the duplication.

-- 
Robert Haas
EDB: http://www.enterprisedb.com