Обсуждение: refactoring basebackup.c

Поиск
Список
Период
Сортировка

refactoring basebackup.c

От
Robert Haas
Дата:
Hi,

I'd like to propose a fairly major refactoring of the server's
basebackup.c. The current code isn't horrific or anything, but the
base backup mechanism has grown quite a few features over the years
and all of the code knows about all of the features. This is going to
make it progressively more difficult to add additional features, and I
have a few in mind that I'd like to add, as discussed below and also
on several other recent threads.[1][2] The attached patch set shows
what I have in mind. It needs more work, but I believe that there's
enough here for someone to review the overall direction, and even some
of the specifics, and hopefully give me some useful feedback.

This patch set is built around the idea of creating two new
abstractions, a base backup sink -- or bbsink -- and a base backup
archiver -- or bbarchiver.  Each of these works like a foreign data
wrapper or custom scan or TupleTableSlot. That is, there's a table of
function pointers that act like method callbacks. Every implementation
can allocate a struct of sufficient size for its own bookkeeping data,
and the first member of the struct is always the same, and basically
holds the data that all implementations must store, including a
pointer to the table of function pointers. If we were using C++,
bbarchiver and bbsink would be abstract base classes.

They represent closely-related concepts, so much so that I initially
thought we could get by with just one new abstraction layer. I found
on experimentation that this did not work well, so I split it up into
two and that worked a lot better. The distinction is this: a bbsink is
something to which you can send a bunch of archives -- currently, each
would be a tarfile -- and also a backup manifest. A bbarchiver is
something to which you send every file in the data directory
individually, or at least the ones that are getting backed up, plus
any that are being injected into the backup (e.g. the backup_label).
Commonly, a bbsink will do something with the data and then forward it
to a subsequent bbsink, or a bbarchiver will do something with the
data and then forward it to a subsequent bbarchiver or bbsink. For
example, there's a bbarchiver_tar object which, like any bbarchiver,
sees all the files and their contents as input. The output is a
tarfile, which gets send to a bbsink. As things stand in the patch set
now, the tar archives are ultimately sent to the "libpq" bbsink, which
sends them to the client.

In the future, we could have other bbarchivers. For example, we could
add "pax", "zip", or "cpio" bbarchiver which produces archives of that
format, and any given backup could choose which one to use. Or, we
could have a bbarchiver that runs each individual file through a
compression algorithm and then forwards the resulting data to a
subsequent bbarchiver. That would make it easy to produce a tarfile of
individually compressed files, which is one possible way of creating a
seekable achive.[3] Likewise, we could have other bbsinks. For
example, we could have a "localdisk" bbsink that cause the server to
write the backup somewhere in the local filesystem instead of
streaming it out over libpq. Or, we could have an "s3" bbsink that
writes the archives to S3. We could also have bbsinks that compresses
the input archives using some compressor (e.g. lz4, zstd, bzip2, ...)
and forward the resulting compressed archives to the next bbsink in
the chain. I'm not trying to pass judgement on whether any of these
particular things are things we want to do, nor am I saying that this
patch set solves all the problems with doing them. However, I believe
it will make such things a whole lot easier to implement, because all
of the knowledge about whatever new functionality is being added is
centralized in one place, rather than being spread across the entirety
of basebackup.c. As an example of this, look at how 0010 changes
basebackup.c and basebackup_tar.c: afterwards, basebackup.c no longer
knows anything that is tar-specific, whereas right now it knows about
tar-specific things in many places.

Here's an overview of this patch set:

0001-0003 are cleanup patches that I have posted for review on
separate threads.[4][5] They are included here to make it easy to
apply this whole series if someone wishes to do so.

0004 is a minor refactoring that reduces by 1 the number of functions
in basebackup.c that know about the specifics of tarfiles. It is just
a preparatory patch and probably not very interesting.

0005 invents the bbsink abstraction.

0006 creates basebackup_libpq.c and moves all code that knows about
the details of sending archives via libpq there. The functionality is
exposed for use by basebackup.c as a new type of bbsink, bbsink_libpq.

0007 creates basebackup_throttle.c and moves all code that knows about
throttling backups there. The functionality is exposed for use by
basebackup.c as a new type of bbsink, bbsink_throttle. This means that
the throttling logic could be reused to throttle output to any final
destination. Essentially, this is a bbsink that just passes everything
it gets through to the next bbsink, but with a rate limit. If
throttling's not enabled, no bbsink_throttle object is created, so all
of the throttling code is completely out of the execution pipeline.

0008 creates basebackup_progress.c and moves all code that knows about
progress reporting there. The functionality is exposed for use by
basebackup.c as a new type of bbsink, bbsink_progress. Since the
abstraction doesn't fit perfectly in this case, some extra functions
are added to work around the problem. This is not entirely elegant,
but I don't think it's still an improvement over what we have now, and
I don't have a better idea.

0009 invents the bbarchiver abstraction.

0010 invents two new bbarchivers, a tar bbarchiver and a tarsize
bbarchiver, and refactors basebackup.c to make use of them. The tar
bbarchiver puts the files it sees into tar archives and forwards the
resulting archives to a bbsink. The tarsize bbarchiver is used to
support the PROGRESS option to the BASE_BACKUP command. It just
estimates the size of the backup by summing up the file sizes without
reading them. This approach is good for a couple of reasons. First,
without something like this, it's impossible to keep basebackup.c from
knowing something about the tar format, because the PROGRESS option
doesn't just figure out how big the files to be backed up are: it
figures out how big it thinks the archives will be, and that involves
tar-specific considerations. This area needs more work, as the whole
idea of measuring progress by estimating the archive size is going to
break down as soon as server-side compression is in the picture.
Second, this makes the code path that we use for figuring out the
backup size details much more similar to the path we use for
performing the actual backup. For instance, with this patch, we
include the exact same files in the calculation that we will include
in the backup, and in the same order, something that's not true today.
The basebackup_tar.c file added by this patch is sadly lacking in
comments, which I will add in a future version of the patch set. I
think, though, that it will not be too unclear what's going on here.

0011 invents another new kind of bbarchiver. This bbarchiver just
eavesdrops on the stream of files to facilitate backup manifest
construction, and then forwards everything through to a subsequent
bbarchiver. Like bbsink_throttle, it can be entirely omitted if not
used. This patch is a bit clunky at the moment and needs some polish,
but it is another demonstration of how these abstractions can be used
to simplify basebackup.c, so that basebackup.c only has to worry about
determining what should be backed up and not have to worry much about
all the specific things that need to be done as part of that.

Although this patch set adds quite a bit of code on net, it makes
basebackup.c considerably smaller and simpler, removing more than 400
lines of code from that file, about 20% of the current total. There
are some gratifying changes vs. the status quo. For example, in
master, we have this:

sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
                bool sendtblspclinks, backup_manifest_info *manifest,
                const char *spcoid)

Notably, the sizeonly flag makes the function not do what the name of
the function suggests that it does. Also, we've got to pass some extra
fields through to enable specific features. With the patch set, the
equivalent function looks like this:

archive_directory(bbarchiver *archiver, const char *path, int basepathlen,
                                  List *tablespaces, bool sendtblspclinks)

The question "what should I do with the directories and files we find
as we recurse?" is now answered by the choice of which bbarchiver to
pass to the function, rather than by the values of sizeonly, manifest,
and spcoid. That's not night and day, but I think it's better,
especially as you imagine adding more features in the future. The
really important part, for me, is that you can make the bbarchiver do
anything you like without needing to make any more changes to this
function. It just arranges to invoke your callbacks. You take it from
there.

One pretty major question that this patch set doesn't address is what
the user interface for any of the hypothetical features mentioned
above ought to look like, or how basebackup.c ought to support them.
The syntax for the BASE_BACKUP command, like the contents of
basebackup.c, has grown organically, and doesn't seem to be very
scalable. Also, the wire protocol - a series of CopyData results which
the client is entirely responsible for knowing how to interpret and
about which the server provides only minimal information - doesn't
much lend itself to extensibility. Some careful design work is likely
needed in both areas, and this patch does not try to do any of it. I
am quite interested in discussing those questions, but I felt that
they weren't the most important problems to solve first.

What do you all think?

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] http://postgr.es/m/CA+TgmoZubLXYR+Pd_gi3MVgyv5hQdLm-GBrVXkun-Lewaw12Kg@mail.gmail.com
[2] http://postgr.es/m/CA+TgmoYr7+-0_vyQoHbTP5H3QGZFgfhnrn6ewDteF=kUqkG=Fw@mail.gmail.com
[3] http://postgr.es/m/CA+TgmoZQCoCyPv6fGoovtPEZF98AXCwYDnSB0=p5XtxNY68r_A@mail.gmail.com
and following
[4] http://postgr.es/m/CA+TgmoYq+59SJ2zBbP891ngWPA9fymOqntqYcweSDYXS2a620A@mail.gmail.com
[5] http://postgr.es/m/CA+TgmobWbfReO9-XFk8urR1K4wTNwqoHx_v56t7=T8KaiEoKNw@mail.gmail.com



Re: refactoring basebackup.c

От
Robert Haas
Дата:

Re: refactoring basebackup.c

От
Andres Freund
Дата:
Hi,

On 2020-05-08 16:53:09 -0400, Robert Haas wrote:
> They represent closely-related concepts, so much so that I initially
> thought we could get by with just one new abstraction layer. I found
> on experimentation that this did not work well, so I split it up into
> two and that worked a lot better. The distinction is this: a bbsink is
> something to which you can send a bunch of archives -- currently, each
> would be a tarfile -- and also a backup manifest. A bbarchiver is
> something to which you send every file in the data directory
> individually, or at least the ones that are getting backed up, plus
> any that are being injected into the backup (e.g. the backup_label).
> Commonly, a bbsink will do something with the data and then forward it
> to a subsequent bbsink, or a bbarchiver will do something with the
> data and then forward it to a subsequent bbarchiver or bbsink. For
> example, there's a bbarchiver_tar object which, like any bbarchiver,
> sees all the files and their contents as input. The output is a
> tarfile, which gets send to a bbsink. As things stand in the patch set
> now, the tar archives are ultimately sent to the "libpq" bbsink, which
> sends them to the client.

Hm.

I wonder if there's cases where recursively forwarding like this will
cause noticable performance effects. The only operation that seems
frequent enough to potentially be noticable would be "chunks" of the
file. So perhaps it'd be good to make sure we read in large enough
chunks?

> 0010 invents two new bbarchivers, a tar bbarchiver and a tarsize
> bbarchiver, and refactors basebackup.c to make use of them. The tar
> bbarchiver puts the files it sees into tar archives and forwards the
> resulting archives to a bbsink. The tarsize bbarchiver is used to
> support the PROGRESS option to the BASE_BACKUP command. It just
> estimates the size of the backup by summing up the file sizes without
> reading them. This approach is good for a couple of reasons. First,
> without something like this, it's impossible to keep basebackup.c from
> knowing something about the tar format, because the PROGRESS option
> doesn't just figure out how big the files to be backed up are: it
> figures out how big it thinks the archives will be, and that involves
> tar-specific considerations.

ISTM that it's not actually good to have the progress calculations
include the tar overhead. As you say:

> This area needs more work, as the whole idea of measuring progress by
> estimating the archive size is going to break down as soon as
> server-side compression is in the picture.

This, to me, indicates that we should measure the progress solely based
on how much of the "source" data was processed. The overhead of tar, the
reduction due to compression, shouldn't be included.


> What do you all think?

I've not though enough about the specifics, but I think it looks like
it's going roughly in a better direction.

One thing I wonder about is how stateful the interface is. Archivers
will pretty much always track which file is currently open etc. Somehow
such a repeating state machine seems a bit ugly - but I don't really
have a better answer.

Greetings,

Andres Freund



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Fri, May 8, 2020 at 5:27 PM Andres Freund <andres@anarazel.de> wrote:
> I wonder if there's cases where recursively forwarding like this will
> cause noticable performance effects. The only operation that seems
> frequent enough to potentially be noticable would be "chunks" of the
> file. So perhaps it'd be good to make sure we read in large enough
> chunks?

Yeah, that needs to be tested. Right now the chunk size is 32kB but it
might be a good idea to go larger. Another thing is that right now the
chunk size is tied to the protocol message size, and I'm not sure
whether the size that's optimal for disk reads is also optimal for
protocol messages.

> This, to me, indicates that we should measure the progress solely based
> on how much of the "source" data was processed. The overhead of tar, the
> reduction due to compression, shouldn't be included.

I don't think it's a particularly bad thing that we include a small
amount of progress for sending an empty file, a directory, or a
symlink. That could make the results more meaningful if you have a
database with lots of empty relations in it. However, I agree that the
effect of compression shouldn't be included. To get there, I think we
need to redesign the wire protocol. Right now, the server has no way
of letting the client know how many uncompressed bytes it's sent, and
the client has no way of figuring it out without uncompressing, which
seems like something we want to avoid.

There are some other problems with the current wire protocol, too:

1. The syntax for the BASE_BACKUP command is large and unwieldy. We
really ought to adopt an extensible options syntax, like COPY, VACUUM,
EXPLAIN, etc. do, rather than using a zillion ad-hoc bolt-ons, each
with bespoke lexer and parser support.

2. The client is sent a list of tablespaces and is supposed to use
that to expect an equal number of archives, computing the name for
each one on the client side from the tablespace info. However, I think
we should be able to support modes like "put all the tablespaces in a
single archive" or "send a separate archive for every 256GB" or "ship
it all to the cloud and don't send me any archives". To get there, I
think we should have the server send the archive name to the clients,
and the client should just keep receiving the next archive until it's
told that there are no more. Then if there's one archive or ten
archives or no archives, the client doesn't have to care. It just
receives what the server sends until it hears that there are no more.
It also doesn't know how the server is naming the archives; the server
can, for example, adjust the archive names based on which compression
format is being chosen, without knowledge of the server's naming
conventions needing to exist on the client side.

I think we should keep support for the current BASE_BACKUP command but
either add a new variant using an extensible options, or else invent a
whole new command with a different name (BACKUP, SEND_BACKUP,
whatever) that takes extensible options. This command should send back
all the archives and the backup manifest using a single COPY stream
rather than multiple COPY streams. Within the COPY stream, we'll
invent a sub-protocol, e.g. based on the first letter of the message,
e.g.:

t = Tablespace boundary. No further message payload. Indicates, for
progress reporting purposes, that we are advancing to the next
tablespace.
f = Filename. The remainder of the message payload is the name of the
next file that will be transferred.
d = Data. The next four bytes contain the number of uncompressed bytes
covered by this message, for progress reporting purposes. The rest of
the message is payload, possibly compressed. Could be empty, if the
data is being shipped elsewhere, and these messages are only being
sent to update the client's notion of progress.

> I've not though enough about the specifics, but I think it looks like
> it's going roughly in a better direction.

Good to hear.

> One thing I wonder about is how stateful the interface is. Archivers
> will pretty much always track which file is currently open etc. Somehow
> such a repeating state machine seems a bit ugly - but I don't really
> have a better answer.

I thought about that a bit, too. There might be some way to unify that
by having some common context object that's defined by basebackup.c
and all archivers get it, so that they have some commonly-desired
details without needing bespoke code, but I'm not sure at this point
whether that will actually produce a nicer result. Even if we don't
have it initially, it seems like it wouldn't be very hard to add it
later, so I'm not too stressed about it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: refactoring basebackup.c

От
Sumanta Mukherjee
Дата:
Hi Robert,
Please see my comments inline below.

On Tue, May 12, 2020 at 12:33 AM Robert Haas <robertmhaas@gmail.com> wrote:
Yeah, that needs to be tested. Right now the chunk size is 32kB but it
might be a good idea to go larger. Another thing is that right now the
chunk size is tied to the protocol message size, and I'm not sure
whether the size that's optimal for disk reads is also optimal for
protocol messages.

One needs a balance between the number of packets to be sent across the network and so if the size 
of reading from the disk and the network packet size could be unified then it might provide a better optimization.
 

I don't think it's a particularly bad thing that we include a small
amount of progress for sending an empty file, a directory, or a
symlink. That could make the results more meaningful if you have a
database with lots of empty relations in it. However, I agree that the
effect of compression shouldn't be included. To get there, I think we
need to redesign the wire protocol. Right now, the server has no way
of letting the client know how many uncompressed bytes it's sent, and
the client has no way of figuring it out without uncompressing, which
seems like something we want to avoid.


  I agree here too, except that if we have too many tar files one might cringe 
  but sending the xtra amt from these tar files looks okay to me.
 
There are some other problems with the current wire protocol, too:

1. The syntax for the BASE_BACKUP command is large and unwieldy. We
really ought to adopt an extensible options syntax, like COPY, VACUUM,
EXPLAIN, etc. do, rather than using a zillion ad-hoc bolt-ons, each
with bespoke lexer and parser support.

2. The client is sent a list of tablespaces and is supposed to use
that to expect an equal number of archives, computing the name for
each one on the client side from the tablespace info. However, I think
we should be able to support modes like "put all the tablespaces in a
single archive" or "send a separate archive for every 256GB" or "ship
it all to the cloud and don't send me any archives". To get there, I
think we should have the server send the archive name to the clients,
and the client should just keep receiving the next archive until it's
told that there are no more. Then if there's one archive or ten
archives or no archives, the client doesn't have to care. It just
receives what the server sends until it hears that there are no more.
It also doesn't know how the server is naming the archives; the server
can, for example, adjust the archive names based on which compression
format is being chosen, without knowledge of the server's naming
conventions needing to exist on the client side.

  One thing to remember here could be that an optimization would need to be made between the number of options
  we provide and people coming back and saying which combinations do not work
  For example, if a user script has "put all the tablespaces in a single archive" and later on somebody makes some
  script changes to break it down at "256 GB" and there is a conflict then which one takes precedence needs to be chosen.
  When the number of options like this become very large this could lead to some complications.
  
I think we should keep support for the current BASE_BACKUP command but
either add a new variant using an extensible options, or else invent a
whole new command with a different name (BACKUP, SEND_BACKUP,
whatever) that takes extensible options. This command should send back
all the archives and the backup manifest using a single COPY stream
rather than multiple COPY streams. Within the COPY stream, we'll
invent a sub-protocol, e.g. based on the first letter of the message,
e.g.:

t = Tablespace boundary. No further message payload. Indicates, for
progress reporting purposes, that we are advancing to the next
tablespace.
f = Filename. The remainder of the message payload is the name of the
next file that will be transferred.
d = Data. The next four bytes contain the number of uncompressed bytes
covered by this message, for progress reporting purposes. The rest of
the message is payload, possibly compressed. Could be empty, if the
data is being shipped elsewhere, and these messages are only being
sent to update the client's notion of progress.
 
  Here I support this.
 
I thought about that a bit, too. There might be some way to unify that
by having some common context object that's defined by basebackup.c
and all archivers get it, so that they have some commonly-desired
details without needing bespoke code, but I'm not sure at this point
whether that will actually produce a nicer result. Even if we don't
have it initially, it seems like it wouldn't be very hard to add it
later, so I'm not too stressed about it.
 
--Sumanta Mukherjee
The Enterprise PostgreSQL Company


Re: refactoring basebackup.c

От
Dilip Kumar
Дата:
On Sat, May 9, 2020 at 2:23 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Hi,
>
> I'd like to propose a fairly major refactoring of the server's
> basebackup.c. The current code isn't horrific or anything, but the
> base backup mechanism has grown quite a few features over the years
> and all of the code knows about all of the features. This is going to
> make it progressively more difficult to add additional features, and I
> have a few in mind that I'd like to add, as discussed below and also
> on several other recent threads.[1][2] The attached patch set shows
> what I have in mind. It needs more work, but I believe that there's
> enough here for someone to review the overall direction, and even some
> of the specifics, and hopefully give me some useful feedback.
>
> This patch set is built around the idea of creating two new
> abstractions, a base backup sink -- or bbsink -- and a base backup
> archiver -- or bbarchiver.  Each of these works like a foreign data
> wrapper or custom scan or TupleTableSlot. That is, there's a table of
> function pointers that act like method callbacks. Every implementation
> can allocate a struct of sufficient size for its own bookkeeping data,
> and the first member of the struct is always the same, and basically
> holds the data that all implementations must store, including a
> pointer to the table of function pointers. If we were using C++,
> bbarchiver and bbsink would be abstract base classes.
>
> They represent closely-related concepts, so much so that I initially
> thought we could get by with just one new abstraction layer. I found
> on experimentation that this did not work well, so I split it up into
> two and that worked a lot better. The distinction is this: a bbsink is
> something to which you can send a bunch of archives -- currently, each
> would be a tarfile -- and also a backup manifest. A bbarchiver is
> something to which you send every file in the data directory
> individually, or at least the ones that are getting backed up, plus
> any that are being injected into the backup (e.g. the backup_label).
> Commonly, a bbsink will do something with the data and then forward it
> to a subsequent bbsink, or a bbarchiver will do something with the
> data and then forward it to a subsequent bbarchiver or bbsink. For
> example, there's a bbarchiver_tar object which, like any bbarchiver,
> sees all the files and their contents as input. The output is a
> tarfile, which gets send to a bbsink. As things stand in the patch set
> now, the tar archives are ultimately sent to the "libpq" bbsink, which
> sends them to the client.
>
> In the future, we could have other bbarchivers. For example, we could
> add "pax", "zip", or "cpio" bbarchiver which produces archives of that
> format, and any given backup could choose which one to use. Or, we
> could have a bbarchiver that runs each individual file through a
> compression algorithm and then forwards the resulting data to a
> subsequent bbarchiver. That would make it easy to produce a tarfile of
> individually compressed files, which is one possible way of creating a
> seekable achive.[3] Likewise, we could have other bbsinks. For
> example, we could have a "localdisk" bbsink that cause the server to
> write the backup somewhere in the local filesystem instead of
> streaming it out over libpq. Or, we could have an "s3" bbsink that
> writes the archives to S3. We could also have bbsinks that compresses
> the input archives using some compressor (e.g. lz4, zstd, bzip2, ...)
> and forward the resulting compressed archives to the next bbsink in
> the chain. I'm not trying to pass judgement on whether any of these
> particular things are things we want to do, nor am I saying that this
> patch set solves all the problems with doing them. However, I believe
> it will make such things a whole lot easier to implement, because all
> of the knowledge about whatever new functionality is being added is
> centralized in one place, rather than being spread across the entirety
> of basebackup.c. As an example of this, look at how 0010 changes
> basebackup.c and basebackup_tar.c: afterwards, basebackup.c no longer
> knows anything that is tar-specific, whereas right now it knows about
> tar-specific things in many places.
>
> Here's an overview of this patch set:
>
> 0001-0003 are cleanup patches that I have posted for review on
> separate threads.[4][5] They are included here to make it easy to
> apply this whole series if someone wishes to do so.
>
> 0004 is a minor refactoring that reduces by 1 the number of functions
> in basebackup.c that know about the specifics of tarfiles. It is just
> a preparatory patch and probably not very interesting.
>
> 0005 invents the bbsink abstraction.
>
> 0006 creates basebackup_libpq.c and moves all code that knows about
> the details of sending archives via libpq there. The functionality is
> exposed for use by basebackup.c as a new type of bbsink, bbsink_libpq.
>
> 0007 creates basebackup_throttle.c and moves all code that knows about
> throttling backups there. The functionality is exposed for use by
> basebackup.c as a new type of bbsink, bbsink_throttle. This means that
> the throttling logic could be reused to throttle output to any final
> destination. Essentially, this is a bbsink that just passes everything
> it gets through to the next bbsink, but with a rate limit. If
> throttling's not enabled, no bbsink_throttle object is created, so all
> of the throttling code is completely out of the execution pipeline.
>
> 0008 creates basebackup_progress.c and moves all code that knows about
> progress reporting there. The functionality is exposed for use by
> basebackup.c as a new type of bbsink, bbsink_progress. Since the
> abstraction doesn't fit perfectly in this case, some extra functions
> are added to work around the problem. This is not entirely elegant,
> but I don't think it's still an improvement over what we have now, and
> I don't have a better idea.
>
> 0009 invents the bbarchiver abstraction.
>
> 0010 invents two new bbarchivers, a tar bbarchiver and a tarsize
> bbarchiver, and refactors basebackup.c to make use of them. The tar
> bbarchiver puts the files it sees into tar archives and forwards the
> resulting archives to a bbsink. The tarsize bbarchiver is used to
> support the PROGRESS option to the BASE_BACKUP command. It just
> estimates the size of the backup by summing up the file sizes without
> reading them. This approach is good for a couple of reasons. First,
> without something like this, it's impossible to keep basebackup.c from
> knowing something about the tar format, because the PROGRESS option
> doesn't just figure out how big the files to be backed up are: it
> figures out how big it thinks the archives will be, and that involves
> tar-specific considerations. This area needs more work, as the whole
> idea of measuring progress by estimating the archive size is going to
> break down as soon as server-side compression is in the picture.
> Second, this makes the code path that we use for figuring out the
> backup size details much more similar to the path we use for
> performing the actual backup. For instance, with this patch, we
> include the exact same files in the calculation that we will include
> in the backup, and in the same order, something that's not true today.
> The basebackup_tar.c file added by this patch is sadly lacking in
> comments, which I will add in a future version of the patch set. I
> think, though, that it will not be too unclear what's going on here.
>
> 0011 invents another new kind of bbarchiver. This bbarchiver just
> eavesdrops on the stream of files to facilitate backup manifest
> construction, and then forwards everything through to a subsequent
> bbarchiver. Like bbsink_throttle, it can be entirely omitted if not
> used. This patch is a bit clunky at the moment and needs some polish,
> but it is another demonstration of how these abstractions can be used
> to simplify basebackup.c, so that basebackup.c only has to worry about
> determining what should be backed up and not have to worry much about
> all the specific things that need to be done as part of that.
>
> Although this patch set adds quite a bit of code on net, it makes
> basebackup.c considerably smaller and simpler, removing more than 400
> lines of code from that file, about 20% of the current total. There
> are some gratifying changes vs. the status quo. For example, in
> master, we have this:
>
> sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
>                 bool sendtblspclinks, backup_manifest_info *manifest,
>                 const char *spcoid)
>
> Notably, the sizeonly flag makes the function not do what the name of
> the function suggests that it does. Also, we've got to pass some extra
> fields through to enable specific features. With the patch set, the
> equivalent function looks like this:
>
> archive_directory(bbarchiver *archiver, const char *path, int basepathlen,
>                                   List *tablespaces, bool sendtblspclinks)
>
> The question "what should I do with the directories and files we find
> as we recurse?" is now answered by the choice of which bbarchiver to
> pass to the function, rather than by the values of sizeonly, manifest,
> and spcoid. That's not night and day, but I think it's better,
> especially as you imagine adding more features in the future. The
> really important part, for me, is that you can make the bbarchiver do
> anything you like without needing to make any more changes to this
> function. It just arranges to invoke your callbacks. You take it from
> there.
>
> One pretty major question that this patch set doesn't address is what
> the user interface for any of the hypothetical features mentioned
> above ought to look like, or how basebackup.c ought to support them.
> The syntax for the BASE_BACKUP command, like the contents of
> basebackup.c, has grown organically, and doesn't seem to be very
> scalable. Also, the wire protocol - a series of CopyData results which
> the client is entirely responsible for knowing how to interpret and
> about which the server provides only minimal information - doesn't
> much lend itself to extensibility. Some careful design work is likely
> needed in both areas, and this patch does not try to do any of it. I
> am quite interested in discussing those questions, but I felt that
> they weren't the most important problems to solve first.
>
> What do you all think?

The overall idea looks quite nice.  I had a look at some of the
patches at least 0005 and 0006.  At first look, I have one comment.

+/*
+ * Each archive is set as a separate stream of COPY data, and thus begins
+ * with a CopyOutResponse message.
+ */
+static void
+bbsink_libpq_begin_archive(bbsink *sink, const char *archive_name)
+{
+ SendCopyOutResponse();
+}

Some of the bbsink_libpq_* functions are directly calling pq_* e.g.
bbsink_libpq_begin_backup whereas others are calling SendCopy*
functions and therein those are calling pq_* functions.  I think
bbsink_libpq_* function can directly call pq_* functions instead of
adding one more level of the function call.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, May 12, 2020 at 4:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> Some of the bbsink_libpq_* functions are directly calling pq_* e.g.
> bbsink_libpq_begin_backup whereas others are calling SendCopy*
> functions and therein those are calling pq_* functions.  I think
> bbsink_libpq_* function can directly call pq_* functions instead of
> adding one more level of the function call.

I think all the helper functions have more than one caller, though.
That's why I created them - to avoid duplicating code.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: refactoring basebackup.c

От
Dilip Kumar
Дата:
On Wed, May 13, 2020 at 1:56 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, May 12, 2020 at 4:32 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > Some of the bbsink_libpq_* functions are directly calling pq_* e.g.
> > bbsink_libpq_begin_backup whereas others are calling SendCopy*
> > functions and therein those are calling pq_* functions.  I think
> > bbsink_libpq_* function can directly call pq_* functions instead of
> > adding one more level of the function call.
>
> I think all the helper functions have more than one caller, though.
> That's why I created them - to avoid duplicating code.

You are right, somehow I missed that part.  Sorry for the noise.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Suraj Kharage
Дата:
Hi,

Did some performance testing by varying TAR_SEND_SIZE with Robert's refactor patch and without the patch to check the impact.

Below are the details:

Backup type: local backup using pg_basebackup
Data size: Around 200GB (200 tables - each table around 1.05 GB)
different TAR_SEND_SIZE values8kb, 32kb (default value), 128kB, 1MB (1024kB)

Server details:
RAM: 500 GB CPU details: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 128 Filesystem: ext4

8kb 32kb (default value)128kB1024kB
Without refactor patchreal 10m22.718s
user 1m23.629s
sys 8m51.410s
real 8m36.245s
user 1m8.471s
sys 7m21.520s
real 6m54.299s
user 0m55.690s
sys 5m46.502s
real 18m3.511s
user 1m38.197s
sys 9m36.517s
With refactor patch (Robert's patch)real 10m11.350s
user 1m25.038s
sys 8m39.226s
real 8m56.226s
user 1m9.774s
sys 7m41.032s
real 7m26.678s
user 0m54.833s
sys 6m20.057s
real 18m17.230s
user 1m42.749s
sys 9m53.704s

The above numbers are taken from the minimum of two runs of each scenario.

I can see, when we have TAR_SEND_SIZE as 32kb or 128kb, it is giving us a good performance whereas, for 1Mb it is taking 2.5x more time.

Please let me know your thoughts/suggestions on the same.

--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.

Re: refactoring basebackup.c

От
Sumanta Mukherjee
Дата:
Hi Suraj,

Two points I wanted to mention.

  1. The max rate at which the transfer is happening when the tar size is 128 Kb is at most .48 GB/sec. Is there a possibility to understand what is the buffer size which is being used. That could help us explain some part of the puzzle.
  2. Secondly the idea of taking just the min of two runs is a bit counter to the following. How do we justify the performance numbers and attribute that the differences is not related to noise. It might be better to do a few experiments for each of the kind and then try and fit a basic linear model and report the std deviation. "Order statistics"  where you get the min(X1, X2, ... , Xn) is generally a biased estimator.  A variance calculation of the biased statistics is a bit tricky and so the results could be corrupted by noise. 

With Regards,
Sumanta Mukherjee.


On Wed, May 13, 2020 at 9:31 AM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote:
Hi,

Did some performance testing by varying TAR_SEND_SIZE with Robert's refactor patch and without the patch to check the impact.

Below are the details:

Backup type: local backup using pg_basebackup
Data size: Around 200GB (200 tables - each table around 1.05 GB)
different TAR_SEND_SIZE values8kb, 32kb (default value), 128kB, 1MB (1024kB)

Server details:
RAM: 500 GB CPU details: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 128 Filesystem: ext4

8kb 32kb (default value)128kB1024kB
Without refactor patchreal 10m22.718s
user 1m23.629s
sys 8m51.410s
real 8m36.245s
user 1m8.471s
sys 7m21.520s
real 6m54.299s
user 0m55.690s
sys 5m46.502s
real 18m3.511s
user 1m38.197s
sys 9m36.517s
With refactor patch (Robert's patch)real 10m11.350s
user 1m25.038s
sys 8m39.226s
real 8m56.226s
user 1m9.774s
sys 7m41.032s
real 7m26.678s
user 0m54.833s
sys 6m20.057s
real 18m17.230s
user 1m42.749s
sys 9m53.704s

The above numbers are taken from the minimum of two runs of each scenario.

I can see, when we have TAR_SEND_SIZE as 32kb or 128kb, it is giving us a good performance whereas, for 1Mb it is taking 2.5x more time.

Please let me know your thoughts/suggestions on the same.

--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Wed, May 13, 2020 at 12:01 AM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote:
8kb 32kb (default value)128kB1024kB
Without refactor patchreal 10m22.718s
user 1m23.629s
sys 8m51.410s
real 8m36.245s
user 1m8.471s
sys 7m21.520s
real 6m54.299s
user 0m55.690s
sys 5m46.502s
real 18m3.511s
user 1m38.197s
sys 9m36.517s
With refactor patch (Robert's patch)real 10m11.350s
user 1m25.038s
sys 8m39.226s
real 8m56.226s
user 1m9.774s
sys 7m41.032s
real 7m26.678s
user 0m54.833s
sys 6m20.057s
real 18m17.230s
user 1m42.749s
sys 9m53.704s

The above numbers are taken from the minimum of two runs of each scenario.

I can see, when we have TAR_SEND_SIZE as 32kb or 128kb, it is giving us a good performance whereas, for 1Mb it is taking 2.5x more time.

Please let me know your thoughts/suggestions on the same.

So the patch came out slightly faster at 8kB and slightly slower in the other tests. That's kinda strange. I wonder if it's just noise. How much do the results vary run to run?

I would've expected (and I think Andres thought the same) that a smaller block size would be bad for the patch and a larger block size would be good, but that's not what these numbers show.

I wouldn't worry too much about the regression at 1MB. Probably what's happening there is that we're losing some concurrency - perhaps with smaller block sizes the OS can buffer the entire chunk in the pipe connecting pg_basebackup to the server and start on the next one, but when you go up to 1MB it doesn't fit and ends up spending a lot of time waiting for data to be read from the pipe. Wait event profiling might tell you more. Probably what this suggests is that you want the largest buffer size that doesn't cause you to overrun the network/pipe buffer and no larger. Unfortunately, I have no idea how we'd figure that out dynamically, and I don't see a reason to believe that everyone will have the same size buffers.
 
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Re: refactoring basebackup.c

От
Suraj Kharage
Дата:
Hi,

On Wed, May 13, 2020 at 7:49 PM Robert Haas <robertmhaas@gmail.com> wrote:

So the patch came out slightly faster at 8kB and slightly slower in the other tests. That's kinda strange. I wonder if it's just noise. How much do the results vary run to run?
It is not varying much except for 8kB run. Please see below details for both runs of each scenario.

8kb 32kb (default value)128kB1024kB
WIthout refactor
patch
1st runreal 10m50.924s
user 1m29.774s
sys 9m13.058s
real 8m36.245s
user 1m8.471s
sys 7m21.520s
real 7m8.690s
user 0m54.840s
sys 6m1.725s
real 18m16.898s
user 1m39.105s
sys 9m42.803s
2nd runreal 10m22.718s
user 1m23.629s
sys 8m51.410s
real 8m44.455s
user 1m7.896s
sys 7m28.909s
real 6m54.299s
user 0m55.690s
sys 5m46.502s
real 18m3.511s
user 1m38.197s
sys 9m36.517s
WIth refactor
patch
1st runreal 10m11.350s
user 1m25.038s
sys 8m39.226s
real 8m56.226s
user 1m9.774s
sys 7m41.032s
real 7m26.678s
user 0m54.833s
sys 6m20.057s
real 19m5.218s
user 1m44.122s
sys 10m17.623s
2nd runreal 11m30.500s
user 1m45.221s
sys 9m37.815s
real 9m4.103s
user 1m6.893s
sys 7m49.393s
real 7m26.713s
user 0m54.868s
sys 6m19.652s
real 18m17.230s
user 1m42.749s
sys 9m53.704s
 

--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.

Re: refactoring basebackup.c

От
Dipesh Pandit
Дата:
Hi,

I have repeated the experiment with 8K block size and found that the results are not varying much after applying the patch.
Please find the details below.

Backup type: local backup using pg_basebackup
Data size: Around 200GB (200 tables - each table around 1.05 GB)
TAR_SEND_SIZE value8kb

Server details:
RAM: 500 GB CPU details: Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 128 Filesystem: ext4

Results:

IterationWIthout refactor
patch
WIth refactor
patch
1st runreal 10m19.001s
user 1m37.895s
sys 8m33.008s
real 9m45.291s
user 1m23.192s
sys 8m14.993s
2nd runreal 9m33.970s
user 1m19.490s
sys 8m6.062s
real 9m30.560s
user 1m22.124s
sys 8m0.979s
3rd runreal 9m19.327s
user 1m21.772s
sys 7m50.613s
real 8m59.241s
user 1m19.001s
sys 7m32.645s
4th runreal 9m56.873s
user 1m22.370s
sys 8m27.054s
real 9m52.290s
user 1m22.175s
sys 8m23.052s
5th runreal 9m45.343s
user 1m23.113s
sys 8m15.418s
real 9m49.633s
user 1m23.122s
sys 8m19.240s

Later I connected with Suraj to validate the experiment details and found that the setup and steps followed are exactly the same in this
experiment when compared with the previous experiment.

Thanks,
Dipesh

On Thu, May 14, 2020 at 7:50 AM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote:
Hi,

On Wed, May 13, 2020 at 7:49 PM Robert Haas <robertmhaas@gmail.com> wrote:

So the patch came out slightly faster at 8kB and slightly slower in the other tests. That's kinda strange. I wonder if it's just noise. How much do the results vary run to run?
It is not varying much except for 8kB run. Please see below details for both runs of each scenario.

8kb 32kb (default value)128kB1024kB
WIthout refactor
patch
1st runreal 10m50.924s
user 1m29.774s
sys 9m13.058s
real 8m36.245s
user 1m8.471s
sys 7m21.520s
real 7m8.690s
user 0m54.840s
sys 6m1.725s
real 18m16.898s
user 1m39.105s
sys 9m42.803s
2nd runreal 10m22.718s
user 1m23.629s
sys 8m51.410s
real 8m44.455s
user 1m7.896s
sys 7m28.909s
real 6m54.299s
user 0m55.690s
sys 5m46.502s
real 18m3.511s
user 1m38.197s
sys 9m36.517s
WIth refactor
patch
1st runreal 10m11.350s
user 1m25.038s
sys 8m39.226s
real 8m56.226s
user 1m9.774s
sys 7m41.032s
real 7m26.678s
user 0m54.833s
sys 6m20.057s
real 19m5.218s
user 1m44.122s
sys 10m17.623s
2nd runreal 11m30.500s
user 1m45.221s
sys 9m37.815s
real 9m4.103s
user 1m6.893s
sys 7m49.393s
real 7m26.713s
user 0m54.868s
sys 6m19.652s
real 18m17.230s
user 1m42.749s
sys 9m53.704s
 

--
--

Thanks & Regards, 
Suraj kharage, 
EnterpriseDB Corporation, 
The Postgres Database Company.

Re: refactoring basebackup.c

От
Suraj Kharage
Дата:

On Tue, Jun 30, 2020 at 10:45 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
Hi,

I have repeated the experiment with 8K block size and found that the results are not varying much after applying the patch.
Please find the details below.


Later I connected with Suraj to validate the experiment details and found that the setup and steps followed are exactly the same in this
experiment when compared with the previous experiment.


Thanks Dipesh.
It looks like the results are not varying much with your run as you followed the same steps.
One of my run with 8kb which took more time than others might be because of noise at that time.

--
--

Thanks & Regards, 
Suraj kharage, 

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Fri, May 8, 2020 at 4:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
> So it might be good if I'd remembered to attach the patches. Let's try
> that again.

Here's an updated patch set. This is now rebased over master and
includes as 0001 the patch I posted separately at
http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
but drops some other patches that were committed meanwhile. 0002-0009
of this series are basically the same as 0004-0011 from the previous
series, except for rebasing and fixing a bug I discovered in what's
now 0006. 0012 does a refactoring of pg_basebackup along similar lines
to the server-side refactoring from patches earlier in the series.
0012 is a really terrible, hacky, awful demonstration of how this
infrastructure can support server-side compression. If you apply it
and take a tar-format backup without -R, you will get .tar files that
are actually .tar.gz files. You can rename them, decompress them, and
use pg_verifybackup to check that everything is OK. If you try to do
anything else with 0012 applied, everything will break.

In the process of working on this, I learned a lot about how
pg_basebackup actually works, and found out about a number of things
that, with the benefit of hindsight, seem like they might not have
been the best way to go.

1. pg_basebackup -R injects recovery.conf (on older versions) or
injects standby.signal and appends to postgresql.auto.conf (on newer
versions) by parsing the tar file sent by the server and editing it on
the fly. From the point of view of server-side compression, this is
not ideal, because if you want to make these kinds of changes when
server-side compression is in use, you'd have to decompress the stream
on the client side in order to figure out where in the steam you ought
to inject your changes. But having to do that is a major expense. If
the client instead told the server what to change when generating the
archive, and the server did it, this expense could be avoided. It
would have the additional advantage that the backup manifest could
reflect the effects of those changes; right now it doesn't, and
pg_verifybackup just knows to expect differences in those files.

2. According to the comments, some tar programs require two tar blocks
(i.e. 512-byte blocks) of zero bytes at the end of an archive. The
server does not generate these blocks of zero bytes, so it basically
creates a tar file that works fine with my copy of tar but might break
with somebody else's. Instead, the client appends 1024 zero bytes to
the end of every file it receives from the server. That is an odd way
of fixing this problem, and it makes things rather inflexible. If the
server sends you any kind of a file OTHER THAN a tar file with the
last 1024 zero bytes stripped off, then adding 1024 zero bytes will be
the wrong thing to do. It would be better if the server just generated
fully correct tar files (whatever we think that means) and the client
wrote out exactly what it got from the server. Then, we could have the
server generate cpio archives or zip files or gzip-compressed tar
files or lz4-compressed tar files or anything we like, and the client
wouldn't really need to care as long as it didn't need to extract
those archives. That seems a lot cleaner.

3. The way that progress reporting works relies on the server knowing
exactly how large the archive sent to the client is going to be.
Progress as reckoned by the client is equal to the number of archive
payload bytes the client has received. This works OK with a tar
because we know how big the tar file is going to be based on the size
of the input files we intend to send, but it's unsuitable for any sort
of compressed archive (tar.gz, zip, whatever) because the compression
ratio cannot be predicted in advance. It would be better if the server
sent the payload bytes (possibly compressed) interleaved with progress
indicators, so that the client could correctly indicate that, say, the
backup is 30% complete because 30GB of 100GB has been processed on the
server side, even though the amount of data actually received by the
client might be 25GB or 20GB or 10GB or whatever because it got
compressed before transmission.

4. A related consideration is that we might want to have an option to
do something with the backup other than send it to the client. For
example, it might be useful to have an option for pg_basebackup to
tell the server to write the backup files to some specified server
directory, or to, say, S3. There are security concerns there, and I'm
not proposing to do anything about this immediately, but it seems like
something we might eventually want to have. In such a case, we are not
going to send any payload to the client, but the client probably still
wants progress indicators, so the current system of coupling progress
to the number of bytes received by the client breaks down for that
reason also.

5. As things stand today, the client must know exactly how many
archives it should expect to receive from the server and what each one
is. It can do that, because it knows to expect one archive per
tablespace, and the archive must be an uncompressed tarfile, so there
is no ambiguity. But, if the server could send archives to other
places, or send other kinds of archives to the client, then this would
become more complex. There is no intrinsic reason why the logic on the
client side can't simply be made more complicated in order to cope,
but it doesn't seem like great design, because then every time you
enhance the server, you've also got to enhance the client, and that
limits cross-version compatibility, and also seems more fragile. I
would rather that the server advertise the number of archives and the
names of each archive to the client explicitly, allowing the client to
be dumb unless it needs to post-process (e.g. extract) those archives.

Putting all of the above together, what I propose - but have not yet
tried to implement - is a new COPY sub-protocol for taking base
backups. Instead of sending a COPY stream per archive, the server
would send a single COPY stream where the first byte of each message
is a type indicator, like we do with the replication sub-protocol
today. For example, if the first byte is 'a' that could indicate that
we're beginning a new archive and the rest of the message would
indicate the archive name and perhaps some flags or options. If the
first byte is 'p' that could indicate that we're sending archive
payload, perhaps with the first four bytes of the message being
progress, i.e. the number of newly-processed bytes on the server side
prior to any compression, and the remaining bytes being payload. On
receipt of such a message, the client would increment the progress
indicator by the value indicated in those first four bytes, and then
process the remaining bytes by writing them to a file or whatever
behavior the user selected via -Fp, -Ft, -Z, etc. To be clear, I'm not
saying that this specific thing is the right thing, just something of
this sort. The server would need to continue supporting the current
multi-copy protocol for compatibility with older pg_basebackup
versions, and pg_basebackup would need to continue to support it for
compatibility with older server versions, but we could use the new
approach going forward. (Or, we could break compatibility, but that
would probably be unpopular and seems unnecessary and even risky to me
at this point.)

The ideas in the previous paragraph would address #3-#5 directly, but
they also indirectly address #2 because while we're switching
protocols we could easily move the padding with zero bytes to the
server side, and I think we should. #1 is a bit of a separate
consideration. To tackle #1 along the lines proposed above, the client
needs a way to send the recovery.conf contents to the server so that
the server can inject them into the tar file. It's not exactly clear
to me what the best way of permitting this is, and maybe there's a
totally different approach that would be altogether better. One thing
to consider is that we might well want the client to be able to send
*multiple* chunks of data to the server at the start of a backup. For
instance, suppose we want to support incremental backups. I think the
right approach is for the client to send the backup_manifest file from
the previous full backup to the server. What exactly the server does
with it afterward depends on your preferred approach, but the
necessary information is there. Maybe incremental backup is based on
comparing cryptographic checksums, so the server looks at all the
files and sends to the client those where the checksum (hopefully
SHA-something!) does not match. I wouldn't favor this approach myself,
but I know some people like it. Or maybe it's based on finding blocks
modified since the LSN of the previous backup; the manifest has enough
information for that to work, too. In such an approach, there can be
altogether new files with old LSNs, because files can be flat-copied
without changing block LSNs, so it's important to have the complete
list of files from the previous backup, and that too is in the
manifest. There are even timestamps for the bold among you. Anyway, my
point is to advocate for a design where the client says (1) I want a
backup with these options and then (2) here are 0, 1, or >1 files
(recovery parameters and/or backup manifest and/or other things) in
support of that and then the server hands back a stream of archives
which the client may or may not choose to post-process.

It's tempting to think about solving this problem by appealing to
CopyBoth, but I think that might be the wrong idea. The reason we use
CopyBoth for the replication subprotocol is because there's periodic
messages flowing in both directions that are only loosely coupled to
each other. Apart from reading frequently enough to avoid a deadlock
because both sides have full write buffers, each end of the connection
can kind of do whatever it wants. But for the kinds of use cases I'm
talking about here, that's not so. First the client talks and the
server acknowledges, then the reverse. That doesn't mean we couldn't
use CopyBoth, but maybe a CopyIn followed by a CopyOut would be more
straightforward. I am not sure of the details here and am happy to
hear the ideas of others.

One final thought is that the options framework for pg_basebackup is a
little unfortunate. As of today, what the client receives, always, is
a series of tar files. If you say -Fp, it doesn't change the backup
format; it just extracts the tar files. If you say -Ft, it doesn't. If
you say -Ft but also -Z, it compresses the tar files. Thinking just
about server-side compression and ignoring for the moment more remote
features like alternate archive formats (e.g. zip) or things like
storing the backup to an alternate location rather than returning it
to the client, you probably want the client to be able to specify at
least (1) server-side compression (perhaps with one of several
algorithms) and the client just writes the results, (2) server-side
compression (still with a choice of algorithm) and the client
decompresses but does not extract, (3) server-side compression (still
with a choice of algorithms) and the client decompresses and extracts,
(4) client-side compression (with a choice of algorithms), and (5)
client-side extraction. You might also want (6) server-side
compression (with a choice of algorithms) and client-side decompresses
and then re-compresses with a different algorithm (e.g. lz4 on the
server to save bandwidth at moderate CPU cost into parallel bzip2 on
the client for minimum archival storage). Or, as also discussed
upthread, you might want (7) server-side compression of each file
individually, so that you get a seekable archive of individually
compressed files (e.g. to support fast delta restore). I think that
with these refactoring patches - and the wire protocol redesign
mentioned above - it is very reasonable to offer as many of these
options as we believe users will find useful, but it is not very clear
how to extend the current command-line option framework to support
them. It probably would have been better if pg_basebackup, instead of
having -Fp and -Ft, just had an --extract option that you could either
specify or omit, because that would not have presumed anything about
the archive format, but the existing structure is well-baked at this
point.

Thanks,

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Вложения

Re: refactoring basebackup.c

От
Andres Freund
Дата:
Hi,

On 2020-07-29 11:31:26 -0400, Robert Haas wrote:
> Here's an updated patch set. This is now rebased over master and
> includes as 0001 the patch I posted separately at
> http://postgr.es/m/CA+TgmobAczXDRO_Gr2euo_TxgzaH1JxbNxvFx=HYvBinefNH8Q@mail.gmail.com
> but drops some other patches that were committed meanwhile. 0002-0009
> of this series are basically the same as 0004-0011 from the previous
> series, except for rebasing and fixing a bug I discovered in what's
> now 0006. 0012 does a refactoring of pg_basebackup along similar lines
> to the server-side refactoring from patches earlier in the series.

Have you tested whether this still works against older servers? Or do
you think we should not have that as a goal?


> 1. pg_basebackup -R injects recovery.conf (on older versions) or
> injects standby.signal and appends to postgresql.auto.conf (on newer
> versions) by parsing the tar file sent by the server and editing it on
> the fly. From the point of view of server-side compression, this is
> not ideal, because if you want to make these kinds of changes when
> server-side compression is in use, you'd have to decompress the stream
> on the client side in order to figure out where in the steam you ought
> to inject your changes. But having to do that is a major expense. If
> the client instead told the server what to change when generating the
> archive, and the server did it, this expense could be avoided. It
> would have the additional advantage that the backup manifest could
> reflect the effects of those changes; right now it doesn't, and
> pg_verifybackup just knows to expect differences in those files.

Hm. I don't think I terribly like the idea of things like -R having to
be processed server side. That'll be awfully annoying to keep working
across versions, for one. But perhaps the config file should just not be
in the main tar file going forward?

I think we should eventually be able to use one archive for multiple
purposes, e.g. to set up a standby as well as using it for a base
backup. Or multiple standbys with different tablespace remappings.


> 2. According to the comments, some tar programs require two tar blocks
> (i.e. 512-byte blocks) of zero bytes at the end of an archive. The
> server does not generate these blocks of zero bytes, so it basically
> creates a tar file that works fine with my copy of tar but might break
> with somebody else's. Instead, the client appends 1024 zero bytes to
> the end of every file it receives from the server. That is an odd way
> of fixing this problem, and it makes things rather inflexible. If the
> server sends you any kind of a file OTHER THAN a tar file with the
> last 1024 zero bytes stripped off, then adding 1024 zero bytes will be
> the wrong thing to do. It would be better if the server just generated
> fully correct tar files (whatever we think that means) and the client
> wrote out exactly what it got from the server. Then, we could have the
> server generate cpio archives or zip files or gzip-compressed tar
> files or lz4-compressed tar files or anything we like, and the client
> wouldn't really need to care as long as it didn't need to extract
> those archives. That seems a lot cleaner.

Yea.


> 5. As things stand today, the client must know exactly how many
> archives it should expect to receive from the server and what each one
> is. It can do that, because it knows to expect one archive per
> tablespace, and the archive must be an uncompressed tarfile, so there
> is no ambiguity. But, if the server could send archives to other
> places, or send other kinds of archives to the client, then this would
> become more complex. There is no intrinsic reason why the logic on the
> client side can't simply be made more complicated in order to cope,
> but it doesn't seem like great design, because then every time you
> enhance the server, you've also got to enhance the client, and that
> limits cross-version compatibility, and also seems more fragile. I
> would rather that the server advertise the number of archives and the
> names of each archive to the client explicitly, allowing the client to
> be dumb unless it needs to post-process (e.g. extract) those archives.

ISTM that that can help to some degree, but things like tablespace
remapping etc IMO aren't best done server side, so I think the client
will continue to need to know about the contents to a significnat
degree?


> Putting all of the above together, what I propose - but have not yet
> tried to implement - is a new COPY sub-protocol for taking base
> backups. Instead of sending a COPY stream per archive, the server
> would send a single COPY stream where the first byte of each message
> is a type indicator, like we do with the replication sub-protocol
> today. For example, if the first byte is 'a' that could indicate that
> we're beginning a new archive and the rest of the message would
> indicate the archive name and perhaps some flags or options. If the
> first byte is 'p' that could indicate that we're sending archive
> payload, perhaps with the first four bytes of the message being
> progress, i.e. the number of newly-processed bytes on the server side
> prior to any compression, and the remaining bytes being payload. On
> receipt of such a message, the client would increment the progress
> indicator by the value indicated in those first four bytes, and then
> process the remaining bytes by writing them to a file or whatever
> behavior the user selected via -Fp, -Ft, -Z, etc.

Wonder if there's a way to get this to be less stateful. It seems a bit
ugly that the client would know what the last 'a' was for a 'p'? Perhaps
we could actually make 'a' include an identifier for each archive, and
then 'p' would append to a specific archive? Which would then also would
allow for concurrent processing of those archives on the server side.

I'd personally rather have a separate message type for progress and
payload. Seems odd to have to send payload messages with 0 payload just
because we want to update progress (in case of uploading to
e.g. S3). And I think it'd be nice if we could have a more extensible
progress measurement approach than a fixed length prefix. E.g. it might
be nice to allow it to report both the overall progress, as well as a
per archive progress. Or we might want to send progress when uploading
to S3, even when not having pre-calculated the total size of the data
directory.


Greetings,

Andres Freund



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Fri, Jul 31, 2020 at 12:49 PM Andres Freund <andres@anarazel.de> wrote:
> Have you tested whether this still works against older servers? Or do
> you think we should not have that as a goal?

I haven't tested that recently but I intended to keep it working. I'll
make sure to nail that down before I get to the point of committing
anything, but I don't expect big problems. It's kind of annoying to
have so much backward compatibility stuff here but I think ripping any
of that out should wait for another time.

> Hm. I don't think I terribly like the idea of things like -R having to
> be processed server side. That'll be awfully annoying to keep working
> across versions, for one. But perhaps the config file should just not be
> in the main tar file going forward?

That'd be a user-visible change, though, whereas what I'm proposing
isn't. Instead of directly injecting stuff, the client can just send
it to the server and have the server inject it, provided the server is
new enough. Cross-version issues don't seem to be any worse than now.
That being said, I don't love it, either. We could just suggest to
people that using -R together with server compression is

> I think we should eventually be able to use one archive for multiple
> purposes, e.g. to set up a standby as well as using it for a base
> backup. Or multiple standbys with different tablespace remappings.

I don't think I understand your point here.

> ISTM that that can help to some degree, but things like tablespace
> remapping etc IMO aren't best done server side, so I think the client
> will continue to need to know about the contents to a significnat
> degree?

If I'm not mistaken, those mappings are only applied with -Fp i.e. if
we're extracting. And it's no problem to jigger things in that case;
we can only do this if we understand the archive in the first place.
The problem is when you have to decompress and recompress to jigger
things.

> Wonder if there's a way to get this to be less stateful. It seems a bit
> ugly that the client would know what the last 'a' was for a 'p'? Perhaps
> we could actually make 'a' include an identifier for each archive, and
> then 'p' would append to a specific archive? Which would then also would
> allow for concurrent processing of those archives on the server side.

...says the guy working on asynchronous I/O. I don't know, it's not a
bad idea, but I think we'd have to change a LOT of code to make it
actually do something useful. I feel like this could be added as a
later extension of the protocol, rather than being something that we
necessarily need to do now.

> I'd personally rather have a separate message type for progress and
> payload. Seems odd to have to send payload messages with 0 payload just
> because we want to update progress (in case of uploading to
> e.g. S3). And I think it'd be nice if we could have a more extensible
> progress measurement approach than a fixed length prefix. E.g. it might
> be nice to allow it to report both the overall progress, as well as a
> per archive progress. Or we might want to send progress when uploading
> to S3, even when not having pre-calculated the total size of the data
> directory.

I don't mind a separate message type here, but if you want merging of
short messages with adjacent longer messages to generate a minimal
number of system calls, that might have some implications for the
other thread where we're talking about how to avoid extra memory
copies when generating protocol messages. If you don't mind them going
out as separate network packets, then it doesn't matter.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: refactoring basebackup.c

От
Mark Dilger
Дата:

> On Jul 29, 2020, at 8:31 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Fri, May 8, 2020 at 4:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
>> So it might be good if I'd remembered to attach the patches. Let's try
>> that again.
>
> Here's an updated patch set.

Hi Robert,

v2-0001 through v2-0009 still apply cleanly, but v2-0010 no longer applies.  It seems to be conflicting with Heikki's
workfrom August.  Could you rebase please? 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Wed, Oct 21, 2020 at 12:14 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> v2-0001 through v2-0009 still apply cleanly, but v2-0010 no longer applies.  It seems to be conflicting with Heikki's
workfrom August.  Could you rebase please?
 

Here at last is a new version. I've dropped the "bbarchiver" patch for
now, added a new patch that I'll talk about below, and revised the
others. I'm pretty happy with the code now, so I guess the main things
that I'd like feedback on are (1) whether design changes seem to be
needed and (2) the UI. Once we have that stuff hammered out, I'll work
on adding documentation, which is missing at present. The interesting
patches in terms of functionality are 0006 and 0007; the rest is
preparatory refactoring.

0006 adds a concept of base backup "targets," which means that it lets
you send the base backup to someplace other than the client. You
specify the target using a new "-t" option to pg_basebackup. By way of
example, 0006 adds a "blackhole" target which throws the backup away
instead of sending it anywhere, and also a "server" target which
stores the backup to the server filesystem in lieu of streaming it to
the client. So you can say something like "pg_basebackup -Xnone -Ft -t
server:/backup/2021-07-08" and, provided that you're superuser, the
server will try to drop the backup there. At present, you can't use
-Fp or -Xfetch or -Xstream with a backup target, because that
functionality is implemented on the client side. I think that's an
acceptable restriction. Eventually I imagine we will want to have
targets like "aws" or "s3" or maybe some kind of plug-in system for
new targets. I haven't designed anything like that yet, but I think
it's probably not all that hard to generalize what I've got.

0007 adds server-side compression; currently, it only supports
server-side compression using gzip, but I hope that it won't be hard
to generalize that to support LZ4 as well, and Andres told me he
thinks we should aim to support zstd since that library has built-in
parallel compression which is very appealing in this context. So you
say something like "pg_basebackup -Ft --server-compression=gzip -D
/backup/2021-07-08" or, if you want that compressed backup stored on
the server and compressed as hard as possible, you could say
"pg_basebackup -Xnone -Ft --server-compression=gzip9 -t
server:/backup/2021-07-08". Unfortunately, here again there are a
number of features that are implemented on the client side, and they
don't work in combination with this. -Fp could be made to work by
teaching the client to decompress; I just haven't written the code to
do that. It's probably not very useful in general, but maybe there's a
use case if you're really tight on network bandwidth. Making -R work
looks outright useless, because the client would have to get the whole
compressed tarfile from the server and then uncompress it, edit the
tar file, and recompress. That seems like a thing no one can possibly
want. Also, if you say pg_basebackup -Ft -D- >whatever.tar, the server
injects the backup manifest into the tarfile, which if you used
--server-compression would require decompressing and recompressing the
whole thing, so it doesn't seem worth supporting. It's more likely to
be a footgun than to help anybody. This option can be used with
-Xstream or -Xfetch, but it doesn't compress pg_wal.tar, because
that's generated on the client side.

The thing I'm really unhappy with here is the -F option to
pg_basebackup, which presently allows only p for plain or t for tar.
For purposes of these patches, I've essentially treated this as if -Fp
means "I want the tar files the server sends to be extracted" and
"-Ft" as if it means "I'm happy with them the way they are." Under
that interpretation, it's fine for --server-compression to cause e.g.
base.tar.gz to be written, because that's what the server sent. But
it's not really a "tar" output format; it's a "tar.gz" output format.
However, it doesn't seem to make any sense to define -Fz to mean "i
want tar.gz output" because -Z or -z already produces tar.gz output
when used with -Ft, and also because it would be redundant to make
people specify both -Fz and --server-compression. Similarly, when you
use --target, the output format is arguably, well, nothing. I mean,
some tar files got stored to the target, but you don't have them, but
again it seems redundant to have people specify --target and then also
have to change the argument to -F. Hindsight being 20-20, I think we
would have been better off not having a -Ft or -Fp option at all, and
having an --extract option that says you want to extract what the
server sends you, but it's probably too late to make that change now.
Or maybe it isn't, and we should just break command-line argument
compatibility for v15. I don't know. Opinions appreciated, especially
if they are nuanced.

If you're curious about what the other patches in the series do,
here's a very fast recap; see commit messages for more. 0001 revises
the grammar for some replication commands to use an extensible-options
syntax. 0002 is a trivial refactoring of basebackup.c. 0003 and 0004
refactor the server's basebackup.c and the client's pg_basebackup.c,
respectively, by introducing abstractions called bbsink and
bbstreamer. 0005 introduces a new COPY sub-protocol for taking base
backups. I think it's worth mentioning that I believe that this
refactoring is quite powerful and could let us do a bunch of other
things that this patch set doesn't attempt. For instance, since this
makes it pretty easy to implement server-side compression, it could
probably also pretty easily be made to do server-side encryption, if
you're brave enough to want to have a discussion on pgsql-hackers
about how to design an encryption feature.

Thanks to my colleague Tushar Ahuja for helping test some of this code.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
tushar
Дата:
On 7/8/21 9:26 PM, Robert Haas wrote:
> Here at last is a new version.
Please refer this scenario ,where backup target using 
--server-compression is closing the server
unexpectedly if we don't provide -no-manifest option

[tushar@localhost bin]$ ./pg_basebackup --server-compression=gzip4  -t 
server:/tmp/data_1  -Xnone
NOTICE:  WAL archiving is not enabled; you must ensure that all required 
WAL segments are copied through other means to complete the backup
pg_basebackup: error: could not read COPY data: server closed the 
connection unexpectedly
     This probably means the server terminated abnormally
     before or while processing the request.

if we try to check with -Ft then this same scenario  is working ?

[tushar@localhost bin]$ ./pg_basebackup --server-compression=gzip4  -Ft 
-D data_0 -Xnone
NOTICE:  WAL archiving is not enabled; you must ensure that all required 
WAL segments are copied through other means to complete the backup
[tushar@localhost bin]$

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: refactoring basebackup.c

От
Dilip Kumar
Дата:
On Mon, Jul 12, 2021 at 5:51 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
>
> On 7/8/21 9:26 PM, Robert Haas wrote:
> > Here at last is a new version.
> Please refer this scenario ,where backup target using
> --server-compression is closing the server
> unexpectedly if we don't provide -no-manifest option
>
> [tushar@localhost bin]$ ./pg_basebackup --server-compression=gzip4  -t
> server:/tmp/data_1  -Xnone
> NOTICE:  WAL archiving is not enabled; you must ensure that all required
> WAL segments are copied through other means to complete the backup
> pg_basebackup: error: could not read COPY data: server closed the
> connection unexpectedly
>      This probably means the server terminated abnormally
>      before or while processing the request.
>

I think the problem is that bbsink_gzip_end_archive() is not
forwarding the end request to the next bbsink.  The attached patch so
fix it.


-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
tushar
Дата:
On 7/8/21 9:26 PM, Robert Haas wrote:
Here at last is a new version.
if i try to perform pg_basebackup using "-t server " option against localhost V/S remote machine ,
i can see difference in backup size.

data directory whose size is

[edb@centos7tushar bin]$ du -sch data/
578M    data/
578M    total

-h=localhost

[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/all_data2 -h localhost   -Xnone --no-manifest -P -v
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
NOTICE:  all required WAL segments have been archived                          
329595/329595 kB (100%), 1/1 tablespace                                        
pg_basebackup: base backup completed

[edb@centos7tushar bin]$ du -sch /tmp/all_data2
322M    /tmp/all_data2
322M    total
[edb@centos7tushar bin]$

-h=remote

[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/all_data2 -h <remote IP> -Xnone --no-manifest -P -v
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
NOTICE:  all required WAL segments have been archived                          
170437/170437 kB (100%), 1/1 tablespace                                        
pg_basebackup: base backup completed

[edb@0 bin]$ du -sch /tmp/all_data2
167M    /tmp/all_data2
167M    total
[edb@0 bin]$

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: refactoring basebackup.c

От
Dilip Kumar
Дата:
On Fri, Jul 16, 2021 at 12:43 PM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Mon, Jul 12, 2021 at 5:51 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
> >
> > On 7/8/21 9:26 PM, Robert Haas wrote:
> > > Here at last is a new version.
> > Please refer this scenario ,where backup target using
> > --server-compression is closing the server
> > unexpectedly if we don't provide -no-manifest option
> >
> > [tushar@localhost bin]$ ./pg_basebackup --server-compression=gzip4  -t
> > server:/tmp/data_1  -Xnone
> > NOTICE:  WAL archiving is not enabled; you must ensure that all required
> > WAL segments are copied through other means to complete the backup
> > pg_basebackup: error: could not read COPY data: server closed the
> > connection unexpectedly
> >      This probably means the server terminated abnormally
> >      before or while processing the request.
> >
>
> I think the problem is that bbsink_gzip_end_archive() is not
> forwarding the end request to the next bbsink.  The attached patch so
> fix it.

I was going through the patch, I think the refactoring made the base
backup code really clean and readable.  I have a few minor
suggestions.

v3-0003

1.
+    Assert(sink->bbs_next != NULL);
+    bbsink_begin_archive(sink->bbs_next, gz_archive_name);

I have noticed that the interface for forwarding the request to next
bbsink is not uniform, for example bbsink_gzip_begin_archive() is
calling bbsink_begin_archive(sink->bbs_next, gz_archive_name); for
forwarding the request to next bbsink where as
bbsink_progress_begin_backup() is calling
bbsink_forward_begin_backup(sink); I think it will be good if we keep
the usage uniform.

2.
I have noticed that bbbsink_copytblspc_* are not forwarding the
request to the next sink, thats probably because we assume this should
always be the last sink.  I agree that its true for this patch but the
commit message of the patch says that in future this might change, so
wouldn't it be good to keep the interface generic? I mean
bbsink_copytblspc_new(), should take the next sink as an input and the
caller can pass it as NULL.  And the other apis can also try to
forward the request if next is not NULL?

3.
It would make more sense to order the function in
basebackup_progress.c same as done in other file i.e
bbsink_progress_begin_backup, bbsink_progress_archive_contents and
then bbsink_progress_end_archive, and this will also be in sync with
function pointer declaration in bbsink_ops.

v3-0005-
4.
+ *
+ * 'copystream' sends a starts a single COPY OUT operation and transmits
+ * all the archives and the manifest if present during the course of that

typo 'copystream' sends a starts a single COPY OUT -->  'copystream'
sends a single COPY OUT


--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
tushar
Дата:
On 7/16/21 12:43 PM, Dilip Kumar wrote:
> I think the problem is that bbsink_gzip_end_archive() is not
> forwarding the end request to the next bbsink.  The attached patch so
> fix it.

Thanks Dilip. Reported issue seems to be fixed now with your patch

[edb@centos7tushar bin]$ ./pg_basebackup --server-compression=gzip4  -t 
server:/tmp/data_2 -v  -Xnone -R
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
NOTICE:  all required WAL segments have been archived
pg_basebackup: base backup completed
[edb@centos7tushar bin]$

OR

[edb@centos7tushar bin]$ ./pg_basebackup   -t server:/tmp/pv1 -Xnone   
--server-compression=gzip4 -r 1024  -P
NOTICE:  all required WAL segments have been archived
23133/23133 kB (100%), 1/1 tablespace
[edb@centos7tushar bin]$

Please refer this scenario ,where -R option is working with '-t server' 
but not with -Ft

--not working

[edb@centos7tushar bin]$ ./pg_basebackup --server-compression=gzip4  
-Ft  -D ccv   -Xnone  -R --no-manifest
pg_basebackup: error: unable to parse archive: base.tar.gz
pg_basebackup: only tar archives can be parsed
pg_basebackup: the -R option requires pg_basebackup to parse the archive
pg_basebackup: removing data directory "ccv"

--working

[edb@centos7tushar bin]$ ./pg_basebackup --server-compression=gzip4 -t   
server:/tmp/ccv    -Xnone  -R --no-manifest
NOTICE:  all required WAL segments have been archived
[edb@centos7tushar bin]$

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: refactoring basebackup.c

От
Dilip Kumar
Дата:
On Mon, Jul 19, 2021 at 6:02 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
>
> On 7/16/21 12:43 PM, Dilip Kumar wrote:
> > I think the problem is that bbsink_gzip_end_archive() is not
> > forwarding the end request to the next bbsink.  The attached patch so
> > fix it.
>
> Thanks Dilip. Reported issue seems to be fixed now with your patch

Thanks for the confirmation.

> Please refer this scenario ,where -R option is working with '-t server'
> but not with -Ft
>
> --not working
>
> [edb@centos7tushar bin]$ ./pg_basebackup --server-compression=gzip4
> -Ft  -D ccv   -Xnone  -R --no-manifest
> pg_basebackup: error: unable to parse archive: base.tar.gz
> pg_basebackup: only tar archives can be parsed
> pg_basebackup: the -R option requires pg_basebackup to parse the archive
> pg_basebackup: removing data directory "ccv"

As per the error message and code, if we are giving -R then we need to
inject recovery-conf file and that is only supported with tar format
but since you are enabling server compression which is no more .tar
format so it is giving an error.

> --working
>
> [edb@centos7tushar bin]$ ./pg_basebackup --server-compression=gzip4 -t
> server:/tmp/ccv    -Xnone  -R --no-manifest
> NOTICE:  all required WAL segments have been archived
> [edb@centos7tushar bin]$

I am not sure why this is working, from the code I could not find if
the backup target is server then are we doing anything with the -R
option or we are just silently ignoring it

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Mark Dilger
Дата:

> On Jul 8, 2021, at 8:56 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> The interesting
> patches in terms of functionality are 0006 and 0007;

The difficulty in v3-0007 with pg_basebackup only knowing how to parse tar archives seems to be a natural consequence
ofnot sufficiently abstracting out the handling of the tar format.  If the bbsink and bbstreamer abstractions fully
encapsulateda set of parsing callbacks, then pg_basebackup wouldn't contain things like: 

    streamer = bbstreamer_tar_parser_new(streamer);

but instead would use the parser callbacks without knowledge of whether they were parsing tar vs. cpio vs. whatever.
Itjust seems really odd that pg_basebackup is using the extensible abstraction layer and then defeating the purpose by
knowingtoo much about the format.  It might even be a useful exercise to write cpio support into this patch set rather
thanwaiting until v16, just to make sure the abstraction layer doesn't have tar-specific assumptions left over. 


    printf(_("  -F, --format=p|t       output format (plain (default), tar)\n"));

    printf(_("  -z, --gzip             compress tar output\n"));
    printf(_("  -Z, --compress=0-9     compress tar output with given compression level\n"));

This is the pre-existing --help output, not changed by your patch, but if you anticipate that other output formats will
besupported in future releases, perhaps it's better not to write the --help output in such a way as to imply that -z
and-Z are somehow connected with the choice of tar format?  Would changing the --help now make for less confusion
later? I'm just asking... 

The new options to pg_basebackup should have test coverage in src/bin/pg_basebackup/t/010_pg_basebackup.pl, though I
expectyou are waiting to hammer out the interface before writing the tests. 

> the rest is
> preparatory refactoring.

patch v3-0001:

The new function AppendPlainCommandOption writes too many spaces, which does no harm, but seems silly, resulting in
lineslike: 

  LOG:  received replication command: BASE_BACKUP ( LABEL 'pg_basebackup base backup',  PROGRESS,  WAIT 0,  MANIFEST
'yes')


patch v3-0003:

The introduction of the sink abstraction seems incomplete, as basebackup.c still has knowledge of things like tar
headers. Calls like _tarWriteHeader(sink, ...) feel like an abstraction violation.  I expected perhaps this would get
addressedin later patches, but it doesn't. 

+ * 'bbs_buffer' is the buffer into which data destined for the bbsink
+ * should be stored. It must be a multiple of BLCKSZ.
+ *
+ * 'bbs_buffer_length' is the allocated length of the buffer.

The length must be a multiple of BLCKSZ, not the pointer.


patch-v3-0005:

+ * 'copystream' sends a starts a single COPY OUT operation and transmits

too many verbs.

+ * Regardless of which method is used, we sent a result set with

"is used" vs. "sent" verb tense mismatch.

+ * So we only check it after the number of bytes sine the last check reaches

typo.  s/sine/since/

-    * (2) we need to inject backup_manifest or recovery configuration into it.
+    * (2) we need to inject backup_manifest or recovery configuration into
+    * it.

src/bin/pg_basebackup/pg_basebackup.c contains word wrap changes like the above which would better be left to a
differentcommit, if done at all. 

+   if (state.manifest_file !=NULL)

Need a space after !=


—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Mon, Jul 19, 2021 at 2:51 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> The difficulty in v3-0007 with pg_basebackup only knowing how to parse tar archives seems to be a natural consequence
ofnot sufficiently abstracting out the handling of the tar format.  If the bbsink and bbstreamer abstractions fully
encapsulateda set of parsing callbacks, then pg_basebackup wouldn't contain things like: 
>
>     streamer = bbstreamer_tar_parser_new(streamer);
>
> but instead would use the parser callbacks without knowledge of whether they were parsing tar vs. cpio vs. whatever.
Itjust seems really odd that pg_basebackup is using the extensible abstraction layer and then defeating the purpose by
knowingtoo much about the format.  It might even be a useful exercise to write cpio support into this patch set rather
thanwaiting until v16, just to make sure the abstraction layer doesn't have tar-specific assumptions left over. 

Well, I had a patch in an earlier patch set that tried to get
knowledge of tar out of basebackup.c, but it couldn't use the bbsink
abstraction; it needed a whole separate abstraction layer which I had
called bbarchiver with a different API. So I dropped it, for fear of
being told, not without some justification, that I was just changing
things for the sake of changing them, and also because having exactly
one implementation of some interface is really not great. I do
conceptually like the idea of making the whole thing flexible enough
to generate cpio or zip archives, because like you I think that having
tar-specific stuff all over the place is grotty, but I have a feeling
there's little market demand for having pg_basebackup produce cpio,
pax, zip, iso, etc. archives. On the other hand, server-side
compression and server-side backup seem like functionality with real
utility. Still, if you or others want to vote for resurrecting
bbarchiver on the grounds that general code cleanup is worthwhile for
its own sake, I'm OK with that, too.

I don't really understand what your problem is with how the patch set
leaves pg_basebackup. On the server side, because I dropped the
bbarchiver stuff, basebackup.c still ends up knowing a bunch of stuff
about tar. pg_basebackup.c, however, really doesn't know anything much
about tar any more. It knows that if it's getting a tar file and needs
to parse a tar file then it had better call the tar parsing code, but
that seems difficult to avoid. What we can avoid, and I think the
patch set does, is pg_basebackup.c having any real knowledge of what
the tar parser is doing under the hood.

Thanks also for the detailed comments. I'll try to the right number of
verbs in each sentence in the next version of the patch. I will also
look into the issues mentioned by Dilip and Tushar.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Mark Dilger
Дата:

> On Jul 20, 2021, at 11:57 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> I don't really understand what your problem is with how the patch set
> leaves pg_basebackup.

I don't have a problem with how the patch set leaves pg_basebackup.

> On the server side, because I dropped the
> bbarchiver stuff, basebackup.c still ends up knowing a bunch of stuff
> about tar. pg_basebackup.c, however, really doesn't know anything much
> about tar any more. It knows that if it's getting a tar file and needs
> to parse a tar file then it had better call the tar parsing code, but
> that seems difficult to avoid.

I was only imagining having a callback for injecting manifests or recovery configurations.  It is not necessary that
thisbe done in the current patch set, or perhaps ever. 

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Jul 20, 2021 at 4:03 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> I was only imagining having a callback for injecting manifests or recovery configurations.  It is not necessary that
thisbe done in the current patch set, or perhaps ever.
 

A callback where?

I actually think the ideal scenario would be if the server always did
all the work and the client wasn't involved in editing the tarfile,
but it's not super-easy to get there from here. We could add an option
to tell the server whether to inject the manifest into the archive,
which probably wouldn't be too bad. For it to inject the recovery
configuration, we'd have to send that configuration to the server
somehow. I thought about using COPY BOTH mode instead of COPY OUT mode
to allow for stuff like that, but it seems pretty complicated, and I
wasn't really sure that we'd get consensus that it was better even if
I went to the trouble of coding it up.

If we don't do that and stick with the current system where it's
handled on the client side, then I agree that we want to separate the
tar-specific concerns from the injection-type concerns, which the
patch does by making those operations different kinds of bbstreamer
that know only a relatively limited amount about what each other are
doing. You get [server] => [tar parser] => [recovery injector] => [tar
archiver], where the [recovery injector] step nukes the archive file
headers for the files it adds or modifies, and the [tar archiver] step
fixes them up again. So the only thing that the [recovery injector]
piece needs to know is that if it makes any changes to a file, it
should send that file to the next step with a 0-length archive header,
and all the [tar archiver] piece needs to know is that already-valid
headers can be left alone and 0-length ones need to be regenerated.

There may be a better scheme; I don't think this is perfectly elegant.
I do think it's better than what we've got now.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Mark Dilger
Дата:

> On Jul 21, 2021, at 8:09 AM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> A callback where?

If you were going to support lots of formats, not just tar, you might want the streamer class for each format to have a
callbackwhich sets up the injector, rather than having CreateBackupStreamer do it directly.  Even then, having now
studiedCreateBackupStreamer a bit more, the idea seems less appealing than it did initially.  I don't think it makes
thingsany cleaner when only supporting tar, and maybe not even when supporting multiple formats, so I'll withdraw the
suggestion.

—
Mark Dilger
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company






Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Wed, Jul 21, 2021 at 12:11 PM Mark Dilger
<mark.dilger@enterprisedb.com> wrote:
> If you were going to support lots of formats, not just tar, you might want the streamer class for each format to have
acallback which sets up the injector, rather than having CreateBackupStreamer do it directly.  Even then, having now
studiedCreateBackupStreamer a bit more, the idea seems less appealing than it did initially.  I don't think it makes
thingsany cleaner when only supporting tar, and maybe not even when supporting multiple formats, so I'll withdraw the
suggestion.

Gotcha. I think if we had a lot of formats I'd probably make a
separate function where you passed in the file extension and archive
type and it hands you back a parser for the appropriate kind of
archive, or something like that. And then maybe a second, similar
function where you pass in the injector and archive type and it wraps
an archiver of the right type around it and hands that back. But I
don't think that's worth doing until we have 2 or 3 formats, which may
or may not happen any time in the forseeable future.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
tushar
Дата:
On 7/19/21 8:29 PM, Dilip Kumar wrote:
> I am not sure why this is working, from the code I could not find if
> the backup target is server then are we doing anything with the -R
> option or we are just silently ignoring it

OK, in an  another scenario  I can see , "-t server" working with 
"--server-compression" option  but not with -z  or -Z ?

"-t  server" with option "-z"  / or (-Z )

[tushar@localhost bin]$ ./pg_basebackup -t server:/tmp/dataN -Xnone  -z  
--no-manifest -p 9033
pg_basebackup: error: only tar mode backups can be compressed
Try "pg_basebackup --help" for more information.

tushar@localhost bin]$ ./pg_basebackup -t server:/tmp/dataNa -Z 1    
-Xnone  --server-compression=gzip4  --no-manifest -p 9033
pg_basebackup: error: only tar mode backups can be compressed
Try "pg_basebackup --help" for more information.

"-t server" with "server-compression"  (working)

[tushar@localhost bin]$ ./pg_basebackup -t server:/tmp/dataN -Xnone  
--server-compression=gzip4  --no-manifest -p 9033
NOTICE:  WAL archiving is not enabled; you must ensure that all required 
WAL segments are copied through other means to complete the backup
[tushar@localhost bin]$

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Thu, Jul 22, 2021 at 1:14 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
> On 7/19/21 8:29 PM, Dilip Kumar wrote:
> > I am not sure why this is working, from the code I could not find if
> > the backup target is server then are we doing anything with the -R
> > option or we are just silently ignoring it
>
> OK, in an  another scenario  I can see , "-t server" working with
> "--server-compression" option  but not with -z  or -Z ?

Right. The error messages or documentation might need some work, but
it's expected that you won't be able to do client-side compression if
the backup is being sent someplace other than to the client.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
0007 adds server-side compression; currently, it only supports
server-side compression using gzip, but I hope that it won't be hard
to generalize that to support LZ4 as well, and Andres told me he
thinks we should aim to support zstd since that library has built-in
parallel compression which is very appealing in this context. 

Thanks, Robert for laying the foundation here.
So, I gave a try to LZ4 streaming API for server-side compression.
LZ4 APIs are documented here[1].

With the attached WIP patch, I am now able to take the backup using the lz4
compression. The attached patch is basically applicable on top of Robert's V3
patch-set[2].

I could take the backup using the command:
pg_basebackup -t server:/tmp/data_lz4 -Xnone --server-compression=lz4

Further, when restored the backup `/tmp/data_lz4` and started the server, I
could see the tables I created, along with the data inserted on the original
server.

When I tried to look into the binary difference between the original data
directory and the backup `data_lz4` directory here is how it looked:

$ diff -qr data/ /tmp/data_lz4
Only in /tmp/data_lz4: backup_label
Only in /tmp/data_lz4: backup_manifest
Only in data/base: pgsql_tmp
Only in /tmp/data_lz4: base.tar
Only in /tmp/data_lz4: base.tar.lz4
Files data/global/pg_control and /tmp/data_lz4/global/pg_control differ
Files data/logfile and /tmp/data_lz4/logfile differ
Only in data/pg_stat: db_0.stat
Only in data/pg_stat: global.stat
Only in data/pg_subtrans: 0000
Only in data/pg_wal: 000000010000000000000099.00000028.backup
Only in data/pg_wal: 00000001000000000000009A
Only in data/pg_wal: 00000001000000000000009B
Only in data/pg_wal: 00000001000000000000009C
Only in data/pg_wal: 00000001000000000000009D
Only in data/pg_wal: 00000001000000000000009E
Only in data/pg_wal/archive_status: 000000010000000000000099.00000028.backup.done
Only in data/: postmaster.opts

For now, what concerns me here is, the following `LZ4F_compressUpdate()` API,
is the one which is doing the core work of streaming compression:

size_t LZ4F_compressUpdate(LZ4F_cctx* cctx,
                                       void* dstBuffer, size_t dstCapacity,
                                 const void* srcBuffer, size_t srcSize,
                                 const LZ4F_compressOptions_t* cOptPtr);

where, `dstCapacity`, is basically provided by the earlier call to
`LZ4F_compressBound()` which provides minimum `dstCapacity` required to
guarantee success of `LZ4F_compressUpdate()`, given a `srcSize` and
`preferences`, for a worst-case scenario. `LZ4F_compressBound()` is:

size_t LZ4F_compressBound(size_t srcSize, const LZ4F_preferences_t* prefsPtr);

Now, hard learning here is that the `dstCapacity` returned by the
`LZ4F_compressBound()` even for a single byte i.e. 1 as `srcSize` is about
~256K (seems it is has something to do with the blockSize in lz4 frame that we
chose, the minimum we can have is 64K), though the actual length of compressed
data by the `LZ4F_compressUpdate()` is very less. Whereas, the destination
buffer length for us i.e. `mysink->base.bbs_next->bbs_buffer_length` is only
32K. In the function call `LZ4F_compressUpdate()`, if I directly try to provide
this `mysink->base.bbs_next->bbs_buffer + bytes_written` as `dstBuffer` and
the value returned by the `LZ4F_compressBound()` as the `dstCapacity` that
sounds very much incorrect to me, since the actual out buffer length remaining
is much less than what is calculated for the worst case by `LZ4F_compressBound()`.

For now, I am creating a temporary buffer of the required size, passing it
for compression, assert that the actual compressed bytes are less than the
whatever length we have available and then copy it to our output buffer.

To give an example, I put some logging statements, and I can see in the log:
"
bytes remaining in mysink->base.bbs_next->bbs_buffer: 16537
input size to be compressed: 512
estimated size for compressed buffer by LZ4F_compressBound(): 262667
actual compressed size: 16
"

Will really appreciate any inputs, comments, suggestions here. 

Regards,
Jeevan Ladhe
 
Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Wed, Sep 8, 2021 at 2:14 PM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> To give an example, I put some logging statements, and I can see in the log:
> "
> bytes remaining in mysink->base.bbs_next->bbs_buffer: 16537
> input size to be compressed: 512
> estimated size for compressed buffer by LZ4F_compressBound(): 262667
> actual compressed size: 16
> "

That is pretty lame. I don't know why it needs a ~256k buffer to
produce 16 bytes of output.

The way the gzip APIs I used work, you tell it how big the output
buffer is and it writes until it fills that buffer, or until the input
buffer is empty, whichever happens first. But this seems to be the
other way around: you tell it how much input you have, and it tells
you how big a buffer it needs. To handle that elegantly, I think I
need to make some changes to the design of the bbsink stuff. What I'm
thinking is that each bbsink somehow tells the next bbsink how big to
make the buffer. So if the LZ4 buffer is told that its buffer should
be at least, I don't know, say 64kB. Then it can compute how large an
output buffer the LZ4 library requires for 64kB. Hopefully we can
assume that liblz4 never needs a smaller buffer for a larger input.
Then we can assume that if a 64kB input requires, say, a 300kB output
buffer, every possible input < 64kB also requires an output buffer <=
300 kB.

But we can't just say, well, we were asked to create a 64kB buffer (or
whatever) so let's ask the next bbsink for a 300kB buffer (or
whatever), because then as soon as we write any data at all into it
the remaining buffer space might be insufficient for the next chunk.
So instead what I think we should do is have bbsink_lz4 set the size
of the next sink's buffer to its own buffer size +
LZ4F_compressBound(its own buffer size). So in this example if it's
asked to create a 64kB buffer and LZ4F_compressBound(64kB) = 300kB
then it asks the next sink to set the buffer size to 364kB. Now, that
means that there will always be at least 300 kB available in the
output buffer until we've accumulated a minimum of 64 kB of compressed
data, and then at that point we can flush.

I think this would be relatively clean and would avoid the need for
the double copying that the current design forced you to do. What do
you think?

+ /*
+ * If we do not have enough space left in the output buffer for this
+ * chunk to be written, first archive the already written contents.
+ */
+ if (nextChunkLen > mysink->base.bbs_next->bbs_buffer_length -
mysink->bytes_written ||
+ mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }

I think this is flat-out wrong. It assumes that the compressor will
never generate more than N bytes of output given N bytes of input,
which is not true. Not sure there's much point in fixing it now
because with the changes described above this code will have to change
anyway, but I think it's just lucky that this has worked for you in
your testing.

+ /*
+ * LZ4F_compressUpdate() returns the number of bytes written into output
+ * buffer. We need to keep track of how many bytes have been cumulatively
+ * written into the output buffer(bytes_written). But,
+ * LZ4F_compressUpdate() returns 0 in case the data is buffered and not
+ * written to output buffer, set autoFlush to 1 to force the writing to the
+ * output buffer.
+ */
+ prefs->autoFlush = 1;

I don't see why this should be necessary. Elsewhere you have code that
caters to bytes being stuck inside LZ4's buffer, so why do we also
require this?

Thanks for researching this!

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Wed, Sep 8, 2021 at 3:39 PM Robert Haas <robertmhaas@gmail.com> wrote:
> The way the gzip APIs I used work, you tell it how big the output
> buffer is and it writes until it fills that buffer, or until the input
> buffer is empty, whichever happens first. But this seems to be the
> other way around: you tell it how much input you have, and it tells
> you how big a buffer it needs. To handle that elegantly, I think I
> need to make some changes to the design of the bbsink stuff. What I'm
> thinking is that each bbsink somehow tells the next bbsink how big to
> make the buffer.

Here's a new patch set with that design change (and a bug fix for 0001).

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Thanks, Robert for your response.

On Thu, Sep 9, 2021 at 1:09 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Sep 8, 2021 at 2:14 PM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> To give an example, I put some logging statements, and I can see in the log:
> "
> bytes remaining in mysink->base.bbs_next->bbs_buffer: 16537
> input size to be compressed: 512
> estimated size for compressed buffer by LZ4F_compressBound(): 262667
> actual compressed size: 16
> "

That is pretty lame. I don't know why it needs a ~256k buffer to
produce 16 bytes of output.

As I mentioned earlier, I think it has something to do with the lz4
blocksize. Currently, I have chosen it has 256kB, which is 262144 bytes,
and here the LZ4F_compressBound() has returned 262667 for worst-case
accommodation of 512 bytes i.e. 262144(256kB) + 512 + I guess some
book-keeping bytes. If I choose to have blocksize as 64K, then this turns
out to be: 66059 which is 65536(64 kB) + 512 + bookkeeping bytes.

The way the gzip APIs I used work, you tell it how big the output
buffer is and it writes until it fills that buffer, or until the input
buffer is empty, whichever happens first. But this seems to be the
other way around: you tell it how much input you have, and it tells
you how big a buffer it needs. To handle that elegantly, I think I
need to make some changes to the design of the bbsink stuff. What I'm
thinking is that each bbsink somehow tells the next bbsink how big to
make the buffer. So if the LZ4 buffer is told that its buffer should
be at least, I don't know, say 64kB. Then it can compute how large an
output buffer the LZ4 library requires for 64kB. Hopefully we can
assume that liblz4 never needs a smaller buffer for a larger input.
Then we can assume that if a 64kB input requires, say, a 300kB output
buffer, every possible input < 64kB also requires an output buffer <=
300 kB.

 I agree, this assumption is fair enough.

But we can't just say, well, we were asked to create a 64kB buffer (or
whatever) so let's ask the next bbsink for a 300kB buffer (or
whatever), because then as soon as we write any data at all into it
the remaining buffer space might be insufficient for the next chunk.
So instead what I think we should do is have bbsink_lz4 set the size
of the next sink's buffer to its own buffer size +
LZ4F_compressBound(its own buffer size). So in this example if it's
asked to create a 64kB buffer and LZ4F_compressBound(64kB) = 300kB
then it asks the next sink to set the buffer size to 364kB. Now, that
means that there will always be at least 300 kB available in the
output buffer until we've accumulated a minimum of 64 kB of compressed
data, and then at that point we can flush. 
I think this would be relatively clean and would avoid the need for
the double copying that the current design forced you to do. What do
you think?

I think this should work.
 

+ /*
+ * If we do not have enough space left in the output buffer for this
+ * chunk to be written, first archive the already written contents.
+ */
+ if (nextChunkLen > mysink->base.bbs_next->bbs_buffer_length -
mysink->bytes_written ||
+ mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length)
+ {
+ bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
+ mysink->bytes_written = 0;
+ }

I think this is flat-out wrong. It assumes that the compressor will
never generate more than N bytes of output given N bytes of input,
which is not true. Not sure there's much point in fixing it now
because with the changes described above this code will have to change
anyway, but I think it's just lucky that this has worked for you in
your testing.

I see your point. But for it to be accurate, I think we need to then
considered the return value of LZ4F_compressBound() to check if that
many bytes are available. But, as explained earlier our output buffer is
already way smaller than that.
 
 
+ /*
+ * LZ4F_compressUpdate() returns the number of bytes written into output
+ * buffer. We need to keep track of how many bytes have been cumulatively
+ * written into the output buffer(bytes_written). But,
+ * LZ4F_compressUpdate() returns 0 in case the data is buffered and not
+ * written to output buffer, set autoFlush to 1 to force the writing to the
+ * output buffer.
+ */
+ prefs->autoFlush = 1;

I don't see why this should be necessary. Elsewhere you have code that
caters to bytes being stuck inside LZ4's buffer, so why do we also
require this?

This is needed to know the actual bytes written in the output buffer. If it is

set to 0, then LZ4F_compressUpdate() would randomly return 0 or actual

bytes are written to the output buffer, depending on whether it has buffered

or really flushed data to the output buffer.

IIUC, you are referring to the following comment for bbsink_lz4_end_archive():


"

* There might be some data inside lz4's internal buffers; we need to get            

 * that flushed out, also finalize the lz4 frame and then get that forwarded         

 * to the successor sink as archive content.

"


I think it should be modified to:


"

* Finalize the lz4 frame and then get that forwarded to the successor sink as

* archive content.

"



Regards,
Jeevan Ladhe.

Re: refactoring basebackup.c

От
Dilip Kumar
Дата:
On Fri, Sep 10, 2021 at 5:25 AM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Sep 8, 2021 at 3:39 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > The way the gzip APIs I used work, you tell it how big the output
> > buffer is and it writes until it fills that buffer, or until the input
> > buffer is empty, whichever happens first. But this seems to be the
> > other way around: you tell it how much input you have, and it tells
> > you how big a buffer it needs. To handle that elegantly, I think I
> > need to make some changes to the design of the bbsink stuff. What I'm
> > thinking is that each bbsink somehow tells the next bbsink how big to
> > make the buffer.
>
> Here's a new patch set with that design change (and a bug fix for 0001).

Seems like nothing has been done about the issue reported in [1]

This one line change shall fix the issue,

--- a/src/backend/replication/basebackup_gzip.c
+++ b/src/backend/replication/basebackup_gzip.c
@@ -264,6 +264,8 @@ bbsink_gzip_end_archive(bbsink *sink)
                bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
                mysink->bytes_written = 0;
        }
+
+       bbsink_forward_end_archive(sink);
 }


[1] https://www.postgresql.org/message-id/CAFiTN-uhg4iKA7FGWxaG9J8WD_LTx655%2BAUW3_KiK1%3DSakQy4A%40mail.gmail.com

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Mon, Sep 13, 2021 at 6:03 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
>> + /*
>> + * If we do not have enough space left in the output buffer for this
>> + * chunk to be written, first archive the already written contents.
>> + */
>> + if (nextChunkLen > mysink->base.bbs_next->bbs_buffer_length -
>> mysink->bytes_written ||
>> + mysink->bytes_written >= mysink->base.bbs_next->bbs_buffer_length)
>> + {
>> + bbsink_archive_contents(sink->bbs_next, mysink->bytes_written);
>> + mysink->bytes_written = 0;
>> + }
>>
>> I think this is flat-out wrong. It assumes that the compressor will
>> never generate more than N bytes of output given N bytes of input,
>> which is not true. Not sure there's much point in fixing it now
>> because with the changes described above this code will have to change
>> anyway, but I think it's just lucky that this has worked for you in
>> your testing.
>
> I see your point. But for it to be accurate, I think we need to then
> considered the return value of LZ4F_compressBound() to check if that
> many bytes are available. But, as explained earlier our output buffer is
> already way smaller than that.

Well, in your last version of the patch, you kind of had two output
buffers: a bigger one that you use internally and then the "official"
one which is associated with the next sink. With my latest patch set
you should be able to make that go away by just arranging for the next
sink's buffer to be as big as you need it to be. But, if we were going
to stick with using an extra buffer, then the solution would not be to
do this, but to copy the internal buffer to the official buffer in
multiple chunks if needed. So don't bother doing this here but just
wait and see how much data you get and then chunk it to the next
sink's buffer, calling bbsink_archive_contents() multiple times if
required. That would be annoying and expensive so I'm glad we're not
doing it that way, but it could be done correctly.

>> + /*
>> + * LZ4F_compressUpdate() returns the number of bytes written into output
>> + * buffer. We need to keep track of how many bytes have been cumulatively
>> + * written into the output buffer(bytes_written). But,
>> + * LZ4F_compressUpdate() returns 0 in case the data is buffered and not
>> + * written to output buffer, set autoFlush to 1 to force the writing to the
>> + * output buffer.
>> + */
>> + prefs->autoFlush = 1;
>>
>> I don't see why this should be necessary. Elsewhere you have code that
>> caters to bytes being stuck inside LZ4's buffer, so why do we also
>> require this?
>
> This is needed to know the actual bytes written in the output buffer. If it is
> set to 0, then LZ4F_compressUpdate() would randomly return 0 or actual
> bytes are written to the output buffer, depending on whether it has buffered
> or really flushed data to the output buffer.

The problem is that if we autoflush, I think it will cause the
compression ratio to be less good. Try un-lz4ing a file that is
produced this way and then re-lz4 it and compare the size of the
re-lz4'd file to the original one. Compressors rely on postponing
decisions about how to compress until they've seen as much of the
input as possible, and flushing forces them to decide earlier, and
maybe making a decision that isn't as good as it could have been. So I
believe we should look for a way of avoiding this. Now I realize
there's a problem there with doing that and also making sure the
output buffer is large enough, and I'm not quite sure how we solve
that problem, but there is probably a way to do it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Robert Haas
Дата:

Re: refactoring basebackup.c

От
Dilip Kumar
Дата:
On Mon, Sep 13, 2021 at 9:42 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Mon, Sep 13, 2021 at 7:19 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> > Seems like nothing has been done about the issue reported in [1]
> >
> > This one line change shall fix the issue,
>
> Oops. Try this version.

Thanks, this version works fine.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Sergei Kornilov
Дата:
Hello

I found that in 0001 you propose to rename few options. Probably we could rename another option for clarify? I think
FAST(it's about some bw limits?) and WAIT (wait for what? checkpoint?) option names are confusing.
 
Could we replace FAST with "CHECKPOINT [fast|spread]" and WAIT to WAIT_WAL_ARCHIVED? I think such names would be more
descriptive.

-        if (PQserverVersion(conn) >= 100000)
-            /* pg_recvlogical doesn't use an exported snapshot, so suppress */
-            appendPQExpBufferStr(query, " NOEXPORT_SNAPSHOT");
+        /* pg_recvlogical doesn't use an exported snapshot, so suppress */
+        if (use_new_option_syntax)
+            AppendStringCommandOption(query, use_new_option_syntax,
+                                       "SNAPSHOT", "nothing");
+        else
+            AppendPlainCommandOption(query, use_new_option_syntax,
+                                     "NOEXPORT_SNAPSHOT");

In 0002, it looks like condition for 9.x releases was lost?

Also my gcc version 8.3.0 is not happy with v5-0007-Support-base-backup-targets.patch and produces:

basebackup.c: In function ‘parse_basebackup_options’:
basebackup.c:970:7: error: ‘target_str’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
       errmsg("target '%s' does not accept a target detail",
       ^~~~~~

regards, Sergei



Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Thanks for the newer set of the patches Robert!

I was wondering if we should change the bbs_buffer_length in bbsink to
be size_t instead of int, because that's what most of the compression
libraries have their length variables defined as.

Regards,
Jeevan Ladhe

On Mon, Sep 13, 2021 at 9:42 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Sep 13, 2021 at 7:19 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> Seems like nothing has been done about the issue reported in [1]
>
> This one line change shall fix the issue,

Oops. Try this version.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
>> + /*
>> + * LZ4F_compressUpdate() returns the number of bytes written into output
>> + * buffer. We need to keep track of how many bytes have been cumulatively
>> + * written into the output buffer(bytes_written). But,
>> + * LZ4F_compressUpdate() returns 0 in case the data is buffered and not
>> + * written to output buffer, set autoFlush to 1 to force the writing to the
>> + * output buffer.
>> + */
>> + prefs->autoFlush = 1;
>>
>> I don't see why this should be necessary. Elsewhere you have code that
>> caters to bytes being stuck inside LZ4's buffer, so why do we also
>> require this?
>
> This is needed to know the actual bytes written in the output buffer. If it is
> set to 0, then LZ4F_compressUpdate() would randomly return 0 or actual
> bytes are written to the output buffer, depending on whether it has buffered
> or really flushed data to the output buffer.

The problem is that if we autoflush, I think it will cause the
compression ratio to be less good. Try un-lz4ing a file that is
produced this way and then re-lz4 it and compare the size of the
re-lz4'd file to the original one. Compressors rely on postponing
decisions about how to compress until they've seen as much of the
input as possible, and flushing forces them to decide earlier, and
maybe making a decision that isn't as good as it could have been. So I
believe we should look for a way of avoiding this. Now I realize
there's a problem there with doing that and also making sure the
output buffer is large enough, and I'm not quite sure how we solve
that problem, but there is probably a way to do it.

Yes, you are right here, and I could verify this fact with an experiment.
When autoflush is 1, the file gets less compressed i.e. the compressed file
is of more size than the one generated when autoflush is set to 0.
But, as of now, I couldn't think of a solution as we need to really advance the
bytes written to the output buffer so that we can write into the output buffer.

Regards,
Jeevan Ladhe
 

Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Hi Robert,

Here is a patch for lz4 based on the v5 set of patches. The patch adapts with the
bbsink changes, and is now able to make the provision for the required length
for the output buffer using the new callback function bbsink_lz4_begin_backup().

Sample command to take backup:
pg_basebackup -t server:/tmp/data_lz4 -Xnone --server-compression=lz4

Please let me know your thoughts.

Regards,
Jeevan Ladhe

On Mon, Sep 13, 2021 at 9:42 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Sep 13, 2021 at 7:19 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
> Seems like nothing has been done about the issue reported in [1]
>
> This one line change shall fix the issue,

Oops. Try this version.

--
Robert Haas
EDB: http://www.enterprisedb.com
Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Sep 14, 2021 at 11:30 AM Sergei Kornilov <sk@zsrv.org> wrote:
> I found that in 0001 you propose to rename few options. Probably we could rename another option for clarify? I think
FAST(it's about some bw limits?) and WAIT (wait for what? checkpoint?) option names are confusing. 
> Could we replace FAST with "CHECKPOINT [fast|spread]" and WAIT to WAIT_WAL_ARCHIVED? I think such names would be more
descriptive.

I think CHECKPOINT { 'spread' | 'fast' } is probably a good idea; the
options logic for pg_basebackup uses the same convention, and if
somebody ever wanted to introduce a third kind of checkpoint, it would
be a lot easier if you could just make pg_basebackup -cbanana send
CHECKPOINT 'banana' to the server. I don't think renaming WAIT ->
WAIT_WAL_ARCHIVED has much value. The replication grammar isn't really
intended to be consumed directly by end-users, and it's also not clear
that WAIT_WAL_ARCHIVED would attract more support than any of 5 or 10
other possible variants. I'd rather leave it alone.

> -               if (PQserverVersion(conn) >= 100000)
> -                       /* pg_recvlogical doesn't use an exported snapshot, so suppress */
> -                       appendPQExpBufferStr(query, " NOEXPORT_SNAPSHOT");
> +               /* pg_recvlogical doesn't use an exported snapshot, so suppress */
> +               if (use_new_option_syntax)
> +                       AppendStringCommandOption(query, use_new_option_syntax,
> +                                                                          "SNAPSHOT", "nothing");
> +               else
> +                       AppendPlainCommandOption(query, use_new_option_syntax,
> +                                                                        "NOEXPORT_SNAPSHOT");
>
> In 0002, it looks like condition for 9.x releases was lost?

Good catch, thanks.

I'll post an updated version of these two patches on the thread
dedicated to those two patches, which can be found at
http://postgr.es/m/CA+Tgmob2cbCPNbqGoixp0J6aib0p00XZerswGZwx-5G=0M+BMA@mail.gmail.com

> Also my gcc version 8.3.0 is not happy with v5-0007-Support-base-backup-targets.patch and produces:
>
> basebackup.c: In function ‘parse_basebackup_options’:
> basebackup.c:970:7: error: ‘target_str’ may be used uninitialized in this function [-Werror=maybe-uninitialized]
>        errmsg("target '%s' does not accept a target detail",
>        ^~~~~~

OK, I'll fix that. Thanks.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Sep 21, 2021 at 7:54 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I was wondering if we should change the bbs_buffer_length in bbsink to
> be size_t instead of int, because that's what most of the compression
> libraries have their length variables defined as.

I looked into this and found that I was already using size_t or Size
in a bunch of related places, so this seems to make sense.

Here's a new patch set, responding also to Sergei's comments.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Sep 21, 2021 at 9:08 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Yes, you are right here, and I could verify this fact with an experiment.
> When autoflush is 1, the file gets less compressed i.e. the compressed file
> is of more size than the one generated when autoflush is set to 0.
> But, as of now, I couldn't think of a solution as we need to really advance the
> bytes written to the output buffer so that we can write into the output buffer.

I don't understand why you think we need to do that. What happens if
you just change prefs->autoFlush = 1 to set it to 0 instead? What I
think will happen is that you'll call LZ4F_compressUpdate a bunch of
times without outputting anything, and then suddenly one of the calls
will produce a bunch of output all at once. But so what? I don't see
that anything in bbsink_lz4_archive_contents() would get broken by
that.

It would be a problem if LZ4F_compressUpdate() didn't produce anything
and also didn't buffer the data internally, and expected us to keep
the input around. That we would have difficulty doing, because we
wouldn't be calling LZ4F_compressUpdate() if we didn't need to free up
some space in that sink's input buffer. But if it buffers the data
internally, I don't know why we care.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Sep 21, 2021 at 9:35 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Here is a patch for lz4 based on the v5 set of patches. The patch adapts with the
> bbsink changes, and is now able to make the provision for the required length
> for the output buffer using the new callback function bbsink_lz4_begin_backup().
>
> Sample command to take backup:
> pg_basebackup -t server:/tmp/data_lz4 -Xnone --server-compression=lz4
>
> Please let me know your thoughts.

This pretty much looks right, with the exception of the autoFlush
thing about which I sent a separate email. I need to write docs for
all of this, and ideally test cases. It might also be good if
pg_basebackup had an option to un-gzip or un-lz4 archives, but I
haven't thought too hard about what would be required to make that
work.

+ if (opt->compression == BACKUP_COMPRESSION_LZ4)

else if

+ /* First of all write the frame header to destination buffer. */
+ Assert(CHUNK_SIZE >= LZ4F_HEADER_SIZE_MAX);
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ CHUNK_SIZE,
+ prefs);

I think this is wrong. I think you should be passing bbs_buffer_length
instead of CHUNK_SIZE, and I think you can just delete CHUNK_SIZE. If
you think otherwise, why?

+ * sink's bbs_buffer of length that can accomodate the compressed input

Spelling.

+ * Make it next multiple of BLCKSZ since the buffer length is expected so.

The buffer length is expected to be a multiple of BLCKSZ, so round up.

+ * If we are falling short of available bytes needed by
+ * LZ4F_compressUpdate() per the upper bound that is decided by
+ * LZ4F_compressBound(), send the archived contents to the next sink to
+ * process it further.

If the number of available bytes has fallen below the value computed
by LZ4F_compressBound(), ask the next sink to process the data so that
we can empty the buffer.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:


On Tue, Sep 21, 2021 at 10:27 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Sep 21, 2021 at 9:08 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Yes, you are right here, and I could verify this fact with an experiment.
> When autoflush is 1, the file gets less compressed i.e. the compressed file
> is of more size than the one generated when autoflush is set to 0.
> But, as of now, I couldn't think of a solution as we need to really advance the
> bytes written to the output buffer so that we can write into the output buffer.

I don't understand why you think we need to do that. What happens if
you just change prefs->autoFlush = 1 to set it to 0 instead? What I
think will happen is that you'll call LZ4F_compressUpdate a bunch of
times without outputting anything, and then suddenly one of the calls
will produce a bunch of output all at once. But so what? I don't see
that anything in bbsink_lz4_archive_contents() would get broken by
that.

It would be a problem if LZ4F_compressUpdate() didn't produce anything
and also didn't buffer the data internally, and expected us to keep
the input around. That we would have difficulty doing, because we
wouldn't be calling LZ4F_compressUpdate() if we didn't need to free up
some space in that sink's input buffer. But if it buffers the data
internally, I don't know why we care.

If I set prefs->autoFlush to 0, then LZ4F_compressUpdate() returns an
error: ERROR_dstMaxSize_tooSmall after a few iterations.

After digging a bit in the source of LZ4F_compressUpdate() in LZ4 repository, I
see that it throws this error when the destination buffer capacity, which in
our case is mysink->base.bbs_next->bbs_buffer_length is less than the
compress bound which it calculates internally by calling LZ4F_compressBound()
internally for buffered_bytes + input buffer(CHUNK_SIZE in this case). Not sure
how can we control this.

Regards,
Jeevan Ladhe 

Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:

On Tue, Sep 21, 2021 at 10:50 PM Robert Haas <robertmhaas@gmail.com> wrote:

+ if (opt->compression == BACKUP_COMPRESSION_LZ4)

else if

+ /* First of all write the frame header to destination buffer. */
+ Assert(CHUNK_SIZE >= LZ4F_HEADER_SIZE_MAX);
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ CHUNK_SIZE,
+ prefs);

I think this is wrong. I think you should be passing bbs_buffer_length
instead of CHUNK_SIZE, and I think you can just delete CHUNK_SIZE. If
you think otherwise, why?

+ * sink's bbs_buffer of length that can accomodate the compressed input

Spelling.

+ * Make it next multiple of BLCKSZ since the buffer length is expected so.

The buffer length is expected to be a multiple of BLCKSZ, so round up.

+ * If we are falling short of available bytes needed by
+ * LZ4F_compressUpdate() per the upper bound that is decided by
+ * LZ4F_compressBound(), send the archived contents to the next sink to
+ * process it further.

If the number of available bytes has fallen below the value computed
by LZ4F_compressBound(), ask the next sink to process the data so that
we can empty the buffer.

Thanks for your comments, Robert.
Here is the patch addressing the comments, except the one regarding the
autoFlush flag setting.
 
Kindly have a look.

Regards,
Jeevan Ladhe
 
Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Wed, Sep 22, 2021 at 12:41 PM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> If I set prefs->autoFlush to 0, then LZ4F_compressUpdate() returns an
> error: ERROR_dstMaxSize_tooSmall after a few iterations.
>
> After digging a bit in the source of LZ4F_compressUpdate() in LZ4 repository, I
> see that it throws this error when the destination buffer capacity, which in
> our case is mysink->base.bbs_next->bbs_buffer_length is less than the
> compress bound which it calculates internally by calling LZ4F_compressBound()
> internally for buffered_bytes + input buffer(CHUNK_SIZE in this case). Not sure
> how can we control this.

Uggh. It had been my guess was that the reason why
LZ4F_compressBound() was returning such a large value was because it
had to allow for the possibility of bytes inside of its internal
buffers. But, if the amount of internally buffered data counts against
the argument that you have to pass to LZ4F_compressBound(), then that
makes it more complicated.

Still, there's got to be a simple way to make this work, and it can't
involve setting autoFlush. Like, look at this:

https://github.com/lz4/lz4/blob/dev/examples/frameCompress.c

That uses the same APIs that we're here and a fixed-size input buffer
and a fixed-size output buffer, just as we have here, to compress a
file. And it probably works, because otherwise it likely wouldn't be
in the "examples" directory. And it sets autoFlush to 0.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Still, there's got to be a simple way to make this work, and it can't
involve setting autoFlush. Like, look at this:

https://github.com/lz4/lz4/blob/dev/examples/frameCompress.c

That uses the same APIs that we're here and a fixed-size input buffer
and a fixed-size output buffer, just as we have here, to compress a
file. And it probably works, because otherwise it likely wouldn't be
in the "examples" directory. And it sets autoFlush to 0.

Thanks, Robert. I have seen this example, and it is similar to what we have.
I went through each of the steps and appears that I have done it correctly.
I am still trying to debug and figure out where it is going wrong.

I am going to try hooking the pg_basebackup with the lz4 source and
debug both the sources.

Regards,
Jeevan Ladhe

Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Hi Robert,

I have fixed the autoFlush issue. Basically, I was wrongly initializing
the lz4 preferences in bbsink_lz4_begin_archive() instead of
bbsink_lz4_begin_backup(). I have fixed the issue in the attached
patch, please have a look at it.

Regards,
Jeevan Ladhe


On Fri, Sep 24, 2021 at 6:27 PM Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
Still, there's got to be a simple way to make this work, and it can't
involve setting autoFlush. Like, look at this:

https://github.com/lz4/lz4/blob/dev/examples/frameCompress.c

That uses the same APIs that we're here and a fixed-size input buffer
and a fixed-size output buffer, just as we have here, to compress a
file. And it probably works, because otherwise it likely wouldn't be
in the "examples" directory. And it sets autoFlush to 0.

Thanks, Robert. I have seen this example, and it is similar to what we have.
I went through each of the steps and appears that I have done it correctly.
I am still trying to debug and figure out where it is going wrong.

I am going to try hooking the pg_basebackup with the lz4 source and
debug both the sources.

Regards,
Jeevan Ladhe
Вложения

Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Hi Robert,

I think the patch v6-0007-Support-base-backup-targets.patch has broken
the case for multiple tablespaces. When I tried to take the backup
for target 'none' and extract the base.tar I was not able to locate
tablespace_map file.

I debugged and figured out in normal tar backup i.e. '-Ft' case
pg_basebackup command is sent with TABLESPACE_MAP to the server:
BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS,
TABLESPACE_MAP, MANIFEST 'yes', TARGET 'client')

But, with the target command i.e. "pg_basebackup -t server:/tmp/data_v1
-Xnone", we are not sending the TABLESPACE_MAP, here is how the command
is sent:
BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS, MANIFEST
'yes', TARGET 'server', TARGET_DETAIL '/tmp/data_none')

I am attaching a patch to fix this issue.

With the patch the command sent is now:
BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS, MANIFEST
'yes', TABLESPACE_MAP, TARGET 'server', TARGET_DETAIL '/tmp/data_none')

Regards,
Jeevan Ladhe

On Tue, Sep 21, 2021 at 10:22 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Sep 21, 2021 at 7:54 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I was wondering if we should change the bbs_buffer_length in bbsink to
> be size_t instead of int, because that's what most of the compression
> libraries have their length variables defined as.

I looked into this and found that I was already using size_t or Size
in a bunch of related places, so this seems to make sense.

Here's a new patch set, responding also to Sergei's comments.

--
Robert Haas
EDB: http://www.enterprisedb.com
Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Thu, Oct 7, 2021 at 7:50 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I think the patch v6-0007-Support-base-backup-targets.patch has broken
> the case for multiple tablespaces. When I tried to take the backup
> for target 'none' and extract the base.tar I was not able to locate
> tablespace_map file.
>
> I debugged and figured out in normal tar backup i.e. '-Ft' case
> pg_basebackup command is sent with TABLESPACE_MAP to the server:
> BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS,
> TABLESPACE_MAP, MANIFEST 'yes', TARGET 'client')
>
> But, with the target command i.e. "pg_basebackup -t server:/tmp/data_v1
> -Xnone", we are not sending the TABLESPACE_MAP, here is how the command
> is sent:
> BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS, MANIFEST
> 'yes', TARGET 'server', TARGET_DETAIL '/tmp/data_none')
>
> I am attaching a patch to fix this issue.

Thanks. Here's a new patch set incorporating that change. I committed
the preparatory patches to add an extensible options syntax for
CREATE_REPLICATION_SLOT and BASE_BACKUP, so those patches are no
longer included in this patch set. Barring objections, I will also
push 0001, a small preparatory refactoring patch, soon.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Oct 5, 2021 at 5:51 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I have fixed the autoFlush issue. Basically, I was wrongly initializing
> the lz4 preferences in bbsink_lz4_begin_archive() instead of
> bbsink_lz4_begin_backup(). I have fixed the issue in the attached
> patch, please have a look at it.

Thanks for the new patch. Seems like this is getting closer, but:

+/*
+ * Read the input buffer in CHUNK_SIZE length in each iteration and pass it to
+ * the lz4 compression. Defined as 8k, since the input buffer is multiple of
+ * BLCKSZ i.e. multiple of 8k.
+ */
+#define CHUNK_SIZE 8192

BLCKSZ does not have to be 8kB.

+ size_t compressedSize;
+ int nextChunkLen = CHUNK_SIZE;
+
+ /* Last chunk to be read from the input. */
+ if (avail_in < CHUNK_SIZE)
+ nextChunkLen = avail_in;

This is the only place where CHUNK_SIZE gets used, and I don't think I
see any point to it. I think the 5th argument to LZ4F_compressUpdate
could just be avail_in. And as soon as you do that then I think
bbsink_lz4_archive_contents() no longer needs to be a loop. For gzip,
the output buffer isn't guaranteed to be big enough to write all the
data, so the compression step can fail to compress all the data. But
LZ4 forces us to make the output buffer big enough that no such
failure can happen. Therefore, that can't happen here except if you
artificially limit the amount of data that you pass to
LZ4F_compressUpdate() to something less than the size of the input
buffer. And I don't see any reason to do that.

+ /* First of all write the frame header to destination buffer. */
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ mysink->base.bbs_next->bbs_buffer_length,
+ &mysink->prefs);

+ compressedSize = LZ4F_compressEnd(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ NULL);

I think there's some issue with these two chunks of code. What happens
if one of these functions wants to write more data than will fit in
the output buffer? It seems like either there needs to be some code
someplace that ensures adequate space in the output buffer at the time
of these calls, or else there needs to be a retry loop that writes as
much of the data as possible, flushes the output buffer, and then
loops to generate more output data. But there's clearly no retry loop
here, and I don't see any code that guarantees that the output buffer
has to be large enough (and in the case of LZ4F_compressEnd, have
enough remaining space) either. In other words, all the same concerns
that apply to LZ4F_compressUpdate() also apply here ... but in
LZ4F_compressUpdate() you seem to BOTH have a retry loop and ALSO code
to make sure that the buffer is certain to be large enough (which is
more than you need, you only need one of those) and here you seem to
have NEITHER of those things (which is not enough, you need one or the
other).

+ /* Initialize compressor object. */
+ prefs->frameInfo.blockSizeID = LZ4F_max256KB;
+ prefs->frameInfo.blockMode = LZ4F_blockLinked;
+ prefs->frameInfo.contentChecksumFlag = LZ4F_noContentChecksum;
+ prefs->frameInfo.frameType = LZ4F_frame;
+ prefs->frameInfo.contentSize = 0;
+ prefs->frameInfo.dictID = 0;
+ prefs->frameInfo.blockChecksumFlag = LZ4F_noBlockChecksum;
+ prefs->compressionLevel = 0;
+ prefs->autoFlush = 0;
+ prefs->favorDecSpeed = 0;
+ prefs->reserved[0] = 0;
+ prefs->reserved[1] = 0;
+ prefs->reserved[2] = 0;

How about instead using memset() to zero the whole thing and then
omitting the zero initializations? That seems like it would be less
fragile, if the upstream structure definition ever changes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Thanks, Robert for reviewing the patch.


On Tue, Oct 12, 2021 at 11:09 PM Robert Haas <robertmhaas@gmail.com> wrote:

This is the only place where CHUNK_SIZE gets used, and I don't think I
see any point to it. I think the 5th argument to LZ4F_compressUpdate
could just be avail_in. And as soon as you do that then I think
bbsink_lz4_archive_contents() no longer needs to be a loop.

Agree. Removed the CHUNK_SIZE and the loop.
 

+ /* First of all write the frame header to destination buffer. */
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ mysink->base.bbs_next->bbs_buffer_length,
+ &mysink->prefs);

+ compressedSize = LZ4F_compressEnd(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ NULL);

I think there's some issue with these two chunks of code. What happens
if one of these functions wants to write more data than will fit in
the output buffer? It seems like either there needs to be some code
someplace that ensures adequate space in the output buffer at the time
of these calls, or else there needs to be a retry loop that writes as
much of the data as possible, flushes the output buffer, and then
loops to generate more output data. But there's clearly no retry loop
here, and I don't see any code that guarantees that the output buffer
has to be large enough (and in the case of LZ4F_compressEnd, have
enough remaining space) either. In other words, all the same concerns
that apply to LZ4F_compressUpdate() also apply here ... but in
LZ4F_compressUpdate() you seem to BOTH have a retry loop and ALSO code
to make sure that the buffer is certain to be large enough (which is
more than you need, you only need one of those) and here you seem to
have NEITHER of those things (which is not enough, you need one or the
other).

Fair enough. I have made the change in the bbsink_lz4_begin_backup() to
make sure we reserve enough extra bytes for the header and the footer those
are written by LZ4F_compressBegin() and LZ4F_compressEnd() respectively.
The LZ4F_compressBound() when passed the input size as "0", would give
the upper bound for output buffer needed by the LZ4F_compressEnd().

How about instead using memset() to zero the whole thing and then
omitting the zero initializations? That seems like it would be less
fragile, if the upstream structure definition ever changes.

Made this change.

Please review the patch, and let me know your comments.

Regards,
Jeevan Ladhe
Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Thu, Oct 14, 2021 at 1:21 PM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Agree. Removed the CHUNK_SIZE and the loop.

Try harder. :-)

The loop is gone, but CHUNK_SIZE itself seems to have evaded the executioner.

> Fair enough. I have made the change in the bbsink_lz4_begin_backup() to
> make sure we reserve enough extra bytes for the header and the footer those
> are written by LZ4F_compressBegin() and LZ4F_compressEnd() respectively.
> The LZ4F_compressBound() when passed the input size as "0", would give
> the upper bound for output buffer needed by the LZ4F_compressEnd().

I think this is not the best way to accomplish the goal. Adding
LZ4F_compressBound(0) to next_buf_len makes the buffer substantially
bigger for something that's only going to happen once. We are assuming
in any case, I think, that LZ4F_compressBound(0) <=
LZ4F_compressBound(mysink->base.bbs_buffer_length), so all you need to
do is have bbsink_end_archive() empty the buffer, if necessary, before
calling LZ4F_compressEnd(). With just that change, you can set
next_buf_len = LZ4F_HEADER_SIZE_MAX + mysink->output_buffer_bound --
but that's also more than you need. You can instead do next_buf_len =
Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound). Now, you're
probably thinking that won't work, because bbsink_lz4_begin_archive()
could fill up the buffer partway, and then the first call to
bbsink_lz4_archive_contents() could overrun it. But that problem can
be solved by reversing the order of operations in
bbsink_lz4_archive_contents(): before you call LZ4F_compressUpdate(),
test whether you need to empty the buffer first, and if so, do it.

That's actually less confusing than the way you've got it, because as
you have it written, we don't really know why we're emptying the
buffer -- is it to prepare for the next call to LZ4F_compressUpdate(),
or is it to prepare for the call to LZ4F_compressEnd()? How do we know
now how much space the next person writing into the buffer is going to
need? It seems better if bbsink_lz4_archive_contents() empties the
buffer before calling LZ4F_compressUpdate() if that call might not
have enough space, and likewise bbsink_lz4_end_archive() empties the
buffer before calling LZ4F_compressEnd() if that's needed. That way,
each callback makes the space *it* needs, not the space the *next*
caller needs. (bbsink_lz4_end_archive() still needs to ALSO empty the
buffer after LZ4F_compressEnd(), so we don't orphan any data.)

On another note, if the call to LZ4F_freeCompressionContext() is
required in bbsink_lz4_end_archive(), then I think this code is going
to just leak the memory used by the compression context if an error
occurs before this code is reached. That kind of sucks. The way to fix
it, I suppose, is a TRY/CATCH block, but I don't think that can be
something internal to basebackup_lz4.c: I think the bbsink stuff would
need to provide some kind of infrastructure for basebackup_lz4.c to
use. It would be a lot better if we could instead get LZ4 to allocate
memory using palloc(), but a quick Google search suggests that you
can't accomplish that without recompiling liblz4, and that's not
workable since we don't want to require a liblz4 built specifically
for PostgreSQL. Do you see any other solution?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Hi Robert,

> The loop is gone, but CHUNK_SIZE itself seems to have evaded the executioner.

I am sorry, but I did not really get it. Or it is what you have pointed
in the following paragraphs?

> I think this is not the best way to accomplish the goal. Adding
> LZ4F_compressBound(0) to next_buf_len makes the buffer substantially
> bigger for something that's only going to happen once.

Yes, you are right. I missed this.

> We are assuming in any case, I think, that LZ4F_compressBound(0) <=
> LZ4F_compressBound(mysink->base.bbs_buffer_length), so all you need to
> do is have bbsink_end_archive() empty the buffer, if necessary, before
> calling LZ4F_compressEnd().

This is a fair enough assumption.

> With just that change, you can set
> next_buf_len = LZ4F_HEADER_SIZE_MAX + mysink->output_buffer_bound --
> but that's also more than you need. You can instead do next_buf_len =
> Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound). Now, you're
> probably thinking that won't work, because bbsink_lz4_begin_archive()
> could fill up the buffer partway, and then the first call to
> bbsink_lz4_archive_contents() could overrun it. But that problem can
> be solved by reversing the order of operations in
> bbsink_lz4_archive_contents(): before you call LZ4F_compressUpdate(),
> test whether you need to empty the buffer first, and if so, do it.

I am still not able to get - how can we survive with a mere
size of Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound).
LZ4F_HEADER_SIZE_MAX is defined as 19 in lz4 library. With this
proposal, it is almost guaranteed that the next buffer length will
be always set to 19, which will result in failure of a call to
LZ4F_compressUpdate() with the error LZ4F_ERROR_dstMaxSize_tooSmall,
even if we had called bbsink_archive_contents() before.

> That's actually less confusing than the way you've got it, because as
> you have it written, we don't really know why we're emptying the
> buffer -- is it to prepare for the next call to LZ4F_compressUpdate(),
> or is it to prepare for the call to LZ4F_compressEnd()? How do we know
> now how much space the next person writing into the buffer is going to
> need? It seems better if bbsink_lz4_archive_contents() empties the
> buffer before calling LZ4F_compressUpdate() if that call might not
> have enough space, and likewise bbsink_lz4_end_archive() empties the
> buffer before calling LZ4F_compressEnd() if that's needed. That way,
> each callback makes the space *it* needs, not the space the *next*
> caller needs. (bbsink_lz4_end_archive() still needs to ALSO empty the
> buffer after LZ4F_compressEnd(), so we don't orphan any data.)

Sure, I get your point here.

> On another note, if the call to LZ4F_freeCompressionContext() is
> required in bbsink_lz4_end_archive(), then I think this code is going
> to just leak the memory used by the compression context if an error
> occurs before this code is reached. That kind of sucks.

Yes, the LZ4F_freeCompressionContext() is needed to clear the
LZ4F_cctx. The structure LZ4F_cctx_s maintains internal stages
of compression, internal buffers, etc.

> The way to fix
> it, I suppose, is a TRY/CATCH block, but I don't think that can be
> something internal to basebackup_lz4.c: I think the bbsink stuff would
> need to provide some kind of infrastructure for basebackup_lz4.c to
> use. It would be a lot better if we could instead get LZ4 to allocate
> memory using palloc(), but a quick Google search suggests that you
> can't accomplish that without recompiling liblz4, and that's not
> workable since we don't want to require a liblz4 built specifically
> for PostgreSQL. Do you see any other solution?

You mean the way gzip allows us to use our own alloc and free functions
by means of providing the function pointers for them. Unfortunately,
no, LZ4 does not have that kind of provision. Maybe that makes a
good proposal for LZ4 library ;-).
I cannot think of another solution to it right away.

Regards,
Jeevan Ladhe

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Fri, Oct 15, 2021 at 7:54 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> > The loop is gone, but CHUNK_SIZE itself seems to have evaded the executioner.
>
> I am sorry, but I did not really get it. Or it is what you have pointed
> in the following paragraphs?

I mean #define CHUNK_SIZE is still in the patch.

> > With just that change, you can set
> > next_buf_len = LZ4F_HEADER_SIZE_MAX + mysink->output_buffer_bound --
> > but that's also more than you need. You can instead do next_buf_len =
> > Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound). Now, you're
> > probably thinking that won't work, because bbsink_lz4_begin_archive()
> > could fill up the buffer partway, and then the first call to
> > bbsink_lz4_archive_contents() could overrun it. But that problem can
> > be solved by reversing the order of operations in
> > bbsink_lz4_archive_contents(): before you call LZ4F_compressUpdate(),
> > test whether you need to empty the buffer first, and if so, do it.
>
> I am still not able to get - how can we survive with a mere
> size of Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound).
> LZ4F_HEADER_SIZE_MAX is defined as 19 in lz4 library. With this
> proposal, it is almost guaranteed that the next buffer length will
> be always set to 19, which will result in failure of a call to
> LZ4F_compressUpdate() with the error LZ4F_ERROR_dstMaxSize_tooSmall,
> even if we had called bbsink_archive_contents() before.

Sorry, should have been Max(), not Min().

> You mean the way gzip allows us to use our own alloc and free functions
> by means of providing the function pointers for them. Unfortunately,
> no, LZ4 does not have that kind of provision. Maybe that makes a
> good proposal for LZ4 library ;-).
> I cannot think of another solution to it right away.

OK. Will give it some thought.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Hi Robert,

I mean #define CHUNK_SIZE is still in the patch.

Oops, removed now. 
 
> > With just that change, you can set
> > next_buf_len = LZ4F_HEADER_SIZE_MAX + mysink->output_buffer_bound --
> > but that's also more than you need. You can instead do next_buf_len =
> > Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound). Now, you're
> > probably thinking that won't work, because bbsink_lz4_begin_archive()
> > could fill up the buffer partway, and then the first call to
> > bbsink_lz4_archive_contents() could overrun it. But that problem can
> > be solved by reversing the order of operations in
> > bbsink_lz4_archive_contents(): before you call LZ4F_compressUpdate(),
> > test whether you need to empty the buffer first, and if so, do it.
>
> I am still not able to get - how can we survive with a mere
> size of Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound).
> LZ4F_HEADER_SIZE_MAX is defined as 19 in lz4 library. With this
> proposal, it is almost guaranteed that the next buffer length will
> be always set to 19, which will result in failure of a call to
> LZ4F_compressUpdate() with the error LZ4F_ERROR_dstMaxSize_tooSmall,
> even if we had called bbsink_archive_contents() before.

Sorry, should have been Max(), not Min().

Ahh ok.
I looked into the code of LZ4F_compressBound() and the header size is
already included in the calculations, so when we compare
LZ4F_HEADER_SIZE_MAX and mysink->output_buffer_bound, the latter
will be always greater, and hence sufficient length for the output buffer. I
made this change in the attached patch, and also cleared the buffer by
calling bbsink_archive_contents() before calling LZ4_compressUpdate().
Similarly cleared the buffer before calling LZ4F_compressEnd().

> You mean the way gzip allows us to use our own alloc and free functions
> by means of providing the function pointers for them. Unfortunately,
> no, LZ4 does not have that kind of provision. Maybe that makes a
> good proposal for LZ4 library ;-).
> I cannot think of another solution to it right away.

OK. Will give it some thought.

I have started a thread[1] on LZ4 community for this, but so far no
reply on that.

Regards,
Jeevan Ladhe

Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Fri, Oct 15, 2021 at 8:05 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > You mean the way gzip allows us to use our own alloc and free functions
> > by means of providing the function pointers for them. Unfortunately,
> > no, LZ4 does not have that kind of provision. Maybe that makes a
> > good proposal for LZ4 library ;-).
> > I cannot think of another solution to it right away.
>
> OK. Will give it some thought.

Here's a new patch set. I've tried adding a "cleanup" callback to the
bbsink method and ensuring that it gets called even in case of an
error. The code for that is untested since I have no use for it with
the existing basebackup sink types, so let me know how it goes when
you try to use it for LZ4.

I've also added documentation for the new pg_basebackup options in
this version, and I fixed up a couple of these patches to be
pgindent-clean when they previously were not.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Thanks, Robert for the patches.

I tried to take a backup using gzip compression and got a core.

$ pg_basebackup -t server:/tmp/data_gzip -Xnone --server-compression=gzip
NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup
pg_basebackup: error: could not read COPY data: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.


The backtrace:

gdb) bt
#0  0x0000000000000000 in ?? ()
#1  0x0000558264bfc40a in bbsink_cleanup (sink=0x55826684b5f8) at ../../../src/include/replication/basebackup_sink.h:268
#2  0x0000558264bfc838 in bbsink_forward_cleanup (sink=0x55826684b710) at basebackup_sink.c:124
#3  0x0000558264bf4cab in bbsink_cleanup (sink=0x55826684b710) at ../../../src/include/replication/basebackup_sink.h:268
#4  0x0000558264bf7738 in SendBaseBackup (cmd=0x55826683bd10) at basebackup.c:1020
#5  0x0000558264c10915 in exec_replication_command (
    cmd_string=0x5582667bc580 "BASE_BACKUP ( LABEL 'pg_basebackup base backup',  PROGRESS,  MANIFEST 'yes',  TABLESPACE_MAP,  TARGET 'server',  TARGET_DETAIL '/tmp/data_g
zip',  COMPRESSION 'gzip')") at walsender.c:1731
#6  0x0000558264c8a69b in PostgresMain (dbname=0x5582667e84d8 "", username=0x5582667e84b8 "hadoop") at postgres.c:4493
#7  0x0000558264bb10a6 in BackendRun (port=0x5582667de160) at postmaster.c:4560
#8  0x0000558264bb098b in BackendStartup (port=0x5582667de160) at postmaster.c:4288
#9  0x0000558264bacb55 in ServerLoop () at postmaster.c:1801
#10 0x0000558264bac2ee in PostmasterMain (argc=3, argv=0x5582667b68c0) at postmaster.c:1473
#11 0x0000558264aa0950 in main (argc=3, argv=0x5582667b68c0) at main.c:198


bbsink_gzip_ops have the cleanup() callback set to NULL, and when the
bbsink_cleanup() callback is triggered, it tries to invoke a function that
is NULL. I think either bbsink_gzip_ops should set the cleanup callback
to bbsink_forward_cleanup or we should be calling the cleanup() callback
from PG_CATCH instead of PG_FINALLY()? But in the latter case, even if
we call from PG_CATCH, it will have a similar problem for gzip and other
sinks which may not need a custom cleanup() callback in case there is any
error before the backup could finish up normally.

I have attached a patch to fix this.

Thoughts?

Regards,
Jeevan Ladhe

On Tue, Oct 26, 2021 at 1:45 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Oct 15, 2021 at 8:05 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > You mean the way gzip allows us to use our own alloc and free functions
> > by means of providing the function pointers for them. Unfortunately,
> > no, LZ4 does not have that kind of provision. Maybe that makes a
> > good proposal for LZ4 library ;-).
> > I cannot think of another solution to it right away.
>
> OK. Will give it some thought.

Here's a new patch set. I've tried adding a "cleanup" callback to the
bbsink method and ensuring that it gets called even in case of an
error. The code for that is untested since I have no use for it with
the existing basebackup sink types, so let me know how it goes when
you try to use it for LZ4.

I've also added documentation for the new pg_basebackup options in
this version, and I fixed up a couple of these patches to be
pgindent-clean when they previously were not.

--
Robert Haas
EDB: http://www.enterprisedb.com
Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Fri, Oct 29, 2021 at 8:59 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:>
> bbsink_gzip_ops have the cleanup() callback set to NULL, and when the
> bbsink_cleanup() callback is triggered, it tries to invoke a function that
> is NULL. I think either bbsink_gzip_ops should set the cleanup callback
> to bbsink_forward_cleanup or we should be calling the cleanup() callback
> from PG_CATCH instead of PG_FINALLY()? But in the latter case, even if
> we call from PG_CATCH, it will have a similar problem for gzip and other
> sinks which may not need a custom cleanup() callback in case there is any
> error before the backup could finish up normally.
>
> I have attached a patch to fix this.

Yes, this is the right fix. Apologies for the oversight.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:

I have implemented the cleanup callback bbsink_lz4_cleanup() in the attached patch.


Please have a look and let me know of any comments.


Regards,

Jeevan Ladhe


On Fri, Oct 29, 2021 at 6:54 PM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Oct 29, 2021 at 8:59 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:>
> bbsink_gzip_ops have the cleanup() callback set to NULL, and when the
> bbsink_cleanup() callback is triggered, it tries to invoke a function that
> is NULL. I think either bbsink_gzip_ops should set the cleanup callback
> to bbsink_forward_cleanup or we should be calling the cleanup() callback
> from PG_CATCH instead of PG_FINALLY()? But in the latter case, even if
> we call from PG_CATCH, it will have a similar problem for gzip and other
> sinks which may not need a custom cleanup() callback in case there is any
> error before the backup could finish up normally.
>
> I have attached a patch to fix this.

Yes, this is the right fix. Apologies for the oversight.

--
Robert Haas
EDB: http://www.enterprisedb.com
Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Nov 2, 2021 at 7:53 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I have implemented the cleanup callback bbsink_lz4_cleanup() in the attached patch.
>
> Please have a look and let me know of any comments.

Looks pretty good. I think you should work on stuff like documentation
and tests, and I need to do some work on that stuff, too. Also, I
think you should try to figure out how to support different
compression levels. For gzip, I did that by making gzip1..gzip9
possible compression settings. But that might not have been the right
idea because something like lz43 to mean lz4 at level 3 would be
confusing. Also, for the lz4 command line utility, there's not only
"lz4 -3" which means LZ4 with level 3 compression, but also "lz4
--fast=3" which selects "ultra-fast compression level 3" rather than
regular old level 3. And apparently LZ4 levels go up to 12 rather than
just 9 like gzip. I'm thinking maybe we should go with something like
"gzip@9" rather than just "gzip9" to mean gzip with compression level
9, and then things like "lz4@3" or "lz4@fast3" would select either the
regular compression levels or the ultra-fast compression levels.

Meanwhile, I think it's probably OK for me to go ahead and commit
0001-0003 from my patches at this point, since it seems we have pretty
good evidence that the abstraction basically works, and there doesn't
seem to be any value in holding off and maybe having to do a bunch
more rebasing. We may also want to look into making -Fp work with
--server-compression, which would require pg_basebackup to know how to
decompress. I'm actually not sure if this is worthwhile; you'd need to
have a network connection slow enough that it's worth spending a lot
of CPU time compressing on the server and decompressing on the client
to make up for the cost of network transfer. But some people might
have that case. It might make it easier to test this, too, since we
probably can't rely on having an LZ4 binary installed. Another thing
that you probably need to investigate is also supporting client-side
LZ4 compression. I think that is probably a really desirable addition
to your patch set, since people might find it odd if that were
exclusively a server-side option. Hopefully it's not that much work.

One minor nitpick in terms of the code:

+ mysink->bytes_written = mysink->bytes_written + headerSize;

I would use += here.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Nov 2, 2021 at 10:32 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Looks pretty good. I think you should work on stuff like documentation
> and tests, and I need to do some work on that stuff, too. Also, I
> think you should try to figure out how to support different
> compression levels.

On second thought, maybe we don't need to do this. There's a thread on
"Teach pg_receivewal to use lz4 compression" which concluded that
supporting different compression levels was unnecessary.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Nov 2, 2021 at 10:32 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Meanwhile, I think it's probably OK for me to go ahead and commit
> 0001-0003 from my patches at this point, since it seems we have pretty
> good evidence that the abstraction basically works, and there doesn't
> seem to be any value in holding off and maybe having to do a bunch
> more rebasing.

I went ahead and committed 0001 and 0002, but got nervous about
proceeding with 0003. For those who may not have been following along
closely, what was 0003 and is now 0001 introduces a new COPY
subprotocol for taking backups. That probably needs to be documented
and as of now the patch does not do that, but the bigger question is
what to do about backward compatibility. I wrote the patch in such a
way that, post-patch, the server can do backups either the way that we
do them now, or the new way that it introduces, but I'm wondering if I
should rip that out and just support the new way only. If you run a
newer pg_basebackup against an older server, it will work, and still
does with the patch. If, however, you run an older pg_basebackup
against a newer server, it complains. For example running a pg13
pg_basebackup against a pg14 cluster produces this:

pg_basebackup: error: incompatible server version 14.0
pg_basebackup: removing data directory "pgstandby"

Now for all I know there is out-of-core software out there that speaks
the replication protocol and can take base backups using it and would
like it to continue working as it does today, and that's easy for me
to do, because that's the way the patch works. But on the other hand
since the patch adapts the in-core tools to use the new method when
talking to a new server, we wouldn't have test coverage for the old
method any more, which might possibly make it annoying to maintain.
But then again that is a problem we could leave for the future, and
rip it out then rather than now. I'm not sure which way to jump.
Anyone else have thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Fri, Nov 5, 2021 at 11:50 AM Robert Haas <robertmhaas@gmail.com> wrote:
> I went ahead and committed 0001 and 0002, but got nervous about
> proceeding with 0003.

It turns out that these commits are causing failures on prairiedog.
Per email from Tom off-list, that's apparently because prairiedog has
a fussy version of tar that doesn't like it when you omit the trailing
NUL blocks that are supposed to be part of a tar file. So how did this
get broken?

It turns out that in the current state of the world, the server sends
an almost-tarfile to the client. What I mean by an almost-tarfile is
that it sends something that looks like a valid tarfile except that
the two blocks of trailing NUL bytes are omitted. Prior to these
patches, that was a very strategic omission, because the pg_basebackup
code wants to edit the tar files, and it wasn't smart enough to parse
them, so it just received all the data from the server, then added any
members that it wanted to add (e.g. recovery.signal) and then added
the terminator itself. I would classify this as an ugly hack, but it
worked. With these changes, the client is now capable of really
parsing a tarfile, so it would have no problem injecting new files
into the archive whether or not the server terminates it properly. It
also has no problem adding the two blocks of terminating NUL bytes if
the server omits them, but not otherwise. All in all, it's
significantly smarter code.

However, I also set things up so that the client doesn't bother
parsing the tar file from the server if it's not doing anything that
requires editing the tar file on the fly. That saves some overhead,
and it's also important for the rest of the patch set, which wants to
make it so that the server could send us something besides a tarfile,
like maybe a .tar.gz. We can't just have a convention of adding 1024
NUL bytes to any file the server sends us unless what the server sends
us is always and precisely an unterminated tarfile.  Unfortunately,
that means that in the case where the tar parsing logic isn't used,
the tar file ends up with the proper terminator. Because most 'tar'
implementations are happy to ignore that defect, the tests pass on my
machine, but not on prairiedog. I think I realized this problem at
some point during the development process of this patch, but then I
forgot about it again and ended up committing something that has a
problem of which, at some earlier point in time, I had been entirely
aware. Oops.

It's tempting to try to fix this problem by changing the server so
that it properly terminates the tar files it sends to the client.
Honestly, I don't know how we ever thought it was OK to design a
protocol for base backups that involved the server sending something
that is almost but not quite a valid tarfile. However, that's not
quite good enough, because pg_basebackup is supposed to be backward
compatible, so we'd still have the same problem if a new version of
pg_basebackup were used with an old server. So what I'm inclined to do
is fix both the server and pg_basebackup. On the server side, properly
terminate the tarfile. On the client side, if we're talking to a
pre-v15 server and don't need to parse the tarfile, blindly add 1024
NUL bytes at the end.

I think I can get patches for this done today. Please let me know ASAP
if you have objections to this line of attack.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> It turns out that these commits are causing failures on prairiedog.
> Per email from Tom off-list, that's apparently because prairiedog has
> a fussy version of tar that doesn't like it when you omit the trailing
> NUL blocks that are supposed to be part of a tar file.

FTR, prairiedog is green.  It's Noah's AIX menagerie that's complaining.

It's actually a little bit disturbing that we're only seeing a failure
on that one platform, because that means that nothing else is anchoring
us to the strict POSIX specification for tarfile format.  We knew that
GNU tar is forgiving about missing trailing zero blocks, but apparently
so is BSD tar.

One part of me wants to add some explicit test for the trailing blocks.
Another says, well, the *de facto* tar standard seems not to require
the trailing blocks, never mind the letter of POSIX --- so when AIX
dies, will anyone care anymore?  Maybe not.

            regards, tom lane



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Mon, Nov 8, 2021 at 10:59 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > It turns out that these commits are causing failures on prairiedog.
> > Per email from Tom off-list, that's apparently because prairiedog has
> > a fussy version of tar that doesn't like it when you omit the trailing
> > NUL blocks that are supposed to be part of a tar file.
>
> FTR, prairiedog is green.  It's Noah's AIX menagerie that's complaining.

Woops.

> It's actually a little bit disturbing that we're only seeing a failure
> on that one platform, because that means that nothing else is anchoring
> us to the strict POSIX specification for tarfile format.  We knew that
> GNU tar is forgiving about missing trailing zero blocks, but apparently
> so is BSD tar.

Yeah.

> One part of me wants to add some explicit test for the trailing blocks.
> Another says, well, the *de facto* tar standard seems not to require
> the trailing blocks, never mind the letter of POSIX --- so when AIX
> dies, will anyone care anymore?  Maybe not.

FWIW, I think both of those are pretty defensible positions. Honestly,
I'm not sure how likely the bug is to recur once we fix it here,
either. The only reason this is a problem is because of the kludge of
having the server generate the entire output file except for the last
1kB. If we eliminate that behavior I don't know that this particular
problem is especially likely to come back. But adding a test isn't
stupid either, just a bit tricky to write. When I was testing locally
this morning I found that there were considerably more than 1024 zero
bytes at the end of the file because the last file it backs up is
pg_control which ends with lots of zero bytes. So it's not sufficient
to just write a test that checks for non-zero bytes in the last 1kB of
the file. What I think you'd need to do is figure out the number of
files in the archive and the sizes of each one, and based on that work
out how big the tar archive should be: 512 bytes per file or directory
or symlink plus enough extra 512 byte chunks to cover the contents of
each file plus an extra 1024 bytes at the end. That doesn't seem
particularly simple to code. We could run 'tar tvf' and parse the
output to get the number of files and their lengths, but that seems
likely to cause more portability headaches than the underlying issue.
Since pg_basebackup now has the logic to do all of this parsing
internally, we could make it complain if it receives from a v15+
server an archive trailer that is not 1024 bytes of zeroes, but that
wouldn't help with this exact problem, because the issue in this case
is when pg_basebackup decides it doesn't need to parse in the first
place. We could add a pg_basebackup option
--force-parsing-and-check-if-the-server-seems-broken, but that seems
like overkill to me. So overall I'm inclined to just do nothing about
this unless someone has a better idea how to write a reasonable test.

Anyway, here's my proposal for fixing the issue immediately before us.
0001 adds logic to pad out the unterminated tar archives, and 0002
makes the server terminate its tar archives while preserving the logic
added by 0001 for cases where we're talking to an older server. I
assume that it's best to get something committed quickly here so will
do that in ~4 hours if there are no major objections, or sooner if I
hear some enthusiastic endorsement.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Mon, Nov 8, 2021 at 11:34 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Anyway, here's my proposal for fixing the issue immediately before us.
> 0001 adds logic to pad out the unterminated tar archives, and 0002
> makes the server terminate its tar archives while preserving the logic
> added by 0001 for cases where we're talking to an older server. I
> assume that it's best to get something committed quickly here so will
> do that in ~4 hours if there are no major objections, or sooner if I
> hear some enthusiastic endorsement.

I have now committed 0001 and will wait to see what the buildfarm
thinks about that before doing anything more.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Mon, Nov 8, 2021 at 4:41 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Nov 8, 2021 at 11:34 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > Anyway, here's my proposal for fixing the issue immediately before us.
> > 0001 adds logic to pad out the unterminated tar archives, and 0002
> > makes the server terminate its tar archives while preserving the logic
> > added by 0001 for cases where we're talking to an older server. I
> > assume that it's best to get something committed quickly here so will
> > do that in ~4 hours if there are no major objections, or sooner if I
> > hear some enthusiastic endorsement.
>
> I have now committed 0001 and will wait to see what the buildfarm
> thinks about that before doing anything more.

It seemed OK, so I have now committed 0002 as well.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Dmitry Dolgov
Дата:
> On Fri, Nov 05, 2021 at 11:50:01AM -0400, Robert Haas wrote:
> On Tue, Nov 2, 2021 at 10:32 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > Meanwhile, I think it's probably OK for me to go ahead and commit
> > 0001-0003 from my patches at this point, since it seems we have pretty
> > good evidence that the abstraction basically works, and there doesn't
> > seem to be any value in holding off and maybe having to do a bunch
> > more rebasing.
>
> I went ahead and committed 0001 and 0002, but got nervous about
> proceeding with 0003.

Hi,

I'm observing a strange issue which I can only relate to bef47ff85d
where bbsink abstraction was introduced. The problem is about failing
assertion when doing:

    DETAIL:  Failed process was running: BASE_BACKUP ( LABEL 'pg_basebackup base backup',  PROGRESS,  WAIT 0,  MAX_RATE
102400, MANIFEST 'yes')
 

Walsender tries to send a backup manifest, but crashes on the trottling sink:

    #2  0x0000560857b551af in ExceptionalCondition (conditionName=0x560857d15d27 "sink->bbs_next != NULL",
errorType=0x560857d15c23"FailedAssertion", fileName=0x560857d15d15 "basebackup_sink.c", lineNumber=91) at assert.c:69
 
    #3  0x0000560857918a94 in bbsink_forward_manifest_contents (sink=0x5608593f73f8, len=32768) at
basebackup_sink.c:91
    #4  0x0000560857918d68 in bbsink_throttle_manifest_contents (sink=0x5608593f7450, len=32768) at
basebackup_throttle.c:125
    #5  0x00005608579186d0 in bbsink_manifest_contents (sink=0x5608593f7450, len=32768) at
../../../src/include/replication/basebackup_sink.h:240
    #6  0x0000560857918b1b in bbsink_forward_manifest_contents (sink=0x5608593f74e8, len=32768) at
basebackup_sink.c:94
    #7  0x0000560857911edc in bbsink_manifest_contents (sink=0x5608593f74e8, len=32768) at
../../../src/include/replication/basebackup_sink.h:240
    #8  0x00005608579129f6 in SendBackupManifest (manifest=0x7ffdaea9d120, sink=0x5608593f74e8) at
backup_manifest.c:373

Looking at the similar bbsink_throttle_archive_contents it's not clear
why comments for both functions (archive and manifest throttling) say
"pass archive contents to next sink", but only bbsink_throttle_manifest_contents
does pass bbs_next into the bbsink_forward_manifest_contents. Is it
supposed to be like that? Passing the same sink object instead the next
one into bbsink_forward_manifest_contents seems to solve the problem in
this case.



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Mon, Nov 15, 2021 at 11:25 AM Dmitry Dolgov <9erthalion6@gmail.com> wrote:
> Walsender tries to send a backup manifest, but crashes on the trottling sink:
>
>     #2  0x0000560857b551af in ExceptionalCondition (conditionName=0x560857d15d27 "sink->bbs_next != NULL",
errorType=0x560857d15c23"FailedAssertion", fileName=0x560857d15d15 "basebackup_sink.c", lineNumber=91) at assert.c:69
 
>     #3  0x0000560857918a94 in bbsink_forward_manifest_contents (sink=0x5608593f73f8, len=32768) at
basebackup_sink.c:91
>     #4  0x0000560857918d68 in bbsink_throttle_manifest_contents (sink=0x5608593f7450, len=32768) at
basebackup_throttle.c:125
>     #5  0x00005608579186d0 in bbsink_manifest_contents (sink=0x5608593f7450, len=32768) at
../../../src/include/replication/basebackup_sink.h:240
>     #6  0x0000560857918b1b in bbsink_forward_manifest_contents (sink=0x5608593f74e8, len=32768) at
basebackup_sink.c:94
>     #7  0x0000560857911edc in bbsink_manifest_contents (sink=0x5608593f74e8, len=32768) at
../../../src/include/replication/basebackup_sink.h:240
>     #8  0x00005608579129f6 in SendBackupManifest (manifest=0x7ffdaea9d120, sink=0x5608593f74e8) at
backup_manifest.c:373
>
> Looking at the similar bbsink_throttle_archive_contents it's not clear
> why comments for both functions (archive and manifest throttling) say
> "pass archive contents to next sink", but only bbsink_throttle_manifest_contents
> does pass bbs_next into the bbsink_forward_manifest_contents. Is it
> supposed to be like that? Passing the same sink object instead the next
> one into bbsink_forward_manifest_contents seems to solve the problem in
> this case.

Yeah, that's what it should be doing. I'll commit a fix, thanks for
the report and diagnosis.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Mon, Nov 15, 2021 at 2:23 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Yeah, that's what it should be doing. I'll commit a fix, thanks for
> the report and diagnosis.

Here's a new patch set.

0001 - When I committed the patch to add the missing 2 blocks of zero
bytes to the tar archives generated by the server, I failed to adjust
the documentation. So 0001 does that. This is the only new patch in
the series. I was not sure whether to just remove the statement from
the documentation saying that those blocks aren't included, or whether
to mention that we used to include them and no longer do. I went for
the latter; opinions welcome.

0002 - This adds a new COPY subprotocol for taking base backups. I've
improved it over the previous version by adding documentation. I'm
still seeking comments on the points I raised in
http://postgr.es/m/CA+TgmobrOXbDh+hCzzVkD3weV3R-QRy3SPa=FRb_Rv9wF5iPJw@mail.gmail.com
but what I'm leaning toward doing is committing the patch as is and
then submitting - or maybe several patches - later to rip some this
and a few other old things out. That way the debate - or lack thereof
- about what to do here doesn't have to block the main patch set, and
also, it feels safer to make removing the existing stuff a separate
effort rather than doing it now.

0003 - This adds "server" and "blackhole" as backup targets. In this
version, I've improved the documentation. Also, the previous version
only let you use a backup target with -Xnone, and I realized that was
stupid. -Xfetch is OK too. -Xstream still doesn't work, since that's
implemented via client-side logic. I think this still needs some work
to be committable, like adding tests, but I don't expect to make any
major changes.

0004 - Server-side gzip compression. Similar level of maturity to 0003.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Hi Robert,

Please find the lz4 compression patch here that basically has:

1. Documentation
2. pgindent run over it.
3. your comments addressed for using "+="

I have not included the compression level per your comment below:
---------
> "On second thought, maybe we don't need to do this. There's a thread on
> "Teach pg_receivewal to use lz4 compression" which concluded that
> supporting different compression levels was unnecessary."
---------

Regards,
Jeevan Ladhe

On Wed, Nov 17, 2021 at 3:17 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Mon, Nov 15, 2021 at 2:23 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Yeah, that's what it should be doing. I'll commit a fix, thanks for
> the report and diagnosis.

Here's a new patch set.

0001 - When I committed the patch to add the missing 2 blocks of zero
bytes to the tar archives generated by the server, I failed to adjust
the documentation. So 0001 does that. This is the only new patch in
the series. I was not sure whether to just remove the statement from
the documentation saying that those blocks aren't included, or whether
to mention that we used to include them and no longer do. I went for
the latter; opinions welcome.

0002 - This adds a new COPY subprotocol for taking base backups. I've
improved it over the previous version by adding documentation. I'm
still seeking comments on the points I raised in
http://postgr.es/m/CA+TgmobrOXbDh+hCzzVkD3weV3R-QRy3SPa=FRb_Rv9wF5iPJw@mail.gmail.com
but what I'm leaning toward doing is committing the patch as is and
then submitting - or maybe several patches - later to rip some this
and a few other old things out. That way the debate - or lack thereof
- about what to do here doesn't have to block the main patch set, and
also, it feels safer to make removing the existing stuff a separate
effort rather than doing it now.

0003 - This adds "server" and "blackhole" as backup targets. In this
version, I've improved the documentation. Also, the previous version
only let you use a backup target with -Xnone, and I realized that was
stupid. -Xfetch is OK too. -Xstream still doesn't work, since that's
implemented via client-side logic. I think this still needs some work
to be committable, like adding tests, but I don't expect to make any
major changes.

0004 - Server-side gzip compression. Similar level of maturity to 0003.

--
Robert Haas
EDB: http://www.enterprisedb.com
Вложения

Re: refactoring basebackup.c

От
tushar
Дата:
On 11/22/21 11:05 PM, Jeevan Ladhe wrote:
> Please find the lz4 compression patch here that basically has:
Thanks, Could you please rebase your patch, it is failing at my end -

[edb@centos7tushar pg15_lz]$ git apply /tmp/v8-0001-LZ4-compression.patch
error: patch failed: doc/src/sgml/ref/pg_basebackup.sgml:230
error: doc/src/sgml/ref/pg_basebackup.sgml: patch does not apply
error: patch failed: src/backend/replication/Makefile:19
error: src/backend/replication/Makefile: patch does not apply
error: patch failed: src/backend/replication/basebackup.c:64
error: src/backend/replication/basebackup.c: patch does not apply
error: patch failed: src/include/replication/basebackup_sink.h:285
error: src/include/replication/basebackup_sink.h: patch does not apply

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Hi Tushar,

You need to apply Robert's v10 version patches 0002, 0003 and 0004, before applying the lz4 patch(v8 version).
Please let me know if you still face any issues.

Regards,
Jeevan Ladhe

On Mon, Dec 27, 2021 at 7:01 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
On 11/22/21 11:05 PM, Jeevan Ladhe wrote:
> Please find the lz4 compression patch here that basically has:
Thanks, Could you please rebase your patch, it is failing at my end -

[edb@centos7tushar pg15_lz]$ git apply /tmp/v8-0001-LZ4-compression.patch
error: patch failed: doc/src/sgml/ref/pg_basebackup.sgml:230
error: doc/src/sgml/ref/pg_basebackup.sgml: patch does not apply
error: patch failed: src/backend/replication/Makefile:19
error: src/backend/replication/Makefile: patch does not apply
error: patch failed: src/backend/replication/basebackup.c:64
error: src/backend/replication/basebackup.c: patch does not apply
error: patch failed: src/include/replication/basebackup_sink.h:285
error: src/include/replication/basebackup_sink.h: patch does not apply

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Re: refactoring basebackup.c

От
tushar
Дата:
On 12/28/21 1:11 PM, Jeevan Ladhe wrote:
> You need to apply Robert's v10 version patches 0002, 0003 and 0004, 
> before applying the lz4 patch(v8 version).
Thanks, able to apply now.

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: refactoring basebackup.c

От
tushar
Дата:
On 11/22/21 11:05 PM, Jeevan Ladhe wrote:
> Please find the lz4 compression patch here that basically has:
One small issue, in the "pg_basebackup --help", we are not displaying
lz4 value under --server-compression option


[edb@tusharcentos7-v14 bin]$ ./pg_basebackup --help | grep 
server-compression
       --server-compression=none|gzip|gzip[1-9]

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: refactoring basebackup.c

От
tushar
Дата:
On 11/22/21 11:05 PM, Jeevan Ladhe wrote:
> Please find the lz4 compression patch here that basically has:
Please refer to this  scenario , where --server-compression is only 
compressing
base backup into lz4 format but not pg_wal directory

[edb@centos7tushar bin]$ ./pg_basebackup -Ft --server-compression=lz4 
-Xstream -D foo

[edb@centos7tushar bin]$ ls foo
backup_manifest  base.tar.lz4  pg_wal.tar

this same is valid for gzip as well if server-compression is set to gzip

edb@centos7tushar bin]$ ./pg_basebackup -Ft --server-compression=gzip4 
-Xstream -D foo1

[edb@centos7tushar bin]$ ls foo1
backup_manifest  base.tar.gz  pg_wal.tar

if this scenario is valid then both the folders format should be in lz4 
format otherwise we should
get an error something like - not a valid option ?

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Mon, Jan 3, 2022 at 12:12 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
> On 11/22/21 11:05 PM, Jeevan Ladhe wrote:
> > Please find the lz4 compression patch here that basically has:
> Please refer to this  scenario , where --server-compression is only
> compressing
> base backup into lz4 format but not pg_wal directory
>
> [edb@centos7tushar bin]$ ./pg_basebackup -Ft --server-compression=lz4
> -Xstream -D foo
>
> [edb@centos7tushar bin]$ ls foo
> backup_manifest  base.tar.lz4  pg_wal.tar
>
> this same is valid for gzip as well if server-compression is set to gzip
>
> edb@centos7tushar bin]$ ./pg_basebackup -Ft --server-compression=gzip4
> -Xstream -D foo1
>
> [edb@centos7tushar bin]$ ls foo1
> backup_manifest  base.tar.gz  pg_wal.tar
>
> if this scenario is valid then both the folders format should be in lz4
> format otherwise we should
> get an error something like - not a valid option ?

Before sending an email like this, it would be a good idea to read the
documentation for the --server-compression option.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
tushar
Дата:
On 1/4/22 8:07 PM, Robert Haas wrote:
> Before sending an email like this, it would be a good idea to read the
> documentation for the --server-compression option.
Sure, Thanks Robert.

One scenario where I feel error message is confusing and if it is not 
supported at all then error message need to be a little bit more clear

if we use -z  (or -Z ) with -t , we are getting this error
[edb@centos7tushar bin]$  ./pg_basebackup -t server:/tmp/test0 -Xfetch -z
pg_basebackup: error: only tar mode backups can be compressed
Try "pg_basebackup --help" for more information.

but after removing -z option  backup is in tar mode only

edb@centos7tushar bin]$  ./pg_basebackup -t server:/tmp/test0 -Xfetch
[edb@centos7tushar bin]$ ls /tmp/test0
backup_manifest  base.tar

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Wed, Jan 5, 2022 at 5:11 AM tushar <tushar.ahuja@enterprisedb.com> wrote:
> One scenario where I feel error message is confusing and if it is not
> supported at all then error message need to be a little bit more clear
>
> if we use -z  (or -Z ) with -t , we are getting this error
> [edb@centos7tushar bin]$  ./pg_basebackup -t server:/tmp/test0 -Xfetch -z
> pg_basebackup: error: only tar mode backups can be compressed
> Try "pg_basebackup --help" for more information.
>
> but after removing -z option  backup is in tar mode only
>
> edb@centos7tushar bin]$  ./pg_basebackup -t server:/tmp/test0 -Xfetch
> [edb@centos7tushar bin]$ ls /tmp/test0
> backup_manifest  base.tar

OK, fair enough, I can adjust the error message for that case.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
tushar
Дата:


On Tue, Dec 28, 2021 at 1:12 PM Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
Hi Tushar,

You need to apply Robert's v10 version patches 0002, 0003 and 0004, before applying the lz4 patch(v8 version).
Please let me know if you still face any issues.

Thanks, Jeevan. 
I tested —server-compression option using different other options of pg_basebackup, also checked -t/—server-compression from pg_basebackup of v15 will 
throw an error if the server version is v14 or below. Things are looking good to me. 
Two open  issues -
1)lz4 value is missing for --server-compression in pg_basebackup --help
2)Error messages need to improve if using -t server with -z/-Z
 
regards, 

Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Hi,

Similar to LZ4 server-side compression, I have also tried to add a ZSTD
server-side compression in the attached patch. I have done some initial
testing and things seem to be working.

Example run:
pg_basebackup -t server:/tmp/data_zstd -Xnone --server-compression=zstd

The patch surely needs some grooming, but I am expecting some initial
review, specially in the area where we are trying to close the zstd stream
in bbsink_zstd_end_archive(). We need to tell the zstd library to end the
compression by calling ZSTD_compressStream2() thereby sending a
ZSTD_e_end flag. But, this also needs some input string, which per
example[1] line # 686, I have taken as an empty ZSTD_inBuffer.

Thanks, Tushar for testing the LZ4 patch. I have added the LZ4 option in
the pg_basebackup help now.

Note: Before applying these patches please apply Robert's v10 version
of patches 0002, 0003, and 0004.


Regards,
Jeevan Ladhe

On Wed, Jan 5, 2022 at 10:24 PM tushar <tushar.ahuja@enterprisedb.com> wrote:


On Tue, Dec 28, 2021 at 1:12 PM Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
Hi Tushar,

You need to apply Robert's v10 version patches 0002, 0003 and 0004, before applying the lz4 patch(v8 version).
Please let me know if you still face any issues.

Thanks, Jeevan. 
I tested —server-compression option using different other options of pg_basebackup, also checked -t/—server-compression from pg_basebackup of v15 will 
throw an error if the server version is v14 or below. Things are looking good to me. 
Two open  issues -
1)lz4 value is missing for --server-compression in pg_basebackup --help
2)Error messages need to improve if using -t server with -z/-Z
 
regards, 
Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Nov 16, 2021 at 4:47 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Here's a new patch set.

And here's another one.

I've committed the first two patches from the previous set, the second
of those just today, and so we're getting down to the meat of the
patch set.

0001 adds "server" and "blackhole" as backup targets. It now has some
tests. This might be more or less ready to ship, unless somebody else
sees a problem, or I find one.

0002 adds server-side gzip compression. This one hasn't got tests yet.
Also, it's going to need some adjustment based on the parallel
discussion on the new options structure.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Jan 18, 2022 at 9:43 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> The patch surely needs some grooming, but I am expecting some initial
> review, specially in the area where we are trying to close the zstd stream
> in bbsink_zstd_end_archive(). We need to tell the zstd library to end the
> compression by calling ZSTD_compressStream2() thereby sending a
> ZSTD_e_end flag. But, this also needs some input string, which per
> example[1] line # 686, I have taken as an empty ZSTD_inBuffer.

As far as I can see, this is correct. I found
https://zstd.docsforge.com/dev/api-documentation/#streaming-compression-howto
which seems to endorse what you've done here.

One (minor) thing that I notice is that, the way you've written the
loop in bbsink_zstd_end_archive(), I think it will typically call
bbsink_archive_contents() twice. It will flush whatever is already
present in the next sink's buffer as a result of the previous calls to
bbsink_zstd_archive_contents(), and then it will call
ZSTD_compressStream2() which will partially refill the buffer you just
emptied, and then there will be nothing left in the internal buffer,
so it will call bbsink_archive_contents() again. But ... the initial
flush may not have been necessary. It could be that there was enough
space already in the output buffer for the ZSTD_compressStream2() call
to succeed without a prior flush. So maybe:

do
{
    yet_to_flush = ZSTD_compressStream2(..., ZSTD_e_end);
    check ZSTD_isError here;
    if (mysink->zstd_outBuf.pos > 0)
        bbsink_archive_contents();
} while (yet_to_flush > 0);

I believe this might be very slightly more efficient.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Dipesh Pandit
Дата:
Hi,

I have added support for decompressing a gzip compressed tar file
at client. pg_basebackup can enable server side compression for
plain format backup with this change.

Added a gzip extractor which decompresses the compressed archive
and forwards it to the next streamer. I have done initial testing and
working on updating the test coverage.

Note: Before applying the patch, please apply Robert's v11 version
of the patches 0001 and 0002.

Thanks,
Dipesh
Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Wed, Jan 19, 2022 at 7:16 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
> I have added support for decompressing a gzip compressed tar file
> at client. pg_basebackup can enable server side compression for
> plain format backup with this change.
>
> Added a gzip extractor which decompresses the compressed archive
> and forwards it to the next streamer. I have done initial testing and
> working on updating the test coverage.

Cool. It's going to need some documentation changes, too.

I don't like the way you coded this in CreateBackupStreamer(). I would
like the decision about whether to use
bbstreamer_gzip_extractor_new(), and/or throw an error about not being
able to parse an archive, to based on the file type i.e. "did we get a
.tar.gz file?" rather than on whether we asked for server-side
compression. Notice that the existing logic checks whether we actually
got a .tar file from the server rather than assuming that's what must
have happened.

As a matter of style, I don't think it's good for the only thing
inside of an "if" statement to be another "if" statement. The two
could be merged, but we also don't want to have the "if" conditional
be too complex. I am imagining that this should end up saying
something like if (must_parse_archive && !is_tar && !is_tar_gz) {
pg_log_error(...

+     * "windowBits" must be greater than or equal to "windowBits" value
+     * provided to deflateInit2 while compressing.

It would be nice to clarify why we know the value we're using is safe.
Maybe we're using the maximum possible value, in which case you could
just add that to the end of the comment: "...so we use the maximum
possible value for safety."

+    /*
+     * End of the stream, if there is some pending data in output buffers then
+     * we must forward it to next streamer.
+     */
+    if (res == Z_STREAM_END) {
+        bbstreamer_content(mystreamer->base.bbs_next, member,
mystreamer->base.bbs_buffer.data,
+                mystreamer->bytes_written, context);
+    }

Uncuddle the brace.

It probably doesn't make much difference, but I would be inclined to
do the final flush in bbstreamer_gzip_extractor_finalize() rather than
here. That way we rely on our own notion of when there's no more input
data rather than zlib's notion. Probably terrible things are going to
happen if those two ideas don't match up .... but there might be some
other compression algorithm that doesn't return a distinguishing code
at end-of-stream. Such an algorithm would have to take care of any
leftover data in the finalize function, so I think we should do that
here too, so the code can be similar in all cases.

Perhaps we should move all the gzip stuff to a new file bbstreamer_gzip.c.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
tushar
Дата:
On 1/18/22 8:12 PM, Jeevan Ladhe wrote:
> Similar to LZ4 server-side compression, I have also tried to add a ZSTD
> server-side compression in the attached patch.
Thanks Jeevan. while testing found one scenario where the server is 
getting crash while performing pg_basebackup
against server-compression=zstd for a huge data second time

Steps to reproduce
--PG sources ( apply v11-0001,v11-0001,v9-0001,v9-0002 , configure 
--with-lz4,--with-zstd, make/install, initdb, start server)
--insert huge data (./pgbench -i -s 2000 postgres)
--restart the server (./pg_ctl -D data restart)
--pg_basebackup ( ./pg_basebackup  -t server:/tmp/yc1 
--server-compression=zstd -R  -Xnone -n -N -l 'ccc' --no-estimate-size -v)
--insert huge data (./pgbench -i -s 1000 postgres)
--restart the server (./pg_ctl -D data restart)
--run pg_basebackup again (./pg_basebackup  -t server:/tmp/yc11 
--server-compression=zstd -v  -Xnone )

[edb@centos7tushar bin]$ ./pg_basebackup  -t server:/tmp/yc11 
--server-compression=zstd -v  -Xnone
pg_basebackup: initiating base backup, waiting for checkpoint to complete
2022-01-19 21:23:26.508 IST [30219] LOG:  checkpoint starting: force wait
2022-01-19 21:23:26.608 IST [30219] LOG:  checkpoint complete: wrote 0 
buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.001 
s, sync=0.001 s, total=0.101 s; sync files=0, longest=0.000 s, 
average=0.000 s; distance=16369 kB, estimate=16369 kB
pg_basebackup: checkpoint completed
TRAP: FailedAssertion("len > 0 && len <= sink->bbs_buffer_length", File: 
"../../../src/include/replication/basebackup_sink.h", Line: 208, PID: 30226)
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"(ExceptionalCondition+0x7a)[0x94ceca]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"[0x7b9a08]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"[0x7b9be2]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"[0x7b5b30]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"(SendBaseBackup+0x563)[0x7b7053]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"(exec_replication_command+0x961)[0x7c9a41]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"(PostgresMain+0x92f)[0x81ca3f]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"[0x48e430]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"(PostmasterMain+0xfd2)[0x785702]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"(main+0x1c6)[0x48fb96]
/lib64/libc.so.6(__libc_start_main+0xf5)[0x7f63642c8555]
postgres: walsender edb [local] sending backup "pg_basebackup base 
backup"[0x48feb5]
pg_basebackup: error: could not read COPY data: server closed the 
connection unexpectedly
     This probably means the server terminated abnormally
     before or while processing the request.
2022-01-19 21:25:34.485 IST [30205] LOG:  server process (PID 30226) was 
terminated by signal 6: Aborted
2022-01-19 21:25:34.485 IST [30205] DETAIL:  Failed process was running: 
BASE_BACKUP ( LABEL 'pg_basebackup base backup', PROGRESS,  MANIFEST 
'yes',  TABLESPACE_MAP,  TARGET 'server', TARGET_DETAIL '/tmp/yc11',  
COMPRESSION 'zstd')
2022-01-19 21:25:34.485 IST [30205] LOG:  terminating any other active 
server processes
[edb@centos7tushar bin]$ 2022-01-19 21:25:34.489 IST [30205] LOG: all 
server processes terminated; reinitializing
2022-01-19 21:25:34.536 IST [30228] LOG:  database system was 
interrupted; last known up at 2022-01-19 21:23:26 IST
2022-01-19 21:25:34.669 IST [30228] LOG:  database system was not 
properly shut down; automatic recovery in progress
2022-01-19 21:25:34.671 IST [30228] LOG:  redo starts at 9/7000028
2022-01-19 21:25:34.671 IST [30228] LOG:  invalid record length at 
9/7000148: wanted 24, got 0
2022-01-19 21:25:34.671 IST [30228] LOG:  redo done at 9/7000110 system 
usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
2022-01-19 21:25:34.673 IST [30229] LOG:  checkpoint starting: 
end-of-recovery immediate wait
2022-01-19 21:25:34.713 IST [30229] LOG:  checkpoint complete: wrote 3 
buffers (0.0%); 0 WAL file(s) added, 0 removed, 0 recycled; write=0.003 
s, sync=0.001 s, total=0.041 s; sync files=2, longest=0.001 s, 
average=0.001 s; distance=0 kB, estimate=0 kB
2022-01-19 21:25:34.718 IST [30205] LOG:  database system is ready to 
accept connections

Observation -

if we change server-compression method to lz4 from zstd then it is NOT 
happening.

[edb@centos7tushar bin]$ ./pg_basebackup  -t server:/tmp/ycc1 
--server-compression=lz4 -v  -Xnone
pg_basebackup: initiating base backup, waiting for checkpoint to complete
2022-01-19 21:27:51.642 IST [30229] LOG:  checkpoint starting: force wait
2022-01-19 21:27:51.687 IST [30229] LOG:  checkpoint complete: wrote 0 
buffers (0.0%); 0 WAL file(s) added, 1 removed, 0 recycled; write=0.001 
s, sync=0.001 s, total=0.046 s; sync files=0, longest=0.000 s, 
average=0.000 s; distance=16383 kB, estimate=16383 kB
pg_basebackup: checkpoint completed

NOTICE:  WAL archiving is not enabled; you must ensure that all required 
WAL segments are copied through other means to complete the backup
pg_basebackup: base backup completed
[edb@centos7tushar bin]$

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Wed, Jan 19, 2022 at 7:16 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
> I have done initial testing and
> working on updating the test coverage.

I spent some time thinking about test coverage for the server-side
backup code today and came up with the attached (v12-0003). It does an
end-to-end test that exercises server-side backup and server-side
compression and then untars the backup and validity-checks it using
pg_verifybackup. In addition to being good test coverage for these
patches, it also plugs a gap in the test coverage of pg_verifybackup,
which currently has no test case that untars a tar-format backup and
then verifies the result. I couldn't figure out a way to do that back
at the time I was working on pg_verifybackup, because I didn't think
we had any existing precedent for using 'tar' from a TAP test. But it
was pointed out to me that we do, so I used that as the model for this
test. It should be easy to generalize this test case to test lz4 and
zstd as well, I think. But I guess we'll still need something
different to test what your patch is doing.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
Dipesh Pandit
Дата:
Hi,

Thanks for the feedback, I have incorporated the suggestions and
updated a new patch v2.

> I spent some time thinking about test coverage for the server-side
> backup code today and came up with the attached (v12-0003). It does an
> end-to-end test that exercises server-side backup and server-side
> compression and then untars the backup and validity-checks it using
> pg_verifybackup. In addition to being good test coverage for these
> patches, it also plugs a gap in the test coverage of pg_verifybackup,
> which currently has no test case that untars a tar-format backup and
> then verifies the result. I couldn't figure out a way to do that back
> at the time I was working on pg_verifybackup, because I didn't think
> we had any existing precedent for using 'tar' from a TAP test. But it
> was pointed out to me that we do, so I used that as the model for this
> test. It should be easy to generalize this test case to test lz4 and
> zstd as well, I think. But I guess we'll still need something
> different to test what your patch is doing.

I tried to add the test coverage for server side gzip compression with
plain format backup using pg_verifybackup. I have modified the test
to use a flag specific to plain format. If this flag is set then it takes a
plain format backup (with server compression enabled) and verifies
this using pg_verifybackup. I have updated (v2-0002) for the test
coverage.

> It's going to need some documentation changes, too.
yes, I am working on it.

Note: Before applying the patches, please apply Robert's v12 version
of the patches 0001, 0002 and 0003.

Thanks,
Dipesh
Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Thu, Jan 20, 2022 at 8:00 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
> Thanks for the feedback, I have incorporated the suggestions and
> updated a new patch v2.

Cool. I'll do a detailed review later, but I think this is going in a
good direction.

> I tried to add the test coverage for server side gzip compression with
> plain format backup using pg_verifybackup. I have modified the test
> to use a flag specific to plain format. If this flag is set then it takes a
> plain format backup (with server compression enabled) and verifies
> this using pg_verifybackup. I have updated (v2-0002) for the test
> coverage.

Interesting approach. This unfortunately has the effect of making that
test case file look a bit incoherent -- the comment at the top of the
file isn't really accurate any more, for example, and the plain_format
flag does more than just cause us to use -Fp; it also causes us NOT to
use --target server:X. However, that might be something we can figure
out a way to clean up. Alternatively, we could have a new test case
file that is structured like 002_algorithm.pl but looping over
compression methods rather than checksum algorithms, and testing each
one with --server-compress and -Fp. It might be easier to make that
look nice (but I'm not 100% sure).

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Wed, Jan 19, 2022 at 4:26 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I spent some time thinking about test coverage for the server-side
> backup code today and came up with the attached (v12-0003).

I committed the base backup target patch yesterday, and today I
updated the remaining code in light of Michael Paquier's commit
5c649fe153367cdab278738ee4aebbfd158e0546. Here is the resulting patch.

Michael, I am proposing to that we remove this message as part of this commit:

-                pg_log_info("no value specified for compression
level, switching to default");

I think most people won't want to specify a compression level, so
emitting a message when they don't seems too verbose.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Thu, Jan 20, 2022 at 11:10 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Thu, Jan 20, 2022 at 8:00 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
> > Thanks for the feedback, I have incorporated the suggestions and
> > updated a new patch v2.
>
> Cool. I'll do a detailed review later, but I think this is going in a
> good direction.

Here is a more detailed review.

+    if (inflateInit2(zs, 15 + 16) != Z_OK)
+    {
+        pg_log_error("could not initialize compression library");
+        exit(1);
+
+    }

Extra blank line.

+    /* At present, we only know how to parse tar and gzip archives. */

gzip -> tar.gz. You can gzip something that is not a tar.

+     * Extract the gzip compressed archive using a gzip extractor and then
+     * forward it to next streamer.

This comment is not good. First, we're not necessarily doing it.
Second, it just describes what the code does, not why it does it.
Maybe something like "If the user requested both that the server
compress the backup and also that we extract the backup, we need to
decompress it."

+    if (server_compression != NULL)
+    {
+        if (strcmp(server_compression, "gzip") == 0)
+            server_compression_type = BACKUP_COMPRESSION_GZIP;
+        else if (strlen(server_compression) == 5 &&
+                strncmp(server_compression, "gzip", 4) == 0 &&
+                server_compression[4] >= '1' && server_compression[4] <= '9')
+        {
+            server_compression_type = BACKUP_COMPRESSION_GZIP;
+            server_compression_level = server_compression[4] - '0';
+        }
+    }
+    else
+        server_compression_type = BACKUP_COMPRESSION_NONE;

I think this is not required any more. I think probably some other
things need to be adjusted as well, based on Michael's changes and the
updates in my patch to match.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Dipesh Pandit
Дата:
Hi,

> Here is a more detailed review.

Thanks for the feedback, I have incorporated the suggestions
and updated a new version of the patch (v3-0001).

The required documentation changes are also incorporated in
updated patch (v3-0001).

> Interesting approach. This unfortunately has the effect of making that
> test case file look a bit incoherent -- the comment at the top of the
> file isn't really accurate any more, for example, and the plain_format
> flag does more than just cause us to use -Fp; it also causes us NOT to
> use --target server:X. However, that might be something we can figure
> out a way to clean up. Alternatively, we could have a new test case
> file that is structured like 002_algorithm.pl but looping over
> compression methods rather than checksum algorithms, and testing each
> one with --server-compress and -Fp. It might be easier to make that
> look nice (but I'm not 100% sure).

Added a new test case file "009_extract.pl" to test server compressed plain
format backup (v3-0002).

> I committed the base backup target patch yesterday, and today I
> updated the remaining code in light of Michael Paquier's commit
> 5c649fe153367cdab278738ee4aebbfd158e0546. Here is the resulting patch.

v13 patch does not apply on the latest head, it requires a rebase. I have applied
it on commit dc43fc9b3aa3e0fa9c84faddad6d301813580f88 to validate gzip
decompression patches.

Thanks,
Dipesh
Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Mon, Jan 24, 2022 at 9:30 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
> v13 patch does not apply on the latest head, it requires a rebase. I have applied
> it on commit dc43fc9b3aa3e0fa9c84faddad6d301813580f88 to validate gzip
> decompression patches.

It only needed trivial rebasing; I have committed it after doing that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



RE: refactoring basebackup.c

От
"Shinoda, Noriyoshi (PN Japan FSIP)"
Дата:
Hi, 
Thank you for committing a great feature. I have tested the committed features. 
The attached small patch fixes the output of the --help message. In the previous commit, only gzip and none were
output,but in the attached patch, client-gzip and server-gzip are added.
 

Regards,
Noriyoshi Shinoda
-----Original Message-----
From: Robert Haas <robertmhaas@gmail.com> 
Sent: Saturday, January 22, 2022 3:33 AM
To: Dipesh Pandit <dipesh.pandit@gmail.com>; Michael Paquier <michael@paquier.xyz>
Cc: Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>; tushar <tushar.ahuja@enterprisedb.com>; Dmitry Dolgov
<9erthalion6@gmail.com>;Mark Dilger <mark.dilger@enterprisedb.com>; pgsql-hackers@postgresql.org
 
Subject: Re: refactoring basebackup.c

On Wed, Jan 19, 2022 at 4:26 PM Robert Haas <robertmhaas@gmail.com> wrote:
> I spent some time thinking about test coverage for the server-side 
> backup code today and came up with the attached (v12-0003).

I committed the base backup target patch yesterday, and today I updated the remaining code in light of Michael
Paquier'scommit 5c649fe153367cdab278738ee4aebbfd158e0546. Here is the resulting patch.
 

Michael, I am proposing to that we remove this message as part of this commit:

-                pg_log_info("no value specified for compression
level, switching to default");

I think most people won't want to specify a compression level, so emitting a message when they don't seems too
verbose.

--
Robert Haas
EDB: http://www.enterprisedb.com 

Вложения

Re: refactoring basebackup.c

От
Dagfinn Ilmari Mannsåker
Дата:
"Shinoda, Noriyoshi (PN Japan FSIP)" <noriyoshi.shinoda@hpe.com> writes:

> Hi, 
> Thank you for committing a great feature. I have tested the committed features. 
> The attached small patch fixes the output of the --help message. In the
> previous commit, only gzip and none were output, but in the attached
> patch, client-gzip and server-gzip are added.

I think it would be better to write that as `[{client,server}-]gzip`,
especially as we add more compression agorithms, where it would
presumably become `[{client,server}-]METHOD` (assuming all methods are
supported on both the client and server side).

I also noticed that in the docs, the `client` and `server` are marked up
as replaceable parameters, when they are actually literals, plus the
hyphen is misplaced.  The `--checkpoint` option also has the `fast` and
`spread` literals marked up as parameters.

All of these are fixed in the attached patch.

- ilmari

From 8e3d191917984a6d17f2c72212d90c96467463b0 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Tue, 25 Jan 2022 13:04:05 +0000
Subject: [PATCH] pg_basebackup documentation and help fixes

Don't mark up literals as replaceable parameters and indicate alternatives
correctly with {...|...}.
---
 doc/src/sgml/ref/pg_basebackup.sgml   | 6 +++---
 src/bin/pg_basebackup/pg_basebackup.c | 2 +-
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 1d0df346b9..98c89751b3 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -400,7 +400,7 @@
       <term><option>-Z <replaceable class="parameter">level</replaceable></option></term>
       <term><option>-Z <replaceable
class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
       <term><option>--compress=<replaceable class="parameter">level</replaceable></option></term>
-      <term><option>--compress=[[{<replaceable class="parameter">client|server</replaceable>-}]<replaceable
class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
+      <term><option>--compress=[[{client|server}-]<replaceable
class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
       <listitem>
        <para>
         Requests compression of the backup. If <literal>client</literal> or
@@ -441,8 +441,8 @@
 
     <variablelist>
      <varlistentry>
-      <term><option>-c <replaceable class="parameter">fast|spread</replaceable></option></term>
-      <term><option>--checkpoint=<replaceable class="parameter">fast|spread</replaceable></option></term>
+      <term><option>-c {fast|spread}</option></term>
+      <term><option>--checkpoint={fast|spread}</option></term>
       <listitem>
        <para>
         Sets checkpoint mode to fast (immediate) or spread (the default)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 72c27c78d0..46f6f53e9b 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -391,7 +391,7 @@ usage(void)
     printf(_("  -X, --wal-method=none|fetch|stream\n"
              "                         include required WAL files with specified method\n"));
     printf(_("  -z, --gzip             compress tar output\n"));
-    printf(_("  -Z, --compress={gzip,none}[:LEVEL] or [LEVEL]\n"
+    printf(_("  -Z, --compress={[{client,server}-]gzip,none}[:LEVEL] or [LEVEL]\n"
              "                         compress tar output with given compression method or level\n"));
     printf(_("\nGeneral options:\n"));
     printf(_("  -c, --checkpoint=fast|spread\n"
-- 
2.30.2


Re: refactoring basebackup.c

От
Dagfinn Ilmari Mannsåker
Дата:
Dagfinn Ilmari Mannsåker <ilmari@ilmari.org> writes:

> "Shinoda, Noriyoshi (PN Japan FSIP)" <noriyoshi.shinoda@hpe.com> writes:
>
>> Hi, 
>> Thank you for committing a great feature. I have tested the committed features. 
>> The attached small patch fixes the output of the --help message. In the
>> previous commit, only gzip and none were output, but in the attached
>> patch, client-gzip and server-gzip are added.
>
> I think it would be better to write that as `[{client,server}-]gzip`,
> especially as we add more compression agorithms, where it would
> presumably become `[{client,server}-]METHOD` (assuming all methods are
> supported on both the client and server side).
>
> I also noticed that in the docs, the `client` and `server` are marked up
> as replaceable parameters, when they are actually literals, plus the
> hyphen is misplaced.  The `--checkpoint` option also has the `fast` and
> `spread` literals marked up as parameters.
>
> All of these are fixed in the attached patch.

I just noticed there was a superfluous [ in the SGM documentation, and
that the short form was missing the [{client|server}-] part.  Updated
patch attaced.

- ilmari

From 2164f1a9fc97a5f88f57c7cc9cdafa67398dcc0e Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Dagfinn=20Ilmari=20Manns=C3=A5ker?= <ilmari@ilmari.org>
Date: Tue, 25 Jan 2022 13:04:05 +0000
Subject: [PATCH v2] pg_basebackup documentation and help fixes

Don't mark up literals as replaceable parameters and indicate alternatives
correctly with {...|...}, and add missing [{client,server}-] to the
-Z form.
---
 doc/src/sgml/ref/pg_basebackup.sgml   | 8 ++++----
 src/bin/pg_basebackup/pg_basebackup.c | 2 +-
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/doc/src/sgml/ref/pg_basebackup.sgml b/doc/src/sgml/ref/pg_basebackup.sgml
index 1d0df346b9..a5e03d2c66 100644
--- a/doc/src/sgml/ref/pg_basebackup.sgml
+++ b/doc/src/sgml/ref/pg_basebackup.sgml
@@ -398,9 +398,9 @@
 
      <varlistentry>
       <term><option>-Z <replaceable class="parameter">level</replaceable></option></term>
-      <term><option>-Z <replaceable
class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
+      <term><option>-Z [{client|server}-]<replaceable
class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
       <term><option>--compress=<replaceable class="parameter">level</replaceable></option></term>
-      <term><option>--compress=[[{<replaceable class="parameter">client|server</replaceable>-}]<replaceable
class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
+      <term><option>--compress=[{client|server}-]<replaceable
class="parameter">method</replaceable></option>[:<replaceable>level</replaceable>]</term>
       <listitem>
        <para>
         Requests compression of the backup. If <literal>client</literal> or
@@ -441,8 +441,8 @@
 
     <variablelist>
      <varlistentry>
-      <term><option>-c <replaceable class="parameter">fast|spread</replaceable></option></term>
-      <term><option>--checkpoint=<replaceable class="parameter">fast|spread</replaceable></option></term>
+      <term><option>-c {fast|spread}</option></term>
+      <term><option>--checkpoint={fast|spread}</option></term>
       <listitem>
        <para>
         Sets checkpoint mode to fast (immediate) or spread (the default)
diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 72c27c78d0..46f6f53e9b 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -391,7 +391,7 @@ usage(void)
     printf(_("  -X, --wal-method=none|fetch|stream\n"
              "                         include required WAL files with specified method\n"));
     printf(_("  -z, --gzip             compress tar output\n"));
-    printf(_("  -Z, --compress={gzip,none}[:LEVEL] or [LEVEL]\n"
+    printf(_("  -Z, --compress={[{client,server}-]gzip,none}[:LEVEL] or [LEVEL]\n"
              "                         compress tar output with given compression method or level\n"));
     printf(_("\nGeneral options:\n"));
     printf(_("  -c, --checkpoint=fast|spread\n"
-- 
2.30.2


Re: refactoring basebackup.c

От
tushar
Дата:
On 1/22/22 12:03 AM, Robert Haas wrote:
> I committed the base backup target patch yesterday, and today I
> updated the remaining code in light of Michael Paquier's commit
> 5c649fe153367cdab278738ee4aebbfd158e0546. Here is the resulting patch.
Thanks Robert,  I tested against the latest PG Head and found a few issues -

A)Getting syntax error if -z is used along with -t

[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/data902 -z -Xfetch
pg_basebackup: error: could not initiate base backup: ERROR:  syntax error

OR

[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/t2 
--compress=server-gzip:9 -Xfetch -v -z
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: error: could not initiate base backup: ERROR:  syntax error

B)No information of "client-gzip" or "server-gzip" added under 
"--compress" option/method of ./pg_basebackup --help.

C) -R option is silently ignoring

[edb@centos7tushar bin]$  ./pg_basebackup  -Z 4  -v  -t server:/tmp/pp 
-Xfetch -R
pg_basebackup: initiating base backup, waiting for checkpoint to complete
pg_basebackup: checkpoint completed
pg_basebackup: write-ahead log start point: 0/30000028 on timeline 1
pg_basebackup: write-ahead log end point: 0/30000100
pg_basebackup: base backup completed
[edb@centos7tushar bin]$

go to /tmp/pp folder and extract it - there is no "standby.signal" file 
and if we start cluster against this data directory,it will not be in 
slave mode.

if this is not supported then I think we should throw some errors.

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Jan 25, 2022 at 8:42 AM Dagfinn Ilmari Mannsåker
<ilmari@ilmari.org> wrote:
> I just noticed there was a superfluous [ in the SGM documentation, and
> that the short form was missing the [{client|server}-] part.  Updated
> patch attaced.

Committed, thanks.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Michael Paquier
Дата:
On Tue, Jan 25, 2022 at 03:54:52AM +0000, Shinoda, Noriyoshi (PN Japan FSIP) wrote:
> Michael, I am proposing to that we remove this message as part of
> this commit:
>
> -                pg_log_info("no value specified for compression
> level, switching to default");
>
> I think most people won't want to specify a compression level, so
> emitting a message when they don't seems too verbose.

(Just noticed this message as I am not in CC.)
Removing this message is fine by me, thanks!
--
Michael

Вложения

Re: refactoring basebackup.c

От
Michael Paquier
Дата:
On Tue, Jan 25, 2022 at 09:52:12PM +0530, tushar wrote:
> C) -R option is silently ignoring
>
> go to /tmp/pp folder and extract it - there is no "standby.signal" file and
> if we start cluster against this data directory,it will not be in slave
> mode.

Yeah, I don't think it's good to silently ignore the option, and we
should not generate the file on the server-side.  Rather than erroring
in this case, you'd better add the file to the existing compressed
file of the base data folder on the client-side.

This makes me wonder whether we should begin tracking any open items
for v15..  We don't want to lose track of any issue with features
committed already in the tree.
--
Michael

Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Jan 25, 2022 at 8:23 PM Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Jan 25, 2022 at 03:54:52AM +0000, Shinoda, Noriyoshi (PN Japan FSIP) wrote:
> > Michael, I am proposing to that we remove this message as part of
> > this commit:
> >
> > -                pg_log_info("no value specified for compression
> > level, switching to default");
> >
> > I think most people won't want to specify a compression level, so
> > emitting a message when they don't seems too verbose.
>
> (Just noticed this message as I am not in CC.)
> Removing this message is fine by me, thanks!

Oh, I thought I'd CC'd you. I know I meant to do so.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Jan 25, 2022 at 11:22 AM tushar <tushar.ahuja@enterprisedb.com> wrote:
> A)Getting syntax error if -z is used along with -t
>
> [edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/data902 -z -Xfetch
> pg_basebackup: error: could not initiate base backup: ERROR:  syntax error

Oops. The attached patch should fix this.

> B)No information of "client-gzip" or "server-gzip" added under
> "--compress" option/method of ./pg_basebackup --help.

Already fixed by e1f860f13459e186479319aa9f65ef184277805f.

> C) -R option is silently ignoring

The attached patch should fix this, too.

Thanks for finding these issues.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
Dipesh Pandit
Дата:
Hi,

> It only needed trivial rebasing; I have committed it after doing that.

I have updated the patches to support server compression (gzip) for
plain format backup. Please find attached v4 patches.

Thanks,
Dipesh
Вложения

Re: refactoring basebackup.c

От
tushar
Дата:
On 1/27/22 2:15 AM, Robert Haas wrote:
> The attached patch should fix this, too.
Thanks, the issues seem to be fixed now.

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Thu, Jan 27, 2022 at 7:15 AM tushar <tushar.ahuja@enterprisedb.com> wrote:
> On 1/27/22 2:15 AM, Robert Haas wrote:
> > The attached patch should fix this, too.
> Thanks, the issues seem to be fixed now.

Cool. I committed that patch.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
tushar
Дата:
On 1/27/22 10:17 PM, Robert Haas wrote:
> Cool. I committed that patch.
Thanks , Please refer to this scenario  where the label is set to  0 for 
server-gzip but the directory is still  compressed

[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/11 --gzip 
--compress=0 -Xnone
NOTICE:  all required WAL segments have been archived
[edb@centos7tushar bin]$ ls /tmp/11
16384.tar  backup_manifest  base.tar


[edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/10 --gzip 
--compress=server-gzip:0 -Xnone
NOTICE:  all required WAL segments have been archived
[edb@centos7tushar bin]$ ls /tmp/10
16384.tar.gz  backup_manifest  base.tar.gz

0 is for no compression so the directory should not be compressed if we 
mention server-gzip:0 and both these
above scenarios should match?

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Thu, Jan 27, 2022 at 12:08 PM tushar <tushar.ahuja@enterprisedb.com> wrote:
> On 1/27/22 10:17 PM, Robert Haas wrote:
> > Cool. I committed that patch.
> Thanks , Please refer to this scenario  where the label is set to  0 for
> server-gzip but the directory is still  compressed
>
> [edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/11 --gzip
> --compress=0 -Xnone
> NOTICE:  all required WAL segments have been archived
> [edb@centos7tushar bin]$ ls /tmp/11
> 16384.tar  backup_manifest  base.tar
>
>
> [edb@centos7tushar bin]$ ./pg_basebackup -t server:/tmp/10 --gzip
> --compress=server-gzip:0 -Xnone
> NOTICE:  all required WAL segments have been archived
> [edb@centos7tushar bin]$ ls /tmp/10
> 16384.tar.gz  backup_manifest  base.tar.gz
>
> 0 is for no compression so the directory should not be compressed if we
> mention server-gzip:0 and both these
> above scenarios should match?

Well what's weird here is that you are using both --gzip and also
--compress. Those both control the same behavior, so it's a surprising
idea to specify both. But I guess if someone does, we should make the
second one fully override the first one. Here's a patch to try to do
that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Thu, Jan 27, 2022 at 2:37 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
> I have updated the patches to support server compression (gzip) for
> plain format backup. Please find attached v4 patches.

I made a pass over these patches today and made a bunch of minor
corrections. New version attached. The two biggest things I changed
are (1) s/gzip_extractor/gzip_compressor/, because I feel like you
extract an archive like a tarfile, but that is not what is happening
here, this is not an archive and (2) I took a few bits of out of the
test case that didn't seem to be necessary. There wasn't any reason
that I could see why testing for PG_VERSION needed to be skipped when
the compression method is 'none', so my first thought was to just take
out the 'if' statement around that, but then after more thought that
test and the one for pg_verifybackup are certainly going to fail if
those files are not present, so why have an extra test? It might make
sense if we were only conditionally able to run pg_verifybackup and
wanted to have some test coverage even when we can't, but that's not
the case here, so I see no point.

I studied this a bit to see whether I needed to make any adjustments
along the lines of 4f0bcc735038e96404fae59aa16ef9beaf6bb0aa in order
for this to work on msys. I think I don't, because 002_algorithm.pl
and 003_corruption.pl both pass $backup_path, not $real_backup_path,
to command_ok -- and I think something inside there does the
translation, which is weird, but we might as well be consistent.
008_untar.pl and 4f0bcc735038e96404fae59aa16ef9beaf6bb0aa needed to do
something different because --target server:X confused the msys magic,
but I think that shouldn't be an issue for this patch. However, I
might be wrong.

Barring objections or problems, I plan to commit this version
tomorrow. I'd do it today, but I have plans for tonight that are
incompatible with discovering that the build farm hates this ....

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
Dipesh Pandit
Дата:
Hi,

> I made a pass over these patches today and made a bunch of minor
> corrections. New version attached. The two biggest things I changed
> are (1) s/gzip_extractor/gzip_compressor/, because I feel like you
> extract an archive like a tarfile, but that is not what is happening
> here, this is not an archive and (2) I took a few bits of out of the
> test case that didn't seem to be necessary. There wasn't any reason
> that I could see why testing for PG_VERSION needed to be skipped when
> the compression method is 'none', so my first thought was to just take
> out the 'if' statement around that, but then after more thought that
> test and the one for pg_verifybackup are certainly going to fail if
> those files are not present, so why have an extra test? It might make
> sense if we were only conditionally able to run pg_verifybackup and
> wanted to have some test coverage even when we can't, but that's not
> the case here, so I see no point.

Thanks. This makes sense.

+#ifdef HAVE_LIBZ
+   /*
+    * If the user has requested a server compressed archive along with archive
+    * extraction at client then we need to decompress it.
+    */
+   if (format == 'p' && compressmethod == COMPRESSION_GZIP &&
+           compressloc == COMPRESS_LOCATION_SERVER)
+       streamer = bbstreamer_gzip_decompressor_new(streamer);
+#endif

I think it is not required to have HAVE_LIBZ check in pg_basebackup.c
while creating a new gzip writer/decompressor. This check is already
in place in bbstreamer_gzip_writer_new() and bbstreamer_gzip_decompressor_new()
and it throws an error in case the build does not have required library
support. I have removed this check from pg_basebackup.c and updated
a delta patch. The patch can be applied on v5 patch.

Thanks,
Dipesh
Вложения

Re: refactoring basebackup.c

От
tushar
Дата:
On 1/27/22 11:12 PM, Robert Haas wrote:
Well what's weird here is that you are using both --gzip and also
--compress. Those both control the same behavior, so it's a surprising
idea to specify both. But I guess if someone does, we should make the
second one fully override the first one. Here's a patch to try to do
that.
right, the current behavior was  -

[edb@centos7tushar bin]$ ./pg_basebackup  -t server:/tmp/y101 --gzip -Z none  -Xnone
pg_basebackup: error: cannot use compression level with method none
Try "pg_basebackup --help" for more information.

and even this was not matching with PG v14 behavior too
e.g
 ./pg_basebackup -Ft -z -Z none  -D /tmp/test1  ( working in PG v14 but throwing above error on PG HEAD)

and somewhere we were breaking the backward compatibility.

now with your patch -this seems working fine

[edb@centos7tushar bin]$ ./pg_basebackup  -t server:/tmp/y101 --gzip -Z none  -Xnone
NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to complete the backup
[edb@centos7tushar bin]$ ls /tmp/y101
backup_manifest  base.tar

OR

[edb@centos7tushar bin]$  ./pg_basebackup  -t server:/tmp/y0p -Z none  -Xfetch -z
[edb@centos7tushar bin]$ ls /tmp/y0p
backup_manifest  base.tar.gz

but what about server-gzip:0? should it allow compressing the directory?

[edb@centos7tushar bin]$  ./pg_basebackup  -t server:/tmp/1 --compress=server-gzip:0  -Xfetch
[edb@centos7tushar bin]$ ls /tmp/1
backup_manifest  base.tar.gz

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company 

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Fri, Jan 28, 2022 at 3:54 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
> Thanks. This makes sense.
>
> +#ifdef HAVE_LIBZ
> +   /*
> +    * If the user has requested a server compressed archive along with archive
> +    * extraction at client then we need to decompress it.
> +    */
> +   if (format == 'p' && compressmethod == COMPRESSION_GZIP &&
> +           compressloc == COMPRESS_LOCATION_SERVER)
> +       streamer = bbstreamer_gzip_decompressor_new(streamer);
> +#endif
>
> I think it is not required to have HAVE_LIBZ check in pg_basebackup.c
> while creating a new gzip writer/decompressor. This check is already
> in place in bbstreamer_gzip_writer_new() and bbstreamer_gzip_decompressor_new()
> and it throws an error in case the build does not have required library
> support. I have removed this check from pg_basebackup.c and updated
> a delta patch. The patch can be applied on v5 patch.

Right, makes sense. Committed with that change, plus I realized the
skip count in the test case file was wrong after the changes I made
yesterday, so I fixed that as well.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Hi Robert,

I have attached the latest rebased version of the LZ4 server-side compression
patch on the recent commits. This patch also introduces the compression level
and adds a tap test.

Also, while adding the lz4 case in the pg_verifybackup/t/008_untar.pl, I found
an unused variable {have_zlib}. I have attached a cleanup patch for that as well.

Please review and let me know your thoughts.

Regards,
Jeevan Ladhe

Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Fri, Jan 28, 2022 at 12:48 PM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I have attached the latest rebased version of the LZ4 server-side compression
> patch on the recent commits. This patch also introduces the compression level
> and adds a tap test.

In view of this morning's commit of
d45099425eb19e420433c9d81d354fe585f4dbd6 I think the threshold for
committing this patch has gone up. We need to make it support
decompression with LZ4 on the client side, as we now have for gzip.

Other comments:

- Even if we were going to support LZ4 only on the server side, surely
it's not right to refuse --compress lz4 and --compress client-lz4 at
the parsing stage. I don't even think the message you added to main()
is reachable.

- In the new test case you set decompress_flags but according to the
documentation I have here, -m is for multiple files (and so should not
be needed here) and -d is for decompression (which is what we want
here). So I'm confused why this is like this.

Other than that this seems like it's in pretty good shape.

> Also, while adding the lz4 case in the pg_verifybackup/t/008_untar.pl, I found
> an unused variable {have_zlib}. I have attached a cleanup patch for that as well.

This part seems clearly correct, so I have committed it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:


On Sat, Jan 29, 2022 at 1:20 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Fri, Jan 28, 2022 at 12:48 PM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I have attached the latest rebased version of the LZ4 server-side compression
> patch on the recent commits. This patch also introduces the compression level
> and adds a tap test.

In view of this morning's commit of
d45099425eb19e420433c9d81d354fe585f4dbd6 I think the threshold for
committing this patch has gone up. We need to make it support
decompression with LZ4 on the client side, as we now have for gzip.

Fair enough. Makes sense.
 
- In the new test case you set decompress_flags but according to the
documentation I have here, -m is for multiple files (and so should not
be needed here) and -d is for decompression (which is what we want
here). So I'm confused why this is like this.


'-d' is the default when we have a .lz4 extension, which is true in our case,
hence elimininated that. About, '-m' introduction, without any option, or even
after providing the explicit '-d' option, weirdly lz4 command was throwing
decompressed tar on the console, that's when in my lz4 man version I saw
these 2 lines and tried adding '-m' option, and it worked:

" It is considered bad practice to rely on implicit output in scripts.
 because the script´s environment may change. Always use explicit
 output in scripts. -c ensures that output will be stdout. Conversely,
 providing a destination name, or using -m ensures that the output will
 be either the specified name, or filename.lz4 respectively."

and

"Similarly, lz4 -m -d can decompress multiple *.lz4 files."
 
This part seems clearly correct, so I have committed it.

Thanks for pushing it.

Regards,
Jeevan Ladhe

Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Hi Robert,

I had an offline discussion with Dipesh, and he will be working on the
lz4 client side decompression part.

Please find the attached patch with the following changes:

- Even if we were going to support LZ4 only on the server side, surely
it's not right to refuse --compress lz4 and --compress client-lz4 at
the parsing stage. I don't even think the message you added to main()
is reachable.

I think you are right, I have removed the message and again introduced
the Assert() back.

- In the new test case you set decompress_flags but according to the
documentation I have here, -m is for multiple files (and so should not
be needed here) and -d is for decompression (which is what we want
here). So I'm confused why this is like this.

As explained earlier in the tap test the 'lz4 -d base.tar.lz4' command was
throwing the decompression to stdout. Now, I have removed the '-m',
added '-d' for decompression, and also added the target file explicitly in
the command.

Regards,
Jeevan Ladhe
Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Mon, Jan 31, 2022 at 6:11 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> I had an offline discussion with Dipesh, and he will be working on the
> lz4 client side decompression part.

OK. I guess we should also be thinking about client-side LZ4
compression. It's probably best to focus on that before worrying about
ZSTD, even though ZSTD would be really cool to have.

>> - In the new test case you set decompress_flags but according to the
>> documentation I have here, -m is for multiple files (and so should not
>> be needed here) and -d is for decompression (which is what we want
>> here). So I'm confused why this is like this.
>
> As explained earlier in the tap test the 'lz4 -d base.tar.lz4' command was
> throwing the decompression to stdout. Now, I have removed the '-m',
> added '-d' for decompression, and also added the target file explicitly in
> the command.

I don't see the behavior you describe here. For me:

[rhaas ~]$ lz4 q.lz4
Decoding file q
q.lz4                : decoded 3785 bytes
[rhaas ~]$ rm q
[rhaas ~]$ lz4 -m q.lz4
[rhaas ~]$ ls q
q
[rhaas ~]$ rm q
[rhaas ~]$ lz4 -d q.lz4
Decoding file q
q.lz4                : decoded 3785 bytes
[rhaas ~]$ rm q
[rhaas ~]$ lz4 -d -m q.lz4
[rhaas ~]$ ls q
q

In other words, on my system, the file gets decompressed with or
without -d, and with or without -m. The only difference I see is that
using -m makes it happen silently, without printing anything on the
terminal. Anyway, I wasn't saying that using -m was necessarily wrong,
just that I didn't understand why you had it like that. Now that I'm
more informed, I recommend that we use -d -m, the former to be
explicit about wanting to decompress and the latter because it either
makes it less noisy (on my system) or makes it work at all (on yours).
It's surprising that the command behavior would be different like that
on different systems, but it is what it is. I think any set of flags
we put here is better than adding more logical in perl, as it keeps
things simpler.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
 
I think you are right, I have removed the message and again introduced
the Assert() back.
In my previous version of patch, this was a problem, basically, there should
not be an assert as the code is still reachable be it server-lz4 or client-lz4.
I removed the assert and added the level range check there similar to gzip.

Regards,
Jeevan Ladhe

Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Jan 18, 2022 at 1:55 PM Robert Haas <robertmhaas@gmail.com> wrote:
> 0001 adds "server" and "blackhole" as backup targets. It now has some
> tests. This might be more or less ready to ship, unless somebody else
> sees a problem, or I find one.

I played around with this a bit and it seems quite easy to extend this
further. So please find attached a couple more patches to generalize
this mechanism.

0001 adds an extensibility framework for backup targets. The idea is
that an extension loaded via shared_preload_libraries can call
BaseBackupAddTarget() to define a new base backup target, which the
user can then access via pg_basebackup --target TARGET_NAME, or if
they want to pass a detail string, pg_basebackup --target
TARGET_NAME:DETAIL. There might be slightly better ways of hooking
this into the system. I'm not unhappy with this approach, but there
might be a better idea out there.

0002 adds an example contrib module called basebackup_to_shell. The
system administrator can set basebackup_to_shell.command='SOMETHING'.
A backup directed to the 'shell' target will cause the server to
execute the configured command once per generated archive, and once
for the backup_manifest, if any. When executing the command, %f gets
replaced with the archive filename (e.g. base.tar) and %d gets
replaced with the detail. The actual contents of the file are passed
to the command's standard input, and it can then do whatever it likes
with that data. Clearly, this is not state of the art; for instance,
if what you really want is to upload the backup files someplace via
HTTP, using this to run 'curl' is probably not so good of an idea as
using an extension module that links with libcurl. That would likely
lead to better error checking, better performance, nicer
configuration, and just generally fewer things that can go wrong. On
the other hand, writing an integration in C is kind of tricky, and
this thing is quite easy to use -- and it does work.

There are a couple of things to be concerned about with 0002 from a
security perspective. First, in a backend environment, we have a
function to spawn a subprocess via popen(), namely OpenPipeStream(),
but there is no function to spawn a subprocess with execve() and end
up with a socket connected to its standard input. And that means that
whatever command the administrator configures is being interpreted by
the shell, which is a potential problem given that we're interpolating
the target detail string supplied by the user, who must have at least
replication privileges but need not be the superuser. I chose to
handle this by allowing the target detail to contain only alphanumeric
characters. Refinement is likely possible, but whether the effort is
worthwhile seems questionable. Second, what if the superuser wants to
allow the use of this module to only some of the users who have
replication privileges? That seems a bit unlikely but it's possible,
so I added a GUC basebackup_to_shell.required_role. If set, the
functionality is only usable by members of the named role. If unset,
anyone with replication privilege can use it. I guess someone could
criticize this as defaulting to the least secure setting, but
considering that you have to have replication privileges to use this
at all, I don't find that argument much to get excited about.

I have to say that I'm incredibly happy with how easy these patches
were to write. I think this is going to make adding new base backup
targets as accessible as we can realistically hope to make it. There
is some boilerplate code, as an examination of the patches will
reveal, but it's not a lot, and at least IMHO it's pretty
straightforward. Granted, coding up a new base backup target is
something only experienced C hackers are likely to do, but the fact
that I was able to throw this together so quickly suggests to me that
I've got the design basically right, and that anyone who does want to
plug into the new mechanism shouldn't have too much trouble doing so.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Thanks for the patch, Dipesh.
With a quick look at the patch I have following observations:

----------------------------------------------------------
In bbstreamer_lz4_compressor_new(), I think this alignment is not needed
on client side:

    /* Align the output buffer length. */
    compressed_bound += compressed_bound + BLCKSZ - (compressed_bound %
BLCKSZ);
----------------------------------------------------------

bbstreamer_lz4_compressor_content(), avail_in and len variables both are
not changed. I think we can simply change the len to avail_in in the
argument list.
----------------------------------------------------------

Comment:
+        * Update the offset and capacity of output buffer based on based on number
+        * of bytes written to output buffer.

I think it is thinko:

+        * Update the offset and capacity of output buffer based on number of
+        * bytes written to output buffer.
----------------------------------------------------------

Indentation:

+       if ((mystreamer->base.bbs_buffer.maxlen - mystreamer->bytes_written) <=
+                       footer_bound)

----------------------------------------------------------
I think similar to bbstreamer_lz4_compressor_content() in
bbstreamer_lz4_decompressor_content() we can change len to avail_in.


Regards,
Jeevan Ladhe

On Thu, 10 Feb 2022 at 18:11, Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
Hi,

> On Mon, Jan 31, 2022 at 4:41 PM Jeevan Ladhe <jeevan.ladhe@enterprisedb.com> wrote:
Hi Robert,

I had an offline discussion with Dipesh, and he will be working on the
lz4 client side decompression part.

Please find the attached patch to support client side compression
and decompression using lz4.

Added a new lz4 bbstreamer to compress the archive chunks at
client if user has specified --compress=clinet-lz4:[LEVEL] option
in pg_basebackup. The new streamer accepts archive chunks
compresses it and forwards it to plain-writer.

Similarly, If a user has specified a server compressed lz4 archive
with plain format (-F p) backup then it requires decompressing
the compressed archive chunks before forwarding it to tar extractor.
Added a new bbstreamer to decompress the compressed archive
and forward it to tar extractor.

Note: This patch can be applied on Jeevan Ladhe's v12 patch
for lz4 compression.

Thanks,
Dipesh

Re: refactoring basebackup.c

От
Dipesh Pandit
Дата:
Hi,

Thanks for the feedback, I have incorporated the suggestions
and updated a new patch. PFA v2 patch.

> I think similar to bbstreamer_lz4_compressor_content() in
> bbstreamer_lz4_decompressor_content() we can change len to avail_in.

In bbstreamer_lz4_decompressor_content(), we are modifying avail_in
based on the number of bytes decompressed in each iteration. I think
we cannot replace it with "len" here.

Jeevan, Your v12 patch does not apply on HEAD, it requires a
rebase. I have applied it on commit 400fc6b6487ddf16aa82c9d76e5cfbe64d94f660
to validate my v2 patch.

Thanks,
Dipesh
Вложения

Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
>Jeevan, Your v12 patch does not apply on HEAD, it requires a
rebase.

Sure, please find the rebased patch attached.

Regards,
Jeevan

On Fri, 11 Feb 2022 at 14:13, Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
Hi,

Thanks for the feedback, I have incorporated the suggestions
and updated a new patch. PFA v2 patch.

> I think similar to bbstreamer_lz4_compressor_content() in
> bbstreamer_lz4_decompressor_content() we can change len to avail_in.

In bbstreamer_lz4_decompressor_content(), we are modifying avail_in
based on the number of bytes decompressed in each iteration. I think
we cannot replace it with "len" here.

Jeevan, Your v12 patch does not apply on HEAD, it requires a
rebase. I have applied it on commit 400fc6b6487ddf16aa82c9d76e5cfbe64d94f660
to validate my v2 patch.

Thanks,
Dipesh
Вложения

Re: refactoring basebackup.c

От
Dipesh Pandit
Дата:
> Sure, please find the rebased patch attached.

Thanks, I have validated v2 patch on top of rebased patch.

Thanks,
Dipesh

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Fri, Feb 11, 2022 at 5:58 AM Jeevan Ladhe <jeevanladhe.os@gmail.com> wrote:
> >Jeevan, Your v12 patch does not apply on HEAD, it requires a
> rebase.
>
> Sure, please find the rebased patch attached.

It's Friday today, but I'm feeling brave, and it's still morning here,
so ... committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Fri, Feb 11, 2022 at 7:20 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
> > Sure, please find the rebased patch attached.
>
> Thanks, I have validated v2 patch on top of rebased patch.

I'm still feeling brave, so I committed this too after fixing a few
things. In the process I noticed that we don't have support for LZ4
compression of streamed WAL (cf. CreateWalTarMethod). It would be good
to fix that. I'm not quite sure whether

http://postgr.es/m/pm1bMV6zZh9_4tUgCjSVMLxDX4cnBqCDGTmdGlvBLHPNyXbN18x_k00eyjkCCJGEajWgya2tQLUDpvb2iIwlD22IcUIrIt9WnMtssNh-F9k=@pm.me
is basically what we need or whether something else is required.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Thanks Robert for the bravity :-)

Regards,
Jeevan Ladhe


On Fri, 11 Feb 2022, 20:31 Robert Haas, <robertmhaas@gmail.com> wrote:
On Fri, Feb 11, 2022 at 7:20 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
> > Sure, please find the rebased patch attached.
>
> Thanks, I have validated v2 patch on top of rebased patch.

I'm still feeling brave, so I committed this too after fixing a few
things. In the process I noticed that we don't have support for LZ4
compression of streamed WAL (cf. CreateWalTarMethod). It would be good
to fix that. I'm not quite sure whether
http://postgr.es/m/pm1bMV6zZh9_4tUgCjSVMLxDX4cnBqCDGTmdGlvBLHPNyXbN18x_k00eyjkCCJGEajWgya2tQLUDpvb2iIwlD22IcUIrIt9WnMtssNh-F9k=@pm.me
is basically what we need or whether something else is required.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Fri, Feb 11, 2022 at 10:29 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> FYI: there's a couple typos in the last 2 patches.

Hmm. OK. But I don't consider "can be optionally specified" incorrect
or worse than "can optionally be specified".

I do agree that spelling words correctly is a good idea.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



RE: refactoring basebackup.c

От
"Shinoda, Noriyoshi (PN Japan FSIP)"
Дата:
Hi, Hackers.
Thank you for developing a great feature.
The current help message shown below does not seem to be able to specify the 'client-' or 'server-' for lz4
compression.
 --compress = {[{client, server}-]gzip, lz4, none}[:LEVEL]

The attached small patch fixes the help message as follows:
 --compress = {[{client, server}-]{gzip, lz4}, none}[:LEVEL]

Regards,
Noriyoshi Shinoda
-----Original Message-----
From: Robert Haas <robertmhaas@gmail.com> 
Sent: Saturday, February 12, 2022 12:50 AM
To: Justin Pryzby <pryzby@telsasoft.com>
Cc: Jeevan Ladhe <jeevanladhe.os@gmail.com>; Dipesh Pandit <dipesh.pandit@gmail.com>; Abhijit Menon-Sen
<ams@toroid.org>;Dmitry Dolgov <9erthalion6@gmail.com>; Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>; Mark Dilger
<mark.dilger@enterprisedb.com>;pgsql-hackers@postgresql.org; tushar <tushar.ahuja@enterprisedb.com>
 
Subject: Re: refactoring basebackup.c

On Fri, Feb 11, 2022 at 10:29 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> FYI: there's a couple typos in the last 2 patches.

Hmm. OK. But I don't consider "can be optionally specified" incorrect or worse than "can optionally be specified".

I do agree that spelling words correctly is a good idea.

--
Robert Haas
EDB: http://www.enterprisedb.com 



Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Sat, Feb 12, 2022 at 1:01 AM Shinoda, Noriyoshi (PN Japan FSIP)
<noriyoshi.shinoda@hpe.com> wrote:
> Thank you for developing a great feature.
> The current help message shown below does not seem to be able to specify the 'client-' or 'server-' for lz4
compression.
>  --compress = {[{client, server}-]gzip, lz4, none}[:LEVEL]
>
> The attached small patch fixes the help message as follows:
>  --compress = {[{client, server}-]{gzip, lz4}, none}[:LEVEL]

Hmm. After studying this a bit more closely, I think this might
actually need a bit more revision than what you propose here. In most
places, we use vertical bars to separate alternatives:

  -X, --wal-method=none|fetch|stream

But here, we're using commas in some places and the word "or" in one
case as well:

  -Z, --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]

We're also not consistently using braces for grouping, which makes the
order of operations a bit unclear, and it makes no sense to put
brackets around LEVEL when it's the only thing that's part of that
alternative.

A more consistent way of writing the supported syntax would be like this:

  -Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|LEVEL|none}

I would be somewhat inclined to leave the level-only variant
undocumented and instead write it like this:

  -Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|none}

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Hi,

Please find the attached updated version of patch for ZSTD server side
compression.

This patch has following changes:

- Fixes the issue Tushar reported[1].
- Adds a tap test.
- Makes document changes related to zstd.
- Updates pg_basebackup help for pg_basebackup. Here I have chosen the
suggestion by Robert upthread (as given below):

>> I would be somewhat inclined to leave the level-only variant
>> undocumented and instead write it like this:
>>  -Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|none}

- pg_indent on basebackup_zstd.c.

Thanks Tushar, for offline help for testing the patch.

[1] https://www.postgresql.org/message-id/6c3f1558-1e56-9946-78a2-c59340da1dbf%40enterprisedb.com

Regards,
Jeevan Ladhe

On Mon, 14 Feb 2022 at 21:30, Robert Haas <robertmhaas@gmail.com> wrote:
On Sat, Feb 12, 2022 at 1:01 AM Shinoda, Noriyoshi (PN Japan FSIP)
<noriyoshi.shinoda@hpe.com> wrote:
> Thank you for developing a great feature.
> The current help message shown below does not seem to be able to specify the 'client-' or 'server-' for lz4 compression.
>  --compress = {[{client, server}-]gzip, lz4, none}[:LEVEL]
>
> The attached small patch fixes the help message as follows:
>  --compress = {[{client, server}-]{gzip, lz4}, none}[:LEVEL]

Hmm. After studying this a bit more closely, I think this might
actually need a bit more revision than what you propose here. In most
places, we use vertical bars to separate alternatives:

  -X, --wal-method=none|fetch|stream

But here, we're using commas in some places and the word "or" in one
case as well:

  -Z, --compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]

We're also not consistently using braces for grouping, which makes the
order of operations a bit unclear, and it makes no sense to put
brackets around LEVEL when it's the only thing that's part of that
alternative.

A more consistent way of writing the supported syntax would be like this:

  -Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|LEVEL|none}

I would be somewhat inclined to leave the level-only variant
undocumented and instead write it like this:

  -Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|none}

--
Robert Haas
EDB: http://www.enterprisedb.com
Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Wed, Feb 9, 2022 at 8:41 AM Abhijit Menon-Sen <ams@toroid.org> wrote:
> It took me a while to assimilate these patches, including the backup
> targets one, which I hadn't looked at before. Now that I've wrapped my
> head around how to put the pieces together, I really like the idea. As
> you say, writing non-trivial integrations in C will take some effort,
> but it seems worthwhile. It's also nice that one can continue to use
> pg_basebackup to trigger the backups and see progress information.

Cool. Thanks for having a look.

> Yes, it looks simple to follow the example set by basebackup_to_shell to
> write a custom target. The complexity will be in whatever we need to do
> to store/forward the backup data, rather than in obtaining the data in
> the first place, which is exactly as it should be.

Yeah, that's what made me really happy with how this came out.

Here's v2, rebased and with documentation added.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
tushar
Дата:
On 2/15/22 6:48 PM, Jeevan Ladhe wrote:
> Please find the attached updated version of patch for ZSTD server side
Thanks, Jeevan, I again tested with the attached patch, and as mentioned 
the crash is fixed now.

also, I tested with different labels with gzip V/s zstd against data 
directory size which is 29GB and found these results

====
./pg_basebackup  -t server:/tmp/<directory> 
--compress=server-zstd:<label>  -Xnone -n -N --no-estimate-size -v

--compress=server-zstd:1 =  compress directory size is  1.3GB
--compress=server-zstd:4 = compress  directory size is  1.3GB
--compress=server-zstd:7 = compress  directory size is  1.2GB
--compress=server-zstd:12 = compress directory size is 1.2GB
====

===
./pg_basebackup  -t server:/tmp/<directooy> 
--compress=server-gzip:<label>  -Xnone -n -N --no-estimate-size -v

--compress=server-gzip:1 =  compress directory size is  1.8GB
--compress=server-gzip:4 = compress  directory size is  1.6GB
--compress=server-gzip:9 = compress  directory size is  1.6GB
===

-- 
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company




Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Thanks Tushar for the testing.

I further worked on ZSTD and now have implemented client side
compression as well. Attached are the patches for both server-side and
client-side compression.

The patch 0001 is a server-side patch, and has not changed since the
last patch version - v10, but, just bumping the version number.

Patch 0002 is the client-side compression patch.

Regards,
Jeevan Ladhe

On Tue, 15 Feb 2022 at 22:24, tushar <tushar.ahuja@enterprisedb.com> wrote:
On 2/15/22 6:48 PM, Jeevan Ladhe wrote:
> Please find the attached updated version of patch for ZSTD server side
Thanks, Jeevan, I again tested with the attached patch, and as mentioned
the crash is fixed now.

also, I tested with different labels with gzip V/s zstd against data
directory size which is 29GB and found these results

====
./pg_basebackup  -t server:/tmp/<directory>
--compress=server-zstd:<label>  -Xnone -n -N --no-estimate-size -v

--compress=server-zstd:1 =  compress directory size is  1.3GB
--compress=server-zstd:4 = compress  directory size is  1.3GB
--compress=server-zstd:7 = compress  directory size is  1.2GB
--compress=server-zstd:12 = compress directory size is 1.2GB
====

===
./pg_basebackup  -t server:/tmp/<directooy>
--compress=server-gzip:<label>  -Xnone -n -N --no-estimate-size -v

--compress=server-gzip:1 =  compress directory size is  1.8GB
--compress=server-gzip:4 = compress  directory size is  1.6GB
--compress=server-gzip:9 = compress  directory size is  1.6GB
===

--
regards,tushar
EnterpriseDB  https://www.enterprisedb.com/
The Enterprise PostgreSQL Company

Вложения

Re: refactoring basebackup.c (zstd)

От
Robert Haas
Дата:
On Tue, Feb 15, 2022 at 12:59 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> There's superfluous changes to ./configure unrelated to the changes in
> configure.ac.  Probably because you're using a different version of autotools,
> or a vendor's patched copy.  You can remove the changes with git checkout -p or
> similar.

I noticed this already and fixed it in the version of the patch I
posted on the other thread.

> +++ b/src/backend/replication/basebackup_zstd.c
> +bbsink *
> +bbsink_zstd_new(bbsink *next, int compresslevel)
> +{
> +#ifndef HAVE_LIBZSTD
> +       ereport(ERROR,
> +                       (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
> +                        errmsg("zstd compression is not supported by this build")));
> +#else
>
> This should have an return; like what's added by 71cbbbbe8 and 302612a6c.
> Also, the parens() around errcode aren't needed since last year.

The parens are still acceptable style, though. The return I guess is needed.

> +       bbsink_zstd *sink;
> +
> +       Assert(next != NULL);
> +       Assert(compresslevel >= 0 && compresslevel <= 22);
> +
> +       if (compresslevel < 0 || compresslevel > 22)
> +               ereport(ERROR,
>
> This looks like dead code in assert builds.
> If it's unreachable, it can be elog().

Actually, the right thing to do here is remove the assert, I think. I
don't believe that the code is unreachable. If I'm wrong and it is
unreachable then the test-and-ereport should be removed.

> + * Compress the input data to the output buffer until we run out of input
> + * data. Each time the output buffer falls below the compression bound for
> + * the input buffer, invoke the archive_contents() method for then next sink.
>
> *the next sink ?

Yeah.

> Does anyone plan to include this for pg15 ?  If so, I think at least the WAL
> compression should have support added too.  I'd plan to rebase Michael's patch.
> https://www.postgresql.org/message-id/YNqWd2GSMrnqWIfx@paquier.xyz

Yes, I'd like to get this into PG15. It's very similar to the LZ4
compression support which was already committed, so it feels like
finishing it up and including it in the release makes a lot of sense.
I'm not against the idea of using ZSTD in other places where it makes
sense as well, but I think that's a separate issue from this patch. As
far as I'm concerned, either basebackup compression with ZSTD or WAL
compression with ZSTD could be committed even if the other is not, and
I plan to spend my time on this project, not that project. However, if
you're saying you want to work on the WAL compression stuff, I've got
no objection to that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Alvaro Herrera
Дата:
On 2022-Feb-14, Robert Haas wrote:

> A more consistent way of writing the supported syntax would be like this:
> 
>   -Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|LEVEL|none}
> 
> I would be somewhat inclined to leave the level-only variant
> undocumented and instead write it like this:
> 
>   -Z, --compress={[{client|server}-]{gzip|lz4}}[:LEVEL]|none}

This is hard to interpret for humans though because of the nested
brackets and braces.  It gets considerably easier if you split it in
separate variants:

   -Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
   -Z, --compress=LEVEL
   -Z, --compress=none
                         compress tar output with given compression method or level


or, if you choose to leave the level-only variant undocumented, then

   -Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
   -Z, --compress=none
                         compress tar output with given compression method or level

There still are some nested brackets and braces, but the scope is
reduced enough that interpreting seems quite a bit simpler.

-- 
Álvaro Herrera           39°49'30"S 73°17'W  —  https://www.EnterpriseDB.com/



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Wed, Feb 16, 2022 at 11:11 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> This is hard to interpret for humans though because of the nested
> brackets and braces.  It gets considerably easier if you split it in
> separate variants:
>
>    -Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
>    -Z, --compress=LEVEL
>    -Z, --compress=none
>                          compress tar output with given compression method or level
>
>
> or, if you choose to leave the level-only variant undocumented, then
>
>    -Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
>    -Z, --compress=none
>                          compress tar output with given compression method or level
>
> There still are some nested brackets and braces, but the scope is
> reduced enough that interpreting seems quite a bit simpler.

I could go for that. I'm also just noticing that "none" is not really
a compression method or level, and the statement that it can only
compress "tar" output is no longer correct, because server-side
compression can be used together with -Fp. So maybe we should change
the sentence afterward to something a bit more generic, like "specify
whether and how to compress the backup".

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Hi Everyone,

So, I went ahead and have now also implemented client side decompression
for zstd.

Robert separated[1] the ZSTD configure switch from my original patch
of server side compression and also added documentation related to
the switch. I have included that patch here in the patch series for
simplicity.

The server side compression patch
0002-ZSTD-add-server-side-compression-support.patch has also taken care
of Justin Pryzby's comments[2]. Also, made changes to pg_basebackup help
as suggested by Álvaro Herrera.

[1] https://www.postgresql.org/message-id/CA%2BTgmobRisF-9ocqYDcMng6iSijGj1EZX99PgXA%3D3VVbWuahog%40mail.gmail.com
[2] https://www.postgresql.org/message-id/20220215175944.GY31460%40telsasoft.com

Regards,
Jeevan Ladhe

On Wed, 16 Feb 2022 at 21:46, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Feb 16, 2022 at 11:11 AM Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
> This is hard to interpret for humans though because of the nested
> brackets and braces.  It gets considerably easier if you split it in
> separate variants:
>
>    -Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
>    -Z, --compress=LEVEL
>    -Z, --compress=none
>                          compress tar output with given compression method or level
>
>
> or, if you choose to leave the level-only variant undocumented, then
>
>    -Z, --compress=[{client|server}-]{gzip|lz4}[:LEVEL]
>    -Z, --compress=none
>                          compress tar output with given compression method or level
>
> There still are some nested brackets and braces, but the scope is
> reduced enough that interpreting seems quite a bit simpler.

I could go for that. I'm also just noticing that "none" is not really
a compression method or level, and the statement that it can only
compress "tar" output is no longer correct, because server-side
compression can be used together with -Fp. So maybe we should change
the sentence afterward to something a bit more generic, like "specify
whether and how to compress the backup".

--
Robert Haas
EDB: http://www.enterprisedb.com
Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Wed, Feb 16, 2022 at 12:46 PM Jeevan Ladhe <jeevanladhe.os@gmail.com> wrote:
> So, I went ahead and have now also implemented client side decompression
> for zstd.
>
> Robert separated[1] the ZSTD configure switch from my original patch
> of server side compression and also added documentation related to
> the switch. I have included that patch here in the patch series for
> simplicity.
>
> The server side compression patch
> 0002-ZSTD-add-server-side-compression-support.patch has also taken care
> of Justin Pryzby's comments[2]. Also, made changes to pg_basebackup help
> as suggested by Álvaro Herrera.

The first hunk of the documentation changes is missing a comma between
gzip and lz4.

+     * At the start of each archive we reset the state to start a new
+     * compression operation. The parameters are sticky and they would stick
+     * around as we are resetting with option ZSTD_reset_session_only.

I don't think "would" is what you mean here. If you say something
would stick around, that means it could be that way it isn't. ("I
would go to the store and buy some apples, but I know they don't have
any so there's no point.") I think you mean "will".

-    printf(_("  -Z,
--compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]\n"
-             "                         compress tar output with given
compression method or level\n"));
+    printf(_("  -Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL]\n"));
+    printf(_("  -Z, --compress=none\n"));

You deleted a line that you should have preserved here.

Overall there doesn't seem to be much to complain about here on a
first read-through. It will be good if we can also fix
CreateWalTarMethod to support LZ4 and ZSTD.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Thanks for the comments Robert. I have addressed your comments in the
attached patch v13-0002-ZSTD-add-server-side-compression-support.patch.
Rest of the patches are similar to v12, but just bumped the version number.

> It will be good if we can also fix
> CreateWalTarMethod to support LZ4 and ZSTD.
Ok we will see, either Dipesh or I will take care of it.

Regards,
Jeevan Ladhe


On Thu, 17 Feb 2022 at 02:37, Robert Haas <robertmhaas@gmail.com> wrote:
On Wed, Feb 16, 2022 at 12:46 PM Jeevan Ladhe <jeevanladhe.os@gmail.com> wrote:
> So, I went ahead and have now also implemented client side decompression
> for zstd.
>
> Robert separated[1] the ZSTD configure switch from my original patch
> of server side compression and also added documentation related to
> the switch. I have included that patch here in the patch series for
> simplicity.
>
> The server side compression patch
> 0002-ZSTD-add-server-side-compression-support.patch has also taken care
> of Justin Pryzby's comments[2]. Also, made changes to pg_basebackup help
> as suggested by Álvaro Herrera.

The first hunk of the documentation changes is missing a comma between
gzip and lz4.

+     * At the start of each archive we reset the state to start a new
+     * compression operation. The parameters are sticky and they would stick
+     * around as we are resetting with option ZSTD_reset_session_only.

I don't think "would" is what you mean here. If you say something
would stick around, that means it could be that way it isn't. ("I
would go to the store and buy some apples, but I know they don't have
any so there's no point.") I think you mean "will".

-    printf(_("  -Z,
--compress={[{client,server}-]gzip,lz4,none}[:LEVEL] or [LEVEL]\n"
-             "                         compress tar output with given
compression method or level\n"));
+    printf(_("  -Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL]\n"));
+    printf(_("  -Z, --compress=none\n"));

You deleted a line that you should have preserved here.

Overall there doesn't seem to be much to complain about here on a
first read-through. It will be good if we can also fix
CreateWalTarMethod to support LZ4 and ZSTD.

--
Robert Haas
EDB: http://www.enterprisedb.com
Вложения

Re: refactoring basebackup.c

От
Dipesh Pandit
Дата:

Hi,

> > It will be good if we can also fix
> > CreateWalTarMethod to support LZ4 and ZSTD.
> Ok we will see, either Dipesh or I will take care of it.

I took a look at the CreateWalTarMethod to support LZ4 compression
for WAL files. The current implementation involves a 3 step to backup
a WAL file to a tar archive. For each file:
  1. It first writes the header in the function tar_open_for_write, flushes the contents of tar to disk and stores the header offset.
  2.  Next, the contents of WAL are written to the tar archive.
  3. In the end, it recalculates the checksum in function tar_close() and overwrites the header at an offset stored in step #1.
The need for overwriting header in CreateWalTarMethod is mainly related to
partial WAL files where the size of the WAL file < WalSegSize. The file is being
padded and checksum is recalculated after adding pad bytes.

If we go ahead and implement LZ4 support for CreateWalTarMethod then
we have a problem here at step #3. In order to achieve better compression
ratio, compressed LZ4 blocks are linked to each other and these blocks
are decoded sequentially. If we overwrite the header as part of step #3 then
it corrupts the link between compressed LZ4 blocks. Although LZ4 provides
an option to write the compressed block independently (using blockMode
option set to LZ4F_blockIndepedent) but it is still a problem because we don't
know if overwriting the header after recalculating the checksum will not overlap
the boundary of the next block.

GZIP manages to overcome this problem as it provides an option to turn on/off
compression on the fly while writing a compressed archive with the help of zlib
library function deflateParams(). The current gzip implementation for
CreateWalTarMethod uses this library function to turn off compression just before
step #1 and it writes the uncompressed header of size equal to TAR_BLOCK_SIZE.
It uses the same library function to turn on the compression for writing the contents
of the WAL file as part of step #2. It again turns off the compression just before step
#3 to overwrite the header. The header is overwritten at the same offset with size
equal to TAR_BLOCK_SIZE.

Since GZIP provides this option to enable/disable compression, it is possible to
control the size of data we are writing to a compressed archive. Even if we overwrite
an already written block in a compressed archive there is no risk of it overlapping
with the boundary of the next block. This mechanism is not available in LZ4 and ZSTD.

In order to support LZ4 and ZSTD compression for CreateWalTarMethod we may
need to refactor this code unless I am missing something. We need to somehow
add the padding bytes in case of partial WAL before we send it to the compressed
archive. This will make sure that all files which are being compressed does not
require any padding as the size is always equal to WalSegSize. There is no need to
recalculate the checksum and we can avoid overwriting the header as part of
step #3.

Thoughts?

Thanks,
Dipesh

walmethods.c is kind of a mess (was Re: refactoring basebackup.c)

От
Robert Haas
Дата:
On Fri, Mar 4, 2022 at 3:32 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
> GZIP manages to overcome this problem as it provides an option to turn on/off
> compression on the fly while writing a compressed archive with the help of zlib
> library function deflateParams(). The current gzip implementation for
> CreateWalTarMethod uses this library function to turn off compression just before
> step #1 and it writes the uncompressed header of size equal to TAR_BLOCK_SIZE.
> It uses the same library function to turn on the compression for writing the contents
> of the WAL file as part of step #2. It again turns off the compression just before step
> #3 to overwrite the header. The header is overwritten at the same offset with size
> equal to TAR_BLOCK_SIZE.

This is a real mess. To me, it seems like a pretty big hack to use
deflateParams() to shut off compression in the middle of the
compressed data stream so that we can go back and overwrite that part
of the data later. It appears that the only reason we need that hack
is because we don't know the file size starting out. Except we kind of
do know the size, because pad_to_size specifies a minimum size for the
file. It's true that the maximum file size is unbounded, but I'm not
sure why that's important. I wonder if anyone else has an idea why we
didn't just set the file size to pad_to_size exactly when we write the
tar header the first time, instead of this IMHO kind of nutty approach
where we back up. I'd try to figure it out from the comments, but
there basically aren't any. I also had a look at the relevant commit
messages and didn't see anything relevant there either. If I'm missing
something, please point it out.

While I'm complaining, I noticed while looking at this code that it is
documented that "The caller must ensure that only one method is
instantiated in any given program, and that it's only instantiated
once!" As far as I can see, this is because somebody thought about
putting all of the relevant data into a struct and then decided on an
alternative strategy of storing some of it there, and the rest in a
global variable. I can't quite imagine why anyone would think that was
a good idea. There may be some reason that I can't see right now, but
here again there appear to be no relevant code comments.

I'm somewhat inclined to wonder whether we could just get rid of
walmethods.c entirely and use the new bbstreamer stuff instead. That
code also knows how to write plain files into a directory, and write
tar archives, and compress stuff, but in my totally biased opinion as
the author of most of that code, it's better code. It has no
restriction on using at most one method per program, or of
instantiating that method only once, and it already has LZ4 support,
and there's a pending patch for ZSTD support that I intend to get
committed soon as well. It also has, and I know I might be beating a
dead horse here, comments. Now, admittedly, it does need to know the
size of each archive member up front in order to work, so if we can't
solve the problem then we can't go this route. But if we can't solve
that problem, then we also can't add LZ4 and ZSTD support to
walmethods.c, because random access to compressed data is not really a
thing, even if we hacked it to work for gzip.

Thoughts?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Wed, Feb 16, 2022 at 8:46 PM Jeevan Ladhe <jeevanladhe.os@gmail.com> wrote:
> Thanks for the comments Robert. I have addressed your comments in the
> attached patch v13-0002-ZSTD-add-server-side-compression-support.patch.
> Rest of the patches are similar to v12, but just bumped the version number.

OK, here's a consolidated patch with all your changes from 0002-0004
as 0001 plus a few proposed edits of my own in 0002. By and large I
think this is fine.

My proposed changes are largely cosmetic, but one thing that isn't is
revising the size - pos <= bound tests to instead check size - pos <
bound. My reasoning for that change is: if the number of bytes
remaining in the buffer is exactly equal to the maximum number we can
write, we don't need to flush it yet. If that sounds correct, we
should fix the LZ4 code the same way.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Hi Robert,

My proposed changes are largely cosmetic, but one thing that isn't is
revising the size - pos <= bound tests to instead check size - pos <
bound. My reasoning for that change is: if the number of bytes
remaining in the buffer is exactly equal to the maximum number we can
write, we don't need to flush it yet. If that sounds correct, we
should fix the LZ4 code the same way.

I agree with your patch. The patch looks good to me.
Yes, the LZ4 flush check should also be fixed. Please find the attached
patch to fix the LZ4 code.

Regards,
Jeevan Ladhe
Вложения

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Mar 8, 2022 at 4:49 AM Jeevan Ladhe <jeevanladhe.os@gmail.com> wrote:
> I agree with your patch. The patch looks good to me.
> Yes, the LZ4 flush check should also be fixed. Please find the attached
> patch to fix the LZ4 code.

OK, committed all that stuff.

I think we also need to fix one other thing. Right now, for LZ4
support we test HAVE_LIBLZ4, but TOAST and XLOG compression are
testing USE_LZ4, so I think we should be doing the same here. And
similarly I think we should be testing USE_ZSTD not HAVE_LIBZSTD.

Patch for that attached.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
OK, committed all that stuff.

Thanks for the commit Robert.
 
I think we also need to fix one other thing. Right now, for LZ4
support we test HAVE_LIBLZ4, but TOAST and XLOG compression are
testing USE_LZ4, so I think we should be doing the same here. And
similarly I think we should be testing USE_ZSTD not HAVE_LIBZSTD.

I reviewed the patch, and it seems to be capturing and replacing all the
places of HAVE_LIB* with USE_* correctly.
Just curious, apart from consistency, do you see other problems as well
when testing one vs the other?

Regards,
Jeevan Ladhe

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Mar 8, 2022 at 11:32 AM Jeevan Ladhe <jeevanladhe.os@gmail.com> wrote:
> I reviewed the patch, and it seems to be capturing and replacing all the
> places of HAVE_LIB* with USE_* correctly.
> Just curious, apart from consistency, do you see other problems as well
> when testing one vs the other?

So, the kind of problem you would worry about in a case like this is:
suppose that configure detects LIBLZ4, but the user specifies
--without-lz4. Then maybe there is some way for HAVE_LIBLZ4 to be
true, while USE_LIBLZ4 is false, and therefore we should not be
compiling code that uses LZ4 but do anyway. As configure.ac is
currently coded, I think that's impossible, because we only search for
liblz4 if the user says --with-lz4, and if they do that, then USE_LZ4
will be set. Therefore, I don't think there is a live problem here,
just an inconsistency.

Probably still best to clean it up before an angry Andres chases me
down, since I know he's working on the build system...

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
ok got it. Thanks for your insights.

Regards,
Jeevan Ladhe

On Tue, 8 Mar 2022 at 22:23, Robert Haas <robertmhaas@gmail.com> wrote:
On Tue, Mar 8, 2022 at 11:32 AM Jeevan Ladhe <jeevanladhe.os@gmail.com> wrote:
> I reviewed the patch, and it seems to be capturing and replacing all the
> places of HAVE_LIB* with USE_* correctly.
> Just curious, apart from consistency, do you see other problems as well
> when testing one vs the other?

So, the kind of problem you would worry about in a case like this is:
suppose that configure detects LIBLZ4, but the user specifies
--without-lz4. Then maybe there is some way for HAVE_LIBLZ4 to be
true, while USE_LIBLZ4 is false, and therefore we should not be
compiling code that uses LZ4 but do anyway. As configure.ac is
currently coded, I think that's impossible, because we only search for
liblz4 if the user says --with-lz4, and if they do that, then USE_LZ4
will be set. Therefore, I don't think there is a live problem here,
just an inconsistency.

Probably still best to clean it up before an angry Andres chases me
down, since I know he's working on the build system...

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: refactoring basebackup.c

От
Justin Pryzby
Дата:
I'm getting errors from pg_basebackup when using both -D- and --compress=server-*
The issue seems to go away if I use --no-manifest.

$ ./src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method none --compress=server-gzip >/dev/null ; echo $?
pg_basebackup: error: tar member has empty name
1

$ ./src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method none --compress=server-gzip >/dev/null ; echo $?
NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means to
completethe backup
 
pg_basebackup: error: COPY stream ended before last file was finished
1



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Thu, Mar 10, 2022 at 8:02 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> I'm getting errors from pg_basebackup when using both -D- and --compress=server-*
> The issue seems to go away if I use --no-manifest.
>
> $ ./src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method none --compress=server-gzip >/dev/null ; echo
$?
> pg_basebackup: error: tar member has empty name
> 1
>
> $ ./src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method none --compress=server-gzip >/dev/null ; echo
$?
> NOTICE:  WAL archiving is not enabled; you must ensure that all required WAL segments are copied through other means
tocomplete the backup
 
> pg_basebackup: error: COPY stream ended before last file was finished
> 1

Thanks for the report. The problem here is that, when the output is
standard output (-D -), pg_basebackup can only produce a single output
file, so the manifest gets injected into the tar file on the client
side rather than being written separately as we do in normal cases.
However, that only works if we're receiving a tar file that we can
parse from the server, and here the server is sending a compressed
tarfile. The current code mistakely attempts to parse the compressed
tarfile as if it were an uncompressed tarfile, which causes the error
messages that you are seeing (and which I can also reproduce here). We
actually have enough infrastructure available in pg_basebackup now
that we could do the "right thing" in this case: decompress the data
received from the server, parse the resulting tar file, inject the
backup manifest, construct a new tar file, and recompress. However, I
think that's probably not a good idea, because it's unlikely that the
user will understand that the data is being compressed on the server,
then decompressed, and then recompressed again, and the performance of
the resulting pipeline will probably not be very good. So I think we
should just refuse this command. Patch for that attached.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
Justin Pryzby
Дата:
On Fri, Mar 11, 2022 at 10:19:29AM -0500, Robert Haas wrote:
> So I think we should just refuse this command. Patch for that attached.

Sounds right.

Also, I think the magic 8 for .gz should actually be a 7.

I'm not sure why it tests for ".gz" but not ".tar.gz", which would help to make
them all less magic.

commit 1fb1e21ba7a500bb2b85ec3e65f59130fcdb4a7e
Author: Justin Pryzby <pryzbyj@telsasoft.com>
Date:   Thu Mar 10 21:22:16 2022 -0600

    pg_basebackup: make magic numbers less magic
    
    The magic 8 for .gz should actually be a 7.
    
    .tar.gz
    1234567
    
    .tar.lz4
    .tar.zst
    12345678
    
    See d45099425, 751b8d23b, 7cf085f07.

diff --git a/src/bin/pg_basebackup/pg_basebackup.c b/src/bin/pg_basebackup/pg_basebackup.c
index 9f3ecc60fbe..8dd9721323d 100644
--- a/src/bin/pg_basebackup/pg_basebackup.c
+++ b/src/bin/pg_basebackup/pg_basebackup.c
@@ -1223,17 +1223,17 @@ CreateBackupStreamer(char *archive_name, char *spclocation,
     is_tar = (archive_name_len > 4 &&
               strcmp(archive_name + archive_name_len - 4, ".tar") == 0);
 
-    /* Is this a gzip archive? */
-    is_tar_gz = (archive_name_len > 8 &&
-                 strcmp(archive_name + archive_name_len - 3, ".gz") == 0);
+    /* Is this a .tar.gz archive? */
+    is_tar_gz = (archive_name_len > 7 &&
+                 strcmp(archive_name + archive_name_len - 7, ".tar.gz") == 0);
 
-    /* Is this a LZ4 archive? */
+    /* Is this a .tar.lz4 archive? */
     is_tar_lz4 = (archive_name_len > 8 &&
-                  strcmp(archive_name + archive_name_len - 4, ".lz4") == 0);
+                  strcmp(archive_name + archive_name_len - 8, ".tar.lz4") == 0);
 
-    /* Is this a ZSTD archive? */
+    /* Is this a .tar.zst archive? */
     is_tar_zstd = (archive_name_len > 8 &&
-                   strcmp(archive_name + archive_name_len - 4, ".zst") == 0);
+                   strcmp(archive_name + archive_name_len - 8, ".tar.zst") == 0);
 
     /*
      * We have to parse the archive if (1) we're suppose to extract it, or if



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Fri, Mar 11, 2022 at 11:29 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> Sounds right.

OK, committed.

> Also, I think the magic 8 for .gz should actually be a 7.
>
> I'm not sure why it tests for ".gz" but not ".tar.gz", which would help to make
> them all less magic.
>
> commit 1fb1e21ba7a500bb2b85ec3e65f59130fcdb4a7e
> Author: Justin Pryzby <pryzbyj@telsasoft.com>
> Date:   Thu Mar 10 21:22:16 2022 -0600

Yeah, your patch looks right. Committed that, too.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Feb 15, 2022 at 11:26 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Feb 9, 2022 at 8:41 AM Abhijit Menon-Sen <ams@toroid.org> wrote:
> > It took me a while to assimilate these patches, including the backup
> > targets one, which I hadn't looked at before. Now that I've wrapped my
> > head around how to put the pieces together, I really like the idea. As
> > you say, writing non-trivial integrations in C will take some effort,
> > but it seems worthwhile. It's also nice that one can continue to use
> > pg_basebackup to trigger the backups and see progress information.
>
> Cool. Thanks for having a look.
>
> > Yes, it looks simple to follow the example set by basebackup_to_shell to
> > write a custom target. The complexity will be in whatever we need to do
> > to store/forward the backup data, rather than in obtaining the data in
> > the first place, which is exactly as it should be.
>
> Yeah, that's what made me really happy with how this came out.
>
> Here's v2, rebased and with documentation added.

I don't hear many comments on this, but I'm pretty sure that it's a
good idea, and there haven't been many objections to this patch series
as a whole, so I'd like to proceed with it. If nobody objects
vigorously, I'll commit this next week.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Andres Freund
Дата:
Hi,

On 2022-03-11 10:19:29 -0500, Robert Haas wrote:
> Thanks for the report. The problem here is that, when the output is
> standard output (-D -), pg_basebackup can only produce a single output
> file, so the manifest gets injected into the tar file on the client
> side rather than being written separately as we do in normal cases.
> However, that only works if we're receiving a tar file that we can
> parse from the server, and here the server is sending a compressed
> tarfile. The current code mistakely attempts to parse the compressed
> tarfile as if it were an uncompressed tarfile, which causes the error
> messages that you are seeing (and which I can also reproduce here). We
> actually have enough infrastructure available in pg_basebackup now
> that we could do the "right thing" in this case: decompress the data
> received from the server, parse the resulting tar file, inject the
> backup manifest, construct a new tar file, and recompress. However, I
> think that's probably not a good idea, because it's unlikely that the
> user will understand that the data is being compressed on the server,
> then decompressed, and then recompressed again, and the performance of
> the resulting pipeline will probably not be very good. So I think we
> should just refuse this command. Patch for that attached.

You could also just append a manifest as a compresed tar to the compressed tar
stream. Unfortunately GNU tar requires -i to read concated compressed
archives, so perhaps that's not quite an alternative.

Greetings,

Andres Freund



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Fri, Mar 11, 2022 at 8:52 PM Andres Freund <andres@anarazel.de> wrote:
> You could also just append a manifest as a compresed tar to the compressed tar
> stream. Unfortunately GNU tar requires -i to read concated compressed
> archives, so perhaps that's not quite an alternative.

s/Unfortunately/Fortunately/ :-p

I think we've already gone way too far in the direction of making this
stuff rely on specific details of the tar format. What if someday we
wanted to switch to pax, cpio, zip, 7zip, whatever, or even just have
one of those things as an option? It's not that I'm dying to have
PostgreSQL produce rar or arj files, but I think we box ourselves into
a corner when we just assume tar everywhere. As an example of a
similar issue with real consequences, consider the recent discovery
that we can't easily add support for LZ4 or ZSTD compression of
pg_wal.tar. The problem is that the existing code tells the gzip
library to emit the tar header as part of the compressed stream
without actually compressing it, and then it goes back and overwrites
that data later! Unsurprisingly, that's not a feature every
compression library offers.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Dipesh Pandit
Дата:
Hi,

I tried to implement support for parallel ZSTD compression. The
library provides an option (ZSTD_c_nbWorkers) to specify the
number of compression workers. The number of parallel
workers can be set as part of compression parameter and if this
option is specified then the library performs parallel compression
based on the specified number of workers.

User can specify the number of parallel worker as part of
--compress option by appending an integer value after at sign (@).
(-Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL][@WORKERS])

Please find the attached patch v1 with the above changes.

Note: ZSTD library version 1.5.x supports parallel compression
by default and if the library version is lower than 1.5.x then
parallel compression is enabled only the source is compiled with build
macro ZSTD_MULTITHREAD. If the linked library version doesn't
support parallel compression then setting the value of parameter
ZSTD_c_nbWorkers to a value other than 0 will be no-op and
returns an error.

Thanks,
Dipesh
Вложения

Re: refactoring basebackup.c (zstd workers)

От
Justin Pryzby
Дата:
On Mon, Mar 14, 2022 at 09:41:35PM +0530, Dipesh Pandit wrote:
> I tried to implement support for parallel ZSTD compression. The
> library provides an option (ZSTD_c_nbWorkers) to specify the
> number of compression workers. The number of parallel
> workers can be set as part of compression parameter and if this
> option is specified then the library performs parallel compression
> based on the specified number of workers.
> 
> User can specify the number of parallel worker as part of
> --compress option by appending an integer value after at sign (@).
> (-Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL][@WORKERS])

I suggest to use a syntax that's more general than that, maybe something like

:[level=]N,parallel=N,flag,flag,...

For example, someone may want to use zstd "long" mode or (when it's released)
rsyncable mode, or specify fine-grained compression parameters (strategy,
windowLog, hashLog, etc).

I hope the same syntax will be shared with wal_compression and pg_dump.
And libpq, if that patch progresses.

BTW, I think this may be better left for PG16.

-- 
Justin



Re: refactoring basebackup.c (zstd workers)

От
Robert Haas
Дата:
On Mon, Mar 14, 2022 at 12:35 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> I suggest to use a syntax that's more general than that, maybe something like
>
> :[level=]N,parallel=N,flag,flag,...
>
> For example, someone may want to use zstd "long" mode or (when it's released)
> rsyncable mode, or specify fine-grained compression parameters (strategy,
> windowLog, hashLog, etc).

That's an interesting idea. I wonder what the replication protocol
ought to look like in that case. Should we have a COMPRESSION_DETAIL
argument that is just a string, and let the server parse it out? Or
separate protocol-level options? It does feel reasonable to have both
COMPRESSION_LEVEL and COMPRESSION_WORKERS as first-class options, but
I don't know that we want COMPRESSION_HASHLOG true as part of our
first-class grammar.

> I hope the same syntax will be shared with wal_compression and pg_dump.
> And libpq, if that patch progresses.
>
> BTW, I think this may be better left for PG16.

Possibly so ... but if we're thinking of any revisions to the
newly-added grammar, we had better take care of that now, before it's
set in stone.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c (zstd workers)

От
Justin Pryzby
Дата:
On Mon, Mar 14, 2022 at 01:02:20PM -0400, Robert Haas wrote:
> On Mon, Mar 14, 2022 at 12:35 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > I suggest to use a syntax that's more general than that, maybe something like
> >
> > :[level=]N,parallel=N,flag,flag,...
> >
> > For example, someone may want to use zstd "long" mode or (when it's released)
> > rsyncable mode, or specify fine-grained compression parameters (strategy,
> > windowLog, hashLog, etc).
> 
> That's an interesting idea. I wonder what the replication protocol
> ought to look like in that case. Should we have a COMPRESSION_DETAIL
> argument that is just a string, and let the server parse it out? Or
> separate protocol-level options? It does feel reasonable to have both
> COMPRESSION_LEVEL and COMPRESSION_WORKERS as first-class options, but
> I don't know that we want COMPRESSION_HASHLOG true as part of our
> first-class grammar.

I was only referring to the user-facing grammar.

Internally, I was thinking they'd all be handled as first-class options, with
separate struct fields and separate replication protocol options.  If an option
isn't known, it'd be rejected on the client side, rather than causing an error
on the server.

Maybe there'd be an option parser for this in common/ (I think that might
require having new data structure there too, maybe one for each compression
method, or maybe a union{} to handles them all).  Most of the ~100 lines to
support wal_compression='zstd:N' are to parse out the N.

-- 
Justin



Re: refactoring basebackup.c (zstd workers)

От
Robert Haas
Дата:
On Mon, Mar 14, 2022 at 1:11 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> Internally, I was thinking they'd all be handled as first-class options, with
> separate struct fields and separate replication protocol options.  If an option
> isn't known, it'd be rejected on the client side, rather than causing an error
> on the server.

There's some appeal to that, but one downside is that it means that
the client can't be used to fetch data that is compressed in a way
that the server knows about and the client doesn't. I don't think
that's great. Why should, for example, pg_basebackup need to be
compiled with zstd support in order to request zstd compression on the
server side? If the server knows about the brand new
justin-magic-sauce compression algorithm, maybe the client should just
be able to request it and, when given various .jms files by the
server, shrug its shoulders and accept them for what they are. That
doesn't work if -Fp is involved, or similar, but it should work fine
for simple cases if we set things up right.

> Maybe there'd be an option parser for this in common/ (I think that might
> require having new data structure there too, maybe one for each compression
> method, or maybe a union{} to handles them all).  Most of the ~100 lines to
> support wal_compression='zstd:N' are to parse out the N.

Yes, it's actually a very simple feature now that we've got the rest
of the infrastructure set up correctly for it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Jeevan Ladhe
Дата:
Thanks for the patch, Dipesh.
I had a look at the patch and also tried to take the backup. I have
following suggestions and observations:

I get following error at my end:

$ pg_basebackup -D /tmp/zstd_bk -Ft -Xfetch --compress=server-zstd:7@4
pg_basebackup: error: could not initiate base backup: ERROR:  could not compress data: Unsupported parameter
pg_basebackup: removing data directory "/tmp/zstd_bk"

This is mostly because I have the zstd library version v1.4.4, which
does not have default support for parallel workers. Maybe we should
have a better error, something that is hinting that the parallelism is
not supported by the particular build.

The regression for pg_verifybackup test 008_untar.pl also fails with a
similar error. Here, I think we should have some logic in regression to
skip the test if the parameter is not supported?

+   if (ZSTD_isError(ret))                                                    
+       elog(ERROR,                                                            
+            "could not compress data: %s",                                    
+            ZSTD_getErrorName(ret));    

I think all of this can go on one line, but anyhow we have to improve
the error message here.

Also, just a thought, for the versions where parallelism is not
supported, should we instead just throw a warning and fall back to
non-parallel behavior?

Regards,
Jeevan Ladhe

On Mon, 14 Mar 2022 at 21:41, Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
Hi,

I tried to implement support for parallel ZSTD compression. The
library provides an option (ZSTD_c_nbWorkers) to specify the
number of compression workers. The number of parallel
workers can be set as part of compression parameter and if this
option is specified then the library performs parallel compression
based on the specified number of workers.

User can specify the number of parallel worker as part of
--compress option by appending an integer value after at sign (@).
(-Z, --compress=[{client|server}-]{gzip|lz4|zstd}[:LEVEL][@WORKERS])

Please find the attached patch v1 with the above changes.

Note: ZSTD library version 1.5.x supports parallel compression
by default and if the library version is lower than 1.5.x then
parallel compression is enabled only the source is compiled with build
macro ZSTD_MULTITHREAD. If the linked library version doesn't
support parallel compression then setting the value of parameter
ZSTD_c_nbWorkers to a value other than 0 will be no-op and
returns an error.

Thanks,
Dipesh

Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Tue, Mar 15, 2022 at 6:33 AM Jeevan Ladhe <jeevanladhe.os@gmail.com> wrote:
> I get following error at my end:
>
> $ pg_basebackup -D /tmp/zstd_bk -Ft -Xfetch --compress=server-zstd:7@4
> pg_basebackup: error: could not initiate base backup: ERROR:  could not compress data: Unsupported parameter
> pg_basebackup: removing data directory "/tmp/zstd_bk"
>
> This is mostly because I have the zstd library version v1.4.4, which
> does not have default support for parallel workers. Maybe we should
> have a better error, something that is hinting that the parallelism is
> not supported by the particular build.

I'm not averse to trying to improve that error message, but honestly
I'd consider that to be good enough already to be acceptable. We could
think about trying to add an errhint() telling you that the problem
may be with your libzstd build.

> The regression for pg_verifybackup test 008_untar.pl also fails with a
> similar error. Here, I think we should have some logic in regression to
> skip the test if the parameter is not supported?

Or at least to have the test not fail.

> Also, just a thought, for the versions where parallelism is not
> supported, should we instead just throw a warning and fall back to
> non-parallel behavior?

I don't think so. I think it's better for the user to get an error and
then change their mind and request something we can do.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c (zstd negative compression)

От
Justin Pryzby
Дата:
Should zstd's negative compression levels be supported here ?

Here's a POC patch which is enough to play with it.

$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --compress=zstd |wc -c
12305659
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --compress=zstd:1 |wc -c
13827521
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --compress=zstd:0 |wc -c
12304018
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --compress=zstd:-1 |wc -c
16443893
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --compress=zstd:-2 |wc -c
17349563
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --compress=zstd:-4 |wc -c
19452631
$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --compress=zstd:-7 |wc -c
21871505

Also, with a partial regression DB, this crashes when writing to stdout.

$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --compress=lz4 |wc -c
pg_basebackup: bbstreamer_lz4.c:172: bbstreamer_lz4_compressor_content: Assertion `mystreamer->base.bbs_buffer.maxlen
>=out_bound' failed.
 
24117248

#4  0x000055555555e8b4 in bbstreamer_lz4_compressor_content (streamer=0x5555555a5260, member=0x7fffffffc760, 
    data=0x7ffff3068010 "{ \"PostgreSQL-Backup-Manifest-Version\": 1,\n\"Files\": [\n{ \"Path\": \"backup_label\",
\"Size\":227, \"Last-Modified\": \"2022-03-16 02:29:11 GMT\", \"Checksum-Algorithm\": \"CRC32C\", \"Checksum\":
\"46f69d99\"},\n{ \"Pa"..., len=401072, context=BBSTREAMER_MEMBER_CONTENTS) at bbstreamer_lz4.c:172
 
        mystreamer = 0x5555555a5260
        next_in = 0x7ffff3068010 "{ \"PostgreSQL-Backup-Manifest-Version\": 1,\n\"Files\": [\n{ \"Path\":
\"backup_label\",\"Size\": 227, \"Last-Modified\": \"2022-03-16 02:29:11 GMT\", \"Checksum-Algorithm\": \"CRC32C\",
\"Checksum\":\"46f69d99\" },\n{ \"Pa"...
 
        ...

(gdb) p mystreamer->base.bbs_buffer.maxlen
$1 = 524288
(gdb) p (int) LZ4F_compressBound(len, &mystreamer->prefs)
$4 = 524300

This is with: liblz4-1:amd64 1.9.2-2ubuntu0.20.04.1

Вложения

Re: refactoring basebackup.c (zstd workers)

От
Robert Haas
Дата:
On Mon, Mar 14, 2022 at 1:21 PM Robert Haas <robertmhaas@gmail.com> wrote:
> There's some appeal to that, but one downside is that it means that
> the client can't be used to fetch data that is compressed in a way
> that the server knows about and the client doesn't. I don't think
> that's great. Why should, for example, pg_basebackup need to be
> compiled with zstd support in order to request zstd compression on the
> server side? If the server knows about the brand new
> justin-magic-sauce compression algorithm, maybe the client should just
> be able to request it and, when given various .jms files by the
> server, shrug its shoulders and accept them for what they are. That
> doesn't work if -Fp is involved, or similar, but it should work fine
> for simple cases if we set things up right.

Concretely, I propose the attached patch for v15. It renames the
newly-added COMPRESSION_LEVEL option to COMPRESSION_DETAIL, introduces
a flexible syntax for options along the lines you proposed, and
adjusts things so that a client that doesn't support a particular type
of compression can still request that type of compression from the
server.

I think it's important to do this for v15 so that we don't end up with
backward-compatibility problems down the road.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c (zstd workers)

От
Justin Pryzby
Дата:
diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml
index 9178c779ba..00c593f1af 100644
--- a/doc/src/sgml/protocol.sgml
+++ b/doc/src/sgml/protocol.sgml
@@ -2731,14 +2731,24 @@ The commands accepted in replication mode are:
+        <para>
+          For <literal>gzip</literal> the compression level should be an

gzip comma

+++ b/src/backend/replication/basebackup.c
@@ -18,6 +18,7 @@
 
 #include "access/xlog_internal.h"    /* for pg_start/stop_backup */
 #include "common/file_perm.h"
+#include "common/backup_compression.h"

alphabetical

-                                                errmsg("unrecognized compression algorithm: \"%s\"",
+                                                errmsg("unrecognized compression algorithm \"%s\"",

Most other places seem to say "compression method".  So I'd suggest to change
that here, and in doc/src/sgml/ref/pg_basebackup.sgml.

-    if (o_compression_level && !o_compression)
+    if (o_compression_detail && !o_compression)
         ereport(ERROR,
                 (errcode(ERRCODE_SYNTAX_ERROR),
                  errmsg("compression level requires compression")));

s/level/detail/

 /*
+ * Basic parsing of a value specified for -Z/--compress.
+ *
+ * We're not concerned here with understanding exactly what behavior the
+ * user wants, but we do need to know whether the user is requesting client
+ * or server side compression or leaving it unspecified, and we need to
+ * separate the name of the compression algorithm from the detail string.
+ *
+ * For instance, if the user writes --compress client-lz4:6, we want to
+ * separate that into (a) client-side compression, (b) algorithm "lz4",
+ * and (c) detail "6". Note, however, that all the client/server prefix is
+ * optional, and so is the detail. The algorithm name is required, unless
+ * the whole string is an integer, in which case we assume "gzip" as the
+ * algorithm and use the integer as the detail.
..
  */
 static void
+parse_compress_options(char *option, char **algorithm, char **detail,
+                       CompressionLocation *locationres)

It'd be great if this were re-usable for wal_compression, which I hope in pg16 will
support at least level=N.  And eventually pg_dump.  But those clients shouldn't
accept a client/server prefix.  Maybe the way to handle that is for those tools
to check locationres and reject it if it was specified.

+ * We're not concerned with validation at this stage, so if the user writes
+ * --compress client-turkey:sandwhich, the requested algorithm is "turkey"
+ * and the detail string is "sandwhich". We'll sort out whether that's legal

sp: sandwich

+        WalCompressionMethod    wal_compress_method;

This is confusingly similar to src/include/access/xlog.h:WalCompression.
I think someone else mentioned this before ?

+ * A compression specification specifies the parameters that should be used
+ * when * performing compression with a specific algorithm. The simplest

star

+/*
+ * Get the human-readable name corresponding to a particular compression
+ * algorithm.
+ */
+char *
+get_bc_algorithm_name(bc_algorithm algorithm)

should be const ?

+    /* As a special case, the specification can be a bare integer. */
+    bare_level = strtol(specification, &bare_level_endp, 10);

Should this call expect_integer_value()?
See below.

+            result->parse_error =
+                pstrdup("found empty string where a compression option was expected");

Needs to be localized with _() ?
Also, document that it's pstrdup'd.

+/*
+ * Parse 'value' as an integer and return the result.
+ *
+ * If parsing fails, set result->parse_error to an appropriate message
+ * and return -1.
+ */
+static int
+expect_integer_value(char *keyword, char *value, bc_specification *result)

-1 isn't great, since it's also an integer, and, also a valid compression level
for zstd (did you see my message about that?).  Maybe INT_MIN is ok.

+{
+    int        ivalue;
+    char   *ivalue_endp;
+
+    ivalue = strtol(value, &ivalue_endp, 10);

Should this also set/check errno ?
And check if value != ivalue_endp ?
See strtol(3)

+char *
+validate_bc_specification(bc_specification *spec)
...
+    /*
+     * If a compression level was specified, check that the algorithm expects
+     * a compression level and that the level is within the legal range for
+     * the algorithm.

It would be nice if this could be shared with wal_compression and pg_dump.
We shouldn't need multiple places with structures giving the algorithms and
range of compression levels.

+    unsigned    options;        /* OR of BACKUP_COMPRESSION_OPTION constants */

Should be "unsigned int" or "bits32" ?

The server crashes if I send an unknown option - you should hit that in the
regression tests.

$ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --no-manifest
--compress=server-lz4:a|wc -c
 
TRAP: FailedAssertion("pointer != NULL", File: "../../../../src/include/utils/memutils.h", Line: 123, PID: 8627)
postgres: walsender pryzbyj [local] BASE_BACKUP(ExceptionalCondition+0xa0)[0x560b45d7b64b]
postgres: walsender pryzbyj [local] BASE_BACKUP(pfree+0x5d)[0x560b45dad1ea]
postgres: walsender pryzbyj [local] BASE_BACKUP(parse_bc_specification+0x154)[0x560b45dc5d4f]
postgres: walsender pryzbyj [local] BASE_BACKUP(+0x43d56c)[0x560b45bc556c]
postgres: walsender pryzbyj [local] BASE_BACKUP(SendBaseBackup+0x2d)[0x560b45bc85ca]
postgres: walsender pryzbyj [local] BASE_BACKUP(exec_replication_command+0x3a2)[0x560b45bdddb2]
postgres: walsender pryzbyj [local] BASE_BACKUP(PostgresMain+0x6b2)[0x560b45c39131]
postgres: walsender pryzbyj [local] BASE_BACKUP(+0x40530e)[0x560b45b8d30e]
postgres: walsender pryzbyj [local] BASE_BACKUP(+0x408572)[0x560b45b90572]
postgres: walsender pryzbyj [local] BASE_BACKUP(+0x4087b9)[0x560b45b907b9]
postgres: walsender pryzbyj [local] BASE_BACKUP(PostmasterMain+0x1135)[0x560b45b91d9b]
postgres: walsender pryzbyj [local] BASE_BACKUP(main+0x229)[0x560b45ad0f78]

This is interpreted like client-gzip-1; should multiple specifications of
compress be prohibited ?

| src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --no-manifest --compress=server-lz4
--compress=1



Re: refactoring basebackup.c (zstd workers)

От
Robert Haas
Дата:
Thanks for the review!

I'll address most of these comments later, but quickly for right now...

On Thu, Mar 17, 2022 at 3:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> It'd be great if this were re-usable for wal_compression, which I hope in pg16 will
> support at least level=N.  And eventually pg_dump.  But those clients shouldn't
> accept a client/server prefix.  Maybe the way to handle that is for those tools
> to check locationres and reject it if it was specified.
> [...]
> This is confusingly similar to src/include/access/xlog.h:WalCompression.
> I think someone else mentioned this before ?

A couple of people before me have had delusions of grandeur in this
area. We have the WalCompression enum, which has values of the form
COMPRESSION_*, instead of WAL_COMPRESSION_*, as if the WAL were going
to be the only thing that ever got compressed. And pg_dump.h also has
a CompressionAlgorithm enum, with values like COMPR_ALG_*, which isn't
great naming either. Clearly there's some cleanup needed here: if we
can use the same enum for multiple systems, then it can have a name
implying that it's the only game in town, but otherwise both the enum
name and the corresponding value need to use a suitable prefix. I
think that's a job for another patch, probably post-v15. For now I
plan to do the right thing with the new names I'm adding, and leave
the existing names alone. That can be changed in the future, if and
when it seems sensible.

As I said elsewhere, I think the WAL compression stuff is badly
designed and should probably be rewritten completely, maybe to reuse
the bbstreamer stuff. In that case, WalCompressionMethod would
probably go away entirely, making the naming confusion moot, and
picking up zstd and lz4 compression support for free. If that doesn't
happen, we can probably find some way to at least make them share an
enum, but I think that's too hairy to try to clean up right now with
feature freeze pending.

> The server crashes if I send an unknown option - you should hit that in the
> regression tests.
>
> $ src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --no-manifest
--compress=server-lz4:a|wc -c
 
> TRAP: FailedAssertion("pointer != NULL", File: "../../../../src/include/utils/memutils.h", Line: 123, PID: 8627)
> postgres: walsender pryzbyj [local] BASE_BACKUP(ExceptionalCondition+0xa0)[0x560b45d7b64b]
> postgres: walsender pryzbyj [local] BASE_BACKUP(pfree+0x5d)[0x560b45dad1ea]
> postgres: walsender pryzbyj [local] BASE_BACKUP(parse_bc_specification+0x154)[0x560b45dc5d4f]
> postgres: walsender pryzbyj [local] BASE_BACKUP(+0x43d56c)[0x560b45bc556c]
> postgres: walsender pryzbyj [local] BASE_BACKUP(SendBaseBackup+0x2d)[0x560b45bc85ca]
> postgres: walsender pryzbyj [local] BASE_BACKUP(exec_replication_command+0x3a2)[0x560b45bdddb2]
> postgres: walsender pryzbyj [local] BASE_BACKUP(PostgresMain+0x6b2)[0x560b45c39131]
> postgres: walsender pryzbyj [local] BASE_BACKUP(+0x40530e)[0x560b45b8d30e]
> postgres: walsender pryzbyj [local] BASE_BACKUP(+0x408572)[0x560b45b90572]
> postgres: walsender pryzbyj [local] BASE_BACKUP(+0x4087b9)[0x560b45b907b9]
> postgres: walsender pryzbyj [local] BASE_BACKUP(PostmasterMain+0x1135)[0x560b45b91d9b]
> postgres: walsender pryzbyj [local] BASE_BACKUP(main+0x229)[0x560b45ad0f78]

That's odd - I thought I had tested that case. Will double-check.

> This is interpreted like client-gzip-1; should multiple specifications of
> compress be prohibited ?
>
> | src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --no-manifest
--compress=server-lz4--compress=1
 

They're not now and haven't been in the past. I think the last one
should just win (as it apparently does, here). We do that in some
places and throw an error in others and I'm not sure if we have a 100%
consistent rule for it, but flipping one location between one behavior
and the other isn't going to make things more consistent overall.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c (zstd workers)

От
Robert Haas
Дата:
On Thu, Mar 17, 2022 at 3:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> gzip comma

I think it's fine the way it's written. If we made that change, then
we'd have a comma for gzip and not for the other two algorithms. Also,
I'm just moving that sentence, so any change that there is to be made
here is a job for some other patch.

> alphabetical

Fixed.

> -                                                errmsg("unrecognized compression algorithm: \"%s\"",
> +                                                errmsg("unrecognized compression algorithm \"%s\"",
>
> Most other places seem to say "compression method".  So I'd suggest to change
> that here, and in doc/src/sgml/ref/pg_basebackup.sgml.

I'm not sure that's really better, and I don't think this patch is
introducing an altogether novel usage. I think I would probably try to
standardize on algorithm rather than method if I were standardizing
the whole source tree, but I think we can leave that discussion for
another time.

> -       if (o_compression_level && !o_compression)
> +       if (o_compression_detail && !o_compression)
>                 ereport(ERROR,
>                                 (errcode(ERRCODE_SYNTAX_ERROR),
>                                  errmsg("compression level requires compression")));
>
> s/level/detail/

Fixed.


> It'd be great if this were re-usable for wal_compression, which I hope in pg16 will
> support at least level=N.  And eventually pg_dump.  But those clients shouldn't
> accept a client/server prefix.  Maybe the way to handle that is for those tools
> to check locationres and reject it if it was specified.

One thing I forgot to mention in my previous response is that I think
the parsing code is actually well set up for this the way I have it.
server- and client- gets parsed off in a different place than we
interpret the rest, which fits well with your observation that other
cases wouldn't have a client or server prefix.

> sp: sandwich

Fixed.

> star

Fixed.

> should be const ?

OK.

>
> +       /* As a special case, the specification can be a bare integer. */
> +       bare_level = strtol(specification, &bare_level_endp, 10);
>
> Should this call expect_integer_value()?
> See below.

I don't think that would be useful. We have no keyword to pass for the
error message, nor would we use the error message if one got
constructed.

> +                       result->parse_error =
> +                               pstrdup("found empty string where a compression option was expected");
>
> Needs to be localized with _() ?
> Also, document that it's pstrdup'd.

Did the latter. The former would need to be fixed in a bunch of places
and while I'm happy to accept an expert opinion on exactly what needs
to be done here, I don't want to try to do it and do it wrong. Better
to let someone with good knowledge of the subject matter patch it up
later than do a crummy job now.

> -1 isn't great, since it's also an integer, and, also a valid compression level
> for zstd (did you see my message about that?).  Maybe INT_MIN is ok.

It really doesn't matter. Could just return 42. The client shouldn't
use the value if there's an error.

> +{
> +       int             ivalue;
> +       char   *ivalue_endp;
> +
> +       ivalue = strtol(value, &ivalue_endp, 10);
>
> Should this also set/check errno ?
> And check if value != ivalue_endp ?
> See strtol(3)

Even after reading the man page for strtol, it's not clear to me that
this is needed. That page represents checking *endptr != '\0' as
sufficient to tell whether an error occurred. Maybe it wouldn't catch
an out of range value, but in practice all of the algorithms we
support now and any we support in the future are going to catch
something clamped to LONG_MIN or LONG_MAX as out of range and display
the correct error message. What's your specific thinking here?

> +       unsigned        options;                /* OR of BACKUP_COMPRESSION_OPTION constants */
>
> Should be "unsigned int" or "bits32" ?

I do not see why either of those would be better.

> The server crashes if I send an unknown option - you should hit that in the
> regression tests.

Turns out I was testing this on the client side but not the server
side. Fixed and added more tests.

v2 attached.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c (zstd workers)

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
>> Should this also set/check errno ?
>> And check if value != ivalue_endp ?
>> See strtol(3)

> Even after reading the man page for strtol, it's not clear to me that
> this is needed. That page represents checking *endptr != '\0' as
> sufficient to tell whether an error occurred.

I'm not sure whose man page you looked at, but the POSIX standard [1]
has a pretty clear opinion about this:

    Since 0, {LONG_MIN} or {LLONG_MIN}, and {LONG_MAX} or {LLONG_MAX} are
    returned on error and are also valid returns on success, an
    application wishing to check for error situations should set errno to
    0, then call strtol() or strtoll(), then check errno.

Checking *endptr != '\0' is for detecting whether there is trailing
garbage after the number; which may be an error case or not as you
choose, but it's a different matter.

            regards, tom lane

[1] https://pubs.opengroup.org/onlinepubs/9699919799/



Re: refactoring basebackup.c (zstd workers)

От
Justin Pryzby
Дата:
On Sun, Mar 20, 2022 at 03:05:28PM -0400, Robert Haas wrote:
> On Thu, Mar 17, 2022 at 3:41 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > -                                                errmsg("unrecognized compression algorithm: \"%s\"",
> > +                                                errmsg("unrecognized compression algorithm \"%s\"",
> >
> > Most other places seem to say "compression method".  So I'd suggest to change
> > that here, and in doc/src/sgml/ref/pg_basebackup.sgml.
> 
> I'm not sure that's really better, and I don't think this patch is
> introducing an altogether novel usage. I think I would probably try to
> standardize on algorithm rather than method if I were standardizing
> the whole source tree, but I think we can leave that discussion for
> another time.

The user-facing docs are already standardized using "compression method", with
2 exceptions, of which one is contrib/ and the other is what I'm suggesting to
make consistent here.

$ git grep 'compression algorithm' doc
doc/src/sgml/pgcrypto.sgml:    Which compression algorithm to use.  Only available if
doc/src/sgml/ref/pg_basebackup.sgml:        compression algorithm is selected, or if server-side compression

> > +                       result->parse_error =
> > +                               pstrdup("found empty string where a compression option was expected");
> >
> > Needs to be localized with _() ?
> > Also, document that it's pstrdup'd.
> 
> Did the latter. The former would need to be fixed in a bunch of places
> and while I'm happy to accept an expert opinion on exactly what needs
> to be done here, I don't want to try to do it and do it wrong. Better
> to let someone with good knowledge of the subject matter patch it up
> later than do a crummy job now.

I believe it just needs _("foo")
See git grep '= _('

I mentioned another issue off-list:
pg_basebackup.c:2741:10: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
 2741 |   Assert(compressloc = COMPRESS_LOCATION_SERVER);
      |          ^~~~~~~~~~~
pg_basebackup.c:2741:3: note: in expansion of macro ‘Assert’
 2741 |   Assert(compressloc = COMPRESS_LOCATION_SERVER);

This crashes the server using your v2 patch:

src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --no-manifest
--compress=server-zstd:level,|wc -c
 

I wonder whether the syntax should really use both ":" and ",".
Maybe ":" isn't needed at all.

This patch also needs to update the other user-facing docs.

typo: contain a an

-- 
Justin



Re: refactoring basebackup.c (zstd workers)

От
Robert Haas
Дата:
On Sun, Mar 20, 2022 at 3:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > Even after reading the man page for strtol, it's not clear to me that
> > this is needed. That page represents checking *endptr != '\0' as
> > sufficient to tell whether an error occurred.
>
> I'm not sure whose man page you looked at, but the POSIX standard [1]
> has a pretty clear opinion about this:
>
>     Since 0, {LONG_MIN} or {LLONG_MIN}, and {LONG_MAX} or {LLONG_MAX} are
>     returned on error and are also valid returns on success, an
>     application wishing to check for error situations should set errno to
>     0, then call strtol() or strtoll(), then check errno.
>
> Checking *endptr != '\0' is for detecting whether there is trailing
> garbage after the number; which may be an error case or not as you
> choose, but it's a different matter.

I think I'm guilty of verbal inexactitude here but not bad coding.
Checking for *endptr != '\0', as I did, is not sufficient to detect
"whether an error occurred," as I alleged. But, in the part of my
response you didn't quote, I believe I made it clear that I only need
to detect garbage, not out-of-range values. And I think *endptr !=
'\0' will do that.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c (zstd workers)

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> I think I'm guilty of verbal inexactitude here but not bad coding.
> Checking for *endptr != '\0', as I did, is not sufficient to detect
> "whether an error occurred," as I alleged. But, in the part of my
> response you didn't quote, I believe I made it clear that I only need
> to detect garbage, not out-of-range values. And I think *endptr !=
> '\0' will do that.

Hmm ... do you consider an empty string to be valid input?

            regards, tom lane



Re: refactoring basebackup.c (zstd workers)

От
Robert Haas
Дата:
On Sun, Mar 20, 2022 at 3:40 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> The user-facing docs are already standardized using "compression method", with
> 2 exceptions, of which one is contrib/ and the other is what I'm suggesting to
> make consistent here.
>
> $ git grep 'compression algorithm' doc
> doc/src/sgml/pgcrypto.sgml:    Which compression algorithm to use.  Only available if
> doc/src/sgml/ref/pg_basebackup.sgml:        compression algorithm is selected, or if server-side compression

Well, if you just count the number of occurrences of each string in
the documentation, sure. But all of the ones that are talking about a
compression method seem to have to do with configurable TOAST
compression, and the fact that the documentation for that feature is
more extensive than for the pre-existing feature that refers to a
compression algorithm does not, at least in my view, turn it into a
project standard from which no deviation is permitted.

> > Did the latter. The former would need to be fixed in a bunch of places
> > and while I'm happy to accept an expert opinion on exactly what needs
> > to be done here, I don't want to try to do it and do it wrong. Better
> > to let someone with good knowledge of the subject matter patch it up
> > later than do a crummy job now.
>
> I believe it just needs _("foo")
> See git grep '= _('

Hmm. Maybe.

> I mentioned another issue off-list:
> pg_basebackup.c:2741:10: warning: suggest parentheses around assignment used as truth value [-Wparentheses]
>  2741 |   Assert(compressloc = COMPRESS_LOCATION_SERVER);
>       |          ^~~~~~~~~~~
> pg_basebackup.c:2741:3: note: in expansion of macro ‘Assert’
>  2741 |   Assert(compressloc = COMPRESS_LOCATION_SERVER);
>
> This crashes the server using your v2 patch:
>
> src/bin/pg_basebackup/pg_basebackup --wal-method fetch -Ft -D - -h /tmp --no-sync --no-manifest
--compress=server-zstd:level,|wc -c 

Well that's unfortunate. Will fix.

> I wonder whether the syntax should really use both ":" and ",".
> Maybe ":" isn't needed at all.

I don't think we should treat the compression method name in the same
way as a compression algorithm option.

> This patch also needs to update the other user-facing docs.

Which ones exactly?

> typo: contain a an

OK, will fix.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c (zstd workers)

От
Robert Haas
Дата:
On Sun, Mar 20, 2022 at 9:32 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > I think I'm guilty of verbal inexactitude here but not bad coding.
> > Checking for *endptr != '\0', as I did, is not sufficient to detect
> > "whether an error occurred," as I alleged. But, in the part of my
> > response you didn't quote, I believe I made it clear that I only need
> > to detect garbage, not out-of-range values. And I think *endptr !=
> > '\0' will do that.
>
> Hmm ... do you consider an empty string to be valid input?

No, and I thought I had checked properly for that condition before
reaching the point in the code where I call strtol(), but it turns out
I have not, which I guess is what Justin has been trying to tell me
for a few emails now.

I'll send an updated patch tomorrow after looking this all over more carefully.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c (zstd workers)

От
Justin Pryzby
Дата:
On Sun, Mar 20, 2022 at 09:38:44PM -0400, Robert Haas wrote:
> > This patch also needs to update the other user-facing docs.
> 
> Which ones exactly?

I mean pg_basebackup -Z

-Z level
-Z [{client|server}-]method[:level]
--compress=level
--compress=[{client|server}-]method[:level]



Re: refactoring basebackup.c (zstd workers)

От
Robert Haas
Дата:
On Mon, Mar 21, 2022 at 9:18 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> On Sun, Mar 20, 2022 at 09:38:44PM -0400, Robert Haas wrote:
> > > This patch also needs to update the other user-facing docs.
> >
> > Which ones exactly?
>
> I mean pg_basebackup -Z
>
> -Z level
> -Z [{client|server}-]method[:level]
> --compress=level
> --compress=[{client|server}-]method[:level]

Ah, right. Thanks.

Here's v3. I have updated that section of the documentation. I also
went and added a bunch more test cases for validation of compression
detail strings, many inspired by your examples, and fixed all the bugs
that I found in the process. I think the crashes you complained about
are now fixed, but please let me know if I have missed any. I also
added _() calls as you suggested. I searched for the "contain a an"
typo that you mentioned but was not able to find it. Can you give me a
more specific pointer?

I looked a little bit more at the compression method vs. compression
algorithm thing. I agree that there is some inconsistency in
terminology here, but I'm still not sure that we are well-served by
trying to make it totally uniform, especially if we pick the word
"method" as the standard rather than "algorithm". In my opinion,
"method" is less specific than "algorithm". If someone asks me to
choose a compression algorithm, I know that I should give an answer
like "lz4" or "zstd". If they ask me to pick a compression method, I'm
not quite sure whether they want that kind of answer or whether they
want something more detailed, like "use lz4 with compression level 3
and a 1MB block size". After all, that is (at least according to my
understanding of how English works) a perfectly valid answer to the
question "what method should I use to compress this data?" -- but not
to the question "what algorithm should I use to compress this data?".
The latter can ONLY be properly answered by saying something like
"lz4". And I think that's really the root of my hesitation to make the
kinds of changes you want here. If it's just a question of specifying
a compression algorithm and a level, I don't think using the name
"method" for the algorithm is going to be too bad. But as we enrich
the system with multiple compression algorithms each of which may have
multiple and different parameters, I think the whole thing becomes
murkier and the need for precision in language goes up.

Now that is of course an arguable position and you're welcome to
disagree with it, but I think that's part of why I'm hesitating.
Another part of it, at least for me, is that complete uniformity is
not always a positive. I suppose all of us have had the experience at
some point of reading a manual that says something like "to activate
the boil water function, press and release the 'boil water' button"
and rolled our eyes at how useless it was. It's important to me that
we don't fall into that trap. We clearly don't want to go ballistic
and have random inconsistencies in language for no reason, but at the
same time, it's not useful to tell people that METHOD should be
replaced with a compression method and LEVEL with a compression level.
I mean, if you end up saying something like that interspersed with
non-obvious information, that is OK, and I don't want to overstate the
point I'm trying to make. But it seems to me that if there's a little
variation in phrasing and we end up saying that METHOD means the
compression algorithm or that ALGORITHM means the compression method
or whatever, that can actually make things more clear. Here again it's
debatable: how much variation in phraseology is helpful, and at what
point does it just start to seem inconsistent? Well, everyone may have
their own opinion.

I'm not trying to pretend that this patch (or the existing code base)
gets this all right. But I do think that, to the extent that we have a
considered position on what to do here, we can make that change later,
perhaps even after getting some user feedback on what does and does
not make sense to other people. And I also think that what we end up
doing here may well end up being more nuanced than a blanket
search-and-replace. I'm not saying we couldn't make a blanket
search-and-replace. I just don't see it as necessarily creating value,
or being all that closely connected to the goal of this patch, which
is to quickly clean up a forward-compatibility risk before we hit
feature freeze.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c (zstd workers)

От
Justin Pryzby
Дата:
On Mon, Mar 21, 2022 at 12:57:36PM -0400, Robert Haas wrote:
> > typo: contain a an
> I searched for the "contain a an" typo that you mentioned but was not able to
> find it. Can you give me a more specific pointer?

Here:

+ * during parsing, and will otherwise contain a an appropriate error message.

> I looked a little bit more at the compression method vs. compression
> algorithm thing. I agree that there is some inconsistency in
> terminology here, but I'm still not sure that we are well-served by
> trying to make it totally uniform, especially if we pick the word
> "method" as the standard rather than "algorithm". In my opinion,
> "method" is less specific than "algorithm". If someone asks me to
> choose a compression algorithm, I know that I should give an answer
> like "lz4" or "zstd". If they ask me to pick a compression method, I'm
> not quite sure whether they want that kind of answer or whether they
> want something more detailed, like "use lz4 with compression level 3
> and a 1MB block size". After all, that is (at least according to my
> understanding of how English works) a perfectly valid answer to the
> question "what method should I use to compress this data?" -- but not
> to the question "what algorithm should I use to compress this data?".
> The latter can ONLY be properly answered by saying something like
> "lz4". And I think that's really the root of my hesitation to make the
> kinds of changes you want here.

I think "algorithm" could be much more nuanced than "lz4", but I also think
we've spent more than enough time on it now :)

-- 
Justin



Re: refactoring basebackup.c (zstd workers)

От
Robert Haas
Дата:
On Mon, Mar 21, 2022 at 2:22 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> + * during parsing, and will otherwise contain a an appropriate error message.

OK, thanks. v4 attached.

> I think "algorithm" could be much more nuanced than "lz4", but I also think
> we've spent more than enough time on it now :)

Oh dear. But yes.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

multithreaded zstd backup compression for client and server

От
Robert Haas
Дата:
[ Changing subject line in the hopes of attracting more eyeballs. ]

On Mon, Mar 14, 2022 at 12:11 PM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
> I tried to implement support for parallel ZSTD compression.

Here's a new patch for this. It's more of a rewrite than an update,
honestly; commit ffd53659c46a54a6978bcb8c4424c1e157a2c0f1 necessitated
totally different options handling, but I also redid the test cases,
the documentation, and the error message.

For those who may not have been following along, here's an executive
summary: libzstd offers an option for parallel compression. It's
intended to be transparent: you just say you want it, and the library
takes care of it for you. Since we have the ability to do backup
compression on either the client or the server side, we can expose
this option in both locations. That would be cool, because it would
allow for really fast backup compression with a good compression
ratio. It would also mean that we would be, or really libzstd would
be, spawning threads inside the PostgreSQL backend. Short of cats and
dogs living together, it's hard to think of anything more terrifying,
because the PostgreSQL backend is very much not thread-safe. However,
a lot of the things we usually worry about when people make noises
about using threads in the backend don't apply here, because the
threads are hidden away behind libzstd interfaces and can't execute
any PostgreSQL code. Therefore, I think it might be safe to just ...
turn this on. One reason I think that is that this whole approach was
recommended to me by Andres ... but that's not to say that there
couldn't be problems.  I worry a bit that the mere presence of threads
could in some way mess things up, but I don't know what the mechanism
for that would be, and I don't want to postpone shipping useful
features based on nebulous fears.

In my ideal world, I'd like to push this into v15. I've done a lot of
work to improve the backup code in this release, and this is actually
a very small change yet one that potentially enables the project to
get a lot more value out of the work that has already been committed.
That said, I also don't want to break the world, so if you have an
idea what this would break, please tell me.

For those curious as to how this affects performance and backup size,
I loaded up the UK land registry database. That creates a 3769MB
database. Then I backed it up using client-side compression and
server-side compression using the various different algorithms that
are supported in the master branch, plus parallel zstd.

no compression: 3.7GB, 9 seconds
gzip: 1.5GB, 140 seconds with server-side, 141 seconds with client-side
lz4: 2.0GB, 13 seconds with server-side, 12 seconds with client-side

For both parallel and non-parallel zstd compression, I see differences
between the compressed size depending on where the compression is
done. I don't know whether this is an expected behavior of the zstd
library or a bug. Both files uncompress OK and pass pg_verifybackup,
but that doesn't mean we're not, for example, selecting different
compression levels where we shouldn't be. I'll try to figure out
what's going on here.

zstd, client-side: 1.7GB, 17 seconds
zstd, server-side: 1.3GB, 25 seconds
parallel zstd, 4 workers, client-side: 1.7GB, 7.5 seconds
parallel zstd, 4 workers, server-side: 1.3GB, 7.2 seconds

Notice that compressing the backup with parallel zstd is actually
faster than taking an uncompressed backup, even though this test is
all being run on the same machine. That's kind of crazy to me: the
parallel compression is so fast that we save more time on I/O than we
spend compressing. This assumes of course that you have plenty of CPU
resources and limited I/O resources, which won't be true for everyone,
but it's not an unusual situation.

I think the documentation changes in this patch might not be quite up
to scratch. I think there's a brewing problem here: as we add more
compression options, whether or not that happens in this release, and
regardless of what specific options we add, the way things are
structured right now, we're going to end up either duplicating a bunch
of stuff between the pg_basebackup documentation and the BASE_BACKUP
documentation, or else one of those places is going to end up lacking
information that someone reading it might like to have. I'm not
exactly sure what to do about this, though.

This patch contains a trivial adjustment to
PostgreSQL::Test::Cluster::run_log to make it return a useful value
instead of not. I think that should be pulled out and committed
independently regardless of what happens to this patch overall, and
possibly back-patched.

Thanks,

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: multithreaded zstd backup compression for client and server

От
Andres Freund
Дата:
Hi,

On 2022-03-23 16:34:04 -0400, Robert Haas wrote:
> Therefore, I think it might be safe to just ...  turn this on. One reason I
> think that is that this whole approach was recommended to me by Andres ...

I didn't do a super careful analysis of the issues... But I do think it's
pretty much the one case where it "should" be safe.

The most likely source of problem would errors thrown while zstd threads are
alive. Should make sure that that can't happen.


What is the lifetime of the threads zstd spawns? Are they tied to a single
compression call? A single ZSTD_createCCtx()? If the latter, how bulletproof
is our code ensuring that we don't leak such contexts?

If they're short-lived, are we compressing large enough batches to not waste a
lot of time starting/stopping threads?


> but that's not to say that there couldn't be problems.  I worry a bit that
> the mere presence of threads could in some way mess things up, but I don't
> know what the mechanism for that would be, and I don't want to postpone
> shipping useful features based on nebulous fears.

One thing that'd be good to tests for is cancelling in-progress server-side
compression.  And perhaps a few assertions that ensure that we don't escape
with some threads still running. That'd have to be platform dependent, but I
don't see a problem with that in this case.



> For both parallel and non-parallel zstd compression, I see differences
> between the compressed size depending on where the compression is
> done. I don't know whether this is an expected behavior of the zstd
> library or a bug. Both files uncompress OK and pass pg_verifybackup,
> but that doesn't mean we're not, for example, selecting different
> compression levels where we shouldn't be. I'll try to figure out
> what's going on here.
>
> zstd, client-side: 1.7GB, 17 seconds
> zstd, server-side: 1.3GB, 25 seconds
> parallel zstd, 4 workers, client-side: 1.7GB, 7.5 seconds
> parallel zstd, 4 workers, server-side: 1.3GB, 7.2 seconds

What causes this fairly massive client-side/server-side size difference?



> +    /*
> +     * We check for failure here because (1) older versions of the library
> +     * do not support ZSTD_c_nbWorkers and (2) the library might want to
> +     * reject unreasonable values (though in practice it does not seem to do
> +     * so).
> +     */
> +    ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_nbWorkers,
> +                                 compress->workers);
> +    if (ZSTD_isError(ret))
> +    {
> +        pg_log_error("could not set compression worker count to %d: %s",
> +                     compress->workers, ZSTD_getErrorName(ret));
> +        exit(1);
> +    }

Will this cause test failures on systems with older zstd?


Greetings,

Andres Freund



Re: multithreaded zstd backup compression for client and server

От
Justin Pryzby
Дата:
+        * We check for failure here because (1) older versions of the library
+        * do not support ZSTD_c_nbWorkers and (2) the library might want to
+        * reject an unreasonable values (though in practice it does not seem to do
+        * so).
+        */
+       ret = ZSTD_CCtx_setParameter(mysink->cctx, ZSTD_c_nbWorkers,
+                                                                mysink->workers);
+       if (ZSTD_isError(ret))
+               ereport(ERROR,
+                               errcode(ERRCODE_INVALID_PARAMETER_VALUE),
+                               errmsg("could not set compression worker count to %d: %s",
+                                          mysink->workers, ZSTD_getErrorName(ret)));

Also because the library may not be compiled with threading.  A few days ago, I
tried to rebase the original "parallel workers" patch over the COMPRESS DETAIL
patch but then couldn't test it, even after trying various versions of the zstd
package and trying to compile it locally.  I'll try again soon...

I think you should also test the return value when setting the compress level.
Not only because it's generally a good idea, but also because I suggested to
support negative compression levels.  Which weren't allowed before v1.3.4, and
then the range is only defined since 1.3.6 (ZSTD_minCLevel).  At some point,
the range may have been -7..22 but now it's -131072..22.

lib/compress/zstd_compress.c:int ZSTD_minCLevel(void) { return (int)-ZSTD_TARGETLENGTH_MAX; }
lib/zstd.h:#define ZSTD_TARGETLENGTH_MAX    ZSTD_BLOCKSIZE_MAX
lib/zstd.h:#define ZSTD_BLOCKSIZE_MAX     (1<<ZSTD_BLOCKSIZELOG_MAX)
lib/zstd.h:#define ZSTD_BLOCKSIZELOG_MAX  17
; -1<<17
        -131072

Вложения

Re: multithreaded zstd backup compression for client and server

От
Robert Haas
Дата:
On Wed, Mar 23, 2022 at 5:14 PM Andres Freund <andres@anarazel.de> wrote:
> The most likely source of problem would errors thrown while zstd threads are
> alive. Should make sure that that can't happen.
>
> What is the lifetime of the threads zstd spawns? Are they tied to a single
> compression call? A single ZSTD_createCCtx()? If the latter, how bulletproof
> is our code ensuring that we don't leak such contexts?

I haven't found any real documentation explaining how libzstd manages
its threads. I am assuming that it is tied to the ZSTD_CCtx, but I
don't know. I guess I could try to figure it out from the source code.
Anyway, what we have now is  a PG_TRY()/PG_CATCH() block around the
code that uses the basink which will cause bbsink_zstd_cleanup() to
get called in the event of an error. That will do ZSTD_freeCCtx().

It's probably also worth mentioning here that even if, contrary to
expectations, the compression threads hang around to the end of time
and chill, in practice nobody is likely to run BASE_BACKUP and then
keep the connection open for a long time afterward. So it probably
wouldn't really affect resource utilization in real-world scenarios
even if the threads never exited, as long as they didn't, you know,
busy-loop in the background. And I assume the actual library behavior
can't be nearly that bad. This is a pretty mainstream piece of
software.

> If they're short-lived, are we compressing large enough batches to not waste a
> lot of time starting/stopping threads?

Well, we're using a single ZSTD_CCtx for an entire base backup. Again,
I haven't found documentation explaining with libzstd is actually
doing, but it's hard to see how we could make the batch any bigger
than that. The context gets reset for each new tablespace, which may
or may not do anything to the compression threads.

> > but that's not to say that there couldn't be problems.  I worry a bit that
> > the mere presence of threads could in some way mess things up, but I don't
> > know what the mechanism for that would be, and I don't want to postpone
> > shipping useful features based on nebulous fears.
>
> One thing that'd be good to tests for is cancelling in-progress server-side
> compression.  And perhaps a few assertions that ensure that we don't escape
> with some threads still running. That'd have to be platform dependent, but I
> don't see a problem with that in this case.

More specific suggestions, please?

> > For both parallel and non-parallel zstd compression, I see differences
> > between the compressed size depending on where the compression is
> > done. I don't know whether this is an expected behavior of the zstd
> > library or a bug. Both files uncompress OK and pass pg_verifybackup,
> > but that doesn't mean we're not, for example, selecting different
> > compression levels where we shouldn't be. I'll try to figure out
> > what's going on here.
> >
> > zstd, client-side: 1.7GB, 17 seconds
> > zstd, server-side: 1.3GB, 25 seconds
> > parallel zstd, 4 workers, client-side: 1.7GB, 7.5 seconds
> > parallel zstd, 4 workers, server-side: 1.3GB, 7.2 seconds
>
> What causes this fairly massive client-side/server-side size difference?

You seem not to have read what I wrote about this exact point in the
text which you quoted.

> Will this cause test failures on systems with older zstd?

I put a bunch of logic in the test case to try to avoid that, so
hopefully not, but if it does, we can adjust the logic.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: multithreaded zstd backup compression for client and server

От
Robert Haas
Дата:
On Wed, Mar 23, 2022 at 5:52 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> Also because the library may not be compiled with threading.  A few days ago, I
> tried to rebase the original "parallel workers" patch over the COMPRESS DETAIL
> patch but then couldn't test it, even after trying various versions of the zstd
> package and trying to compile it locally.  I'll try again soon...

Ah. Right, I can update the comment to mention that.

> I think you should also test the return value when setting the compress level.
> Not only because it's generally a good idea, but also because I suggested to
> support negative compression levels.  Which weren't allowed before v1.3.4, and
> then the range is only defined since 1.3.6 (ZSTD_minCLevel).  At some point,
> the range may have been -7..22 but now it's -131072..22.

Yeah, I was thinking that might be a good change. It would require
adjusting some other code though, because right now only compression
levels 1..22 are accepted anyhow.

> lib/compress/zstd_compress.c:int ZSTD_minCLevel(void) { return (int)-ZSTD_TARGETLENGTH_MAX; }
> lib/zstd.h:#define ZSTD_TARGETLENGTH_MAX    ZSTD_BLOCKSIZE_MAX
> lib/zstd.h:#define ZSTD_BLOCKSIZE_MAX     (1<<ZSTD_BLOCKSIZELOG_MAX)
> lib/zstd.h:#define ZSTD_BLOCKSIZELOG_MAX  17
> ; -1<<17
>         -131072

So does that, like, compress the value by making it way bigger? :-)

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: multithreaded zstd backup compression for client and server

От
Justin Pryzby
Дата:
On Wed, Mar 23, 2022 at 04:34:04PM -0400, Robert Haas wrote:
> be, spawning threads inside the PostgreSQL backend. Short of cats and
> dogs living together, it's hard to think of anything more terrifying,
> because the PostgreSQL backend is very much not thread-safe. However,
> a lot of the things we usually worry about when people make noises
> about using threads in the backend don't apply here, because the
> threads are hidden away behind libzstd interfaces and can't execute
> any PostgreSQL code. Therefore, I think it might be safe to just ...
> turn this on. One reason I think that is that this whole approach was
> recommended to me by Andres ... but that's not to say that there
> couldn't be problems.  I worry a bit that the mere presence of threads
> could in some way mess things up, but I don't know what the mechanism
> for that would be, and I don't want to postpone shipping useful
> features based on nebulous fears.

Note that the PGDG .RPMs and .DEBs are already linked with pthread, via
libxml => liblzma.

$ ldd /usr/pgsql-14/bin/postgres |grep xm
        libxml2.so.2 => /lib64/libxml2.so.2 (0x00007faab984e000)
$ objdump -p /lib64/libxml2.so.2 |grep NEED
  NEEDED               libdl.so.2
  NEEDED               libz.so.1
  NEEDED               liblzma.so.5
  NEEDED               libm.so.6
  NEEDED               libc.so.6
  VERNEED              0x0000000000019218
  VERNEEDNUM           0x0000000000000005
$ objdump -p /lib64/liblzma.so.5 |grep NEED
  NEEDED               libpthread.so.0



Did you try this on windows at all ?  It's probably no surprise that zstd
implements threading differently there.



Re: multithreaded zstd backup compression for client and server

От
Andres Freund
Дата:
Hi,

On 2022-03-23 18:31:12 -0400, Robert Haas wrote:
> On Wed, Mar 23, 2022 at 5:14 PM Andres Freund <andres@anarazel.de> wrote:
> > The most likely source of problem would errors thrown while zstd threads are
> > alive. Should make sure that that can't happen.
> >
> > What is the lifetime of the threads zstd spawns? Are they tied to a single
> > compression call? A single ZSTD_createCCtx()? If the latter, how bulletproof
> > is our code ensuring that we don't leak such contexts?
> 
> I haven't found any real documentation explaining how libzstd manages
> its threads. I am assuming that it is tied to the ZSTD_CCtx, but I
> don't know. I guess I could try to figure it out from the source code.

I found this the following section in the manual [1]:

    ZSTD_c_nbWorkers=400,    /* Select how many threads will be spawned to compress in parallel.
                              * When nbWorkers >= 1, triggers asynchronous mode when invoking ZSTD_compressStream*() :
                              * ZSTD_compressStream*() consumes input and flush output if possible, but immediately
givesback control to caller,
 
                              * while compression is performed in parallel, within worker thread(s).
                              * (note : a strong exception to this rule is when first invocation of
ZSTD_compressStream2()sets ZSTD_e_end :
 
                              *  in which case, ZSTD_compressStream2() delegates to ZSTD_compress2(), which is always a
blockingcall).
 
                              * More workers improve speed, but also increase memory usage.
                              * Default value is `0`, aka "single-threaded mode" : no worker is spawned,
                              * compression is performed inside Caller's thread, and all invocations are blocking */

"ZSTD_compressStream*() consumes input ... immediately gives back control"
pretty much confirms that.


Do we care about zstd's memory usage here? I think it's OK to mostly ignore
work_mem/maintenance_work_mem here, but I could also see limiting concurrency
so that estimated memory usage would fit into work_mem/maintenance_work_mem.



> It's probably also worth mentioning here that even if, contrary to
> expectations, the compression threads hang around to the end of time
> and chill, in practice nobody is likely to run BASE_BACKUP and then
> keep the connection open for a long time afterward. So it probably
> wouldn't really affect resource utilization in real-world scenarios
> even if the threads never exited, as long as they didn't, you know,
> busy-loop in the background. And I assume the actual library behavior
> can't be nearly that bad. This is a pretty mainstream piece of
> software.

I'm not really worried about resource utilization, more about the existence of
threads moving us into undefined behaviour territory or such. I don't think
that's possible, but it's IIRC UB to fork() while threads are present and do
pretty much *anything* other than immediately exec*().


> > > but that's not to say that there couldn't be problems.  I worry a bit that
> > > the mere presence of threads could in some way mess things up, but I don't
> > > know what the mechanism for that would be, and I don't want to postpone
> > > shipping useful features based on nebulous fears.
> >
> > One thing that'd be good to tests for is cancelling in-progress server-side
> > compression.  And perhaps a few assertions that ensure that we don't escape
> > with some threads still running. That'd have to be platform dependent, but I
> > don't see a problem with that in this case.
> 
> More specific suggestions, please?

I was thinking of doing something like calling pthread_is_threaded_np() before
and after the zstd section and erroring out if they differ. But I forgot that
that's on mac-ism.


> > > For both parallel and non-parallel zstd compression, I see differences
> > > between the compressed size depending on where the compression is
> > > done. I don't know whether this is an expected behavior of the zstd
> > > library or a bug. Both files uncompress OK and pass pg_verifybackup,
> > > but that doesn't mean we're not, for example, selecting different
> > > compression levels where we shouldn't be. I'll try to figure out
> > > what's going on here.
> > >
> > > zstd, client-side: 1.7GB, 17 seconds
> > > zstd, server-side: 1.3GB, 25 seconds
> > > parallel zstd, 4 workers, client-side: 1.7GB, 7.5 seconds
> > > parallel zstd, 4 workers, server-side: 1.3GB, 7.2 seconds
> >
> > What causes this fairly massive client-side/server-side size difference?
> 
> You seem not to have read what I wrote about this exact point in the
> text which you quoted.

Somehow not...

Perhaps it's related to the amounts of memory fed to ZSTD_compressStream2() in
one invocation? I recall that there's some differences between basebackup
client / serverside around buffer sizes - but that's before all the recent-ish
changes...

Greetings,

Andres Freund

[1] http://facebook.github.io/zstd/zstd_manual.html



Re: multithreaded zstd backup compression for client and server

От
Andres Freund
Дата:
On 2022-03-23 18:07:01 -0500, Justin Pryzby wrote:
> Did you try this on windows at all ?

Really should get zstd installed in the windows cf environment...


> It's probably no surprise that zstd implements threading differently there.

Worth noting that we have a few of our own threads running on windows already
- so we're guaranteed to build against the threaded standard libraries etc
already.



Re: multithreaded zstd backup compression for client and server

От
Robert Haas
Дата:
On Wed, Mar 23, 2022 at 7:07 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> Did you try this on windows at all ?  It's probably no surprise that zstd
> implements threading differently there.

I did not. I haven't had a properly functioning Windows development
environment in about a decade.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: multithreaded zstd backup compression for client and server

От
Robert Haas
Дата:
On Wed, Mar 23, 2022 at 7:31 PM Andres Freund <andres@anarazel.de> wrote:
> I found this the following section in the manual [1]:
>
>     ZSTD_c_nbWorkers=400,    /* Select how many threads will be spawned to compress in parallel.
>                               * When nbWorkers >= 1, triggers asynchronous mode when invoking ZSTD_compressStream*()
:
>                               * ZSTD_compressStream*() consumes input and flush output if possible, but immediately
givesback control to caller,
 
>                               * while compression is performed in parallel, within worker thread(s).
>                               * (note : a strong exception to this rule is when first invocation of
ZSTD_compressStream2()sets ZSTD_e_end :
 
>                               *  in which case, ZSTD_compressStream2() delegates to ZSTD_compress2(), which is always
ablocking call).
 
>                               * More workers improve speed, but also increase memory usage.
>                               * Default value is `0`, aka "single-threaded mode" : no worker is spawned,
>                               * compression is performed inside Caller's thread, and all invocations are blocking */
>
> "ZSTD_compressStream*() consumes input ... immediately gives back control"
> pretty much confirms that.

I saw that too, but I didn't consider it conclusive. It would be nice
if their documentation had a bit more detail on what's really
happening.

> Do we care about zstd's memory usage here? I think it's OK to mostly ignore
> work_mem/maintenance_work_mem here, but I could also see limiting concurrency
> so that estimated memory usage would fit into work_mem/maintenance_work_mem.

I think it's possible that we want to do nothing and possible that we
want to do something, but I think it's very unlikely that the thing we
want to do is related to maintenance_work_mem. Say we soft-cap the
compression level to the one which we think will fit within
maintanence_work_mem. I think the most likely outcome is that people
will not get the compression level they request and be confused about
why that has happened. It also seems possible that we'll be wrong
about how much memory will be used - say, because somebody changes the
library behavior in a new release - and will limit it to the wrong
level. If we're going to do anything here, I think it should be to
limit based on the compression level itself and not based how much
memory we think that level will use.

But that leaves the question of whether we should even try to impose
some kind of limit, and there I'm not sure. It feels like it might be
overengineered, because we're only talking about users who have
replication privileges, and if those accounts are subverted there are
big problems anyway. I think if we imposed a governance system here it
would get very little use. On the other hand, I think that the higher
zstd compression levels of 20+ can actually use a ton of memory, so we
might want to limit access to those somehow. Apparently on the command
line you have to say --ultra -- not sure if there's a corresponding
API call or if that's a guard that's built specifically into the CLI.

> Perhaps it's related to the amounts of memory fed to ZSTD_compressStream2() in
> one invocation? I recall that there's some differences between basebackup
> client / serverside around buffer sizes - but that's before all the recent-ish
> changes...

That thought occurred to me too but I haven't investigated yet.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



fixing a few backup compression goofs

От
Robert Haas
Дата:
On Wed, Mar 23, 2022 at 5:52 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> I think you should also test the return value when setting the compress level.
> Not only because it's generally a good idea, but also because I suggested to
> support negative compression levels.  Which weren't allowed before v1.3.4, and
> then the range is only defined since 1.3.6 (ZSTD_minCLevel).  At some point,
> the range may have been -7..22 but now it's -131072..22.

Hi,

The attached patch fixes a few goofs around backup compression. It
adds a check that setting the compression level succeeds, although it
does not allow the broader range of compression levels Justin notes
above. That can be done separately, I guess, if we want to do it. It
also fixes the problem that client and server-side zstd compression
don't actually compress equally well; that turned out to be a bug in
the handling of compression options. Finally it adds an exit call to
an unlikely failure case so that we would, if that case should occur,
print a message and exit, rather than the current behavior of printing
a message and then dereferencing a null pointer.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: fixing a few backup compression goofs

От
Dipesh Pandit
Дата:
Hi,

The changes look good to me.

Thanks,
Dipesh

Re: multithreaded zstd backup compression for client and server

От
Justin Pryzby
Дата:
On Wed, Mar 23, 2022 at 06:57:04PM -0400, Robert Haas wrote:
> On Wed, Mar 23, 2022 at 5:52 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > Also because the library may not be compiled with threading.  A few days ago, I
> > tried to rebase the original "parallel workers" patch over the COMPRESS DETAIL
> > patch but then couldn't test it, even after trying various versions of the zstd
> > package and trying to compile it locally.  I'll try again soon...
> 
> Ah. Right, I can update the comment to mention that.

Actually, I suggest to remove those comments:
| "We check for failure here because..."

That should be the rule rather than the exception, so shouldn't require
justifying why one might checks the return value of library and system calls.

In bbsink_zstd_new(), I think you need to check to see if workers were
requested (same as the issue you found with "level").  If someone builds
against a version of zstd which doesn't support some parameter, you'll
currently call SetParameter with that flag anyway, with a default value.
That's not currently breaking anything for me (even though workers=N doesn't
work) but I think it's fragile and could break, maybe when compiled against an
old zstd, or with future options.  SetParameter should only be called when the
user requested to set the parameter.  I handled that for workers in 003, but
didn't touch "level", which is probably fine, but maybe should change for
consistency.

src/backend/replication/basebackup_zstd.c:              elog(ERROR, "could not set zstd compression level to %d: %s",
src/bin/pg_basebackup/bbstreamer_gzip.c:                pg_log_error("could not set compression level %d: %s",
src/bin/pg_basebackup/bbstreamer_zstd.c:                        pg_log_error("could not set compression level to: %d:
%s",

I'm not sure why these messages sometimes mention the current compression
method and sometimes don't.  I suggest that they shouldn't - errcontext will
have the algorithm, and the user already specified it anyway.  It'd allow the
compiler to merge strings.

Here's a patch for zstd --long mode.  (I don't actually use pg_basebackup, but
I will want to use long mode with pg_dump).  The "strategy" params may also be
interesting, but I haven't played with it.  rsyncable is certainly interesting,
but currently an experimental, nonpublic interface - and a good example of why
to not call SetParameter for params which the user didn't specify: PGDG might
eventually compile postgres against a zstd which supports rsyncable flag.  And
someone might install somewhere which doesn't support rsyncable, but the server
would try to call SetParameter(rsyncable, 0), and the rsyncable ID number
would've changed, so zstd would probably reject it, and basebackup would be
unusable...

$ time src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method=none --no-manifest -Z zstd:long=1 --checkpoint
fast|wc -c
 
4625935
real    0m1,334s

$ time src/bin/pg_basebackup/pg_basebackup -h /tmp -Ft -D- --wal-method=none --no-manifest -Z zstd:long=0 --checkpoint
fast|wc -c
 
8426516
real    0m0,880s

Вложения

Re: fixing a few backup compression goofs

От
Robert Haas
Дата:
On Fri, Mar 25, 2022 at 9:23 AM Dipesh Pandit <dipesh.pandit@gmail.com> wrote:
> The changes look good to me.

Thanks. Committed.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: multithreaded zstd backup compression for client and server

От
Robert Haas
Дата:
On Sun, Mar 27, 2022 at 4:50 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> Actually, I suggest to remove those comments:
> | "We check for failure here because..."
>
> That should be the rule rather than the exception, so shouldn't require
> justifying why one might checks the return value of library and system calls.

I went for modifying the comment rather than removing it. I agree with
you that checking for failure doesn't really require justification,
but I think that in a case like this it is useful to explain what we
know about why it might fail.

> In bbsink_zstd_new(), I think you need to check to see if workers were
> requested (same as the issue you found with "level").

Fixed.

> src/backend/replication/basebackup_zstd.c:              elog(ERROR, "could not set zstd compression level to %d:
%s",
> src/bin/pg_basebackup/bbstreamer_gzip.c:                pg_log_error("could not set compression level %d: %s",
> src/bin/pg_basebackup/bbstreamer_zstd.c:                        pg_log_error("could not set compression level to: %d:
%s",
>
> I'm not sure why these messages sometimes mention the current compression
> method and sometimes don't.  I suggest that they shouldn't - errcontext will
> have the algorithm, and the user already specified it anyway.  It'd allow the
> compiler to merge strings.

I don't think that errcontext() helps here. On the client side, it
doesn't exist. On the server side, it's not in use. I do see
STATEMENT: <whatever> in the server log when a replication command
throws a server-side error, which is similar, but pg_basebackup
doesn't display that STATEMENT line. I don't really know how to
balance the legitimate desire for future messages against the
also-legitimate desire for clarity about where things are failing. I'm
slightly inclined to think that including the algorithm name is
better, because options are in the end algorithm-specific, but it's
certainly debatable. I would be interested in hearing other
opinions...

Here's an updated and rebased version of my patch.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: multithreaded zstd backup compression for client and server

От
Robert Haas
Дата:
On Mon, Mar 28, 2022 at 12:57 PM Robert Haas <robertmhaas@gmail.com> wrote:
> Here's an updated and rebased version of my patch.

Well, that only updated the comment on the client side. Let's try again.

--
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c (zstd workers)

От
Robert Haas
Дата:
On Mon, Mar 28, 2022 at 4:53 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> I suggest to write it differently, as in 0002.

That doesn't seem better to me. What's the argument for it?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c (zstd workers)

От
Justin Pryzby
Дата:
On Mon, Mar 28, 2022 at 05:39:31PM -0400, Robert Haas wrote:
> On Mon, Mar 28, 2022 at 4:53 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > I suggest to write it differently, as in 0002.
> 
> That doesn't seem better to me. What's the argument for it?

I find this much easier to understand:

                /* If we got an error or have reached the end of the string, stop. */
                                                                                                          
 
-               if (result->parse_error != NULL || *kwend == '\0' || *vend == '\0')
                                                                                                          
 
+               if (result->parse_error != NULL)
                                                                                                          
 
+                       break;
                                                                                                          
 
+               if (*kwend == '\0')
                                                                                                          
 
+                       break;
                                                                                                          
 
+               if (vend != NULL && *vend == '\0')
                                                                                                          
 
                        break;
                                                                                                          
 

than

                /* If we got an error or have reached the end of the string, stop. */
-               if (result->parse_error != NULL || *kwend == '\0' || *vend == '\0')
+               if (result->parse_error != NULL ||
+                       (vend == NULL ? *kwend == '\0' : *vend == '\0'))

Also, why wouldn't *kwend be checked in any case ?



Re: refactoring basebackup.c (zstd workers)

От
Tom Lane
Дата:
Justin Pryzby <pryzby@telsasoft.com> writes:
> Also, why wouldn't *kwend be checked in any case ?

I suspect Robert wrote it that way intentionally --- but if so,
I agree it could do with more than zero commentary.

            regards, tom lane



Re: refactoring basebackup.c (zstd workers)

От
Robert Haas
Дата:
On Mon, Mar 28, 2022 at 8:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I suspect Robert wrote it that way intentionally --- but if so,
> I agree it could do with more than zero commentary.

Well, the point is, we stop advancing kwend when we get to the end of
the keyword, and *vend when we get to the end of the value. If there's
a value, the end of the keyword can't have been the end of the string,
but the end of the value might have been. If there's no value, the end
of the keyword could be the end of the string.

Maybe if I just put that last sentence into the comment it's clear enough?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c (zstd workers)

От
Robert Haas
Дата:
On Tue, Mar 29, 2022 at 8:51 AM Robert Haas <robertmhaas@gmail.com> wrote:
> On Mon, Mar 28, 2022 at 8:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > I suspect Robert wrote it that way intentionally --- but if so,
> > I agree it could do with more than zero commentary.
>
> Well, the point is, we stop advancing kwend when we get to the end of
> the keyword, and *vend when we get to the end of the value. If there's
> a value, the end of the keyword can't have been the end of the string,
> but the end of the value might have been. If there's no value, the end
> of the keyword could be the end of the string.
>
> Maybe if I just put that last sentence into the comment it's clear enough?

Done that way, since I thought it was better to fix the bug than wait
for more feedback on the wording. We can still adjust the wording, or
the coding, if it's not clear enough.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c (zstd workers)

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
>> Maybe if I just put that last sentence into the comment it's clear enough?

> Done that way, since I thought it was better to fix the bug than wait
> for more feedback on the wording. We can still adjust the wording, or
> the coding, if it's not clear enough.

FWIW, I thought that explanation was fine, but I was deferring to
Justin who was the one who thought things were unclear.

            regards, tom lane



Re: refactoring basebackup.c (zstd workers)

От
Justin Pryzby
Дата:
On Wed, Mar 30, 2022 at 04:14:47PM -0400, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> >> Maybe if I just put that last sentence into the comment it's clear enough?
> 
> > Done that way, since I thought it was better to fix the bug than wait
> > for more feedback on the wording. We can still adjust the wording, or
> > the coding, if it's not clear enough.
> 
> FWIW, I thought that explanation was fine, but I was deferring to
> Justin who was the one who thought things were unclear.

I still think it's unnecessarily confusing to nest "if" and "?:" conditionals
in one statement, instead of 2 or 3 separate "if"s, or "||"s.
But it's also not worth fussing over any more.



Re: refactoring basebackup.c

От
Thomas Munro
Дата:
On Thu, Mar 23, 2023 at 2:50 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> In rem: commit 3500ccc3,
>
> for X in ` grep -E '^[^*]+event_name = "'
> src/backend/utils/activity/wait_event.c |
>            sed 's/^.* = "//;s/";$//;/unknown/d' `
> do
>   if ! git grep "$X" doc/src/sgml/monitoring.sgml > /dev/null
>   then
>     echo "$X is not documented"
>   fi
> done
>
> BaseBackupSync is not documented
> BaseBackupWrite is not documented

[Resending with trimmed CC: list, because the mailing list told me to
due to a blocked account, sorry if you already got the above.]



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Wed, Mar 22, 2023 at 10:09 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > BaseBackupSync is not documented
> > BaseBackupWrite is not documented
>
> [Resending with trimmed CC: list, because the mailing list told me to
> due to a blocked account, sorry if you already got the above.]

Bummer. I'll write a patch to fix that tomorrow, unless somebody beats me to it.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Thu, Mar 23, 2023 at 4:11 PM Robert Haas <robertmhaas@gmail.com> wrote:
> On Wed, Mar 22, 2023 at 10:09 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > > BaseBackupSync is not documented
> > > BaseBackupWrite is not documented
> >
> > [Resending with trimmed CC: list, because the mailing list told me to
> > due to a blocked account, sorry if you already got the above.]
>
> Bummer. I'll write a patch to fix that tomorrow, unless somebody beats me to it.

Here's a patch for that, and a patch to add the missing error check
Peter noticed.

--
Robert Haas
EDB: http://www.enterprisedb.com

Вложения

Re: refactoring basebackup.c

От
Justin Pryzby
Дата:
On Fri, Mar 24, 2023 at 10:46:37AM -0400, Robert Haas wrote:
> On Thu, Mar 23, 2023 at 4:11 PM Robert Haas <robertmhaas@gmail.com> wrote:
> > On Wed, Mar 22, 2023 at 10:09 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> > > > BaseBackupSync is not documented
> > > > BaseBackupWrite is not documented
> > >
> > > [Resending with trimmed CC: list, because the mailing list told me to
> > > due to a blocked account, sorry if you already got the above.]
> >
> > Bummer. I'll write a patch to fix that tomorrow, unless somebody beats me to it.
> 
> Here's a patch for that, and a patch to add the missing error check
> Peter noticed.

I think these maybe got forgotten ?



Re: refactoring basebackup.c

От
Robert Haas
Дата:
On Wed, Apr 12, 2023 at 10:57 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> I think these maybe got forgotten ?

Committed.

--
Robert Haas
EDB: http://www.enterprisedb.com