Обсуждение: Making pg_standby compression-friendly
Howdy, all. I'm interested in compressing archived WAL segments in an environment set up for PITR in the interests of reducing both network traffic and storage requirements. However, pg_standby presently checks file sizes, requiring that an archive segment be exactly the right size to be considered valid. The idea of compressing log segments is not new -- the clearxlogtail project in pgfoundry provides a tool to make such compression more effective, and is explicitly intended for said purpose -- but as of 8.3.4, pg_standby appears not to support such environments; I propose adding such support. To allow pg_standby to operate in an environment where archive segments are compressed, two behaviors are necessary: - suppressing the file-size checks. This puts the onus on the user to create these files via an atomic mechanism, but is necessary to allow compressed files to be considered. - allowing a custom restore command to be provided. This permits the user to specify the mechanism to be used to decompress the segment. One bikeshed is determining whether the user should pass in a command suitable for use in a pipeline or a command which accepts input and output as arguments. A sample implementation is attached, intended only to kickstart discussion; I'm not attached to either its implementation or its proposed command-line syntax. Thoughts?
Вложения
Charles Duffy wrote: > I'm interested in compressing archived WAL segments in an environment > set up for PITR in the interests of reducing both network traffic and > storage requirements. However, pg_standby presently checks file sizes, > requiring that an archive segment be exactly the right size to be > considered valid. The idea of compressing log segments is not new -- > the clearxlogtail project in pgfoundry provides a tool to make such > compression more effective, and is explicitly intended for said > purpose -- but as of 8.3.4, pg_standby appears not to support such > environments; I propose adding such support. Can't you decompress the files in whatever script you use to copy them to the archive location? -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
On Thu, Oct 23, 2008 at 1:15 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
Charles Duffy wrote:Can't you decompress the files in whatever script you use to copy them to the archive location?I'm interested in compressing archived WAL segments in an environment
set up for PITR in the interests of reducing both network traffic and
storage requirements. However, pg_standby presently checks file sizes,
requiring that an archive segment be exactly the right size to be
considered valid. The idea of compressing log segments is not new --
the clearxlogtail project in pgfoundry provides a tool to make such
compression more effective, and is explicitly intended for said
purpose -- but as of 8.3.4, pg_standby appears not to support such
environments; I propose adding such support.
To be sure I understand -- you're proposing a scenario in which the archive_command on the master compresses the files, passes them over to the slave while compressed, and then decompresses them on the slave for storage in their decompressed state? That succeeds in the goal of decreasing network bandwidth, but (1) isn't necessarily easy to implement over NFS, and (2) doesn't succeed in decreasing storage requirements on the slave.
(While pg_standby's behavior is to delete segments which are no longer needed to keep a warm standby slave running, I maintain a separate archive for PITR use with hardlinked copies of those same archive segments; storage on the slave is a much bigger issue in this environment than it would be if the space used for segments were being deallocated as soon as pg_standby chose to unlink them).
[Heikki, please accept my apologies for the initial off-list response; I wasn't paying enough attention to gmail's default reply behavior].
In terms of compress/decompress WAL in archive/restore, please take a look at my project pglesslog, http://pgfoundry.org/projects/pglesslog/ This project compresses WAL segment by replacing full page writes with corresponding incremental logs. When restored, it inserts dummy WAL record to maintain LSN and file size. This can be applied to log-shipping mechanism, asynchronous or synchronous. 2008/10/23 Charles Duffy <charles@dyfis.net>: > Howdy, all. > > I'm interested in compressing archived WAL segments in an environment > set up for PITR in the interests of reducing both network traffic and > storage requirements. However, pg_standby presently checks file sizes, > requiring that an archive segment be exactly the right size to be > considered valid. The idea of compressing log segments is not new -- > the clearxlogtail project in pgfoundry provides a tool to make such > compression more effective, and is explicitly intended for said > purpose -- but as of 8.3.4, pg_standby appears not to support such > environments; I propose adding such support. > > To allow pg_standby to operate in an environment where archive > segments are compressed, two behaviors are necessary: > > - suppressing the file-size checks. This puts the onus on the user to > create these files via an atomic mechanism, but is necessary to allow > compressed files to be considered. > - allowing a custom restore command to be provided. This permits the > user to specify the mechanism to be used to decompress the segment. > One bikeshed is determining whether the user should pass in a command > suitable for use in a pipeline or a command which accepts input and > output as arguments. > > A sample implementation is attached, intended only to kickstart > discussion; I'm not attached to either its implementation or its > proposed command-line syntax. > > Thoughts? > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers > > -- ------ Koichi Suzuki
Koichi Suzuki wrote: > In terms of compress/decompress WAL in archive/restore, please take a > look at my project pglesslog, > http://pgfoundry.org/projects/pglesslog/ > > This project compresses WAL segment by replacing full page writes with > corresponding incremental logs. When restored, it inserts dummy WAL > record to maintain LSN and file size. > > This can be applied to log-shipping mechanism, asynchronous or synchronous. I believe Charles' question was: how do you hook that decompression into pg_standby? I suggested that whatever script is run on the standby server to copy xlog files to the archive location, should also call the decompression program, like pglesslog, but apparently there is no such script in his setup. How would you set up a standby server, using pg_lesslog? > 2008/10/23 Charles Duffy <charles@dyfis.net>: >> Howdy, all. >> >> I'm interested in compressing archived WAL segments in an environment >> set up for PITR in the interests of reducing both network traffic and >> storage requirements. However, pg_standby presently checks file sizes, >> requiring that an archive segment be exactly the right size to be >> considered valid. The idea of compressing log segments is not new -- >> the clearxlogtail project in pgfoundry provides a tool to make such >> compression more effective, and is explicitly intended for said >> purpose -- but as of 8.3.4, pg_standby appears not to support such >> environments; I propose adding such support. >> >> To allow pg_standby to operate in an environment where archive >> segments are compressed, two behaviors are necessary: >> >> - suppressing the file-size checks. This puts the onus on the user to >> create these files via an atomic mechanism, but is necessary to allow >> compressed files to be considered. >> - allowing a custom restore command to be provided. This permits the >> user to specify the mechanism to be used to decompress the segment. >> One bikeshed is determining whether the user should pass in a command >> suitable for use in a pipeline or a command which accepts input and >> output as arguments. >> >> A sample implementation is attached, intended only to kickstart >> discussion; I'm not attached to either its implementation or its >> proposed command-line syntax. >> >> Thoughts? >> >> >> -- >> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-hackers >> >> > > > -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
In the absence of further feedback from 'yall (and in the presence of some positive results from internal QA), I'm adding the posted patch as-is to the 2008-11 CommitFest queue. That said, any such additional feedback would be gratefully appreciated.
As Heikki pointed out, the issue is not to decompress the compressed WAL, but also how we can keep archive log still compressed after it is handled by pg_standby. I'm afraid pg_standby cannot handle this solely, may need some support by the pg core. For example, after closing archive log in archive recovery, pg_core can call some backend to re-compress the archive log for later use. I'm not sure if archive_commend argument works in this scene too, but very sceptical not. Any further thoughts? ----------------- Koichi Suzuki 2008/10/25 Charles Duffy <Charles_Duffy@messageone.com>: > In the absence of further feedback from 'yall (and in the presence of some > positive results from internal QA), I'm adding the posted patch as-is to the > 2008-11 CommitFest queue. That said, any such additional feedback would be > gratefully appreciated. > > > -- > Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-hackers >
>>> "Koichi Suzuki" <koichi.szk@gmail.com> wrote: > As Heikki pointed out, the issue is not to decompress the compressed > WAL, but also how we can keep archive log still compressed after it is > handled by pg_standby. > > I'm afraid pg_standby cannot handle this solely, may need some support > by the pg core. For example, after closing archive log in archive > recovery, pg_core can call some backend to re-compress the archive log > for later use. Why decompress and re-compress? We're using simple bash scripts, so I can't speak to pg_standby; but we just pipe the file through gunzip in the script called by recovery.conf. The source file isn't modified -- it stays compressed for archiving. -Kevin
Koichi Suzuki wrote: > As Heikki pointed out, the issue is not to decompress the compressed > WAL, but also how we can keep archive log still compressed after it is > handled by pg_standby. pg_standby makes a *copy* of the segment from the archive, and need only ensure that the copy is decompressed; it has no reason to ever decompress the original version in the archive. I don't see the problem here.