Обсуждение: Making pg_standby compression-friendly

Поиск
Список
Период
Сортировка

Making pg_standby compression-friendly

От
"Charles Duffy"
Дата:
Howdy, all.

I'm interested in compressing archived WAL segments in an environment
set up for PITR in the interests of reducing both network traffic and
storage requirements. However, pg_standby presently checks file sizes,
requiring that an archive segment be exactly the right size to be
considered valid. The idea of compressing log segments is not new --
the clearxlogtail project in pgfoundry provides a tool to make such
compression more effective, and is explicitly intended for said
purpose -- but as of 8.3.4, pg_standby appears not to support such
environments; I propose adding such support.

To allow pg_standby to operate in an environment where archive
segments are compressed, two behaviors are necessary:

 - suppressing the file-size checks. This puts the onus on the user to
create these files via an atomic mechanism, but is necessary to allow
compressed files to be considered.
 - allowing a custom restore command to be provided. This permits the
user to specify the mechanism to be used to decompress the segment.
One bikeshed is determining whether the user should pass in a command
suitable for use in a pipeline or a command which accepts input and
output as arguments.

A sample implementation is attached, intended only to kickstart
discussion; I'm not attached to either its implementation or its
proposed command-line syntax.

Thoughts?

Вложения

Re: Making pg_standby compression-friendly

От
Heikki Linnakangas
Дата:
Charles Duffy wrote:
> I'm interested in compressing archived WAL segments in an environment
> set up for PITR in the interests of reducing both network traffic and
> storage requirements. However, pg_standby presently checks file sizes,
> requiring that an archive segment be exactly the right size to be
> considered valid. The idea of compressing log segments is not new --
> the clearxlogtail project in pgfoundry provides a tool to make such
> compression more effective, and is explicitly intended for said
> purpose -- but as of 8.3.4, pg_standby appears not to support such
> environments; I propose adding such support.

Can't you decompress the files in whatever script you use to copy them 
to the archive location?

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Making pg_standby compression-friendly

От
"Charles Duffy"
Дата:
On Thu, Oct 23, 2008 at 1:15 AM, Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> wrote:
Charles Duffy wrote:
I'm interested in compressing archived WAL segments in an environment
set up for PITR in the interests of reducing both network traffic and
storage requirements. However, pg_standby presently checks file sizes,
requiring that an archive segment be exactly the right size to be
considered valid. The idea of compressing log segments is not new --
the clearxlogtail project in pgfoundry provides a tool to make such
compression more effective, and is explicitly intended for said
purpose -- but as of 8.3.4, pg_standby appears not to support such
environments; I propose adding such support.

Can't you decompress the files in whatever script you use to copy them to the archive location?

To be sure I understand -- you're proposing a scenario in which the archive_command on the master compresses the files, passes them over to the slave while compressed, and then decompresses them on the slave for storage in their decompressed state? That succeeds in the goal of decreasing network bandwidth, but (1) isn't necessarily easy to implement over NFS, and (2) doesn't succeed in decreasing storage requirements on the slave.

(While pg_standby's behavior is to delete segments which are no longer needed to keep a warm standby slave running, I maintain a separate archive for PITR use with hardlinked copies of those same archive segments; storage on the slave is a much bigger issue in this environment than it would be if the space used for segments were being deallocated as soon as pg_standby chose to unlink them).


[Heikki, please accept my apologies for the initial off-list response; I wasn't paying enough attention to gmail's default reply behavior].

Re: Making pg_standby compression-friendly

От
"Koichi Suzuki"
Дата:
In terms of compress/decompress WAL in archive/restore, please take a
look at my project pglesslog,
http://pgfoundry.org/projects/pglesslog/

This project compresses WAL segment by replacing full page writes with
corresponding incremental logs.   When restored, it inserts dummy WAL
record to maintain LSN and file size.

This can be applied to log-shipping mechanism, asynchronous or synchronous.


2008/10/23 Charles Duffy <charles@dyfis.net>:
> Howdy, all.
>
> I'm interested in compressing archived WAL segments in an environment
> set up for PITR in the interests of reducing both network traffic and
> storage requirements. However, pg_standby presently checks file sizes,
> requiring that an archive segment be exactly the right size to be
> considered valid. The idea of compressing log segments is not new --
> the clearxlogtail project in pgfoundry provides a tool to make such
> compression more effective, and is explicitly intended for said
> purpose -- but as of 8.3.4, pg_standby appears not to support such
> environments; I propose adding such support.
>
> To allow pg_standby to operate in an environment where archive
> segments are compressed, two behaviors are necessary:
>
>  - suppressing the file-size checks. This puts the onus on the user to
> create these files via an atomic mechanism, but is necessary to allow
> compressed files to be considered.
>  - allowing a custom restore command to be provided. This permits the
> user to specify the mechanism to be used to decompress the segment.
> One bikeshed is determining whether the user should pass in a command
> suitable for use in a pipeline or a command which accepts input and
> output as arguments.
>
> A sample implementation is attached, intended only to kickstart
> discussion; I'm not attached to either its implementation or its
> proposed command-line syntax.
>
> Thoughts?
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>



-- 
------
Koichi Suzuki


Re: Making pg_standby compression-friendly

От
Heikki Linnakangas
Дата:
Koichi Suzuki wrote:
> In terms of compress/decompress WAL in archive/restore, please take a
> look at my project pglesslog,
> http://pgfoundry.org/projects/pglesslog/
> 
> This project compresses WAL segment by replacing full page writes with
> corresponding incremental logs.   When restored, it inserts dummy WAL
> record to maintain LSN and file size.
> 
> This can be applied to log-shipping mechanism, asynchronous or synchronous.

I believe Charles' question was: how do you hook that decompression into 
pg_standby? I suggested that whatever script is run on the standby 
server to copy xlog files to the archive location, should also call the 
decompression program, like pglesslog, but apparently there is no such 
script in his setup. How would you set up a standby server, using 
pg_lesslog?

> 2008/10/23 Charles Duffy <charles@dyfis.net>:
>> Howdy, all.
>>
>> I'm interested in compressing archived WAL segments in an environment
>> set up for PITR in the interests of reducing both network traffic and
>> storage requirements. However, pg_standby presently checks file sizes,
>> requiring that an archive segment be exactly the right size to be
>> considered valid. The idea of compressing log segments is not new --
>> the clearxlogtail project in pgfoundry provides a tool to make such
>> compression more effective, and is explicitly intended for said
>> purpose -- but as of 8.3.4, pg_standby appears not to support such
>> environments; I propose adding such support.
>>
>> To allow pg_standby to operate in an environment where archive
>> segments are compressed, two behaviors are necessary:
>>
>>  - suppressing the file-size checks. This puts the onus on the user to
>> create these files via an atomic mechanism, but is necessary to allow
>> compressed files to be considered.
>>  - allowing a custom restore command to be provided. This permits the
>> user to specify the mechanism to be used to decompress the segment.
>> One bikeshed is determining whether the user should pass in a command
>> suitable for use in a pipeline or a command which accepts input and
>> output as arguments.
>>
>> A sample implementation is attached, intended only to kickstart
>> discussion; I'm not attached to either its implementation or its
>> proposed command-line syntax.
>>
>> Thoughts?
>>
>>
>> --
>> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-hackers
>>
>>
> 
> 
> 


--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: Making pg_standby compression-friendly

От
Charles Duffy
Дата:
In the absence of further feedback from 'yall (and in the presence of 
some positive results from internal QA), I'm adding the posted patch 
as-is to the 2008-11 CommitFest queue. That said, any such additional 
feedback would be gratefully appreciated.



Re: Making pg_standby compression-friendly

От
"Koichi Suzuki"
Дата:
As Heikki pointed out, the issue is not to decompress the compressed
WAL, but also how we can keep archive log still compressed after it is
handled by pg_standby.

I'm afraid pg_standby cannot handle this solely, may need some support
by the pg core.   For example, after closing archive log in archive
recovery, pg_core can call some backend to re-compress the archive log
for later use.

I'm not sure if archive_commend argument works in this scene too, but
very sceptical not.

Any further thoughts?
-----------------
Koichi Suzuki

2008/10/25 Charles Duffy <Charles_Duffy@messageone.com>:
> In the absence of further feedback from 'yall (and in the presence of some
> positive results from internal QA), I'm adding the posted patch as-is to the
> 2008-11 CommitFest queue. That said, any such additional feedback would be
> gratefully appreciated.
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>


Re: Making pg_standby compression-friendly

От
"Kevin Grittner"
Дата:
>>> "Koichi Suzuki" <koichi.szk@gmail.com> wrote: 
> As Heikki pointed out, the issue is not to decompress the compressed
> WAL, but also how we can keep archive log still compressed after it
is
> handled by pg_standby.
> 
> I'm afraid pg_standby cannot handle this solely, may need some
support
> by the pg core.   For example, after closing archive log in archive
> recovery, pg_core can call some backend to re-compress the archive
log
> for later use.
Why decompress and re-compress?  We're using simple bash scripts, so I
can't speak to pg_standby; but we just pipe the file through gunzip in
the script called by recovery.conf.  The source file isn't modified --
it stays compressed for archiving.
-Kevin


Re: Making pg_standby compression-friendly

От
Charles Duffy
Дата:
Koichi Suzuki wrote:
> As Heikki pointed out, the issue is not to decompress the compressed
> WAL, but also how we can keep archive log still compressed after it is
> handled by pg_standby.

pg_standby makes a *copy* of the segment from the archive, and need only 
ensure that the copy is decompressed; it has no reason to ever 
decompress the original version in the archive.

I don't see the problem here.