Обсуждение: pg_archivecleanup with multiple slaves
Hi, First post, forgive me if this is better suited to pgsql-general. I've got streaming replication set up with two slave servers (PostgreSQL 9.0 on Ubuntu 10.04 LTS). The master pushes theWAL to an NFS export, which is in turn mounted on and picked up by the two slaves. The problem I have is that pg_archivecleanup (running on one of the slaves) was removing WAL logs before the other slavehad picked up the changes, thus breaking replication for the second slave. As an interim fix, I simply disabled theautomatic cleanup and figured I'd worry about it later. Well, later is now and I'm running out of HDD space. So, what's the best (or perhaps, correct) way to handle cleaning upWAL archives when there's more than one slave? My first thought was prefixing the pg_archivecleanup call in recovery.conf'sarchive_cleanup_command with a "sleep" of a few seconds to allow both slaves to pick up changes before WALfiles are cleaned up, but I'm afraid I'll end up with some weird race conditions, with loads of sleeping processes waitingto cleanup WAL files that have previously been cleaned up by a recently awoken process. Thanks in advance, Ben
I think you are using Log-Shipping not Streaming-Replication
http://www.postgresql.org/docs/9.0/static/different-replication-solutions.html
I would just make 2 copies of the WAL file one for each slave in different folders.
That way if one slave is offline for a period of time it can catch up when it comes back online.
http://www.postgresql.org/docs/9.0/static/different-replication-solutions.html
I would just make 2 copies of the WAL file one for each slave in different folders.
That way if one slave is offline for a period of time it can catch up when it comes back online.
On Fri, May 20, 2011 at 6:59 AM, Ben Lancaster <benlancaster@holler.co.uk> wrote:
Hi,
First post, forgive me if this is better suited to pgsql-general.
I've got streaming replication set up with two slave servers (PostgreSQL 9.0 on Ubuntu 10.04 LTS). The master pushes the WAL to an NFS export, which is in turn mounted on and picked up by the two slaves.
The problem I have is that pg_archivecleanup (running on one of the slaves) was removing WAL logs before the other slave had picked up the changes, thus breaking replication for the second slave. As an interim fix, I simply disabled the automatic cleanup and figured I'd worry about it later.
Well, later is now and I'm running out of HDD space. So, what's the best (or perhaps, correct) way to handle cleaning up WAL archives when there's more than one slave? My first thought was prefixing the pg_archivecleanup call in recovery.conf's archive_cleanup_command with a "sleep" of a few seconds to allow both slaves to pick up changes before WAL files are cleaned up, but I'm afraid I'll end up with some weird race conditions, with loads of sleeping processes waiting to cleanup WAL files that have previously been cleaned up by a recently awoken process.
Thanks in advance,
Ben
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
Tim wrote: > I would just make 2 copies of the WAL file one for each slave in > different folders. > That way if one slave is offline for a period of time it can > catch up when it comes back online. If you used hard links, that wouldn't even take any extra disk space (beyond the directory space). I would copy to a staging directory, hard link (cp -l) to each of the target directories, and then delete from the staging area. -Kevin
On 20 May 2011, at 12:53, Tim wrote: > I think you are using Log-Shipping not Streaming-Replication > http://www.postgresql.org/docs/9.0/static/different-replication-solutions.html I'm using streaming replication with log shipping as a fallback as per walkthrough here: http://brandonkonkle.com/blog/2010/oct/20/postgres-9-streaming-replication-and-django-balanc/ > I would just make 2 copies of the WAL file one for each slave in different folders. > That way if one slave is offline for a period of time it can catch up when it comes back online. ...hence using a combination of the two. For now, I've reverted to do the following on an hourly basis: for f in `find /srv/pg_backups -ctime +0.5`; do pg_archivecleanup /srv/pg_backups `basename $f`; done; ...that is, find anything that was changed within the past day and a half and archive it. Not particularly elegant, but itgives me plenty of time for a slave to go down and replay the logs.
On Fri, May 20, 2011 at 7:59 PM, Ben Lancaster <benlancaster@holler.co.uk> wrote: > The problem I have is that pg_archivecleanup (running on one of the slaves) was removing WAL logs before the other slavehad picked up the changes, thus breaking replication for the second slave. As an interim fix, I simply disabled theautomatic cleanup and figured I'd worry about it later. I'm afraid there is no clean solution. In order to address this problem, we probably should change the master for 9.2 so that it collects the information about the cutoff point from each standby and calls pg_archivecleanup. Regards, -- Fujii Masao NIPPON TELEGRAPH AND TELEPHONE CORPORATION NTT Open Source Software Center
I don't actually use streaming replication, but what exactly is the problem with the hard link for each slave solution, and the slaves handling there own pg_archivecleanup?
--
Noodle
Toll Free: 866-258-6951 x 701
Tim.Lewis@vialect.com
http://www.vialect.com
Noodle is a product of Vialect Inc
Follow Noodle on Twitter
http://www.twitter.com/noodle_news
On Fri, May 20, 2011 at 11:30 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> The problem I have is that pg_archivecleanup (running on one of the slaves) was removing WAL logs before the other slave had picked up the changes, thus breaking replication for the second slave. As an interim fix, I simply disabled the automatic cleanup and figured I'd worry about it later.I'm afraid there is no clean solution. In order to address this
problem, we probably
should change the master for 9.2 so that it collects the information
about the cutoff
point from each standby and calls pg_archivecleanup.
Regards,
--
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center
--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin
--
Noodle
Connecting People, Content & Capabilities within the Enterprise
Toll Free: 866-258-6951 x 701
Tim.Lewis@vialect.com
http://www.vialect.com
Noodle is a product of Vialect Inc
Follow Noodle on Twitter
http://www.twitter.com/noodle_news
On Fri, May 20, 2011 at 4:58 PM, Tim Lewis <Tim.Lewis@vialect.com> wrote: > I don't actually use streaming replication, but what exactly is the problem > with the hard link for each slave solution, and the slaves handling there > own pg_archivecleanup? Doing so creates a single point of failure. We had problems with links in earlier versions, so we learned the hard way to stay clear of them. -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services