Обсуждение: Best way to replicate to large number of nodes
I have a replication problem I am hoping someone has come across before and can provide a few ideas. I am looking at a configuration of on 'writable' node and anywhere from 10 to 300 'read-only' nodes. Almost all of these nodes will be across a WAN from the writable node (some over slow VPN links too). I am looking for a way to replicate as quickly as possible from the writable node to all the read-only nodes. I can pretty much guarantee the read-only nodes will never become master nodes. Also, the updates to the writable node are bunched and at known times (ie only updated when I want it updated, not constant updates), but when changes occur, there are a lot of them at once. We have use Slony-I for other nodes. But these are all 1 master, 2 slave configurations (where either slave could become the master). But some of our admins are worried about trying to maintain a very large size cluster (ie schema changes). I took a look at the wiki (http://wiki.postgresql.org/wiki/Replication%2C_Clustering%2C_and_Connection_Pooling) and nothing really jumped at me. It sounded like pgpool or Mammoth might be interesting, but I was hoping someone would have some opinions before I randomly start trying stuff. Thanks in advance, Brian
On Apr 21, 2010, at 1:41 PM, Brian Peschel wrote: > I have a replication problem I am hoping someone has come across before and can provide a few ideas. > > I am looking at a configuration of on 'writable' node and anywhere from 10 to 300 'read-only' nodes. Almost all of thesenodes will be across a WAN from the writable node (some over slow VPN links too). I am looking for a way to replicateas quickly as possible from the writable node to all the read-only nodes. I can pretty much guarantee the read-onlynodes will never become master nodes. Also, the updates to the writable node are bunched and at known times (ieonly updated when I want it updated, not constant updates), but when changes occur, there are a lot of them at once. Two things you didn't address are the acceptable latency of keeping the read-only nodes in sync with the master - can theybe different for a day? A minute? Do you need things to stay synchronous? Also, how big is your dataset? A simple pg_dumpand some hot scp action after you batched updates might be able to solve your problem.
-----BEGIN PGP SIGNED MESSAGE----- Hash: RIPEMD160 > I am looking at a configuration of on 'writable' node and anywhere from > 10 to 300 'read-only' nodes. Almost all of these nodes will be across a > WAN from the writable node (some over slow VPN links too). I am looking > for a way to replicate as quickly as possible from the writable node to > all the read-only nodes. I can pretty much guarantee the read-only > nodes will never become master nodes. Also, the updates to the writable > node are bunched and at known times (ie only updated when I want it > updated, not constant updates), but when changes occur, there are a lot > of them at once. > > We have use Slony-I for other nodes. But these are all 1 master, 2 > slave configurations (where either slave could become the master). But > some of our admins are worried about trying to maintain a very large > size cluster (ie schema changes). If you are looking for something that can do automatic schema changes, that's a considerably different requirement. But if you really just want a whole bunch of read-only slaves, you might consider: * Postgres 9.0 and WAL shipping. While not released yet, the overhead is very low compared to some other solutions, so it might be worthwhile to try this out and perhaps 9.0 will be stable and released by the time your project goes live. This will also handle schema changes. You'd probably want some sort of multicasting for the WAL files. * Bucardo. Like Slony, it's trigger based, so there is the overhead of triggers firing on every change to the master tables. Unlike Slony, you can have it sit dormant until you want to push the changes. It also only requires a single daemon. However, you'll probably want to setup cascading slaves and use more than one daemon when the number of slaves climbs over 10 or so. * Slony has a log shipping option that might be good for this use case. - -- Greg Sabino Mullane greg@turnstep.com End Point Corporation http://www.endpoint.com/ PGP Key: 0x14964AC8 201004221122 http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8 -----BEGIN PGP SIGNATURE----- iEYEAREDAAYFAkvQamkACgkQvJuQZxSWSsgG0gCgln0D7eghnu8PX+0CBqKTAoDf s6cAoOXLcA7JKa4nfTfJuat1VSN3WxSL =q7r5 -----END PGP SIGNATURE-----
On 04/22/2010 10:12 AM, Ben Chobot wrote: > On Apr 21, 2010, at 1:41 PM, Brian Peschel wrote: > > >> I have a replication problem I am hoping someone has come across before and can provide a few ideas. >> >> I am looking at a configuration of on 'writable' node and anywhere from 10 to 300 'read-only' nodes. Almost all of thesenodes will be across a WAN from the writable node (some over slow VPN links too). I am looking for a way to replicateas quickly as possible from the writable node to all the read-only nodes. I can pretty much guarantee the read-onlynodes will never become master nodes. Also, the updates to the writable node are bunched and at known times (ieonly updated when I want it updated, not constant updates), but when changes occur, there are a lot of them at once. >> > Two things you didn't address are the acceptable latency of keeping the read-only nodes in sync with the master - can theybe different for a day? A minute? Do you need things to stay synchronous? Also, how big is your dataset? A simple pg_dumpand some hot scp action after you batched updates might be able to solve your problem. Latency is important. I would say 10 to 15 minutes max, but the shorter the better. I don't have an exact size, but I believe the entire DB is about 10 gig. We had an idea of creating our apps write the SQL statements to a file, rather than using an ODBC drive to directly change the DBs. Then we could scp/rsync the files to the remote machines and execute them there. This just seems like a very manual process though. - Brian
On 2010-04-22, Brian Peschel <brianp@occinc.com> wrote: > > On 04/22/2010 10:12 AM, Ben Chobot wrote: >> On Apr 21, 2010, at 1:41 PM, Brian Peschel wrote: >> >> >>> I have a replication problem I am hoping someone has come across before and can provide a few ideas. >>> >>> I am looking at a configuration of on 'writable' node and anywhere from 10 to 300 'read-only' nodes. Almost all of thesenodes will be across a WAN from the writable node (some over slow VPN links too). I am looking for a way to replicateas quickly as possible from the writable node to all the read-only nodes. I can pretty much guarantee the read-onlynodes will never become master nodes. Also, the updates to the writable node are bunched and at known times (ieonly updated when I want it updated, not constant updates), but when changes occur, there are a lot of them at once. >>> >> Two things you didn't address are the acceptable latency of keeping the read-only nodes in sync with the master - canthey be different for a day? A minute? Do you need things to stay synchronous? Also, how big is your dataset? A simplepg_dump and some hot scp action after you batched updates might be able to solve your problem. > > Latency is important. I would say 10 to 15 minutes max, but the shorter > the better. I don't have an exact size, but I believe the entire DB is > about 10 gig. should not be a problem 10 to 15 second latency is easy to get over slow connections (eg satellite) with any of the proposed solutions. > We had an idea of creating our apps write the SQL statements to a file, > rather than using an ODBC drive to directly change the DBs. Then we > could scp/rsync the files to the remote machines and execute them > there. This just seems like a very manual process though. yes, and furthermore SQL-replication tends not to work as intended if you have any updates or inserts that invoke non-constant default values like now(), nextvalue(...), or random()
2010/4/22 Brian Peschel <brianp@occinc.com>: > > On 04/22/2010 10:12 AM, Ben Chobot wrote: >> >> On Apr 21, 2010, at 1:41 PM, Brian Peschel wrote: >> >> >>> >>> I have a replication problem I am hoping someone has come across before >>> and can provide a few ideas. >>> >>> I am looking at a configuration of on 'writable' node and anywhere from >>> 10 to 300 'read-only' nodes. Almost all of these nodes will be across a WAN >>> from the writable node (some over slow VPN links too). I am looking for a >>> way to replicate as quickly as possible from the writable node to all the >>> read-only nodes. I can pretty much guarantee the read-only nodes will never >>> become master nodes. Also, the updates to the writable node are bunched and >>> at known times (ie only updated when I want it updated, not constant >>> updates), but when changes occur, there are a lot of them at once. >>> >> >> Two things you didn't address are the acceptable latency of keeping the >> read-only nodes in sync with the master - can they be different for a day? A >> minute? Do you need things to stay synchronous? Also, how big is your >> dataset? A simple pg_dump and some hot scp action after you batched updates >> might be able to solve your problem. > > Latency is important. I would say 10 to 15 minutes max, but the shorter the > better. I don't have an exact size, but I believe the entire DB is about 10 > gig. > > We had an idea of creating our apps write the SQL statements to a file, > rather than using an ODBC drive to directly change the DBs. Then we could > scp/rsync the files to the remote machines and execute them there. This > just seems like a very manual process though. You need to have a look at PgQ. (in short, skytools will do exactly what you want if I understand correctly your requirments, londiste being somewhat like slony) > > - Brian > > -- > Sent via pgsql-general mailing list (pgsql-general@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-general > -- Cédric Villemain