Обсуждение: Best way to replicate to large number of nodes

Поиск
Список
Период
Сортировка

Best way to replicate to large number of nodes

От
Brian Peschel
Дата:
I have a replication problem I am hoping someone has come across before
and can provide a few ideas.

I am looking at a configuration of on 'writable' node and anywhere from
10 to 300 'read-only' nodes.  Almost all of these nodes will be across a
WAN from the writable node (some over slow VPN links too).  I am looking
for a way to replicate as quickly as possible from the writable node to
all the read-only nodes.  I can pretty much guarantee the read-only
nodes will never become master nodes.  Also, the updates to the writable
node are bunched and at known times (ie only updated when I want it
updated, not constant updates), but when changes occur, there are a lot
of them at once.

We have use Slony-I for other nodes.  But these are all 1 master, 2
slave configurations (where either slave could become the master).  But
some of our admins are worried about trying to maintain a very large
size cluster (ie schema changes).

I took a look at the wiki
(http://wiki.postgresql.org/wiki/Replication%2C_Clustering%2C_and_Connection_Pooling)
and nothing really jumped at me.  It sounded like pgpool or Mammoth
might be interesting, but I was hoping someone would have some opinions
before I randomly start trying stuff.

Thanks in advance,
Brian


[SPAM] Re: Best way to replicate to large number of nodes

От
Ben Chobot
Дата:
On Apr 21, 2010, at 1:41 PM, Brian Peschel wrote:

> I have a replication problem I am hoping someone has come across before and can provide a few ideas.
>
> I am looking at a configuration of on 'writable' node and anywhere from 10 to 300 'read-only' nodes.  Almost all of
thesenodes will be across a WAN from the writable node (some over slow VPN links too).  I am looking for a way to
replicateas quickly as possible from the writable node to all the read-only nodes.  I can pretty much guarantee the
read-onlynodes will never become master nodes.  Also, the updates to the writable node are bunched and at known times
(ieonly updated when I want it updated, not constant updates), but when changes occur, there are a lot of them at once. 

Two things you didn't address are the acceptable latency of keeping the read-only nodes in sync with the master - can
theybe different for a day? A minute? Do you need things to stay synchronous? Also, how big is your dataset? A simple
pg_dumpand some hot scp action after you batched updates might be able to solve your problem. 

Re: Best way to replicate to large number of nodes

От
"Greg Sabino Mullane"
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> I am looking at a configuration of on 'writable' node and anywhere from
> 10 to 300 'read-only' nodes.  Almost all of these nodes will be across a
> WAN from the writable node (some over slow VPN links too).  I am looking
> for a way to replicate as quickly as possible from the writable node to
> all the read-only nodes.  I can pretty much guarantee the read-only
> nodes will never become master nodes.  Also, the updates to the writable
> node are bunched and at known times (ie only updated when I want it
> updated, not constant updates), but when changes occur, there are a lot
> of them at once.
>
> We have use Slony-I for other nodes.  But these are all 1 master, 2
> slave configurations (where either slave could become the master).  But
> some of our admins are worried about trying to maintain a very large
> size cluster (ie schema changes).

If you are looking for something that can do automatic schema changes,
that's a considerably different requirement. But if you really just want
a whole bunch of read-only slaves, you might consider:

* Postgres 9.0 and WAL shipping. While not released yet, the overhead
is very low compared to some other solutions, so it might be worthwhile
to try this out and perhaps 9.0 will be stable and released by the time
your project goes live. This will also handle schema changes. You'd probably
want some sort of multicasting for the WAL files.

* Bucardo. Like Slony, it's trigger based, so there is the overhead of
triggers firing on every change to the master tables. Unlike Slony, you
can have it sit dormant until you want to push the changes. It also only
requires a single daemon. However, you'll probably want to setup cascading
slaves and use more than one daemon when the number of slaves climbs over
10 or so.

* Slony has a log shipping option that might be good for this use case.

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation http://www.endpoint.com/
PGP Key: 0x14964AC8 201004221122
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8

-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkvQamkACgkQvJuQZxSWSsgG0gCgln0D7eghnu8PX+0CBqKTAoDf
s6cAoOXLcA7JKa4nfTfJuat1VSN3WxSL
=q7r5
-----END PGP SIGNATURE-----



Re: [SPAM] Re: Best way to replicate to large number of nodes

От
Brian Peschel
Дата:
On 04/22/2010 10:12 AM, Ben Chobot wrote:
> On Apr 21, 2010, at 1:41 PM, Brian Peschel wrote:
>
>
>> I have a replication problem I am hoping someone has come across before and can provide a few ideas.
>>
>> I am looking at a configuration of on 'writable' node and anywhere from 10 to 300 'read-only' nodes.  Almost all of
thesenodes will be across a WAN from the writable node (some over slow VPN links too).  I am looking for a way to
replicateas quickly as possible from the writable node to all the read-only nodes.  I can pretty much guarantee the
read-onlynodes will never become master nodes.  Also, the updates to the writable node are bunched and at known times
(ieonly updated when I want it updated, not constant updates), but when changes occur, there are a lot of them at once. 
>>
> Two things you didn't address are the acceptable latency of keeping the read-only nodes in sync with the master - can
theybe different for a day? A minute? Do you need things to stay synchronous? Also, how big is your dataset? A simple
pg_dumpand some hot scp action after you batched updates might be able to solve your problem. 

Latency is important.  I would say 10 to 15 minutes max, but the shorter
the better.  I don't have an exact size, but I believe the entire DB is
about 10 gig.

We had an idea of creating our apps write the SQL statements to a file,
rather than using an ODBC drive to directly change the DBs.  Then we
could scp/rsync the files to the remote machines and execute them
there.  This just seems like a very manual process though.

- Brian

Re: [SPAM] Re: Best way to replicate to large number of nodes

От
Jasen Betts
Дата:
On 2010-04-22, Brian Peschel <brianp@occinc.com> wrote:
>
> On 04/22/2010 10:12 AM, Ben Chobot wrote:
>> On Apr 21, 2010, at 1:41 PM, Brian Peschel wrote:
>>
>>
>>> I have a replication problem I am hoping someone has come across before and can provide a few ideas.
>>>
>>> I am looking at a configuration of on 'writable' node and anywhere from 10 to 300 'read-only' nodes.  Almost all of
thesenodes will be across a WAN from the writable node (some over slow VPN links too).  I am looking for a way to
replicateas quickly as possible from the writable node to all the read-only nodes.  I can pretty much guarantee the
read-onlynodes will never become master nodes.  Also, the updates to the writable node are bunched and at known times
(ieonly updated when I want it updated, not constant updates), but when changes occur, there are a lot of them at once. 
>>>
>> Two things you didn't address are the acceptable latency of keeping the read-only nodes in sync with the master -
canthey be different for a day? A minute? Do you need things to stay synchronous? Also, how big is your dataset? A
simplepg_dump and some hot scp action after you batched updates might be able to solve your problem. 
>
> Latency is important.  I would say 10 to 15 minutes max, but the shorter
> the better.  I don't have an exact size, but I believe the entire DB is
> about 10 gig.

should not be a problem 10 to 15 second latency is easy to get over
slow connections (eg satellite) with any of the proposed solutions.

> We had an idea of creating our apps write the SQL statements to a file,
> rather than using an ODBC drive to directly change the DBs.  Then we
> could scp/rsync the files to the remote machines and execute them
> there.  This just seems like a very manual process though.

yes, and furthermore SQL-replication tends not to work as intended
if you have any updates or inserts that invoke non-constant default
values like now(), nextvalue(...), or random()





Re: [SPAM] Re: Best way to replicate to large number of nodes

От
Cédric Villemain
Дата:
2010/4/22 Brian Peschel <brianp@occinc.com>:
>
> On 04/22/2010 10:12 AM, Ben Chobot wrote:
>>
>> On Apr 21, 2010, at 1:41 PM, Brian Peschel wrote:
>>
>>
>>>
>>> I have a replication problem I am hoping someone has come across before
>>> and can provide a few ideas.
>>>
>>> I am looking at a configuration of on 'writable' node and anywhere from
>>> 10 to 300 'read-only' nodes.  Almost all of these nodes will be across a WAN
>>> from the writable node (some over slow VPN links too).  I am looking for a
>>> way to replicate as quickly as possible from the writable node to all the
>>> read-only nodes.  I can pretty much guarantee the read-only nodes will never
>>> become master nodes.  Also, the updates to the writable node are bunched and
>>> at known times (ie only updated when I want it updated, not constant
>>> updates), but when changes occur, there are a lot of them at once.
>>>
>>
>> Two things you didn't address are the acceptable latency of keeping the
>> read-only nodes in sync with the master - can they be different for a day? A
>> minute? Do you need things to stay synchronous? Also, how big is your
>> dataset? A simple pg_dump and some hot scp action after you batched updates
>> might be able to solve your problem.
>
> Latency is important.  I would say 10 to 15 minutes max, but the shorter the
> better.  I don't have an exact size, but I believe the entire DB is about 10
> gig.
>
> We had an idea of creating our apps write the SQL statements to a file,
> rather than using an ODBC drive to directly change the DBs.  Then we could
> scp/rsync the files to the remote machines and execute them there.  This
> just seems like a very manual process though.

You need to have a look at PgQ. (in short, skytools will do exactly
what you want if I understand correctly your requirments, londiste
being somewhat like slony)

>
> - Brian
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>



--
Cédric Villemain