Обсуждение: Master/Slave, DB separation or just spend $$$?

Поиск
Список
Период
Сортировка

Master/Slave, DB separation or just spend $$$?

От
Kelvin Quee
Дата:
Hi Performance Wizards!

I need advice on this.

I have a db which is being constantly updated and queried by a few
computers. We are doing datamining. The machine is running on a
moderately powered machine and processors constantly hit 90%.

At the same time, we need to present these data on a web interface.
The performance for the web interface is now very sluggish as most of
the power is occupied by the mining process.

I have thought of a few ways out of this -

1) Buy a mega powered machine (temporal solution, quick fix)
2) Do a master-slave configuration
3) Separate the DB into 2 - One for pure mining purposes, the other
purely for web serving

For (2), I do not know if it will be very effective since the master
will probably have many changes at any moment. I do not understand how
the changes will be propagated from the master to the slave without
impacting the slave's performance. Anyone with more experience here?

(3) seems ideal but is a very very painful solution!

We can possibly use a message queue system but again I am not familiar
with MQ. Will need to do more research.

If you were me, how would you solve this problem?

Thanks!


Kelvin Quee
+65 9177 3635

Re: Master/Slave, DB separation or just spend $$$?

От
Scott Marlowe
Дата:
On Tue, Jul 21, 2009 at 9:47 PM, Kelvin Quee<kelvinq@gmail.com> wrote:
> Hi Performance Wizards!
>
> I need advice on this.
>
> I have a db which is being constantly updated and queried by a few
> computers. We are doing datamining. The machine is running on a
> moderately powered machine and processors constantly hit 90%.

When your CPUs say 90%, is that regular user / sys %, or is it wait %?
The difference is very important.
What kind of hardware are you running on btw? # cpus, memory, # of
drives,type, RAID controller if any?

> At the same time, we need to present these data on a web interface.
> The performance for the web interface is now very sluggish as most of
> the power is occupied by the mining process.
>
> I have thought of a few ways out of this -
>
> 1) Buy a mega powered machine (temporal solution, quick fix)

Depends very much on what your bound by, CPU or IO.  If adding a
couple of 15K SAS drives would double your performance then u don't
need a super powerful machine.

> 2) Do a master-slave configuration

Often a good choice.

> 3) Separate the DB into 2 - One for pure mining purposes, the other
> purely for web serving
>
> For (2), I do not know if it will be very effective since the master
> will probably have many changes at any moment. I do not understand how
> the changes will be propagated from the master to the slave without
> impacting the slave's performance. Anyone with more experience here?
>
> (3) seems ideal but is a very very painful solution!
>
> We can possibly use a message queue system but again I am not familiar
> with MQ. Will need to do more research.

That could be a very complex solution.

> If you were me, how would you solve this problem?

Slony, most likely.

Re: Master/Slave, DB separation or just spend $$$?

От
Kelvin Quee
Дата:
Hi Scott,

Thanks for the quick reply.

I have been staring at *top* for a while and it's mostly been 40% in
userspace and 30% in system. Wait is rather low and never ventures
beyond 1%.

My hardware is a duo core AMD Athlon64 X2 5000+, 1GB RAM and a single
160 GB SATA II hard disk drive.

I will go look at Slony now.

Scott, one question though - If my master is constantly changing,
wouldn't the updates from the master to the slave also slow down the
slave?


Kelvin Quee
+65 9177 3635



On Wed, Jul 22, 2009 at 1:42 PM, Scott Marlowe<scott.marlowe@gmail.com> wrote:
> On Tue, Jul 21, 2009 at 9:47 PM, Kelvin Quee<kelvinq@gmail.com> wrote:
>> Hi Performance Wizards!
>>
>> I need advice on this.
>>
>> I have a db which is being constantly updated and queried by a few
>> computers. We are doing datamining. The machine is running on a
>> moderately powered machine and processors constantly hit 90%.
>
> When your CPUs say 90%, is that regular user / sys %, or is it wait %?
> The difference is very important.
> What kind of hardware are you running on btw? # cpus, memory, # of
> drives,type, RAID controller if any?
>
>> At the same time, we need to present these data on a web interface.
>> The performance for the web interface is now very sluggish as most of
>> the power is occupied by the mining process.
>>
>> I have thought of a few ways out of this -
>>
>> 1) Buy a mega powered machine (temporal solution, quick fix)
>
> Depends very much on what your bound by, CPU or IO.  If adding a
> couple of 15K SAS drives would double your performance then u don't
> need a super powerful machine.
>
>> 2) Do a master-slave configuration
>
> Often a good choice.
>
>> 3) Separate the DB into 2 - One for pure mining purposes, the other
>> purely for web serving
>>
>> For (2), I do not know if it will be very effective since the master
>> will probably have many changes at any moment. I do not understand how
>> the changes will be propagated from the master to the slave without
>> impacting the slave's performance. Anyone with more experience here?
>>
>> (3) seems ideal but is a very very painful solution!
>>
>> We can possibly use a message queue system but again I am not familiar
>> with MQ. Will need to do more research.
>
> That could be a very complex solution.
>
>> If you were me, how would you solve this problem?
>
> Slony, most likely.
>

Re: Master/Slave, DB separation or just spend $$$?

От
Scott Marlowe
Дата:
On Wed, Jul 22, 2009 at 1:52 AM, Kelvin Quee<kelvinq@gmail.com> wrote:
> Hi Scott,
>
> Thanks for the quick reply.
>
> I have been staring at *top* for a while and it's mostly been 40% in
> userspace and 30% in system. Wait is rather low and never ventures
> beyond 1%.
>
> My hardware is a duo core AMD Athlon64 X2 5000+, 1GB RAM and a single
> 160 GB SATA II hard disk drive.

So I take it you're on a tight budget then?  I'm guessing you could
put a single quad core cpu and 8 Gigs of ram in place for a reasonable
price.  I'd highly recommend setting up at least software RAID-1 for
increased reliability.

> I will go look at Slony now.

Might be overkill if you can get by on a single reasonably powerful machine.

> Scott, one question though - If my master is constantly changing,
> wouldn't the updates from the master to the slave also slow down the
> slave?

Yes it will, but the overhead for the slave is much less than the master.

Re: Master/Slave, DB separation or just spend $$$?

От
"Greg Sabino Mullane"
Дата:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: RIPEMD160


> I have a db which is being constantly updated and queried by a few
> computers. We are doing datamining. The machine is running on a
> moderately powered machine and processors constantly hit 90%.
...
> 2) Do a master-slave configuration
> 3) Separate the DB into 2 - One for pure mining purposes, the other
> purely for web serving

Why not combine the two (if I'm understanding correctly)? Use Bucardo
or Slony to make two slaves, one for the web servers to hit (assuming
they are read-only queries), and one to act as a data warehouse.
Your main box gets all the updates but has no selects or complex
queries to weigh it down. If the we server does read and write, have
your app maintain two database handles.

> For (2), I do not know if it will be very effective since the master
> will probably have many changes at any moment. I do not understand how
> the changes will be propagated from the master to the slave without
> impacting the slave's performance. Anyone with more experience here?

The slave will get the updates as well, but in a more efficient manner
as there will be no WHERE clauses or other logic associated with the
original update. Bucardo or Slony will simply COPY over the rows as
needed. Keep in mind that both are asynchronous, so changes won't appear
on the slaves at the same time as the master, but the delay is typically
measured in seconds.

- --
Greg Sabino Mullane greg@turnstep.com
End Point Corporation
PGP Key: 0x14964AC8 200907221229
http://biglumber.com/x/web?pk=2529DF6AB8F79407E94445B4BC9B906714964AC8
-----BEGIN PGP SIGNATURE-----

iEYEAREDAAYFAkpnPpsACgkQvJuQZxSWSsggKgCfT0EbxWQdym30n7IV1J1X6dC6
HRkAoND4nCMVeffE2VW34VVmPcRtLclI
=tTjn
-----END PGP SIGNATURE-----



Re: Master/Slave, DB separation or just spend $$$?

От
Chris Browne
Дата:
kelvinq@gmail.com (Kelvin Quee) writes:
> I will go look at Slony now.

It's worth looking at, but it is not always to be assumed that
replication will necessarily improve scalability of applications; it's
not a "magic wand" to wave such that "presto, it's all faster!"

Replication is helpful from a performance standpoint if there is a lot
of query load where it is permissible to look at *somewhat* out of
date information.

For instance, replication can be quite helpful for pushing load off
for processing accounting data where you tend to be doing analysis on
data from {yesterday, last week, last month, last year}, and where the
data tends to be inherently temporal (e.g. - you're looking at
transactions with dates on them).

On the other hand, any process that anticipates *writing* to the
master database will be more or less risky to try to shift over to a
possibly-somewhat-behind 'slave' system, as will be anything that
needs to be consistent with the "master state."
--
(reverse (concatenate 'string "ofni.secnanifxunil" "@" "enworbbc"))
http://www3.sympatico.ca/cbbrowne/spiritual.html
"Nondeterminism means never having to say you're wrong."  -- Unknown

Re: Master/Slave, DB separation or just spend $$$?

От
David Rees
Дата:
On Wed, Jul 22, 2009 at 12:52 AM, Kelvin Quee<kelvinq@gmail.com> wrote:
> I have been staring at *top* for a while and it's mostly been 40% in
> userspace and 30% in system. Wait is rather low and never ventures
> beyond 1%.

Certainly seems like you are CPU bound.

> My hardware is a duo core AMD Athlon64 X2 5000+, 1GB RAM and a single
> 160 GB SATA II hard disk drive.

Looks like you are on a budget as Scott also suggested - I would also
mirror his recommendation to upgrade to a quad core processor and more
memory.  Hopefully your motherboard supports quad-cores so you don't
have to replace that bit, and you should be able to get at least 4GB
of RAM in there.

If IO load becomes an issue, Velociraptors are fast and don't cost too
much.  Getting a basic RAID1 will help prevent data-loss due to disk
failure - make sure you are making offline backups as well!

> I will go look at Slony now.
>
> Scott, one question though - If my master is constantly changing,
> wouldn't the updates from the master to the slave also slow down the
> slave?

Yes - Slony will increase the load on your source node as it does take
work to do the replication, so unless you are able to offload your CPU
heavy read only queries to the slave machine, it will only bog down
the source node more.

-Dave