Обсуждение: Mirroring existing mysql setup

Поиск
Список
Период
Сортировка

Mirroring existing mysql setup

От
snacktime
Дата:
Where I work we use mysql for a fairly busy website, and I'd like to
eventually start transitioning to postgres if possible.   The largest
obstacle is the lack of replication as a core feature.  I'm well aware
of the history behind why it's not in core, and I saw a post a while
back saying it would be in 8.4.  I'd like to say I think this is a
very good idea, and I know a number of shops personally that did not
go with postgres just for this reason.

So anyways our general setup is that we have one master replicating to
one slave.  We use the slave for generating various leaderboard stats
for our games.  Most of these get generated every 4 hours.   If we
wanted to duplicate this on postgres I'm not sure which replication
option would work best.  Last time I looked at slony you had to edit
configs for each table  you wanted to replicate, and the whole setup
was more complex then it needed to be.  If it's still like that, I
think we would lose more then we gain by moving to postgres.  Once
setup, the replication needs to be free of daily administration other
then routine automated tasks.  We add new tables/remove old ones
almost on a daily basis.

Now for one of the main things we don't like about mysql.  You can't
add indexes without locking the whole table, which means you can't go
back and add indexes later on a production app without shutting down
the whole system.   The effect his has had is that when we add new
features to our games that would normally require an additional
column, we have to add a new table since we can't add an index to the
old table.   When you add indexes in postgres, how much of a
performance hit will the database be taking while adding the index?
I haven't worked on a postgres installation that's as busy as our
mysql installation is.  We get roughly 3-4 million page views per day,
with each page view probably averaging 4-6 db queries.  Probably 20%
of these are cached.  In addition we have our slave which does far
fewer, but more complicated queries.  Quite a few of our tables will
gain thousands of rows per day, some tens of thousands.  Some of our
busiest tables have tens of millions of rows.  We could start to
archive some of these.

Chris

Re: Mirroring existing mysql setup

От
Erik Jones
Дата:
On Dec 18, 2008, at 11:32 AM, snacktime wrote:

> Where I work we use mysql for a fairly busy website, and I'd like to
> eventually start transitioning to postgres if possible.   The largest
> obstacle is the lack of replication as a core feature.  I'm well aware
> of the history behind why it's not in core, and I saw a post a while
> back saying it would be in 8.4.  I'd like to say I think this is a
> very good idea, and I know a number of shops personally that did not
> go with postgres just for this reason.
>
> So anyways our general setup is that we have one master replicating to
> one slave.  We use the slave for generating various leaderboard stats
> for our games.  Most of these get generated every 4 hours.   If we
> wanted to duplicate this on postgres I'm not sure which replication
> option would work best.  Last time I looked at slony you had to edit
> configs for each table  you wanted to replicate, and the whole setup
> was more complex then it needed to be.  If it's still like that, I
> think we would lose more then we gain by moving to postgres.  Once
> setup, the replication needs to be free of daily administration other
> then routine automated tasks.  We add new tables/remove old ones
> almost on a daily basis.

You should check out Londiste, part of the Skytools package of
Postgres projects.  For simple, master-slave replication it's *loads*
easier to set up and administer than Slony.  The only reason I could
see to go with Slony right now is if you need some kind of complex
setup with cascaded replication or what-not.  Adding and removing
tables to/from the replication stream is also a cinch in Londiste but
you *do* have to actually do it -- they don't get added automatically
like in MySQL's built-in replication.  However, you may want to wait a
few months with your fingers crossed to see if Hot Standy replication
is ready for 8.4 in March.

> Now for one of the main things we don't like about mysql.  You can't
> add indexes without locking the whole table, which means you can't go
> back and add indexes later on a production app without shutting down
> the whole system.   The effect his has had is that when we add new
> features to our games that would normally require an additional
> column, we have to add a new table since we can't add an index to the
> old table.   When you add indexes in postgres, how much of a
> performance hit will the database be taking while adding the index?
> I haven't worked on a postgres installation that's as busy as our
> mysql installation is.  We get roughly 3-4 million page views per day,
> with each page view probably averaging 4-6 db queries.  Probably 20%
> of these are cached.  In addition we have our slave which does far
> fewer, but more complicated queries.  Quite a few of our tables will
> gain thousands of rows per day, some tens of thousands.  Some of our
> busiest tables have tens of millions of rows.  We could start to
> archive some of these.

You can use CREATE INDEX CONCURRENTLY to avoid the table locks.
However, that takes two passes over the data instead of one so there's
a bigger IO hit.

Erik Jones, Database Administrator
Engine Yard
Support, Scalability, Reliability
866.518.9273 x 260
Location: US/Pacific
IRC: mage2k






Re: Mirroring existing mysql setup

От
"Scott Marlowe"
Дата:
On Thu, Dec 18, 2008 at 12:32 PM, snacktime <snacktime@gmail.com> wrote:
> Where I work we use mysql for a fairly busy website, and I'd like to
> eventually start transitioning to postgres if possible.   The largest
> obstacle is the lack of replication as a core feature.  I'm well aware
> of the history behind why it's not in core, and I saw a post a while
> back saying it would be in 8.4.  I'd like to say I think this is a
> very good idea, and I know a number of shops personally that did not
> go with postgres just for this reason.

Then they made a emotional decision, not a scientific one. :)  Whether
or not replication is built in comes WAY after a few other factors,
like

is it reliable?
is it the right kind of replication?
Is it supported during upgrades? (slony was designed to allow
different versions of pgsql to replicate to one another)
What's the overhead for running it?
Are there any showstopper bugs that would affect you in production?
Can I start replication on disparate machines without downtime?
(another slony feature, no need to take down the master to initiate
replication)

> So anyways our general setup is that we have one master replicating to
> one slave.  We use the slave for generating various leaderboard stats
> for our games.  Most of these get generated every 4 hours.   If we
> wanted to duplicate this on postgres I'm not sure which replication
> option would work best.  Last time I looked at slony you had to edit
> configs for each table  you wanted to replicate, and the whole setup
> was more complex then it needed to be.

With a fairly simple script it's quite easy to automate.  That's how I
do it.  But slony is aimed at a narrow wedge of users, and other
replication engines may be a better choice for what you're doing.
They all make compromises in one area or another.

> If it's still like that, I
> think we would lose more then we gain by moving to postgres.  Once
> setup, the replication needs to be free of daily administration other
> then routine automated tasks.  We add new tables/remove old ones
> almost on a daily basis.

Then I would question your design.  But yeah, slony is a bad choice
here.  pgpool might work, but in general ddl replication in pgsql is
problematic for most replication solutions.

> Now for one of the main things we don't like about mysql.  You can't
> add indexes without locking the whole table, which means you can't go
> back and add indexes later on a production app without shutting down
> the whole system.   The effect his has had is that when we add new
> features to our games that would normally require an additional
> column, we have to add a new table since we can't add an index to the
> old table.   When you add indexes in postgres, how much of a
> performance hit will the database be taking while adding the index?

Actually, this is one of my favorite things about using slony and
pgsql.  With the create index concurrently command, index creation
does not block, and you can have different indexes on the master and
slave.  So, minimal indexes with fast updates on the master, and
custom indexes that allow reports run faster on the slave.

> I haven't worked on a postgres installation that's as busy as our
> mysql installation is.  We get roughly 3-4 million page views per day,
> with each page view probably averaging 4-6 db queries.

It's 1pm here and we've had ~4Million db accesses on our machines.
Looking at the logs for yesterday we had ~10Million db queries run.
This is on our master only.  We see about twice as many selects on the
slave.  These machines are overpowered for what we're doing, with one
12 disk RAID-10 array each, and a pair of drives for os/pg_xlog.  32
Gig ram 8 opteron cores.  $11,500 each.   These machines run at a load
factor of 1.5 to 2.5 all day, and can handle load factors into the
dozens before any serious slowdowns occur.  I'm sure with a 6 drive
RAID-10 and a two drive RAID-1 you could get a fast enough machine for
what you're doing.

> Probably 20%
> of these are cached.  In addition we have our slave which does far
> fewer, but more complicated queries.

You could definitely look at optimizing each db differently for each job.