Обсуждение: Data split -- Creating a copy of database without outage

Поиск
Список
Период
Сортировка

Data split -- Creating a copy of database without outage

От
"Igor Shmain"
Дата:

Can you please help with advice?

I need to design a solution for a database which will grow and will require a horizontal split at some moment.

Here is how I am planning to do it: Every record in every table has a shard number. After the database becomes too large (gets too many requests), the tables need to be horizontally split. It means that for every table all records with some specific shard numbers need to be moved to a new database.

My plan is to (1) create a copy of the database on a new server (backup/restore?), (2) synchronize the databases (how?), and then (3) forward all requests for the moved shards to the new database. (I will get rid of the unneeded shards later). An important requirement: there should be no outage for the users.

What are the ways to achieve it?

 

Thank you in advance,
-igorS

 

Re: Data split -- Creating a copy of database without outage

От
Haifeng Liu
Дата:
Why not use a partitioned table? You can write a trigger to control which partition the coming data should be inserted.

Regards,
----
Liu Haifeng




On May 30, 2012, at 4:13 AM, Igor Shmain wrote:

Can you please help with advice?

I need to design a solution for a database which will grow and will require a horizontal split at some moment.

Here is how I am planning to do it: Every record in every table has a shard number. After the database becomes too large (gets too many requests), the tables need to be horizontally split. It means that for every table all records with some specific shard numbers need to be moved to a new database.

My plan is to (1) create a copy of the database on a new server (backup/restore?), (2) synchronize the databases (how?), and then (3) forward all requests for the moved shards to the new database. (I will get rid of the unneeded shards later). An important requirement: there should be no outage for the users.

What are the ways to achieve it?

 

Thank you in advance,
-igorS

 


Re: Data split -- Creating a copy of database without outage

От
"Igor Shmain"
Дата:

Thank you, Liu, for your suggestion.

 

I might be missing something (I am new to postgres), but it seems that your suggestion will not help much in my case.  Since the number of db requests will grow with time (more users will come into the system), the current server will become incapable of serving all the requests quickly enough.

 

The idea is to increase overall calculation capacity of the system by adding more servers to it. Throwing more hardware to a single main server is not an option in my case. Creating multiple replicas (slaves) is also not an good option – it would be way more efficient to have a group of db servers, each serving only some subset of users and hosting data for those users only. Buying new servers in advance is not an option too.

 

What I am looking for is switching some of the users to another db server when the capacity of the existing server(s) is not enough. The point is to do it without interrupting the users’ work (so they do not see that horrible “Sorry! This site is under maintenance…”).

 

If I missed something it would be very kind of you to point this out.

 

Thank you once again,

-igorS

 

 

From: Haifeng Liu [mailto:liuhaifeng@live.com]
Sent: May-29-12 9:13 PM
To: Igor Shmain
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Data split -- Creating a copy of database without outage

 

Why not use a partitioned table? You can write a trigger to control which partition the coming data should be inserted.

 

Regards,

----

Liu Haifeng

 



 

On May 30, 2012, at 4:13 AM, Igor Shmain wrote:



Can you please help with advice?

I need to design a solution for a database which will grow and will require a horizontal split at some moment.

Here is how I am planning to do it: Every record in every table has a shard number. After the database becomes too large (gets too many requests), the tables need to be horizontally split. It means that for every table all records with some specific shard numbers need to be moved to a new database.

My plan is to (1) create a copy of the database on a new server (backup/restore?), (2) synchronize the databases (how?), and then (3) forward all requests for the moved shards to the new database. (I will get rid of the unneeded shards later). An important requirement: there should be no outage for the users.

What are the ways to achieve it?

 

Thank you in advance,
-igorS

 

 

Re: Data split -- Creating a copy of database without outage

От
Craig James
Дата:
On Wed, May 30, 2012 at 8:14 AM, Igor Shmain <igor.shmain@gmail.com> wrote:

Thank you, Liu, for your suggestion.

 

I might be missing something (I am new to postgres), but it seems that your suggestion will not help much in my case.  Since the number of db requests will grow with time (more users will come into the system), the current server will become incapable of serving all the requests quickly enough.

 

The idea is to increase overall calculation capacity of the system by adding more servers to it. Throwing more hardware to a single main server is not an option in my case. Creating multiple replicas (slaves) is also not an good option – it would be way more efficient to have a group of db servers, each serving only some subset of users and hosting data for those users only. Buying new servers in advance is not an option too.

 

What I am looking for is switching some of the users to another db server when the capacity of the existing server(s) is not enough. The point is to do it without interrupting the users’ work (so they do not see that horrible “Sorry! This site is under maintenance…”).


Londiste is very good at replicating a database.  It's normally used for real-time backup or to load-balance read-only applications, but it can also be used for the situation you describe.  At some point when you decide it's necessary to split your database, you would create a new database on a second server, create an empty duplicate schema (using pg_dump's schema-only feature) then install Londiste.  Londiste would replicate your database in real time up to the point where you were ready to make the actual switch.

To switch, you'd simply stop Londiste and simultaneously reconfigure your load-balancing system so that a subset of your users were directed to the new database.  Then you could uninstall Londiste, and clean out each database by removing the user data that is for users on the other server.

You might also consider partitioning.  If you know in advance that you're going to be discarding large subsets of the data, it will be much more efficient if you partition it at the outset.  When you split  your system in two, the cleanup process will be nothing more than dropping partitions.  You won't be plagued by bloated indexes and files.

Craig

 

 

If I missed something it would be very kind of you to point this out.

 

Thank you once again,

-igorS

 

 

From: Haifeng Liu [mailto:liuhaifeng@live.com]
Sent: May-29-12 9:13 PM
To: Igor Shmain
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Data split -- Creating a copy of database without outage

 

Why not use a partitioned table? You can write a trigger to control which partition the coming data should be inserted.

 

Regards,

----

Liu Haifeng

 



 

On May 30, 2012, at 4:13 AM, Igor Shmain wrote:



Can you please help with advice?

I need to design a solution for a database which will grow and will require a horizontal split at some moment.

Here is how I am planning to do it: Every record in every table has a shard number. After the database becomes too large (gets too many requests), the tables need to be horizontally split. It means that for every table all records with some specific shard numbers need to be moved to a new database.

My plan is to (1) create a copy of the database on a new server (backup/restore?), (2) synchronize the databases (how?), and then (3) forward all requests for the moved shards to the new database. (I will get rid of the unneeded shards later). An important requirement: there should be no outage for the users.

What are the ways to achieve it?

 

Thank you in advance,
-igorS

 

 


Re: Data split -- Creating a copy of database without outage

От
Amador Alvarez
Дата:
Hi ,

I would start with a single high performance tuned database focusing mainly on dealing efficiently with concurrent activity and identifying the real hot spots.
If you check out that you really need to go forward on database power, consider on adding  new databases and relocate some users whenever you need it.

Take a look on the different options related to replication, tunning and database balancers, set up some testings and get a deeper understanding on your real needs and performance.

A.A.



On 05/30/2012 08:14 AM, Igor Shmain wrote:

Thank you, Liu, for your suggestion.

 

I might be missing something (I am new to postgres), but it seems that your suggestion will not help much in my case.  Since the number of db requests will grow with time (more users will come into the system), the current server will become incapable of serving all the requests quickly enough.

 

The idea is to increase overall calculation capacity of the system by adding more servers to it. Throwing more hardware to a single main server is not an option in my case. Creating multiple replicas (slaves) is also not an good option – it would be way more efficient to have a group of db servers, each serving only some subset of users and hosting data for those users only. Buying new servers in advance is not an option too.

 

What I am looking for is switching some of the users to another db server when the capacity of the existing server(s) is not enough. The point is to do it without interrupting the users’ work (so they do not see that horrible “Sorry! This site is under maintenance…”).

 

If I missed something it would be very kind of you to point this out.

 

Thank you once again,

-igorS

 

 

From: Haifeng Liu [mailto:liuhaifeng@live.com]
Sent: May-29-12 9:13 PM
To: Igor Shmain
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Data split -- Creating a copy of database without outage

 

Why not use a partitioned table? You can write a trigger to control which partition the coming data should be inserted.

 

Regards,

----

Liu Haifeng

 



 

On May 30, 2012, at 4:13 AM, Igor Shmain wrote:



Can you please help with advice?

I need to design a solution for a database which will grow and will require a horizontal split at some moment.

Here is how I am planning to do it: Every record in every table has a shard number. After the database becomes too large (gets too many requests), the tables need to be horizontally split. It means that for every table all records with some specific shard numbers need to be moved to a new database.

My plan is to (1) create a copy of the database on a new server (backup/restore?), (2) synchronize the databases (how?), and then (3) forward all requests for the moved shards to the new database. (I will get rid of the unneeded shards later). An important requirement: there should be no outage for the users.

What are the ways to achieve it?

 

Thank you in advance,
-igorS

 

 

Re: Data split -- Creating a copy of database without outage

От
"Igor Shmain"
Дата:

Thank you Craig and thank you Amador for your great help!

 

Craig's ideas are very useful - I definitely want to look deeper into Londiste and to use partitioning.

 

Amador's approach makes perfect sense to me.

 

 

Best wishes,

-igor

 

Re: Data split -- Creating a copy of database without outage

От
Jan Nielsen
Дата:
Hi Igor,

On Tue, May 29, 2012 at 2:13 PM, Igor Shmain <igor.shmain@gmail.com> wrote:

I need to design a solution for a database which will grow and will require a horizontal split at some moment.


Ok
 

Here is how I am planning to do it: Every record in every table has a shard number. After the database becomes too large (gets too many requests), the tables need to be horizontally split. It means that for every table all records with some specific shard numbers need to be moved to a new database.


Since PG does not support the notion of a shard at the database layer, you will have to create this yourself. The key to scaling shards is ensuring there there are no cross-shard joins (or any other shared data); this is an application-design question so unless you "own" the application-design, this will be difficult to achieve. If you do own the design, the application's entity-tree will have to be carefully constructed to avoid any sharing of resources across shards. RDBMS programmers are taught just the opposite so this will require diligence from your team with a very clear, likely simple, application-entity design amenable to shard-ing.
 

My plan is to (1) create a copy of the database on a new server (backup/restore?), (2) synchronize the databases (how?), and then (3) forward all requests for the moved shards to the new database. (I will get rid of the unneeded shards later). An important requirement: there should be no outage for the users.


As I mentioned above, it is unlikely that you will be able to shard an RDBMS at this level; you need to look at the application entity-design.

What are the ways to achieve it?


If you mean to dictate a shard solution, very few will be able to help. On the other hand, if you mean to ask for help in finding a "low-cost scale-able PG DB solution for a typical RDBMS application with the potential for rapid growth", then there are a number of options which have been employed with success. For predominately read applications, use a single master database replication strategy:


For predominately write applications, you will likely need a multi-master solution (unless you can dictate the application entity-design) which is more complex territory from a scale-ability and application design standpoint but you might look at Postgres-XS and Londiste for ideas.


Cheers,

Jan

Re: Data split -- Creating a copy of database without outage

От
"Igor Shmain"
Дата:

Hi Jan,

 

Thank you so much for your input and sorry about the delay.

 

I understand most of the limitations, compromises, and sacrifices that come with distributed databases (sharding). But definitely not all of them. After working with databases for about 25 years, I have some basic knowledge about how they work (mssql, oracle, sybase). But I am new to postgres and new to distributed db approach, so any feedback in those areas is extremely welcome :-)

 

If anybody is aware of any successful implementations of distributed approach with postgres and any information around this area, please let me know. If I cannot find a suitable prototype and has to “invent the wheel”, I hope to be able to contribute back to the community with my story one day.

 

 

Thank you once again,

-igor

 

 

From: Jan Nielsen [mailto:jan.sture.nielsen@gmail.com]
Sent: June-02-12 11:00 PM
To: Igor Shmain
Cc: pgsql-admin@postgresql.org
Subject: Re: [ADMIN] Data split -- Creating a copy of database without outage

 

Hi Igor,

On Tue, May 29, 2012 at 2:13 PM, Igor Shmain <igor.shmain@gmail.com> wrote:

I need to design a solution for a database which will grow and will require a horizontal split at some moment.

 

Ok

 

Here is how I am planning to do it: Every record in every table has a shard number. After the database becomes too large (gets too many requests), the tables need to be horizontally split. It means that for every table all records with some specific shard numbers need to be moved to a new database.

 

Since PG does not support the notion of a shard at the database layer, you will have to create this yourself. The key to scaling shards is ensuring there there are no cross-shard joins (or any other shared data); this is an application-design question so unless you "own" the application-design, this will be difficult to achieve. If you do own the design, the application's entity-tree will have to be carefully constructed to avoid any sharing of resources across shards. RDBMS programmers are taught just the opposite so this will require diligence from your team with a very clear, likely simple, application-entity design amenable to shard-ing.

 

My plan is to (1) create a copy of the database on a new server (backup/restore?), (2) synchronize the databases (how?), and then (3) forward all requests for the moved shards to the new database. (I will get rid of the unneeded shards later). An important requirement: there should be no outage for the users.

 

As I mentioned above, it is unlikely that you will be able to shard an RDBMS at this level; you need to look at the application entity-design.

 

What are the ways to achieve it?

 

If you mean to dictate a shard solution, very few will be able to help. On the other hand, if you mean to ask for help in finding a "low-cost scale-able PG DB solution for a typical RDBMS application with the potential for rapid growth", then there are a number of options which have been employed with success. For predominately read applications, use a single master database replication strategy:

 

 

For predominately write applications, you will likely need a multi-master solution (unless you can dictate the application entity-design) which is more complex territory from a scale-ability and application design standpoint but you might look at Postgres-XS and Londiste for ideas.

 

 

Cheers,

 

Jan