Обсуждение: Which replication is the best for our case ?

Поиск
Список
Период
Сортировка

Which replication is the best for our case ?

От
"ben.play"
Дата:
Hi guys,

We have a PG database with more than 400 GB of data.
At this moment, a cron runs each ten minutes and updates about 10 000 lines
with complex algorithms in PHP.

Each time the cron runs, the website is almost down because some queries
have to make an update on the FULL table...

Therefore, I'm asking if it's possible to duplicate my main database on a
slave server in order to run these cron on this second server... then,
replicate these changes on the main database (master).

Which replication is the best in this case ?

http://www.postgresql.org/docs/9.3/static/warm-standby.html ?

Do you have any links or tutorial which explain this kind of operation ?

Thanks a lot !



--
View this message in context: http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Which replication is the best for our case ?

От
Adrian Klaver
Дата:
On 06/29/2015 06:02 AM, ben.play wrote:
> Hi guys,
>
> We have a PG database with more than 400 GB of data.
> At this moment, a cron runs each ten minutes and updates about 10 000 lines
> with complex algorithms in PHP.
>
> Each time the cron runs, the website is almost down because some queries
> have to make an update on the FULL table...
>
> Therefore, I'm asking if it's possible to duplicate my main database on a
> slave server in order to run these cron on this second server... then,
> replicate these changes on the main database (master).

So you want the replication to go from the standby back to the master?
If that is the case, it is not possible.

>
> Which replication is the best in this case ?
>
> http://www.postgresql.org/docs/9.3/static/warm-standby.html ?
>
> Do you have any links or tutorial which explain this kind of operation ?
>
> Thanks a lot !
>
>
>
> --
> View this message in context: http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685.html
> Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Which replication is the best for our case ?

От
Adrian Klaver
Дата:
On 06/29/2015 06:02 AM, ben.play wrote:
> Hi guys,
>
> We have a PG database with more than 400 GB of data.
> At this moment, a cron runs each ten minutes and updates about 10 000 lines
> with complex algorithms in PHP.
>
> Each time the cron runs, the website is almost down because some queries
> have to make an update on the FULL table...
>
> Therefore, I'm asking if it's possible to duplicate my main database on a
> slave server in order to run these cron on this second server... then,
> replicate these changes on the main database (master).
>
> Which replication is the best in this case ?
>
> http://www.postgresql.org/docs/9.3/static/warm-standby.html ?
>
> Do you have any links or tutorial which explain this kind of operation ?

I should have been clearer in my original post. What you want is not
possible using the procedures found at the link you posted. What you are
looking for is Master to Master replication. This is not something I
have done, so I am not the person to offer detailed advice on that
particular style of replication. I do know that this list sees a lot of
activity with regard to
BDR(http://bdr-project.org/docs/stable/index.html) which provides Master
to Master replication.

>
> Thanks a lot !
>
>
>


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Which replication is the best for our case ?

От
Arthur Silva
Дата:
On Mon, Jun 29, 2015 at 10:02 AM, ben.play <benjamin.cohen@playrion.com> wrote:
Hi guys,

We have a PG database with more than 400 GB of data.
At this moment, a cron runs each ten minutes and updates about 10 000 lines
with complex algorithms in PHP.

Each time the cron runs, the website is almost down because some queries
have to make an update on the FULL table...

Therefore, I'm asking if it's possible to duplicate my main database on a
slave server in order to run these cron on this second server... then,
replicate these changes on the main database (master).

Which replication is the best in this case ?

http://www.postgresql.org/docs/9.3/static/warm-standby.html ?

Do you have any links or tutorial which explain this kind of operation ?

Thanks a lot !



--
View this message in context: http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Hello Adrian, can you give us one example of such FULL table update queries?

By website down you mean slowed to a halt or read-only mode (due to the update locks)?

Either way it doesn't look like replication is going to solve your problem.

Re: Which replication is the best for our case ?

От
Дата:

Arthur Silva wrote on Monday, June 29, 2015 5:23 PM:
> Therefore, I'm asking if it's possible to duplicate my main database on a
> slave server in order to run these cron on this second server... then,
> replicate these changes on the main database (master).

> http://www.postgresql.org/docs/9.3/static/warm-standby.html ?

If you do it the other way around, a hot standby (see:  http://www.postgresql.org/docs/9.3/static/hot-standby.html) might work for you.

 

That is, your cron scripts would UPDATE the master server, and the Web site would do read-only queries against the slave server.

 

That’s for the built-in replication of PostgreSQL (sorry, I have no experience with the add-on replication solutions).

 

 

Re: Which replication is the best for our case ?

От
Adrian Klaver
Дата:
On 06/29/2015 08:23 AM, Arthur Silva wrote:
> On Mon, Jun 29, 2015 at 10:02 AM, ben.play <benjamin.cohen@playrion.com
> <mailto:benjamin.cohen@playrion.com>> wrote:
>
>     Hi guys,
>
>     We have a PG database with more than 400 GB of data.
>     At this moment, a cron runs each ten minutes and updates about 10
>     000 lines
>     with complex algorithms in PHP.
>
>     Each time the cron runs, the website is almost down because some queries
>     have to make an update on the FULL table...
>
>     Therefore, I'm asking if it's possible to duplicate my main database
>     on a
>     slave server in order to run these cron on this second server... then,
>     replicate these changes on the main database (master).
>
>     Which replication is the best in this case ?
>
>     http://www.postgresql.org/docs/9.3/static/warm-standby.html ?
>
>     Do you have any links or tutorial which explain this kind of operation ?
>
>     Thanks a lot !
>
>
>
>     --
>     View this message in context:
>     http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685.html
>     Sent from the PostgreSQL - general mailing list archive at Nabble.com.
>
>
>     --
>     Sent via pgsql-general mailing list (pgsql-general@postgresql.org
>     <mailto:pgsql-general@postgresql.org>)
>     To make changes to your subscription:
>     http://www.postgresql.org/mailpref/pgsql-general
>
>
>
> Hello Adrian, can you give us one example of such FULL table update queries?

Actually it is the OP(Ben) that is going to have to supply that.

>
> By website down you mean slowed to a halt or read-only mode (due to the
> update locks)?
>
> Either way it doesn't look like replication is going to solve your problem.


--
Adrian Klaver
adrian.klaver@aklaver.com


Re: Which replication is the best for our case ?

От
Arthur Silva
Дата:

On Mon, Jun 29, 2015 at 1:44 PM, Adrian Klaver <adrian.klaver@aklaver.com> wrote:
On 06/29/2015 08:23 AM, Arthur Silva wrote:
On Mon, Jun 29, 2015 at 10:02 AM, ben.play <benjamin.cohen@playrion.com
<mailto:benjamin.cohen@playrion.com>> wrote:

    Hi guys,

    We have a PG database with more than 400 GB of data.
    At this moment, a cron runs each ten minutes and updates about 10
    000 lines
    with complex algorithms in PHP.

    Each time the cron runs, the website is almost down because some queries
    have to make an update on the FULL table...

    Therefore, I'm asking if it's possible to duplicate my main database
    on a
    slave server in order to run these cron on this second server... then,
    replicate these changes on the main database (master).

    Which replication is the best in this case ?

    http://www.postgresql.org/docs/9.3/static/warm-standby.html ?

    Do you have any links or tutorial which explain this kind of operation ?

    Thanks a lot !



    --
    View this message in context:
    http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685.html
    Sent from the PostgreSQL - general mailing list archive at Nabble.com.


    --
    Sent via pgsql-general mailing list (pgsql-general@postgresql.org
    <mailto:pgsql-general@postgresql.org>)
    To make changes to your subscription:
    http://www.postgresql.org/mailpref/pgsql-general



Hello Adrian, can you give us one example of such FULL table update queries?

Actually it is the OP(Ben) that is going to have to supply that.



By website down you mean slowed to a halt or read-only mode (due to the
update locks)?

Either way it doesn't look like replication is going to solve your problem.


--
Adrian Klaver
adrian.klaver@aklaver.com

Oh of course Adrian! I must have confused the names at the time. Sorry!

Re: Which replication is the best for our case ?

От
Jeff Janes
Дата:
On Mon, Jun 29, 2015 at 6:02 AM, ben.play <benjamin.cohen@playrion.com> wrote:
Hi guys,

We have a PG database with more than 400 GB of data.
At this moment, a cron runs each ten minutes and updates about 10 000 lines
with complex algorithms in PHP.

Each time the cron runs, the website is almost down because some queries
have to make an update on the FULL table...

Why is it updating the full table of 400GB if it only changes 10,000 lines?

If most of the rows are being updated degenerately (they update the column to have the same value it already has) then just add a where clause to filter out those degenerate updates, unless the degenerate update is needed for locking purposes, which is rare.
 

Therefore, I'm asking if it's possible to duplicate my main database on a
slave server in order to run these cron on this second server... then,
replicate these changes on the main database (master).

Sounds like you are trying to use a bulldozer to change a lightbulb.

Improving queries (including the effect running some queries has on the entire system)  starts with "EXPLAIN (ANALYZE, BUFFERS)", not with multimaster replication.

Cheers,

Jeff

Re: Which replication is the best for our case ?

От
Melvin Davidson
Дата:
I think it would help immensely if you provided details such as table_structure, indexes the actual UPDATE query and the reason all rows of the table must be updated.

On Mon, Jun 29, 2015 at 1:15 PM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Mon, Jun 29, 2015 at 6:02 AM, ben.play <benjamin.cohen@playrion.com> wrote:
Hi guys,

We have a PG database with more than 400 GB of data.
At this moment, a cron runs each ten minutes and updates about 10 000 lines
with complex algorithms in PHP.

Each time the cron runs, the website is almost down because some queries
have to make an update on the FULL table...

Why is it updating the full table of 400GB if it only changes 10,000 lines?

If most of the rows are being updated degenerately (they update the column to have the same value it already has) then just add a where clause to filter out those degenerate updates, unless the degenerate update is needed for locking purposes, which is rare.
 

Therefore, I'm asking if it's possible to duplicate my main database on a
slave server in order to run these cron on this second server... then,
replicate these changes on the main database (master).

Sounds like you are trying to use a bulldozer to change a lightbulb.

Improving queries (including the effect running some queries has on the entire system)  starts with "EXPLAIN (ANALYZE, BUFFERS)", not with multimaster replication.

Cheers,

Jeff



--
Melvin Davidson
I reserve the right to fantasize.  Whether or not you
wish to share my fantasy is entirely up to you.

Re: Which replication is the best for our case ?

От
John R Pierce
Дата:
On 6/29/2015 10:41 AM, Melvin Davidson wrote:
> I think it would help immensely if you provided details such as
> table_structure, indexes the actual UPDATE query and the reason all
> rows of the table must be updated.
>

indeed, the whole model of massaging the entire database every 10
minutes is highly suspect.


--
john r pierce, recycling bits in santa cruz



Re: Which replication is the best for our case ?

От
"ben.play"
Дата:
Hi guys,

Thank you a lot for your answers.

In fact, I tried to write the easiest explanation of my problem in order to
be understood...
My project is developed with Symfony and Doctrine (BERK, i know ...).

The project has more than 2 years and Doctrine makes some bad decisions and
lock all the table for a while.
We are developing the project without Doctrine but it will not be available
within 1 year...

To be more precise : We have a database with more than 400 Gb and ONE table
with more than 100 Gb of data. This is huge for doctrine. When the cron
runs, it writes a lot on the disks in temporary file (although we have 128
GB of Ram...). Of course, each table is well indexes...

That is why I'm thinking about replication : My server A (master) is for my
users... and my server B is a server reserved for calculations (and this
server B which writes on the base)

This is a image of my dream system :
<http://postgresql.nabble.com/file/n5855916/9e41ce1f-38ea-4fba-a437-a43c598e655c.jpg>
(If you can't see the image :
http://tof.canardpc.com/view/9e41ce1f-38ea-4fba-a437-a43c598e655c.jpg)


Thank you a lot for your help !



--
View this message in context:
http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685p5855916.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Which replication is the best for our case ?

От
Arthur Silva
Дата:

On Tue, Jun 30, 2015 at 1:57 PM, ben.play <benjamin.cohen@playrion.com> wrote:
Hi guys,

Thank you a lot for your answers.

In fact, I tried to write the easiest explanation of my problem in order to
be understood...
My project is developed with Symfony and Doctrine (BERK, i know ...).

The project has more than 2 years and Doctrine makes some bad decisions and
lock all the table for a while.
We are developing the project without Doctrine but it will not be available
within 1 year...

To be more precise : We have a database with more than 400 Gb and ONE table
with more than 100 Gb of data. This is huge for doctrine. When the cron
runs, it writes a lot on the disks in temporary file (although we have 128
GB of Ram...). Of course, each table is well indexes...

That is why I'm thinking about replication : My server A (master) is for my
users... and my server B is a server reserved for calculations (and this
server B which writes on the base)

This is a image of my dream system :
<http://postgresql.nabble.com/file/n5855916/9e41ce1f-38ea-4fba-a437-a43c598e655c.jpg>
(If you can't see the image :
http://tof.canardpc.com/view/9e41ce1f-38ea-4fba-a437-a43c598e655c.jpg)


Thank you a lot for your help !



--
View this message in context: http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685p5855916.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Hello,

Streaming replication will do just fine from ServerA to ServerB, but as for the rest of the data flow I'm afraid we will need more details.


Re: Which replication is the best for our case ?

От
John R Pierce
Дата:
On 6/30/2015 9:57 AM, ben.play wrote:
> To be more precise : We have a database with more than 400 Gb and ONE table
> with more than 100 Gb of data. This is huge for doctrine. When the cron
> runs, it writes a lot on the disks in temporary file (although we have 128
> GB of Ram...). Of course, each table is well indexes...
>
> That is why I'm thinking about replication : My server A (master) is for my
> users... and my server B is a server reserved for calculations (and this
> server B which writes on the base)
>
> This is a image of my dream system :
> <http://postgresql.nabble.com/file/n5855916/9e41ce1f-38ea-4fba-a437-a43c598e655c.jpg>
> (If you can't see the image :
> http://tof.canardpc.com/view/9e41ce1f-38ea-4fba-a437-a43c598e655c.jpg)


what happens when master A continues to update/insert into these tables
that your cron job is batch updating on the offline copy ? How would you
merge those changes in ?



--
john r pierce, recycling bits in santa cruz



Re: Which replication is the best for our case ?

От
"ben.play"
Дата:
In fact, the cron job will :
-> select about 10 000 lines from a big table (>100 Gb of data). 1 user has
about 10 lines.
-> each line will be examinate by an algorithm
-> at the end of each line, the cron job updates a few parameters for the
user (add some points for example)
-> Then, it inserts a line in another table to indicate to the user each
transaction.

All updates and inserts can be inserted ONLY by the cron job ...
Therefore ... the merge can be done easily : no one can be update these new
datas.

But ... how big company like Facebook or Youtube can calculate on (a)
dedicated server(s) without impacting users ?



--
View this message in context:
http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685p5856062.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


Re: Which replication is the best for our case ?

От
John R Pierce
Дата:
On 7/1/2015 3:08 AM, ben.play wrote:
> In fact, the cron job will :
> -> select about 10 000 lines from a big table (>100 Gb of data). 1 user has
> about 10 lines.
> -> each line will be examinate by an algorithm
> -> at the end of each line, the cron job updates a few parameters for the
> user (add some points for example)
> -> Then, it inserts a line in another table to indicate to the user each
> transaction.
>
> All updates and inserts can be inserted ONLY by the cron job ...
> Therefore ... the merge can be done easily : no one can be update these new
> datas.
>
> But ... how big company like Facebook or Youtube can calculate on (a)
> dedicated server(s) without impacting users ?

that sort of batch processing is not normally done in database-centric
systems, rather, databases are usually updated continuously in realtime
as the events come in via transactions.

your cron task is undoubtably single threaded which means it runs on one
core only,  so the whole system ends up waiting on a single task
crunching massive amounts of data, while your other processor cores have
nothing to do.

it sounds to me like whomever designed this system didn't have a solid
grip on transactional database processing.



--
john r pierce, recycling bits in santa cruz



Re: Which replication is the best for our case ?

От
Arthur Silva
Дата:

On Wed, Jul 1, 2015 at 7:08 AM, ben.play <benjamin.cohen@playrion.com> wrote:
In fact, the cron job will :
-> select about 10 000 lines from a big table (>100 Gb of data). 1 user has
about 10 lines.
-> each line will be examinate by an algorithm
-> at the end of each line, the cron job updates a few parameters for the
user (add some points for example)
-> Then, it inserts a line in another table to indicate to the user each
transaction.

All updates and inserts can be inserted ONLY by the cron job ...
Therefore ... the merge can be done easily : no one can be update these new
datas.

But ... how big company like Facebook or Youtube can calculate on (a)
dedicated server(s) without impacting users ?



--
View this message in context: http://postgresql.nabble.com/Which-replication-is-the-best-for-our-case-tp5855685p5856062.html
Sent from the PostgreSQL - general mailing list archive at Nabble.com.


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


I'm assuming this query is really HUGE,
otherwise I can't see why it'd bring your database to halt, specially with that amount of main memory.

That aside, I don't see why you can't send inserts in small batches back to the master DB.

Regards.