Обсуждение: Replicating hundreds of thousandw of rows

Поиск
Список
Период
Сортировка

Replicating hundreds of thousandw of rows

От
Job
Дата:
Hello,

we need to replicate hundreds of thousands of rows (for reporting) between Postgresql Database nodes that are in
differentlocations. 

Actually, we use Rubyrep with Postgresql 8.4.22.
It works fine but it is very slow with a massive numbers of rows.

With Postgresql 9.x, are there some ways to replicate (in background, not in real time!), these quantities of data?
We need a periodical syncronization..,

Thank you!
/F

Re: Replicating hundreds of thousandw of rows

От
Simon Riggs
Дата:
On 25 November 2016 at 06:23, Job <Job@colliniconsulting.it> wrote:
> Hello,
>
> we need to replicate hundreds of thousands of rows (for reporting) between Postgresql Database nodes that are in
differentlocations. 
>
> Actually, we use Rubyrep with Postgresql 8.4.22.

8.4 is now end-of-life. You should move to the latest version.

> It works fine but it is very slow with a massive numbers of rows.
>
> With Postgresql 9.x, are there some ways to replicate (in background, not in real time!), these quantities of data?
> We need a periodical syncronization..,

You have a choice of

* Physical streaming replication, built-in from 9.0+
* Logical streaming replication, partially built in from 9.4+ using pglogical
and
* Logical streaming replication, built in from 10.0+ (not yet released)

Performance is much better than rubyrep

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


Re: Replicating hundreds of thousandw of rows

От
Moreno Andreo
Дата:
Il 25/11/2016 15:23, Job ha scritto:
> Hello,
Hello, we are using rybyrep with postgresql 9.1 and 9.5 to have
(almost-) real time synchronization for 5 years ATM.
Unfortunately, it seems a dead project, so we had to find a good JRuby
developer to make some customizations.
> we need to replicate hundreds of thousands of rows (for reporting) between Postgresql Database nodes that are in
differentlocations. 
First: which OS are you using? If not Windows, I'll follow Simon
directions... we are still using rubyrep because one of the 2 hosts is
Windows-based, otherwise we'd have chosen something quicker and with a
smaller memory footprint (Java VM in some cases eats up an entire
gigabyte of RAM on a client).

Then: is there any bytea field in your records? If yes, with the current
version you can go ahead to PostgreSQL 9.0.x
In 9.1 there has been a change in bytea manipulation and there's a risk
that once replicated, your data could be corrupted (I went through it
about 4 years ago, but now I forgot most of the details), due to
backslash duplication (double-escaping) of bytea fields.
We had to modify some modules to avoid this, and it's now running fine
with 9.5

Last: Is there a good connection between hosts involved in replication?
If not (I mean an unstable connection), you should write your own code
to restart synchronization whe line drops.

> Actually, we use Rubyrep with Postgresql 8.4.22.
> It works fine but it is very slow with a massive numbers of rows.
You're lucky it works...
We encountered several out of memory exceptions in replication having
xx_pending_changes filled with more han 3-400k records.
> With Postgresql 9.x, are there some ways to replicate (in background, not in real time!), these quantities of data?
> We need a periodical syncronization..,
Again, if you need rubyrep, I'll  think about using only SYNC command
when you need it, with a good configuration to avoid keeping old data
and overwriting new one.
Otherwise, in *nix environments you can benefit of many replication
tools, even not involved with PostgreSQL like Londinste, Bucardo and so
on (but if I were you, I'll choose something that is very close to
postgresql developers...).
>
> Thank you!
> /F
>