Обсуждение: Logical replication performance

Поиск
Список
Период
Сортировка

Logical replication performance

От
Florian Philippon
Дата:
Hello community!

We are currently testing PostgreSQL 11's built-in logical replication. We are trying to initialize a subscriber (from scratch) from a publisher with a large database (+6TB) with around 220 tables.

We tweaked the configuration parameters below, both on publisher and subscriber, in order to minimize the initial copy data phase delay:

- max_replication_slots
- max_wal_senders
- max_wal_size
- max_worker_processes
- max_logical_replication_workers
- max_sync_workers_per_subscription
- max_worker_processes

The two PostgreSQL instances are using the same hardware: 48 vCPU, 384 GB ram, 10GB network and same version of software (PostgreSQL 11.6).

We pre-loaded the full schema of the database (with indexes and constraints) on the subscriber since it's mandatory to have the logical replication working.

However, the initial copy data phase is quite long (+2 days and still running) for largest tables in the database. There is no load on the publisher since it's a staging environment.
We noticed that logical replication workers processes on the subscriber can reach more than 90% CPU usage per worker.

We understand that we cannot have more than one worker per table running but we would like to know if there is anything that could help us to achieve this initial copy phase more quickly.

We tried another solution: we loaded a minimal schema (without indexes and constraints) on the subscriber and created the subscription. The initial copy phase was way faster (a few hours). Then we created indexes and constraints. Is this a suitable solution for production? Will the logical replication flow be buffered by the replication slots during index creation and get in sync afterwards or will it conflict due to locking issues?

Many thanks for your help.

--
Florian Philippon

Re: Logical replication performance

От
Flavio Henrique Araque Gurgel
Дата:


Em sex., 29 de nov. de 2019 às 17:06, Florian Philippon <florian.philippon@doctolib.com> escreveu:
Hello community!

Hi Florian
 
We are currently testing PostgreSQL 11's built-in logical replication. We are trying to initialize a subscriber (from scratch) from a publisher with a large database (+6TB) with around 220 tables.

We tweaked the configuration parameters below, both on publisher and subscriber, in order to minimize the initial copy data phase delay:

- max_replication_slots
- max_wal_senders
- max_wal_size
- max_worker_processes
- max_logical_replication_workers
- max_sync_workers_per_subscription
- max_worker_processes

The two PostgreSQL instances are using the same hardware: 48 vCPU, 384 GB ram, 10GB network and same version of software (PostgreSQL 11.6).

We pre-loaded the full schema of the database (with indexes and constraints) on the subscriber since it's mandatory to have the logical replication working.

However, the initial copy data phase is quite long (+2 days and still running) for largest tables in the database. There is no load on the publisher since it's a staging environment.
We noticed that logical replication workers processes on the subscriber can reach more than 90% CPU usage per worker.

We understand that we cannot have more than one worker per table running but we would like to know if there is anything that could help us to achieve this initial copy phase more quickly.

We tried another solution: we loaded a minimal schema (without indexes and constraints) on the subscriber and created the subscription. The initial copy phase was way faster (a few hours). Then we created indexes and constraints. Is this a suitable solution for production? Will the logical replication flow be buffered by the replication slots during index creation and get in sync afterwards or will it conflict due to locking issues?

 
You can try the pg_dump over a snapshot and use parallel restore (pg_restore -j option) to your initial data load, it should be much faster than an initial sync. Take a look here:

Best,
Flavio

Re: Logical replication performance

От
Jeff Janes
Дата:
On Fri, Nov 29, 2019 at 11:06 AM Florian Philippon <florian.philippon@doctolib.com> wrote:

We tried another solution: we loaded a minimal schema (without indexes and constraints) on the subscriber and created the subscription. The initial copy phase was way faster (a few hours). Then we created indexes and constraints. Is this a suitable solution for production?

This is probably not suitable for production.  Once the COPY is finished, it still has to replicate row-by-row changes to the table rows which occurred since the starting COPY snapshot.  UPDATEs and DELETEs will probably fail due to the lack of indexes on the “replica identity” columns.  This failure will make the entire transaction, including the COPY, roll back to beginning.  So you there will be no point at which you can build the missing indexes without first losing all the work that was done.  If the master was quiescent (at least in regards to UPDATEs and DELETEs) then it there will be no row-by-row changes to apply between the start of the COPY and the start of transactional replication.  In that case, the COPY will have committed before the system discovers the problem with the “replica identity”, giving you an opportunity to go build the index without losing all of the work.

 
Will the logical replication flow be buffered by the replication slots during index creation and get in sync afterwards or will it conflict due to locking issues?

It can't buffer in the middle of the transaction which includes the initial COPY.

Cheers,

Jeff