Обсуждение: [GENERAL] Question regarding logical replication

Поиск
Список
Период
Сортировка

[GENERAL] Question regarding logical replication

От
Weiping Qu
Дата:
Dear postgresql community,

I have a question regarding understanding the implementation logic 
behind logical replication.

Assume a replication slot created on the master node, will more and more 
data get piled up in the slot and the size of replication slot 
continuously increase if there is no slave reading/dequeuing data out of 
this slot or very slowly, thus incurring high I/Os and slow down the 
transaction throughput?

Looking forward to your explanation.


Kindly review and please share your comments on this matter.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Question regarding logical replication

От
Alvaro Aguayo Garcia-Rada
Дата:
Hi. I've had experience with both BDR & pglogical. For each replication slot, postgres saves a LSN which points to the
lastxlog entry read by the client. When a client does not reads xlog, for example, if it cannot connect to the server,
thenthe distance between such LSN(pg_replication_slots.restart_lsn) and the current xlog
location(pg_current_xlog_insert_location())will enlarge over the time. Not sure about the following, but postgres will
notclear old xlog entries which are still pending to be read on any replication slot. Such situation may also happen,
inlower degree, if the client cannot read WAL as fast as it's produced. Anyhow, what will happen is xlog will grow more
andmore. However, that will probably not impact performance, as xlog is written anyway. But if you don't have enough
freespace, you could get your partition full of xlog.
 

Regards,

Alvaro Aguayo
Operations Manager
Open Comb Systems E.I.R.L.

Office: (+51-1) 3377813 | Mobile: (+51) 995540103 | (+51) 954183248
Web: www.ocs.pe

----- Original Message -----
From: "Weiping Qu" <qu@informatik.uni-kl.de>
To: "PostgreSql-general" <pgsql-general@postgresql.org>
Sent: Thursday, 26 October, 2017 14:07:54
Subject: [GENERAL] Question regarding logical replication

Dear postgresql community,

I have a question regarding understanding the implementation logic 
behind logical replication.

Assume a replication slot created on the master node, will more and more 
data get piled up in the slot and the size of replication slot 
continuously increase if there is no slave reading/dequeuing data out of 
this slot or very slowly, thus incurring high I/Os and slow down the 
transaction throughput?

Looking forward to your explanation.


Kindly review and please share your comments on this matter.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Question regarding logical replication

От
Weiping Qu
Дата:
Hi. Thank you very much for such detailed explanation. :)
We are currently testing the overhead of log-based Change Data Capture 
method (i.e. logical decoding) over Postgresql.
The test setting consists of one processing running TPC-C on node1, 
which issued transactions against a database residing on node2, which is 
accessed by three CDC processes running on three different nodes, say 
node3-5.
The difference between three CDC processes are only different table sets 
they are interested while the commonness is that each of they would 
sleep periodically during keeping capturing changes from node2.

We always measured the impact of CDC on original TPC-C workload by 
looking into the transaction throughput on node1.
We selected 0ms, 200ms and 400ms as three different sleeping periods for 
CDC processes to control their impact on node2.
We expect that the longer a sleeping period is set for CDC, the less 
impact is incurred over Postgresql, since less I/Os are triggered to 
fetch data from xlog.
However, the plots showed different trend (currently I don't have plots 
on my laptop) which shows that the more frequently are the CDC processes 
reading from logical slots, the less overhead is incurred over 
PostgreSQL, which leads to higher throughput.

That's the reason why I asked the previous question, whether logical 
slot is implemented as queue.
Without continuous dequeuing the "queue" get larger and larger, thus 
lowering the OLTP workload.

Regards;

Weiping


On 26.10.2017 21:42, Alvaro Aguayo Garcia-Rada wrote:
> Hi. I've had experience with both BDR & pglogical. For each replication slot, postgres saves a LSN which points to
thelast xlog entry read by the client. When a client does not reads xlog, for example, if it cannot connect to the
server,then the distance between such LSN(pg_replication_slots.restart_lsn) and the current xlog
location(pg_current_xlog_insert_location())will enlarge over the time. Not sure about the following, but postgres will
notclear old xlog entries which are still pending to be read on any replication slot. Such situation may also happen,
inlower degree, if the client cannot read WAL as fast as it's produced. Anyhow, what will happen is xlog will grow more
andmore. However, that will probably not impact performance, as xlog is written anyway. But if you don't have enough
freespace, you could get your partition full of xlog.
 
>
> Regards,
>
> Alvaro Aguayo
> Operations Manager
> Open Comb Systems E.I.R.L.
>
> Office: (+51-1) 3377813 | Mobile: (+51) 995540103 | (+51) 954183248
> Web: www.ocs.pe
>
> ----- Original Message -----
> From: "Weiping Qu" <qu@informatik.uni-kl.de>
> To: "PostgreSql-general" <pgsql-general@postgresql.org>
> Sent: Thursday, 26 October, 2017 14:07:54
> Subject: [GENERAL] Question regarding logical replication
>
> Dear postgresql community,
>
> I have a question regarding understanding the implementation logic
> behind logical replication.
>
> Assume a replication slot created on the master node, will more and more
> data get piled up in the slot and the size of replication slot
> continuously increase if there is no slave reading/dequeuing data out of
> this slot or very slowly, thus incurring high I/Os and slow down the
> transaction throughput?
>
> Looking forward to your explanation.
>
>
> Kindly review and please share your comments on this matter.
>
>
>
>



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Question regarding logical replication

От
Francisco Olarte
Дата:
On Thu, Oct 26, 2017 at 10:20 PM, Weiping Qu <qu@informatik.uni-kl.de> wrote:

> However, the plots showed different trend (currently I don't have plots on
> my laptop) which shows that the more frequently are the CDC processes
> reading from logical slots, the less overhead is incurred over PostgreSQL,
> which leads to higher throughput.

Have you accounted for disk caching? Your CDC may be getting log from
the cache when going with little lag but being forced to read from
disk (make the server do it ) when it falls behind.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Question regarding logical replication

От
Weiping Qu
Дата:
That's a good point and we haven't accounted for disk caching.
Is there any way to confirm this fact in PostgreSQL?

Weiping


On 27.10.2017 11:53, Francisco Olarte wrote:
> On Thu, Oct 26, 2017 at 10:20 PM, Weiping Qu <qu@informatik.uni-kl.de> wrote:
>
>> However, the plots showed different trend (currently I don't have plots on
>> my laptop) which shows that the more frequently are the CDC processes
>> reading from logical slots, the less overhead is incurred over PostgreSQL,
>> which leads to higher throughput.
> Have you accounted for disk caching? Your CDC may be getting log from
> the cache when going with little lag but being forced to read from
> disk (make the server do it ) when it falls behind.
>
> Francisco Olarte.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Question regarding logical replication

От
Francisco Olarte
Дата:
On Fri, Oct 27, 2017 at 12:04 PM, Weiping Qu <qu@informatik.uni-kl.de> wrote:
> That's a good point and we haven't accounted for disk caching.
> Is there any way to confirm this fact in PostgreSQL?

I doubt, as it names indicates cache should be hidden from the db server.

You could monitor the machine with varying lags and see the disk-cache
hit ratio , or monitor the throughput loss, a disk-cache effect should
exhibit a constant part for little lags, where you mostly do cache
reads, then a rising part as you begin reading from disks stabilizing
asyntotically ( as most of the fraction of reads comes from disk, but
it could also exhibit a jump if you are unlucky and you evict pages
you'll need soon ), but it is not a simple thing to measure, specially
with a job mix and long delays.

The xlog can do strange things. IIRC it is normally write-only ( only
used on crash recovery, to archive (ship) it and for log based
replication slots ), but postgres recycles segments ( which can have
an impact on big memory machines ). I do not know to what extent a
modern OS can detect the access pattern and do things like evict the
log pages early after sync.

Francisco Olarte.


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Question regarding logical replication

От
Weiping Qu
Дата:
Thanks, Francisco. From the plots we got the same feeling, cache reads 
with little lags and high cache hits really don't put extra burden on 
the original write throughput for OLTP transactions. And log-based is 
the most efficient and harm-less one as compared to trigger-based and 
timestamp based change data capture.

Weiping


On 27.10.2017 14:03, Francisco Olarte wrote:
> On Fri, Oct 27, 2017 at 12:04 PM, Weiping Qu <qu@informatik.uni-kl.de> wrote:
>> That's a good point and we haven't accounted for disk caching.
>> Is there any way to confirm this fact in PostgreSQL?
> I doubt, as it names indicates cache should be hidden from the db server.
>
> You could monitor the machine with varying lags and see the disk-cache
> hit ratio , or monitor the throughput loss, a disk-cache effect should
> exhibit a constant part for little lags, where you mostly do cache
> reads, then a rising part as you begin reading from disks stabilizing
> asyntotically ( as most of the fraction of reads comes from disk, but
> it could also exhibit a jump if you are unlucky and you evict pages
> you'll need soon ), but it is not a simple thing to measure, specially
> with a job mix and long delays.
>
> The xlog can do strange things. IIRC it is normally write-only ( only
> used on crash recovery, to archive (ship) it and for log based
> replication slots ), but postgres recycles segments ( which can have
> an impact on big memory machines ). I do not know to what extent a
> modern OS can detect the access pattern and do things like evict the
> log pages early after sync.
>
> Francisco Olarte.



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general