Re: [HACKERS] make async slave to wait for lsn to be replayed

Поиск
Список
Период
Сортировка
От Ivan Kartyshov
Тема Re: [HACKERS] make async slave to wait for lsn to be replayed
Дата
Msg-id c585d9a0-5bda-778e-e628-dd30959d71d3@postgrespro.ru
обсуждение исходный текст
Ответ на Re: make async slave to wait for lsn to be replayed  (Thomas Munro <thomas.munro@enterprisedb.com>)
Ответы Re: [HACKERS] make async slave to wait for lsn to be replayed  (Thom Brown <thom@linux.com>)
Re: [HACKERS] make async slave to wait for lsn to be replayed  (Alexander Korotkov <a.korotkov@postgrespro.ru>)
Список pgsql-hackers
Thank you for reviews and suggested improvements.
I rewrote patch to make it more stable.

Changes
=======
I've made a few changes:
1) WAITLSN now doesn`t depend on snapshot
2) Check current replayed LSN rather than in xact_redo_commit
3) Add syntax WAITLSN_INFINITE '0/693FF800' - for infinite wait and
WAITLSN_NO_WAIT '0/693FF800' for check if LSN was replayed as you
advised.
4) Reduce the count of loops with GUCs (WalRcvForceReply() which in 9.5
doesn`t exist).
5) Optimize loop that set latches.
6) Add two GUCs that helps us to configure influence on StartupXLOG:
count_waitlsn (denominator to check not each LSN)
interval_waitlsn (Interval in milliseconds to additional LSN check)

Feedback
========
On 09/15/2016 05:41 AM, Thomas Munro wrote:
> You hold a spinlock in one arbitrary slot, but that
> doesn't seem sufficient: another backend may also read it, compute a
> new value and then write it, while holding a different spin lock.  Or
> am I missing something?

We acquire an individual spinlock on each member of array, so you cannot
compute new value and write it concurrently.

Tested
======
We have been tested it on different servers and OS`s, in different cases 
and workloads. New version is nearly as fast as vanilla on primary and 
bring tiny influence on standby performance.

Hardware:
144 Intel Cores with HT
3TB RAM
all data on ramdisk
primary + hotstandby  on the same node.

A dataset was created with "pgbench -i -s 1000" command. For each round 
of test we pause replay on standby, make 1000000 transaction on primary 
with pgbench, start replay on standby and measure replication gap 
disappearing time under different standby workload. The workload was 
"WAITLSN ('Very/FarLSN', 1000ms timeout)" followed by "select abalance 
from pgbench_accounts there aid = random_aid;"
For vanilla 1000ms timeout was enforced on pgbench side by -R option.
GUC waitlsn parameters was adopted for 1000ms timeout on standby with 
35000 tps rate on primary.
interval_waitlsn = 500 (ms)
count_waitlsn = 30000

On 200 clients, slave caching up master as vanilla without significant 
delay.
On 500 clients, slave caching up master 3% slower then vanilla.
On 1000 clients, 12% slower.
On 5000 clients, 3 time slower because it far above our hardware ability.

How to use it
==========
WAITLSN ‘LSN’ [, timeout in ms];
WAITLSN_INFINITE ‘LSN’;
WAITLSN_NO_WAIT ‘LSN’;

#Wait until LSN 0/303EC60 will be replayed, or 10 second passed.
WAITLSN ‘0/303EC60’, 10000;

#Or same without timeout.
WAITLSN ‘0/303EC60’;
orfile:///home/vis/Downloads/waitlsn_10dev_v2.patch
WAITLSN_INFINITE '0/693FF800';

#To check if LSN is replayed can be used.
WAITLSN_NO_WAIT '0/693FF800';

Notice: WAITLSN will release on PostmasterDeath or Interruption events
if they come earlier then target LSN or timeout.


Thank you for reading, will be glad to get your feedback.

-- 
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Stas Kelvich
Дата:
Сообщение: Re: [HACKERS] Speedup twophase transactions
Следующее
От: Nikhil Sontakke
Дата:
Сообщение: Re: [HACKERS] Speedup twophase transactions