Re: [HACKERS] WIP: long transactions on hot standby feedback replica/ proof of concept

Поиск
Список
Период
Сортировка
Thank you for your valuable comments. I've made a few adjustments.

The main goal of my changes is to let long read-only transactions run on 
replica if hot_standby_feedback is turned on.


Patch1 - hsfeedback_av_truncate.patch is made to stop 
ResolveRecoveryConflictWithLock occurs on replica, after autovacuum lazy 
truncates heap on master cutting some pages at the end. When 
hot_standby_feedback is on, we know that the autovacuum does not remove 
anything superfluous, which could be needed on standby, so there is no 
need to rise any ResolveRecoveryConflict*.

1) Add to xl_standby_locks and xl_smgr_truncate isautovacuum flag, which 
tells us that autovacuum generates them.

2) When autovacuum decides to trim the table (using lazy_truncate_heap), 
it takes AccessExclusiveLock and sends this lock to the replica, but 
replica should ignore  AccessExclusiveLock if hot_standby_feedback=on.

3) When autovacuum truncate wal message is replayed on a replica, it 
takes ExclusiveLock on a table, so as not to interfere with read-only 
requests.

We have two cases of resolving ResolveRecoveryConflictWithLock if timers 
  (max_standby_streaming_delay and max_standby_archive_delay) have run 
out:
backend is idle in transaction (waiting input) - in this case backend 
will be sent SIGTERM
backend transaction is running query - in this case running transaction 
will be aborted

How to test:
Make async replica, turn on feedback and reduce 
max_standby_streaming_delay.
Make autovacuum more aggressive.
autovacuum = on
autovacuum_max_workers = 1
autovacuum_naptime = 1s
autovacuum_vacuum_threshold = 1
autovacuum_vacuum_cost_delay = 0

Test1:
Here we will do a load on the master and simulation of  a long 
transaction with repeated 1 second SEQSCANS on the replica (by  calling 
pg_sleep 1 second duration every 6 seconds).
MASTER        REPLICA
     hot_standby = on
     max_standby_streaming_delay = 1s
     hot_standby_feedback = on
start
CREATE TABLE test AS (SELECT id, 1 AS value
FROM generate_series(1,1) id);
pgbench -T600 -P2 -n --file=master.sql postgres
(update test set value = value;)
     start
     BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
     SELECT pg_sleep(value) FROM test;
     \watch 6

---Autovacuum truncate pages at the end
Result on replica:
FATAL: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.

On Patched version lazy_vacuum_truncation passed without fatal errors.

Only some times Error occurs because this tests is too synthetic
ERROR: canceling statement due to conflict with recovery
DETAIL: User was holding shared buffer pin for too long.
Because of rising ResolveRecoveryConflictWithSnapshot while
redo some visibility flags to avoid this conflict we can do test2 or 
increase max_standby_streaming_delay.

Test2:
Here we will do a load on the master and simulation of  a long 
transaction on the replica (by  taking LOCK on table)
MASTER        REPLICA
     hot_standby = on
     max_standby_streaming_delay = 1s
     hot_standby_feedback = on
start
CREATE TABLE test AS (SELECT id, 1 AS value FROM generate_series(1,1) 
id);
pgbench -T600 -P2 -n --file=master.sql postgres
(update test set value = value;)
     start
     BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
     LOCK TABLE test IN ACCESS SHARE MODE;
     select * from test;
     \watch 6

---Autovacuum truncate pages at the end
Result on replica:
FATAL: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.

On Patched version lazy_vacuum_truncation passed without fatal errors.

Test3:
Here we do a load on the master and simulation of  a long transaction 
with repeated 1 second SEQSCANS on the replica (by  calling pg_sleep 1 
second duration every 6 seconds).
MASTER        REPLICA
     hot_standby = on
     max_standby_streaming_delay = 4s
     hot_standby_feedback = on
start
CREATE TABLE test AS (SELECT id, 200 AS value
FROM generate_series(1,1) id);
pgbench -T600 -P2 -n --file=master.sql postgres
(update test set value = value;)
     start
     BEGIN TRANSACTION ISOLATION LEVEL READ COMMITTED;
     SELECT pg_sleep(value) FROM test;

---Autovacuum truncate pages at the end
Result on replica:
FATAL: terminating connection due to conflict with recovery
DETAIL: User was holding a relation lock for too long.

On Patched version lazy_vacuum_truncation passed without fatal errors.

This way we can make transactions with SEQSCAN, INDEXSCAN or BITMAPSCAN


Patch2 - hsfeedback_noninvalide_xmin.patch
When walsender is initialized, its xmin in PROCARRAY is set to 
GetOldestXmin() in order to prevent autovacuum running on master from 
truncating relation and removing some pages that are required by 
replica. This might happen if master's autovacuum and replica's query 
started simultaneously. And the replica has not yet reported its xmin 
value.

How to test:
Make async replica, turn on feedback, reduce max_standby_streaming_delay 
and aggressive autovacuum.
autovacuum = on
autovacuum_max_workers = 1
autovacuum_naptime = 1s
autovacuum_vacuum_threshold = 1
autovacuum_vacuum_cost_delay = 0

Test:
Here we will start replica and begi repeatable read transaction on 
table, then we stop replicas postmaster to prevent starting walreceiver 
worker (on master startup) and sending master it`s transaction xmin over 
hot_standby_feedback message.
MASTER        REPLICA
start
CREATE TABLE test AS (SELECT id, 1 AS value FROM 
generate_series(1,10000000) id);
stop
     start
     BEGIN TRANSACTION ISOLATION LEVEL REPEATABLE READ;
     SELECT * FROM test;
     stop postmaster with gdb
start
DELETE FROM test WHERE id > 0;
wait till autovacuum delete and changed xmin
             release postmaster with gdb
--- Result on replica
FATAL: terminating connection due to conflict with recovery
DETAIL: User query might have needed to see row versions that must be 
removed.

There is one feature of the behavior of standby, which let us to allow 
the autovacuum to cut off the page table (at the end of relation) that 
no one else needs (because there is only dead and removed tuples). So if 
the standby SEQSCAN or another *SCAN mdread a page that is damaged or 
has been deleted, it will receive a zero page, and not break the request 
for ERROR.

Could you give me your ideas over these patches.

-- 
Ivan Kartyshov
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Grigory Smolkin
Дата:
Сообщение: Re: Reopen logfile on SIGHUP
Следующее
От: Nikolay Shaplov
Дата:
Сообщение: Re: [PATCH] Opclass parameters