Re: [HACKERS] Issues with logical replication
От | Petr Jelinek |
---|---|
Тема | Re: [HACKERS] Issues with logical replication |
Дата | |
Msg-id | d0588e65-289c-9768-7a43-2e8ef4d05620@2ndquadrant.com обсуждение исходный текст |
Ответ на | [HACKERS] Issues with logical replication (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>) |
Ответы |
Re: [HACKERS] Issues with logical replication
(Petr Jelinek <petr.jelinek@2ndquadrant.com>)
Re: [HACKERS] Issues with logical replication (Stas Kelvich <s.kelvich@postgrespro.ru>) |
Список | pgsql-hackers |
On 02/10/17 17:49, Konstantin Knizhnik wrote: > I have faced two issues with logical replication. > Reproducing scenario: > > 1. start two Postgres instances (I start both at local machine). > 2. Initialize pgbench tables at both instances: > pgbench -i -s 10 postgres > 3. Create publication of pgbench_accounts table at one node: > create publication pub for table pgbench_accounts; > 4. Create subscription at another node: > delete from pgbench_accounts; > CREATE SUBSCRIPTION sub connection 'dbname=postgres host=localhost > port=5432 sslmode=disable' publication pub; > CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS > TRIGGER AS $$ > BEGIN > -- RETURN NEW; > END $$ LANGUAGE plpgsql; > CREATE TRIGGER replication_trigger BEFORE INSERT OR UPDATE OR DELETE > ON pgbench_accounts FOR EACH ROW EXECUTE PROCEDURE > replication_trigger_proc(); > ALTER TABLE pgbench_accounts ENABLE ALWAYS TRIGGER replication_trigger; > 5. Start pgbench at primary node: > pgbench -T 1000 -P 2 -c 10 postgres > > > Please notice commented "return new" statement. Invocation of this > function cause error and in log we see repeated messages: > > 2017-10-02 16:47:46.764 MSK [32129] LOG: logical replication table > synchronization worker for subscription "sub", table "pgbench_accounts" > has started > 2017-10-02 16:47:46.771 MSK [32129] ERROR: control reached end of > trigger procedure without RETURN > 2017-10-02 16:47:46.771 MSK [32129] CONTEXT: PL/pgSQL function > replication_trigger_proc() > COPY pgbench_accounts, line 1: "1 1 0 " > 2017-10-02 16:47:46.772 MSK [28020] LOG: worker process: logical > replication worker for subscription 17264 sync 17251 (PID 32129) exited > with exit code 1 > ... > > > After few minutes of work primary node (with publication) is crashed > with the following stack trace: > > Program terminated with signal SIGABRT, Aborted. > #0 0x00007f3608f8ec37 in __GI_raise (sig=sig@entry=6) at > ../nptl/sysdeps/unix/sysv/linux/raise.c:56 > 56 ../nptl/sysdeps/unix/sysv/linux/raise.c: No such file or directory. > (gdb) bt > #0 0x00007f3608f8ec37 in __GI_raise (sig=sig@entry=6) at > ../nptl/sysdeps/unix/sysv/linux/raise.c:56 > #1 0x00007f3608f92028 in __GI_abort () at abort.c:89 > #2 0x00000000009f5740 in ExceptionalCondition (conditionName=0xbf6b30 > "!(((xid) != ((TransactionId) 0)))", > errorType=0xbf69af "FailedAssertion", fileName=0xbf69a8 "lmgr.c", > lineNumber=582) at assert.c:54 > #3 0x000000000086ac1d in XactLockTableWait (xid=0, rel=0x0, ctid=0x0, > oper=XLTW_None) at lmgr.c:582 > #4 0x000000000081c9f7 in SnapBuildWaitSnapshot (running=0x2749198, > cutoff=898498) at snapbuild.c:1400 > #5 0x000000000081c7c7 in SnapBuildFindSnapshot (builder=0x2807910, > lsn=798477760, running=0x2749198) at snapbuild.c:1311 > #6 0x000000000081c339 in SnapBuildProcessRunningXacts > (builder=0x2807910, lsn=798477760, running=0x2749198) at snapbuild.c:1106 Hmm this would suggest that XLOG_RUNNING_XACTS record contains invalid transaction ids but we specifically skip those in GetRunningTransactionData(). Can you check pg_waldump output for that record (lsn is shown in the backtrace)? > > Now fix the trigger function: > CREATE OR REPLACE FUNCTION replication_trigger_proc() RETURNS TRIGGER AS $$ > BEGIN > RETURN NEW; > END $$ LANGUAGE plpgsql; > > And manually perform at master two updates inside one transaction: > > postgres=# begin; > BEGIN > postgres=# update pgbench_accounts set abalance=abalance+1 where aid=26; > UPDATE 1 > postgres=# update pgbench_accounts set abalance=abalance-1 where aid=26; > UPDATE 1 > postgres=# commit; > <hangs> > > and in replica log we see: > 2017-10-02 18:40:26.094 MSK [2954] LOG: logical replication apply > worker for subscription "sub" has started > 2017-10-02 18:40:26.101 MSK [2954] ERROR: attempted to lock invisible > tuple > 2017-10-02 18:40:26.102 MSK [2882] LOG: worker process: logical > replication worker for subscription 16403 (PID 2954) exited with exit > code 1 > > Error happens in trigger.c: > > #3 0x000000000069bddb in GetTupleForTrigger (estate=0x2e36b50, > epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tid=0x2dd08ac, > lockmode=LockTupleNoKeyExclusive, newSlot=0x7ffc4420ec40) at > trigger.c:3103 > #4 0x000000000069b259 in ExecBRUpdateTriggers (estate=0x2e36b50, > epqstate=0x7ffc4420eda0, relinfo=0x2dcfe90, tupleid=0x2dd08ac, > fdw_trigtuple=0x0, slot=0x2dd0240) at trigger.c:2748 > #5 0x00000000006d2395 in ExecSimpleRelationUpdate (estate=0x2e36b50, > epqstate=0x7ffc4420eda0, searchslot=0x2dd0358, slot=0x2dd0240) > at execReplication.c:461 > #6 0x0000000000820894 in apply_handle_update (s=0x7ffc442163b0) at > worker.c:736 We have locked the same tuple in RelationFindReplTupleByIndex() just before this gets called and didn't get the same error. I guess we do something wrong with snapshots. Will need to investigate more. -- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Alexander KorotkovДата:
Сообщение: Re: [HACKERS] GSoC 2017: weekly progress reports (week 6)
Следующее
От: Alexander KorotkovДата:
Сообщение: Re: [HACKERS][BUG] Cache invalidation for queries that contains constof temporary composite type