"Multiple table synchronizations are processed serially" still happens

Поиск
Список
Период
Сортировка
От Guillaume Lelarge
Тема "Multiple table synchronizations are processed serially" still happens
Дата
Msg-id CAECtzeVLZ9jq7QSApLMJmbGwSX9Sj0GP=5JESj2Wt5cFz5GqbQ@mail.gmail.com
обсуждение исходный текст
Ответы Re: "Multiple table synchronizations are processed serially" still happens
Re: "Multiple table synchronizations are processed serially" still happens
Список pgsql-hackers
Hi,

One of my customers has an issue with logical replication. As $SUBJECT says, multiple table synchronization happens serially. To be honest, it doesn't do this every time. It happens when the tables are big enough.


This thread was closed by a commit (https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=6c2003f8a1bbc7c192a2e83ec51581c018aa162f) which apparently fixed the issue for the OP.

Attached is a small test case where it still happens for me on 12.6, 11.11, and 10.16. I can't make it happen on 13.2. I don't know why. It may imply bigger tables for 13.2, but why? I simply don't know.

Anyway, the issue at the end of the test case is that synchronizations sometimes happen serially. You can see on the process list that one walsender process is waiting in "idle in transaction" state:

guillau+  486822    2227  0 10:44 ?        00:00:00 /opt/postgresql/12/bin/postgres
guillau+  486824  486822  0 10:44 ?        00:00:01 postgres: testcase: checkpointer  
guillau+  486825  486822  0 10:44 ?        00:00:04 postgres: testcase: background writer  
guillau+  486826  486822  1 10:44 ?        00:00:06 postgres: testcase: walwriter  
guillau+  486827  486822  0 10:44 ?        00:00:00 postgres: testcase: autovacuum launcher  
guillau+  486828  486822  0 10:44 ?        00:00:00 postgres: testcase: stats collector  
guillau+  486829  486822  0 10:44 ?        00:00:00 postgres: testcase: logical replication launcher  
guillau+  489822  486822  0 10:55 ?        00:00:00 postgres: testcase: logical replication worker for subscription 16436  
guillau+  489824  486822 10 10:55 ?        00:00:01 postgres: testcase: walsender repuser ::1(38770) idle
guillau+  489825  486822 22 10:55 ?        00:00:02 postgres: testcase: logical replication worker for subscription 16436 sync 16416  
guillau+  489826  486822  8 10:55 ?        00:00:00 postgres: testcase: walsender repuser ::1(38772) COPY
guillau+  489827  486822  0 10:55 ?        00:00:00 postgres: testcase: logical replication worker for subscription 16436 sync 16427  
guillau+  489828  486822  0 10:55 ?        00:00:00 postgres: testcase: walsender repuser ::1(38774) idle in transaction waiting

And the log says (from the start of the subscription):

2021-04-28 10:55:32.337 CEST [489822] LOG:  logical replication apply worker for subscription "sub" has started
2021-04-28 10:55:32.342 CEST [489824] LOG:  duration: 0.426 ms  statement: SELECT pg_catalog.set_config('search_path', '', false);
2021-04-28 10:55:32.342 CEST [489824] LOG:  received replication command: IDENTIFY_SYSTEM
2021-04-28 10:55:32.342 CEST [489824] LOG:  received replication command: START_REPLICATION SLOT "sub" LOGICAL 0/0   (proto_version '1', publication_names '"pub"')
2021-04-28 10:55:32.342 CEST [489824] LOG:  starting logical decoding for slot "sub"
2021-04-28 10:55:32.342 CEST [489824] DETAIL:  Streaming transactions committing after 1/FF5D8130, reading WAL from  1/FF5D80F8.
2021-04-28 10:55:32.342 CEST [489824] LOG:  logical decoding found consistent point at 1/FF5D80F8
2021-04-28 10:55:32.342 CEST [489824] DETAIL:  There are no running transactions.
2021-04-28 10:55:32.345 CEST [489825] LOG:  logical replication table synchronization worker for subscription "sub", table "foo" has started
2021-04-28 10:55:32.348 CEST [489826] LOG:  duration: 0.315 ms  statement: SELECT pg_catalog.set_config('search_path', '', false);
2021-04-28 10:55:32.349 CEST [489826] LOG:  duration: 0.041 ms  statement: BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ
2021-04-28 10:55:32.349 CEST [489826] LOG:  received replication command: CREATE_REPLICATION_SLOT "sub_16436_sync_16416" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
2021-04-28 10:55:32.355 CEST [489827] LOG:  logical replication table synchronization worker for subscription "sub", table "bar" has started
2021-04-28 10:55:32.359 CEST [489828] LOG:  duration: 0.431 ms  statement: SELECT pg_catalog.set_config('search_path', '', false);
2021-04-28 10:55:32.359 CEST [489828] LOG:  duration: 0.048 ms  statement: BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ
2021-04-28 10:55:32.360 CEST [489828] LOG:  received replication command: CREATE_REPLICATION_SLOT "sub_16436_sync_16427" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT
2021-04-28 10:55:32.407 CEST [489826] LOG:  logical decoding found consistent point at 1/FF602880
2021-04-28 10:55:32.407 CEST [489826] DETAIL:  There are no running transactions.
2021-04-28 10:55:32.409 CEST [489826] LOG:  duration: 1.262 ms  statement: SELECT c.oid, c.relreplident  FROM pg_catalog.pg_class c  INNER JOIN pg_catalog.pg_namespace n        ON (c.relnamespace = n.oid) WHERE n.nspname =     's01'   AND c.relname = 'foo'   AND c.relkind = 'r'
2021-04-28 10:55:32.410 CEST [489826] LOG:  duration: 1.347 ms  statement: SELECT a.attname,       a.atttypid,       a.atttypmod,       a.attnum = ANY(i.indkey)  FROM pg_catalog.pg_attribute a  LEFT JOIN pg_catalog.pg_index i         ON (i.indexrelid = pg_get_replica_identity_index(16390)) WHERE a.attnum > 0::pg_catalog.int2   AND NOT a.            attisdropped AND a.attgenerated = ''   AND a.attrelid = 16390 ORDER BY a.attnum
2021-04-28 10:55:32.484 CEST [489828] LOG:  logical decoding found initial starting point at 1/FF63D168
2021-04-28 10:55:32.484 CEST [489828] DETAIL:  Waiting for transactions (approximately 1) older than 503 to end.
2021-04-28 10:55:33.484 CEST [489828] LOG:  process 489828 still waiting for ShareLock on transaction 502 after 1000.103 ms
2021-04-28 10:55:33.484 CEST [489828] DETAIL:  Process holding the lock: 489825. Wait queue: 489828.
2021-04-28 10:55:33.496 CEST [489828] LOG:  process 489828 still waiting for ShareLock on transaction 502 after 1012.089 ms

And it logs the "still waiting" message as long as the first table is being synchronized. Once this is done, it releases the lock, and the synchronization of the second table starts.

Is there something I didn't understand on the previous thread? I'd like to know why serial synchronization happens, and if there's a way to avoid it.

Thanks.

Regards.


--
Guillaume.
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: "邱宇航(烛远)"
Дата:
Сообщение: 回复:Attach to shared memory after fork()
Следующее
От: Guillaume Lelarge
Дата:
Сообщение: Re: "Multiple table synchronizations are processed serially" still happens