Understanding logical replication lag

Поиск

Список

Период

Сортировка

От	Patrick Molgaard
Тема	Understanding logical replication lag
Дата	21 января 2021 г. 21:21:20
Msg-id	CA+Zan4uOGR2PmCYowN6hF-qi=x3_KTkr3SgJ=D23STN4epGWSg@mail.gmail.com обсуждение исходный текст
Список	pgsql-performance

Дерево обсуждения

Hi all,

My top-level query is: I'm using logical replication under pg 9.6 to do a kind of change data capture and I'm seeing occasional extended periods of significant lag. I'm not sure what conceptual model I'm missing in order to understand why this happens.

The details:

I'm running Postgres 9.6.19 from the postgres debian apt repos & the wal2json extension.

I have a custom client application which essentially executes pg_logical_slot_get_changes() for some manually-created logical replication slot on a loop.

I'm monitoring replication lag defined as pg_current_xlog_location() - confirmed_flush_lsn for that slot.

What I'm observing is - very occasionally - an extended period (hours long) wherein:

* The normal database write load continues or slightly increases

* calls to pg_logical_slot_get_changes() return no rows and confirmed_flush_lsn doesn't move

* the duration of a call to pg_logical_slot_get_changes() rises linearly over time

I understand from the docs and research that this is usually caused by a long-running write transaction, but I notice I'm still confused.

* I'm not 100% sure - I'm still confirming - but I'm fairly confident that I don't have any egregiously long write transactions (at least on that scale of hours). Are there any other common scenarios that can result in a similar 'blockage'? e.g some categories of long read-only transactions, or advisory locks, or other kinds of database activity like a vacuum?

* Conversely, from experimenting, it seems as if not all long-running write transactions cause pg_logical_slot_get_changes() to be unable to advance. In fact, I'm not able so far to produce a minimal set of simple queries which show that behaviour.

Given the following sequence of queries I see changes emitted:

-- session 1

begin;

insert into foo(bar,baz) values (1, 1);

-- session 2

begin;

insert into foo(bar,baz) values (2,2);

commit;

-- session 3

select data from pg_logical_slot_get_changes('example-slot', NULL, NULL, 'format-version', '2');

Session 3 is able to return the row from session 2 despite session 1's ongoing transaction starting first and not yet committing. Can you help me understand (or better yet point me to a resource which explains) the underlying logic defining how logical decoding does in fact get blocked by in-flight transactions?

Thanks,

Patrick

В списке pgsql-performance по дате отправления:

Предыдущее

От: Don Seiler
Дата: 16 января 2021 г., 00:18:48
Сообщение: Re: High COMMIT times

Следующее

От: Nagaraj Raj
Дата: 22 января 2021 г., 04:53:26
Сообщение: Query performance issue

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Understanding logical replication lag

Предыдущее

Следующее