Re: BUG #17524: Increase in WAL size due to logical replication with publication contain a table with low activity.

Поиск
Список
Период
Сортировка
От Raman Kumar
Тема Re: BUG #17524: Increase in WAL size due to logical replication with publication contain a table with low activity.
Дата
Msg-id CAG8FmLVDgxkyBPz4d6t9yy63EvzVjcN0Y1nzyhpyV7H3OkkEmg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: BUG #17524: Increase in WAL size due to logical replication with publication contain a table with low activity.  (Amit Kapila <amit.kapila16@gmail.com>)
Список pgsql-bugs
We are using a debezium postgres kafka connector, it does handle the empty transaction as far as i know.

About the large transactions, no there aren't any as most of the time the disk usage spikes when the activity is really low.

I am using AWS RDS to host the postgres instance, so I don't really have an idea on how to check the data/pg_replslot directory.


In order to mitigate the issue temporarily, We have added a dummy table in the publication on which we are inserting random. This keeps the replication lag under control and avoid filling up the disk.

On Tue, Jun 21, 2022 at 7:08 PM Amit Kapila <amit.kapila16@gmail.com> wrote:
On Mon, Jun 20, 2022 at 7:07 PM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      17524
> Logged by:          Raman Kumar
> Email address:      raman.kumar@r-indventures.com
> PostgreSQL version: 13.6
> Operating system:   Ubuntu
> Description:
>
> We are using logical replication to source the event from postgres. and
> publication includes a single table with (Insert, update, delete and
> truncate ).
> Everything works great, Until we run into a strange issue i.e. Increase is
> the disk usage, abnormally.
> On looking further we have found that some of the disk space occupy by the
> wal.
> Using this query
> ```
> Select slot_name, pg_current_wal_lsn(), confirmed_flush_lsn,
> (pg_current_wal_lsn() - confirmed_flush_lsn) as lag from
> pg_replication_slots;
> ```
> ```
> "slot_name"   "pg_current_wal_lsn"  "confirmed_flush_lsn" "lsn_distance"
> "poc_kafka_connect"   "109C/AC001020"   "109C/94909990"   393180816
> ```
>
> ```
> select usename, client_addr, pg_current_wal_lsn(), flush_lsn,
> (pg_current_wal_lsn() - flush_lsn) as lag from pg_stat_replication;
> ```
>
> ```
> "usename" "client_addr" "pg_current_wal_lsn"  "flush_lsn"   "lag"
> "postgres"    "x.x.x.x"   "109C/AC0011A0"   "109C/94909990"   393181200
> ```
>
> We have notice the replication lag. But as the publication includes on the
> single table, the last changes are already replicated/published.
> So Technically there is no replication lag as per publication.
>

The lag is computed based on the difference between the WAL location
written in the publisher and the WAL location confirmed by the
subscriber as replayed. Normally for non-published data, the latest
WAL location is confirmed by the subscriber either because we send an
empty transaction (pair of BEGIN ... COMMIT commands) or via
keep-alive messages. Now, it is not clear why none of those is working
in your case? Are there any large transactions happening on the
non-published tables? You can once check data/pg_replslot directory to
confirm if there are any spill files present there?

--
With Regards,
Amit Kapila.

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: BUG #17522: While using --with-ssl=openssl and PG_TEST_EXTRA='ssl' options, SSL test fails on OpenBSD 7.1
Следующее
От: Andrey Borodin
Дата:
Сообщение: Re: BUG #17528: ERROR: could not access status of transaction 1997627701