Re: Logical replication stops dropping used initial-sync replication slots
От | hubert depesz lubaczewski |
---|---|
Тема | Re: Logical replication stops dropping used initial-sync replication slots |
Дата | |
Msg-id | 20220325083856.GA26455@depesz.com обсуждение исходный текст |
Ответ на | Re: Logical replication stops dropping used initial-sync replication slots (hubert depesz lubaczewski <depesz@depesz.com>) |
Ответы |
Re: Logical replication stops dropping used initial-sync replication slots
(Amit Kapila <amit.kapila16@gmail.com>)
|
Список | pgsql-bugs |
On Tue, Mar 22, 2022 at 12:26:46PM +0100, hubert depesz lubaczewski wrote: > On Tue, Mar 22, 2022 at 12:32:41PM +0530, vignesh C wrote: > > > That wouldn't do me any good. The server I'm trying to upgrade now is > > > test bench to make sure that I can upgrade using logical replication *at > > > all*. > > > Once I will have verified that I can, I have couple hundred servers to > > > upgrade, that will have concurrent access all the time. > > Is it possible to get a reproducible test or a script for this problem? > > Well, given that we're talking about 38GB database with 60k tables > i find that unlikely. > > But I'll try if I can figure out a way to replicate the problem with > something that is shareable. OK. I don't think it will be simple/possible to replicate. Reason: Did setup 10 separate environments, all with the same database on bionic 12.9. Then I created publication with all tables in specific schemas, and subscribed to it from separate focal pg 14.2 servers. In total, used 20 servers to replicate the test 10 times. Effect: in one case replication didn't finish initial sync. Currently, it's 2022-03-25 08:32:14.10081+00, and on the one source that had failed with subscription i see: slot_name | plugin | slot_type | datoid | database | temporary | active | active_pid| xmin | catalog_xmin | restart_lsn | confirmed_flush_lsn ---------------------------------------------+----------+-----------+--------+----------+-----------+--------+------------+------+--------------+--------------+--------------------- focal14 | pgoutput | logical | 16606 | dbname | f | f | | | 161822728 | 8E9/3E70BB58 | 8E9/3E70BB58 pg_3548429_sync_3343043_7077521990523695096 | pgoutput | logical | 16606 | dbname | f | t | 32385| | 161824826 | 8F0/18327650 | 8F0/1ABF89C0 pg_3548429_sync_3079712_7077521990523695096 | pgoutput | logical | 16606 | dbname | f | t | 32378| | 161824826 | 8F0/18327650 | 8F0/1ABF89C0 pg_3548429_sync_3145162_7077521990523695096 | pgoutput | logical | 16606 | dbname | f | t | 32386| | 161824826 | 8F0/18327650 | 8F0/1ABF89C0 pg_3548429_sync_2993718_7077521990523695096 | pgoutput | logical | 16606 | dbname | f | t | 32387| | 161824826 | 8F0/18327650 | 8F0/1ABF89C0 pg_3548429_sync_3355696_7077521990523695096 | pgoutput | logical | 16606 | dbname | f | t | 32379| | 161824826 | 8F0/18327650 | 8F0/1ABF89C0 pg_3548429_sync_3099460_7077521990523695096 | pgoutput | logical | 16606 | dbname | f | t | 32380| | 161824826 | 8F0/18327650 | 8F0/1ABF89C0 pg_3548429_sync_3162651_7077521990523695096 | pgoutput | logical | 16606 | dbname | f | t | 32381| | 161824826 | 8F0/18327650 | 8F0/1ABF89C0 pg_3548429_sync_3157454_7077521990523695096 | pgoutput | logical | 16606 | dbname | f | t | 32382| | 161824826 | 8F0/18327650 | 8F0/1ABF89C0 pg_3548429_sync_3084198_7077521990523695096 | pgoutput | logical | 16606 | dbname | f | t | 32383| | 161824826 | 8F0/18327650 | 8F0/1ABF89C0 pg_3548429_sync_3142573_7077521990523695096 | pgoutput | logical | 16606 | dbname | f | t | 32384| | 161824826 | 8F0/18327650 | 8F0/1ABF89C0 (11 rows) pg_stat_replication shows 10 sync connections, and that's it. their sent_lsn, write_lsn, flush_lsn, replay_lsn seems to be progressing nicely, but nothing is changing. pg_stat_activity contains: =# select xact_start, query_start, backend_type, query from pg_stat_activity where state is distinct from 'idle'; xact_start │ query_start │ backend_type │ query ═══════════════════════════════╪═══════════════════════════════╪══════════════════════════════╪══════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════════ [null] │ [null] │ autovacuum launcher │ [null] │ 2022-03-24 20:45:30.757582+00 │ walsender │ START_REPLICATION SLOT "pg_3548429_sync_2993718_7077521990523695096"LOGICAL 8E9/40FF0170 (proto_version '1', publication_names '"focal14"') [null] │ [null] │ pglogical supervisor │ [null] │ [null] │ logical replication launcher │ [null] │ 2022-03-24 20:45:30.730818+00 │ walsender │ START_REPLICATION SLOT "pg_3548429_sync_3343043_7077521990523695096"LOGICAL 8E9/40FF0100 (proto_version '1', publication_names '"focal14"') [null] │ 2022-03-24 20:45:31.65015+00 │ walsender │ START_REPLICATION SLOT "pg_3548429_sync_3079712_7077521990523695096"LOGICAL 8E9/40FEFF60 (proto_version '1', publication_names '"focal14"') [null] │ 2022-03-24 20:45:30.733366+00 │ walsender │ START_REPLICATION SLOT "pg_3548429_sync_3145162_7077521990523695096"LOGICAL 8E9/40FF0138 (proto_version '1', publication_names '"focal14"') [null] │ 2022-03-24 20:45:30.76285+00 │ walsender │ START_REPLICATION SLOT "pg_3548429_sync_3355696_7077521990523695096"LOGICAL 8E9/40FEFF98 (proto_version '1', publication_names '"focal14"') [null] │ 2022-03-24 20:45:31.663299+00 │ walsender │ START_REPLICATION SLOT "pg_3548429_sync_3099460_7077521990523695096"LOGICAL 8E9/40FEFFD0 (proto_version '1', publication_names '"focal14"') [null] │ 2022-03-24 20:45:31.676412+00 │ walsender │ START_REPLICATION SLOT "pg_3548429_sync_3162651_7077521990523695096"LOGICAL 8E9/40FF0020 (proto_version '1', publication_names '"focal14"') [null] │ 2022-03-24 20:45:31.691807+00 │ walsender │ START_REPLICATION SLOT "pg_3548429_sync_3157454_7077521990523695096"LOGICAL 8E9/40FF0058 (proto_version '1', publication_names '"focal14"') [null] │ 2022-03-24 20:45:31.69548+00 │ walsender │ START_REPLICATION SLOT "pg_3548429_sync_3084198_7077521990523695096"LOGICAL 8E9/40FF0090 (proto_version '1', publication_names '"focal14"') [null] │ 2022-03-24 20:45:30.974833+00 │ walsender │ START_REPLICATION SLOT "pg_3548429_sync_3142573_7077521990523695096"LOGICAL 8E9/40FF00C8 (proto_version '1', publication_names '"focal14"') 2022-03-25 08:35:27.971161+00 │ 2022-03-25 08:35:27.971161+00 │ client backend │ select xact_start, query_start,backend_type, query from pg_stat_activity where state is distinct from 'idle'; [null] │ [null] │ background writer │ [null] │ [null] │ checkpointer │ [null] │ [null] │ walwriter │ (17 rows) So, it looks to me that there is some random-ish condition that causes the problem. I tried to insulate pgs from the world as well as I could: no monitoring, no cron, no external access. Just replication and my simple load generator, running loop: CREATE TABLE depesz_test as SELECT i, repeat('payload', 50) as payload FROM generate_series(1,100000) i; CREATE INDEX depesz_test_idx on depesz_test(i); DROP TABLE depesz_test; every minute. I'm out of ideas. The good-ish news is that when I tried replication in smaller sets (30k tables, not 60k), and/or with just 2 max_sync_workers_per_subscription, and not 10, it never failed. so I guess I'll go that way. Best regards, depesz
В списке pgsql-bugs по дате отправления:
Предыдущее
От: PG Bug reporting formДата:
Сообщение: BUG #17448: In Windows 10, version 1703 and later, huge_pages doesn't work.
Следующее
От: "egashira.yusuke@fujitsu.com"Дата:
Сообщение: RE: Reconnect a single connection used by multiple threads in embedded SQL in C application causes error.