Обсуждение: BUG #16327: Too many files in pg_replslot folder

Поиск
Список
Период
Сортировка

BUG #16327: Too many files in pg_replslot folder

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      16327
Logged by:          Dmitry Romanenko
Email address:      romdn@inbox.ru
PostgreSQL version: 12.2
Operating system:   Windows
Description:

Hello!
4.5 million rows are inserted from the temporary table. For each of the
three subscriptions, 712,000 xid-*-lsn-*-*.spill files of 3 GB are
generated. The following lines appear in the work log: ERROR: replication
slot is busy processing with PID. Replication stops working.
What to do?

PostgreSQL 12.2, compiled by Visual C++ build 1914, 64-bit
postgresql.auto.conf
max_connections = '30'
shared_buffers = '512MB'
effective_cache_size = '6GB'
maintenance_work_mem = '512MB'
checkpoint_completion_target = '0.9'
wal_buffers = '16MB'
default_statistics_target = '100'
work_mem = '21845kB'
min_wal_size = '2GB'
max_wal_size = '8GB'
max_worker_processes = '4'
max_parallel_workers_per_gather = '2'
max_parallel_workers = '4'
max_parallel_maintenance_workers = '2'
random_page_cost = '2'

P.S. Replication on PostgreSql 11.1 (with default settings) worked fine with
inserting 4.5 million rows. The xid - * - lsn - * - *. Snap files were
smaller and larger in size. But if there were several inserts in a row of
4.5 million lines, an error appeared: ERROR: out of memory.

Best regards, Dmitry


Re: BUG #16327: Too many files in pg_replslot folder

От
Euler Taveira
Дата:
On Sun, 29 Mar 2020 at 11:49, PG Bug reporting form <noreply@postgresql.org> wrote:

P.S. Replication on PostgreSql 11.1 (with default settings) worked fine with
inserting 4.5 million rows. The xid - * - lsn - * - *. Snap files were
smaller and larger in size. But if there were several inserts in a row of
4.5 million lines, an error appeared: ERROR: out of memory.

Could you provide a test case? Are you using subtransactions? logical_decoding_work_mem [1] was added in v13 to ameliorate some high memory usage while decoding transactions. It is weird that it does not 'out of memory' in v11 too (v11 introduced Generational memory allocator to reduce logical decoding memory usage); I expect a high memory usage in v10 in a big transaction scenario.

If you are looking for a workaround, you should try to disable all subscriptions and then enable only one at a time.



Regards,

--
Euler Taveira                 http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #16327: Too many files in pg_replslot folder

От
Dmitry
Дата:
29.03.2020 22:52, Euler Taveira пишет:
On Sun, 29 Mar 2020 at 11:49, PG Bug reporting form <noreply@postgresql.org> wrote:

P.S. Replication on PostgreSql 11.1 (with default settings) worked fine with
inserting 4.5 million rows. The xid - * - lsn - * - *. Snap files were
smaller and larger in size. But if there were several inserts in a row of
4.5 million lines, an error appeared: ERROR: out of memory.

Could you provide a test case? Are you using subtransactions? logical_decoding_work_mem [1] was added in v13 to ameliorate some high memory usage while decoding transactions. It is weird that it does not 'out of memory' in v11 too (v11 introduced Generational memory allocator to reduce logical decoding memory usage); I expect a high memory usage in v10 in a big transaction scenario.

If you are looking for a workaround, you should try to disable all subscriptions and then enable only one at a time.



Regards,

--
Euler Taveira                 http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Thanks for the answer! > Could you provide a test case?
I will try to do a test in the coming days.

> Are you using subtransactions?
No. Functions are called sequentially from the windws scheduler. A large file is loaded into a temporary table. In the cursor, entries are processed line by line and placed in another temporary table. From which they are already transferred to the main table. > If you are looking for a workaround, you should try to disable all subscriptions and then enable only one at a time.
Good, but it will be inconvenient for the administrator. Or back to v11.1.

Today noticed in the repetition process.
Select from pg_stat_replication after a while gives:
pid  application_name client_port backend_start       state     sent_lsn   write_lsn  write_lag     replay_lag    reply_time
4916 sub_2            55590       2020-03-30 20:38:20 streaming 5/3344C598 5/3344A7D0 00:00:00.0058 00:00:00.0058 2020-03-30 21:42:50
4724 sub_3            55828       2020-03-30 20:38:34 streaming 5/3344C598 5/3344A7D0 00:00:00.0015 00:00:00.0015 2020-03-30 21:42:50
928  sub_4            59003       2020-03-30 20:38:56 streaming 5/3344C598 5/334442F0                             2020-03-30 21:42:49
3328 sub_2            61992       2020-03-30 21:47:44 startup

And on server 2:
ERROR: failed to start broadcast WAL: ERROR: replication slot "sub_2" is busy processing with PID 4916

And even later:
pid  application_name client_port backend_start       state     sent_lsn   write_lsn  write_lag     replay_lag    reply_time
4536 sub_2            50299       2020-03-30 22:33:33 startup
528  sub_3            53671       2020-03-30 22:31:28 startup
364
  sub_4            51630       2020-03-30 22:28:29 startup

Best regards, Dmitry