Re: replication primary writting infinite number of WAL files

Поиск
Список
Период
Сортировка
От Adrian Klaver
Тема Re: replication primary writting infinite number of WAL files
Дата
Msg-id 38f7d87c-ae22-48e8-a4c4-0acde1ad6eb9@aklaver.com
обсуждение исходный текст
Ответ на replication primary writting infinite number of WAL files  (Les <nagylzs@gmail.com>)
Список pgsql-general
On 11/24/23 03:39, Les wrote:
> Hello,
> Yesterday, the primary server suddenly started 
> writing to the pg_wal directory at a crazy pace, 1.5GB/sec, but 
> sometimes it went up to over 3GB/sec. The pg_wal started fattening up 
> and didn't stop until it ran out of disk space. It happened so fast that 
> we didn't have time to react. We then stopped all applications 
> (postgresql clients) because we thought one of them was causing the 
> problem. 


> The only exception is a sequence 
> value that was moved millions of steps within a single minute. Of 

Did you determine this by looking at select * from some_seq?

> This new instance worked for about 12 hours.  This morning, the 
> error occurred again, in the same form. Based on our previous 
> experience, we immediately deleted the standby and its replication slot, 
> and the problem resolved itself (except that the standby had to be 
> deleted again). Without rebooting or restarting anything else, the 
> problem went away. I managed to save small part of the pg_wal before 
> deleting the slot. We looked into this, we saw something like this:

Are the servers open to the world and if so have you explored whether 
there has been an intrusion?

Do you have logs that cover the period from when it transitioned from 
working normally to going haywire?


> We looked at the PostgreSQL release history, and we see some bug fixes 
> in version 14.7 that might have something to do with this:
> 
> https://www.postgresql.org/docs/release/14.7/ 
> <https://www.postgresql.org/docs/release/14.7/>
> 
>  > Ignore invalidated logical-replication slots while determining oldest 
> catalog xmin (Sirisha Chamarthi) A replication slot could prevent 
> cleanup of dead tuples in the system catalogs even after it becomes 
> invalidated due to exceeding max_slot_wal_keep_size. Thus, failure of a 
> replication consumer could lead to indefinitely-large catalog bloat.
>

You are using repmgr which as I understand it uses streaming not logical 
replication.

> Thank you,
> 
>     Laszlo
> 
> 

-- 
Adrian Klaver
adrian.klaver@aklaver.com




В списке pgsql-general по дате отправления:

Предыдущее
От: Zahir Lalani
Дата:
Сообщение: RE: Odd Shortcut behaviour in PG14
Следующее
От: Adrian Klaver
Дата:
Сообщение: Re: Inquiry Regarding Initial Seed for pgsql Protocol Fuzz Testing