Re: [HACKERS] Directory pg_replslot is not properly cleaned

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: [HACKERS] Directory pg_replslot is not properly cleaned
Дата
Msg-id A6F30A24-1558-439E-B5AC-C7F72E527313@anarazel.de
обсуждение исходный текст
Ответ на Re: [HACKERS] Directory pg_replslot is not properly cleaned  (Fabrízio de Royes Mello <fabriziomello@gmail.com>)
Ответы Re: [HACKERS] Directory pg_replslot is not properly cleaned  (Fabrízio de Royes Mello <fabriziomello@gmail.com>)
Список pgsql-hackers

On June 7, 2017 11:29:28 AM PDT, "Fabrízio de Royes Mello" <fabriziomello@gmail.com> wrote:
>On Fri, Jun 2, 2017 at 6:37 PM, Fabrízio de Royes Mello <
>fabriziomello@gmail.com> wrote:
>>
>>
>> On Fri, Jun 2, 2017 at 6:32 PM, Fabrízio de Royes Mello <
>fabriziomello@gmail.com> wrote:
>> >
>> > Hi all,
>> >
>> > This week I faced a out of disk space trouble in 8TB production
>cluster. During investigation we notice that pg_replslot was the
>culprit
>growing more than 1TB in less than 1 (one) hour.
>> >
>> > We're using PostgreSQL 9.5.6 with pglogical 1.2.2 replicating to a
>new
>9.6 instance and planning the upgrade soon.
>> >
>> > What I did? I freed some disk space just to startup PostgreSQL and
>begin the investigation. During the 'startup recovery' simply the files
>inside the pg_replslot was tottaly removed. So our trouble with 'out of
>disk space' disappear. Then the server went up and physical slaves
>attached
>normally to master but logical slaves doesn't, staying stalled in
>'catchup'
>state.
>> >
>> > At this moment the "pg_replslot" directory started growing fast
>again
>and forced us to drop the logical replication slot and we lost the
>logical
>slave.
>> >
>> > Googling awhile I found this thread [1] about a similar issue
>reported
>by Dmitriy Sarafannikov and replied by Andres and Álvaro.
>> >
>> > I ran the test case provided by Dmitriy [1] against branches:
>> > - REL9_4_STABLE
>> > - REL9_5_STABLE
>> > - REL9_6_STABLE
>> > - master
>> >
>> > After all test the issue remains... and also using the new Logical
>Replication stuff (CREATE PUB/CREATE SUB). Just after a restart the
>"pg_replslot" was properly cleaned. The typo in
>ReorderBufferIterTXNInit
>complained by Dimitriy was fixed but the issue remains.
>> >
>> > Seems no one complain again about this issue and the thread was
>lost.
>> >
>> > The attached is a reworked version of Dimitriy's patch that seems
>solve
>the issue. I confess I don't know enough about replication slots code
>to
>really know if it's the best solution.
>> >
>> > Regards,
>> >
>> > [1]
>https://www.postgresql.org/message-id/1457621358.355011041%40f382.i.mail.ru
>> >
>>
>> Just adding Dimitriy to conversation... previous email I provided was
>wrong.
>>
>
>Does anyone have some thought about this critical issue?
>

I plan to look into it over the next few days.

Andres
--
Sent from my Android device with K-9 Mail. Please excuse my brevity.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabrízio de Royes Mello
Дата:
Сообщение: Re: [HACKERS] Directory pg_replslot is not properly cleaned
Следующее
От: Robert Haas
Дата:
Сообщение: Re: [HACKERS] Race conditions with WAL sender PID lookups