Обсуждение: WAL rotation question version 8.3.0

Поиск
Список
Период
Сортировка

WAL rotation question version 8.3.0

От
Evan Rempel
Дата:
I am setting up a postgresql server (duh) and am using archive_mode=on
The archive command that I am using sends data to an enterprise
backup server across the network, and I must be able to handle outages
of that server without taking down the postgresql server.

Short outages are fine because the archive_command will return a non zero
result to postgresql and it will be retried every minute until successful.

If the backup server is out for a longer time, new WAL files will be created
by postgresql. This will eventually fill the pg_xlog filesystem and bad things
happen :-( To protect the production database functionality, when the pg_xlog
filesystem reaches some percentage full (we chose 90%) then the archive_command
starts reporting a success (return of zero) even though it is not able to
archive the xlog files.

I understand that this prevents me from doing a disaster recovery AND prevents
me from doing a point in time restore, but in our opinion it is better than letting
the database crash.

Now to the question.

Once the archive_command starts lying about its success, postgresql  deletes
a number of the xlog files that it has been told have been successfuly archived.
Why does it do this? Can I control it? Can I turn it off?


--
Evan Rempel,       Senior Systems Administrator
University of Victoria

Re: WAL rotation question version 8.3.0

От
Alvaro Herrera
Дата:
Evan Rempel wrote:

> Now to the question.
>
> Once the archive_command starts lying about its success, postgresql  deletes
> a number of the xlog files that it has been told have been successfuly archived.
> Why does it do this? Can I control it? Can I turn it off?

Because they're no longer needed.

If you want to keep those files, make the archive_command not lie.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: WAL rotation question version 8.3.0

От
Evan Rempel
Дата:
Alvaro Herrera wrote:
> Evan Rempel wrote:
>
>> Now to the question.
>>
>> Once the archive_command starts lying about its success, postgresql  deletes
>> a number of the xlog files that it has been told have been successfuly archived.
>> Why does it do this? Can I control it? Can I turn it off?
>
> Because they're no longer needed.
>
> If you want to keep those files, make the archive_command not lie.


Normally posgres will rename the old WAL files that have been archived and are no longer needed,
keeping the number of WAL files constant. In this case, it actually deletes them.
Why is the behaviour different?

--
Evan Rempel

Re: WAL rotation question version 8.3.0

От
Alvaro Herrera
Дата:
Evan Rempel wrote:
> Alvaro Herrera wrote:
>> Evan Rempel wrote:
>>
>>> Now to the question.
>>>
>>> Once the archive_command starts lying about its success, postgresql  deletes
>>> a number of the xlog files that it has been told have been successfuly archived.
>>> Why does it do this? Can I control it? Can I turn it off?
>>
>> Because they're no longer needed.
>>
>> If you want to keep those files, make the archive_command not lie.
>
>
> Normally posgres will rename the old WAL files that have been archived and are no longer needed,
> keeping the number of WAL files constant. In this case, it actually deletes them.
> Why is the behaviour different?

Renaming files is done because the files will be reused in the future
under the new name.  However, after a long archiver failure, new files
need to be created to hold the extra data.  When the archiver is
restored, those excess files can be deleted because they're not needed
for recycling.  (The number of files to keep for recycling is a function
of checkpoint_segments.)

--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: WAL rotation question version 8.3.0

От
Evan Rempel
Дата:
>>>> Now to the question.
>>>>
>>>> Once the archive_command starts lying about its success, postgresql  deletes
>>>> a number of the xlog files that it has been told have been successfuly archived.
>>>> Why does it do this? Can I control it? Can I turn it off?
>>> Because they're no longer needed.
>>>
>>> If you want to keep those files, make the archive_command not lie.
>>
>> Normally posgres will rename the old WAL files that have been archived and are no longer needed,
>> keeping the number of WAL files constant. In this case, it actually deletes them.
>> Why is the behaviour different?
>
> Renaming files is done because the files will be reused in the future
> under the new name.  However, after a long archiver failure, new files
> need to be created to hold the extra data.  When the archiver is
> restored, those excess files can be deleted because they're not needed
> for recycling.  (The number of files to keep for recycling is a function
> of checkpoint_segments.)

So it looks like postgresql will try to keep 2.5 * checkpoint_segments files,
and if it has more that have been reported as archived, then it will
start removing them.

Does this sound correct?

--
Evan Rempel