Re: Sample archive_command is still problematic

Поиск
Список
Период
Сортировка
От Kevin Grittner
Тема Re: Sample archive_command is still problematic
Дата
Msg-id 1407782188.34521.YahooMailNeo@web122305.mail.ne1.yahoo.com
обсуждение исходный текст
Ответ на Re: Sample archive_command is still problematic  (Josh Berkus <josh@agliodbs.com>)
Ответы Re: Sample archive_command is still problematic
Список pgsql-docs
Josh Berkus <josh@agliodbs.com> wrote:
> On 08/11/2014 10:21 AM, Kevin Grittner wrote:
>>> Is there some good reason why "test ! -f" was added to the
>>> sample?
>>
>> In an environment with more than one cluster archiving, it is
>> otherwise way too easy to copy a config file and have the WAL files
>> of the two systems overwriting one another.  I consider a check for
>> an already existing file on the target to be very good practice.
>> The errors in the log are a clue that something went wrong, and
>> gives you a chance to fix things without data loss.
>
> It depends on what you're guarding against.  In the case I was dealing
> with, the master crashed in the middle of an archive write.  As a
> result, the file existed, but was incomplete, and *needed* to be
> overwritten.  But because of 'test -f' archiving just kept failing.

I've seen that happen, too.  It's just that the script I used sent
an email to the DBAs when that happened, so the problem was quickly
investigated and resolved.  Also, our monitoring "big board" set an
"LED" to red if we went an hour without a new WAL appearing in the
archive directory.  IMV the archiving script should ensure there is
no data loss, and you should have monitoring or alert systems in
place to know when things stall.

>> The problem with the recommended command is that cp is not atomic.
>> The file can be read before the contents are materialized, causing
>> early end to recovery.  I have seen it happen.  The right way to do
>> this is to copy to a different name or directory and mv the file
>> into place once it is complete -- or use software which does that
>> automatically, like rsync does.
>
> Yeah, realistically, I think we need to start supplying a script or two
> in /contrib and referencing that.  I'm not sure how to make it work for
> the Windows users though.

That might work.  We should do something, though.  The example we
give in the docs is not production quality IMO, and is something of
an embarrassment.  The problem is, it may be hard to get agreement
on what that should look like.  As a DBA, I insisted on the check
for an existing file.  I also insisted on having scripts send an
email to the DBAs on the first occurrence of a failure (but not to
spam us on each and every failed attempt).

--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


В списке pgsql-docs по дате отправления:

Предыдущее
От: Josh Berkus
Дата:
Сообщение: Re: Sample archive_command is still problematic
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Sample archive_command is still problematic