Re: Missing important information in backup.sgml

Поиск
Список
Период
Сортировка
От Gunnar \"Nick\" Bluth
Тема Re: Missing important information in backup.sgml
Дата
Msg-id 80e18b98-9b72-ddee-2a34-76302e5d8a0b@pro-open.de
обсуждение исходный текст
Ответ на Re: Missing important information in backup.sgml  (Stephen Frost <sfrost@snowman.net>)
Ответы Re: Missing important information in backup.sgml  ("Gunnar \"Nick\" Bluth" <gunnar.bluth@pro-open.de>)
Список pgsql-docs
Am 16.11.2016 um 15:36 schrieb Stephen Frost:
> Gunnar, all,
>
> * Gunnar "Nick" Bluth (gunnar.bluth.extern@elster.de) wrote:
>> Am 16.11.2016 um 11:37 schrieb Gunnar "Nick" Bluth:
>>> I ran into this issue (see patch) a few times over the past years, and
>>> tend to forget it again (sigh!). Today I had to clean up a few hundred
>>> GB of unarchived WALs, so I decided to write a patch for the
>>> documentation this time.
>>
>> Uhm, well, the actual problem was a stale replication slot... and
>> tomatoes on my eyes, it seems ;-/. Ashes etc.!
>>
>> However, I still think a warning on (esp. rsync's) RCs >= 128 is worth
>> considering (see -v2 attached).
>
> Frankly, I wouldn't suggest including such wording as it would imply
> that using a bare rsync command is an acceptable configuration of
> archive_command.  It isn't.  At the very least, a bare rsync does
> nothing to ensure that the WAL has been fsync'd to permanent storage
> before returning, leading to potential data loss due to the WAL
> segment being removed by PG before the new segment has been permanently
> stored.

I for myself deem a UPS-backed server in a different DC a pretty good
starting point, and I reckon many others do as well... obviously it's
not a belt and bracers solution, but my guess would be that > 90% of
users have something similar in place, many of them actually using rsync
(or scp) one way or the other (or, think WAL-E et. al., how do you force
an fsync on AWS?!?).
In environments where there's a risk of the WAL segment being
overwritten before that target server has fsync'd, heck, yeah, you're
right. But then you'd probably have something quite sophisticated in
place, and hate to see allegedly random _FATAL_ errors that are _not
documented outside the source code_ even more. Esp. when you can't tell
for sure (from the docs) if archiving your WAL segment will be retried
or not.

> The PG documentation around archive command is, at best, a starting
> point for individuals who wish to implement their own proper backup
> solution, not as examples of good practice for production environments.

True. Which doesn't mean there's no room for more hints, like "ok, we
throw a FATAL error sometimes, but they're not really a problem, you
know, it's just external software that basically everyone uses at one
point or the other doing odd things sometimes" ;-).

Alas, I've been hunting a red herring today, cause when you find your
pg_xlog cluttered with old files _and_ see FATAL archiving messages in
your logs, your first thought is not "there's prolly a replication slot
left over", but "uh oh, those archive_command calls failed, so something
might be somehow stuck now".

I'll try to come up with something more comprehensive, taking your
comments into account...

> Thanks!
>
> Stephen

Thank you for considering this! ;-)

Cheers,
--
Gunnar "Nick" Bluth
RHCE/SCLA

Mobil +49 172 8853339
Email: gunnar.bluth@pro-open.de
_____________________________________________________________
In 1984 mainstream users were choosing VMS over UNIX.
Ten years later they are choosing Windows over UNIX.
What part of that message aren't you getting? - Tom Payne


Вложения

В списке pgsql-docs по дате отправления:

Предыдущее
От: Jürgen Purtz
Дата:
Сообщение: Re: Docbook 5.x
Следующее
От: lr@pcorp.us
Дата:
Сообщение: PL Language list seems a bit dated