Re: Missing important information in backup.sgml
От | Gunnar \"Nick\" Bluth |
---|---|
Тема | Re: Missing important information in backup.sgml |
Дата | |
Msg-id | cd880db6-1ff6-8fa7-4868-249421cd6fb2@pro-open.de обсуждение исходный текст |
Ответ на | Re: Missing important information in backup.sgml ("Gunnar \"Nick\" Bluth" <gunnar.bluth@pro-open.de>) |
Ответы |
Re: Missing important information in backup.sgml
(Kevin Grittner <kgrittn@gmail.com>)
|
Список | pgsql-docs |
Am 16.11.2016 um 22:07 schrieb Gunnar "Nick" Bluth: > Am 16.11.2016 um 15:36 schrieb Stephen Frost: >> Gunnar, all, >> >> * Gunnar "Nick" Bluth (gunnar.bluth.extern@elster.de) wrote: >>> Am 16.11.2016 um 11:37 schrieb Gunnar "Nick" Bluth: >>>> I ran into this issue (see patch) a few times over the past years, and >>>> tend to forget it again (sigh!). Today I had to clean up a few hundred >>>> GB of unarchived WALs, so I decided to write a patch for the >>>> documentation this time. >>> >>> Uhm, well, the actual problem was a stale replication slot... and >>> tomatoes on my eyes, it seems ;-/. Ashes etc.! >>> >>> However, I still think a warning on (esp. rsync's) RCs >= 128 is worth >>> considering (see -v2 attached). >> >> Frankly, I wouldn't suggest including such wording as it would imply >> that using a bare rsync command is an acceptable configuration of >> archive_command. It isn't. At the very least, a bare rsync does >> nothing to ensure that the WAL has been fsync'd to permanent storage >> before returning, leading to potential data loss due to the WAL >> segment being removed by PG before the new segment has been permanently >> stored. > > I for myself deem a UPS-backed server in a different DC a pretty good > starting point, and I reckon many others do as well... obviously it's > not a belt and bracers solution, but my guess would be that > 90% of > users have something similar in place, many of them actually using rsync > (or scp) one way or the other (or, think WAL-E et. al., how do you force > an fsync on AWS?!?). > In environments where there's a risk of the WAL segment being > overwritten before that target server has fsync'd, heck, yeah, you're > right. But then you'd probably have something quite sophisticated in > place, and hate to see allegedly random _FATAL_ errors that are _not > documented outside the source code_ even more. Esp. when you can't tell > for sure (from the docs) if archiving your WAL segment will be retried > or not. > >> The PG documentation around archive command is, at best, a starting >> point for individuals who wish to implement their own proper backup >> solution, not as examples of good practice for production environments. > > True. Which doesn't mean there's no room for more hints, like "ok, we > throw a FATAL error sometimes, but they're not really a problem, you > know, it's just external software that basically everyone uses at one > point or the other doing odd things sometimes" ;-). > > Alas, I've been hunting a red herring today, cause when you find your > pg_xlog cluttered with old files _and_ see FATAL archiving messages in > your logs, your first thought is not "there's prolly a replication slot > left over", but "uh oh, those archive_command calls failed, so something > might be somehow stuck now". > > I'll try to come up with something more comprehensive, taking your > comments into account... So, attached is what I came up with. It's obviously not "complete", however it points out the RC >= 128 "quirk" and also mentions Stephen's remarks on rsync (although to get actual _data loss_, you'd have to have a power outage in the DC caused by your PG server exploding... ;-). Cheers, -- Gunnar "Nick" Bluth RHCE/SCLA Mobil +49 172 8853339 Email: gunnar.bluth@pro-open.de _____________________________________________________________ In 1984 mainstream users were choosing VMS over UNIX. Ten years later they are choosing Windows over UNIX. What part of that message aren't you getting? - Tom Payne
Вложения
В списке pgsql-docs по дате отправления: