Re: hot backups: am I doing it wrong, or do we have a problem with pg_clog?

Поиск
Список
Период
Сортировка
От Merlin Moncure
Тема Re: hot backups: am I doing it wrong, or do we have a problem with pg_clog?
Дата
Msg-id BANLkTikKjTNwx+0uGHMcjDFVPMdKHxGgPA@mail.gmail.com
обсуждение исходный текст
Ответ на hot backups: am I doing it wrong, or do we have a problem with pg_clog?  (Daniel Farina <daniel@heroku.com>)
Список pgsql-hackers
On Thu, Apr 21, 2011 at 6:15 AM, Daniel Farina <daniel@heroku.com> wrote:
> To start at the end of this story: "DETAIL:  Could not read from file
> "pg_clog/007D" at offset 65536: Success."
>
> This is a message we received on a a standby that we were bringing
> online as part of a test.  The clog file was present, but apparently
> too small for Postgres (or at least I tihnk this is what the message
> meant), so one could stub in another clog file and then continue
> recovery successfully (modulus the voodoo of stubbing in clog files in
> general).  I am unsure if this is due to an interesting race condition
> in Postgres or a result of my somewhat-interesting hot-backup
> protocol, which is slightly more involved than the norm.  I will
> describe what it does here:
>
> 1) Call pg start backup
> 2) crawl the entire postgres cluster directory structure, except
> pg_xlog, taking notes of the size of every file present
> 3) begin writing TAR files, but *only up to the size noted during the
> original crawling of the cluster directory,* so if the file grows
> between the original snapshot and subsequently actually calling read()
> on the file those extra bytes will not be added to the TAR.
>  3a) If a file is truncated partially, I add "\0" bytes to pad the
> tarfile member up to the size sampled in step 2, as I am streaming the
> tar file and cannot go back in the stream and adjust the tarfile
> member size
> 4) call pg stop backup
>
> The reason I go to this trouble is because I use many completely
> disjoint tar files to do parallel compression, decompression,
> uploading, and downloading of the base backup of the database, and I
> want to be able to control the size of these files up-front.  The
> requirement of stubbing in \0 is because of a limitation of the tar
> format when dealing with streaming archives and the requirement to
> truncate the files to the size snapshotted in the step 2 is to enable
> splitting up the files between volumes even in the presence of
> possible concurrent growth while I'm performing the hot backup. (ex: a
> handful of nearly-empty heap files can rapidly grow due to a
> concurrent bulk load if I get unlucky, which I do not intend to allow
> myself to be).
>
> Any ideas?  Or does it sound like I'm making some bookkeeping errors
> and should review my code again?  It does work most of the time.  I
> have not gotten a sense how often this reproduces just yet.

Everyone here is going to assume the problem is in your (too?) fancy
tar/diff delta archiving approach because we can't see that code and
it just sounds suspicious.  A busted clog file is of course very
noteworthy but to eliminate your stuff you should try reproducing
using a more standard method of grabbing the base backup.

Have you considered using rsync instead?

merlin


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Kevin Grittner"
Дата:
Сообщение: Re: Formatting Curmudgeons WAS: MMAP Buffers
Следующее
От: Tom Lane
Дата:
Сообщение: Re: smallserial / serial2