Обсуждение: RE: Proposal: More flexible backup/restore via pg_dump

Поиск
Список
Период
Сортировка

RE: Proposal: More flexible backup/restore via pg_dump

От
Philip Warner
Дата:
At 10:55 27/06/00 +0100, Peter Mount wrote:
>comments prefixed with PM...
>
>PM: So can most other Unix based formats. On the intranet server here, I
>pg_dump into /tmp, then include them in a tar piped to the tape drive.

Since you are using an intermediate file, there would be no change. fseek
is only an issue on tape drives and pipes, not files on disk. I suspect
that most people can afford the overhead of restoring the backup file to
disk before restoring the database, but it'd be nice to start out as
flexible as possible. In the back of my mind is the fact that when the WAL
and storage manager are going, raw data backups should be possible (and
fast) - but then again, maybe it's a pipe dream.


>PM: The problem with blobs hasn't been with dumping them (I have some Java
>code that does it into a compressed zip file), but restoring them - you
>can't create a blob with a specific OID, so any references in existing
>tables will break. I currently get round it by updating the tables after the
>restore - but it's ugly and easy to break :(

I assumed this would have to happen - hence why it will not be in the first
version. With all the TOAST stuff coming, and talk about the storage
manager, I still live in hope of a better BLOB system...


>PM: Having a set of api's (either accessible directly into the backend,
>and/or via some fastpath call) would be useful indeed.

By this I assume you mean APIs to get to database data, not backup data.


>
>This is probably an issue. One of the motivations for this utility it to
>allow partial restores (eg. table data for one table only), and
>arbitrarilly ordered restores. But I may have a solution:
>
>PM: That would be useful. I don't know about CPIO, but tar stores the TOC at
>the start of each file (so you can actually join two tar files together and
>still read all the files). In this way, you could put the table name as the
>"filename" in the header, so partial restores could be done.

Well, the way my plans work, I'll use either a munged OID, or a arbitrary
unique ID as the file name. All meaningful access has to go via the TOC.
But that's just a detail; the basic idea is what I'm implementing. 

It's very tempting to say tape restores are only possible in the order in
which the backup file was written ('pg_restore --stream' ?), and that
reordering/selection is only possible if you put the file on disk.


>
>PM: How about IOCTL's? I know that ArcServe on both NT & Unixware can seek
>through the tape, so there must be a way of doing it.

Maybe; I know BackupExec also does some kind of seek to update the TOC at
end of a backup (which is what I need to do). Then again, maybe that's just
a rewind. I don't want to get into custom tape formats...

Do we have any tape experts out there?


>PM: Tar can do this sequentially, which I've had to do many times over the
>years - restoring just one file from a tape, sequential access is probably
>the only way.

It's just nasty when the user reorders the restoration of tables and
metadata. In the worst cast it might be hundreds of scans of the tape. I'd
hate to have my name associated with something so unfriendly (even if it is
the operators fault).


>PM: The tar spec should be around somewhere - just be careful, the obvious
>source I was thinking of would be GPL'd, and we don't want to be poluted :-)

That was my problem. I've got some references now, and I'll look at them.
At least everything I've written so far can be put in PG.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: Proposal: More flexible backup/restore via pg_dump

От
Giles Lean
Дата:
> Do we have any tape experts out there?

Don't even try.  The guaranteed portable subset of tape drive,
interface, device driver, and OS semantics is pretty limited.

I'm confident to stream one (tape) file of less than one tape capacity
to a drive and read it back sequentially.  These days you can probably
expect reliable end of media handling as well, but don't be too sure
what value errno will have when you do hit the end of a tape.

As soon as you start looking to deal with more advanced facilities you
will discover portability problems:

- autochanger interface differences
- head positioning on tapes with multiple files  (SysV v. BSD, anyone?)
- random access (yup, supported on some drives  ... probably all obsolete)
- "fast search marks" and similar

Some of these things can vary on the one OS if a tape drive is
connected to different interfaces, since different drivers may be
used.

BTW I'm not a tape expert.  The problems in practice may be greater or
lesser than I've suggested.

I would be trying really hard to work out a backup format that allows
a one pass restore.  Rummaging around in the database when making the
backup and using some scratch space at that time isn't too bad.  Using
scratch space upon restore is more problematic; restore problems are
traditionally found at the worst possible moment!

Regards,

Giles