Обсуждение: AW: Proposal: More flexible backup/restore via pg_dump

Поиск
Список
Период
Сортировка

AW: Proposal: More flexible backup/restore via pg_dump

От
Zeugswetter Andreas SB
Дата:
> Maybe; I know BackupExec also does some kind of seek to 
> update the TOC at
> end of a backup (which is what I need to do). Then again, 
> maybe that's just
> a rewind. I don't want to get into custom tape formats...
> 
> Do we have any tape experts out there?

Dont lock yourself in on the tape issue, it is the pipes that 
actually add value to the utility, and those can't rewind, seek
or whatever.

pipes can:compress, split output, write to storage managers, stripe output,
.....

I guess we would want two formats, one for pipe, and one for a standard
directory.

Andreas


Re: AW: Proposal: More flexible backup/restore via pg_dump

От
Tom Lane
Дата:
Zeugswetter Andreas SB <ZeugswetterA@wien.spardat.at> writes:
> pipes can:
>     compress, split output, write to storage managers, stripe output,
> .....

Right, the thing we *really* want is to preserve the fact that pg_dump
can write its output to a pipeline ... and that a restore can read from
one.  If you can improve performance when you find you do have a
seekable source/destination file, fine, but the utilities must NOT
require it.

> I guess we would want two formats, one for pipe, and one for a standard
> directory.

At the risk of becoming tiresome, "tar" format is eminently pipeable...
        regards, tom lane


Re: AW: Proposal: More flexible backup/restore via pg_dump

От
Philip Warner
Дата:
At 10:48 27/06/00 -0400, Tom Lane wrote:
>
>Right, the thing we *really* want is to preserve the fact that pg_dump
>can write its output to a pipeline ... and that a restore can read from
>one.  If you can improve performance when you find you do have a
>seekable source/destination file, fine, but the utilities must NOT
>require it.

OK, the limitation will have to be that reordering of *data* loads (as
opposed to metadata) will not be possible in piped data. This is only a
problem if RI constraints are loaded.

I *could* dump the compressed data to /tmp, but I would guess that in most
cases when the archive file is being piped it's because the file won't fit
on a local disk.

Does this sound reasonable?


>> I guess we would want two formats, one for pipe, and one for a standard
>> directory.
>
>At the risk of becoming tiresome, "tar" format is eminently pipeable...
>

No, it's good...I'll never feel guilty about asking for optimizer hints again.

More seriously, though, if I pipe a tar file, I still can't reorder the
*data* files without saving them to disk, which is what I want to avoid.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: AW: Proposal: More flexible backup/restore via pg_dump

От
Tom Lane
Дата:
Philip Warner <pjw@rhyme.com.au> writes:
> More seriously, though, if I pipe a tar file, I still can't reorder the
> *data* files without saving them to disk, which is what I want to avoid.

True.  This is not an issue on the dump side, of course, since you can
choose what order you're going to write the tables in.  On the restore
side, you have no alternative but to restore the tables in the order
they appear on tape.  Of course the DBA can run the restore utility
more than once and extract a subset of tables each time, but I don't
see how the restore utility can be expected to do that for him.
(Except if it finds it does have the ability to seek in its input file,
but I dunno if it's a good idea to use that case for anything except
under-the-hood performance improvement, ie quickly skipping over the
data you don't need.  Features that don't work all the time are not
good in my book.)

Basically I think we want to assume that pg_dump will write the tables
in an order that's OK for restoring.  If we can arrange for RI checks
not to be installed until after all the data is loaded, this shouldn't
be a big problem, seems like.
        regards, tom lane


Re: AW: Proposal: More flexible backup/restore via pg_dump

От
Philip Warner
Дата:
At 11:23 27/06/00 -0400, Tom Lane wrote:
>Of course the DBA can run the restore utility
>more than once and extract a subset of tables each time, but I don't
>see how the restore utility can be expected to do that for him.

Only works with seek (ie. a file).


>Features that don't work all the time are not
>good in my book.)

The *only* bit that won't work is being able to select the table data load
order, and I can fix that by writing tables that are wanted later to /tmp
if seek is unavailable. This *may* not be a problem, and probably should be
presented as an option to the user if restoring from non-seekable media.
Assuming that the backup was originally written to seekable media, I will
be able to tell the user how much space will be required, which should help.

I don't suppose anyone knows of a way of telling if a file handle supports
seek?


>Basically I think we want to assume that pg_dump will write the tables
>in an order that's OK for restoring.  If we can arrange for RI checks
>not to be installed until after all the data is loaded, this shouldn't
>be a big problem, seems like.

Part of the motivation for this utility was to allow DBAs to fix the
ordering at restore time, but otherwise I totally agree. Unfortunately I
don't think the RI checks can be delayed at this stage - can they? 

I don't suppose there is a 'disable constraints' command? Or the ability to
set all constraints as deferrred until commit-time?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: AW: Proposal: More flexible backup/restore via pg_dump

От
Giles Lean
Дата:
> I don't suppose anyone knows of a way of telling if a file handle supports
> seek?

The traditional method is to call lseek() and see what happens.

> Part of the motivation for this utility was to allow DBAs to fix the
> ordering at restore time, but otherwise I totally agree. Unfortunately I
> don't think the RI checks can be delayed at this stage - can they? 

The current pg_dump handles the data and then adds the constraints.

Otherwise there are "chicken and egg" problems where two tables have
mutual RI constraints.  Even at the tuple level two tuples can be
mutually dependent.

Regards,

Giles




Re: AW: Proposal: More flexible backup/restore via pg_dump

От
Philip Warner
Дата:
At 08:48 28/06/00 +1000, Giles Lean wrote:
>
>> Part of the motivation for this utility was to allow DBAs to fix the
>> ordering at restore time, but otherwise I totally agree. Unfortunately I
>> don't think the RI checks can be delayed at this stage - can they? 
>
>The current pg_dump handles the data and then adds the constraints.

Not as far as I can see; that's what I want to do, bu there is no
implemented syntax for doing it. pg_dump simply dumps the table definition
with constraints (at least on 7.0.2).

>Otherwise there are "chicken and egg" problems where two tables have
>mutual RI constraints.  Even at the tuple level two tuples can be
>mutually dependent.

Absolutely. And AFAICT, these happen with pg_dump.


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: AW: Proposal: More flexible backup/restore via pg_dump

От
"Stephan Szabo"
Дата:
> At 08:48 28/06/00 +1000, Giles Lean wrote:
> >Otherwise there are "chicken and egg" problems where two tables have
> >mutual RI constraints.  Even at the tuple level two tuples can be
> >mutually dependent.
>
> Absolutely. And AFAICT, these happen with pg_dump.

This will happen for check constraints, but not for foreign key
constraints...
It actually adds the fk constraints later with CREATE CONSTRAINT TRIGGER
after the data dump is finished. And, if you do separate schema and data
dumps, the wierd statements at the top and bottom of the data dump turn
off triggers and then turn them on again (in the most painful way possible).
However those cases do not actually guarantee the validity of the data in
between.



Re: AW: Proposal: More flexible backup/restore via pg_dump

От
Philip Warner
Дата:
At 12:06 28/06/00 -0700, Stephan Szabo wrote:
>
>This will happen for check constraints, but not for foreign key
>constraints...
>It actually adds the fk constraints later with CREATE CONSTRAINT TRIGGER
>after the data dump is finished. And, if you do separate schema and data
>dumps, the wierd statements at the top and bottom of the data dump turn
>off triggers and then turn them on again (in the most painful way possible).

Thanks for this information!

I had not seen those statements before; I have been mistakenly modifying
6.5.3 sources, not 7.0.2. I will incorporate them in my work. Is there any
way of also disabling all constraint checking while loading the data?


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: AW: Proposal: More flexible backup/restore via pg_dump

От
"Stephan Szabo"
Дата:
I think my previous message went dead...

> >This will happen for check constraints, but not for foreign key
> >constraints...
> >It actually adds the fk constraints later with CREATE CONSTRAINT TRIGGER
> >after the data dump is finished. And, if you do separate schema and data
> >dumps, the wierd statements at the top and bottom of the data dump turn
> >off triggers and then turn them on again (in the most painful way
possible).
>
> Thanks for this information!
>
> I had not seen those statements before; I have been mistakenly modifying
> 6.5.3 sources, not 7.0.2. I will incorporate them in my work. Is there any
> way of also disabling all constraint checking while loading the data?

Well, for unique you could remove/recreate the unique index.  NOT NULL is
probably
not worth bothering with.  Check constraints might be able to be turned off
for a table
by setting relchecks to 0 in the pg_class row and the resetting it after
data is loaded (sort
of like what we do on data only dumps for triggers).

The problem is that the create constraint trigger, playing with reltriggers
and playing with
relchecks doesn't guarantee that the data being loaded is correct.  And if
you remove
and recreate a unique index, you might not get the index back at the end,
and then
you've lost the information that there was supposed to be a unique or
primary key on
the table.

It might be a good idea to have some sort of pg_constraint (or whatever)
that holds
this data, since that would also make it easier to make the constraint
naming SQL compliant
(no duplicate constraint names within schema - that includes automatically
generated ones),
and it might help if we ever try to make deferrable check/primary key/etc...



Re: AW: Proposal: More flexible backup/restore via pg_dump

От
Tom Lane
Дата:
"Stephan Szabo" <sszabo@kick.com> writes:
>> I had not seen those statements before; I have been mistakenly modifying
>> 6.5.3 sources, not 7.0.2. I will incorporate them in my work. Is there any
>> way of also disabling all constraint checking while loading the data?

> Well, for unique you could remove/recreate the unique index.  NOT NULL
> is probably not worth bothering with.  Check constraints might be able
> to be turned off for a table by setting relchecks to 0 in the pg_class
> row and the resetting it after data is loaded (sort of like what we do
> on data only dumps for triggers).

There's no need to disable NOT NULL, nor unique constraints either,
since those are purely local to a table --- if they're going to fail,
altering load order doesn't prevent it.  The things you need to worry
about are constraint expressions that cause references to other tables
(perhaps indirectly via a function).

If we had ALTER TABLE ADD CONSTRAINT then the problem would be largely
solved, I believe.  This should be a minor exercise --- the heavy
lifting is already done, because heap.c's AddRelationRawConstraints()
is already set up to be invokable on a pre-existing relation.  Also
the parser knows how to parse ALTER TABLE ADD CONSTRAINT ... I think
all that's missing is a few lines of glue code in command.c.
        regards, tom lane


Re: AW: Proposal: More flexible backup/restore via pg_dump

От
"Stephan Szabo"
Дата:
> There's no need to disable NOT NULL, nor unique constraints either,
> since those are purely local to a table --- if they're going to fail,
> altering load order doesn't prevent it.  The things you need to worry
Is there a speed difference with doing a copy on a table with an index
versus creating
the index at the end?  I've been assuming that the latter was faster (and
that that was
part of what he wanted)

> about are constraint expressions that cause references to other tables
> (perhaps indirectly via a function).
Yeah, that's actually a big problem, since that's actually also a constraint
on the other table
as well, and as far as I know, we aren't yet constraining the other table.

> If we had ALTER TABLE ADD CONSTRAINT then the problem would be largely
> solved, I believe.  This should be a minor exercise --- the heavy
> lifting is already done, because heap.c's AddRelationRawConstraints()
> is already set up to be invokable on a pre-existing relation.  Also
> the parser knows how to parse ALTER TABLE ADD CONSTRAINT ... I think
> all that's missing is a few lines of glue code in command.c.

Does the AddRelationRawConstraints() check that the constraint is satisified
as well when
you add it?   It didn't look like it did, but I could be missing something.
That's another requirement of ALTER TABLE ADD CONSTRAINT.  That was the
bit I wasn't sure how to do for other generic constraints when I added the
foreign key one.




Re: AW: Proposal: More flexible backup/restore via pg_dump

От
Tom Lane
Дата:
"Stephan Szabo" <sszabo@kick.com> writes:
>> If we had ALTER TABLE ADD CONSTRAINT then the problem would be largely
>> solved, I believe.  This should be a minor exercise --- the heavy
>> lifting is already done, because heap.c's AddRelationRawConstraints()
>> is already set up to be invokable on a pre-existing relation.

> Does the AddRelationRawConstraints() check that the constraint is
> satisified as well when you add it?  It didn't look like it did, but I
> could be missing something.

Oh, you're right, it does not.  So you'd first have to run through the
table and verify that the constraint holds for each existing tuple.
Doesn't seem like a big deal though...
        regards, tom lane


Re: AW: Proposal: More flexible backup/restore via pg_dump

От
Philip Warner
Дата:
At 20:30 29/06/00 -0400, Tom Lane wrote:
>
>Oh, you're right, it does not.  So you'd first have to run through the
>table and verify that the constraint holds for each existing tuple.
>Doesn't seem like a big deal though...
>

Does this mean somebody is likely to do it? It'd certainly make
backup/restore more reliable.

I'm almost at the point of asking for testers with the revised
pg_dump/pg_restore, so I'll go with what I have for now, but it would make
life a lot less messy. Since the new version *allows* table restoration
intermixed with metadata, and in any order, I need to update pg_class
repeatedly (I assume there may be system triggers that need to execute when
metadata is changed).


----------------------------------------------------------------
Philip Warner                    |     __---_____
Albatross Consulting Pty. Ltd.   |----/       -  \
(A.C.N. 008 659 498)             |          /(@)   ______---_
Tel: (+61) 0500 83 82 81         |                 _________  \
Fax: (+61) 0500 83 82 82         |                 ___________ |
Http://www.rhyme.com.au          |                /           \|                                |    --________--
PGP key available upon request,  |  /
and from pgp5.ai.mit.edu:11371   |/


Re: AW: Proposal: More flexible backup/restore via pg_dump

От
"Stephan Szabo"
Дата:
> At 20:30 29/06/00 -0400, Tom Lane wrote:
> >
> >Oh, you're right, it does not.  So you'd first have to run through the
> >table and verify that the constraint holds for each existing tuple.
> >Doesn't seem like a big deal though...
> >
> 
> Does this mean somebody is likely to do it? It'd certainly make
> backup/restore more reliable.

I'll take a stab at it. It might take me a while to get stuff working
but it shouldn't take too long before the beginnings are there.