On Thu, Sep 29, 2005 at 12:50:13AM +0300, Hannu Krosing wrote:
> On T, 2005-09-27 at 17:57 -0500, Jim C. Nasby wrote:
> > On Tue, Sep 27, 2005 at 02:47:46PM -0400, Jan Wieck wrote:
> > > On 9/24/2005 8:17 PM, Jim C. Nasby wrote:
> > >
> > > >Would it be difficult to vacuum as part of a dump? The reasoning behind
> > > >this is that you have to read the table to do the dump anyway,
> > >
> > > I think aside from what's been said so far, it would be rather difficult
> > > anyway. pg_dump relies on MVCC and requires to run in one transaction to
> > > see a consistent snapshot while vacuum jiggles around with transactions
> > > in some rather non-standard way.
> >
> > Is this true even if they were in different connections?
> >
> > My (vague) understanding of the vacuum process is that it first vacuums
> > indexes, and then vacuums the heap.
>
> actually (lazy) vacuum does this
>
> 1) scan heap, collect ctids of rows to remove
> 2) clean indexes
> 3) clean heap
>
> > Since we don't dump indexes, there's
> > nothing for backup to do while those are vacuumed, so my idea is:
> >
> > pg_dump:
> > foreach (table)
> > spawn vacuum
> > wait for vacuum to hit heap
> > start copy
> > wait for analyze to finish
> > next;
>
> probably the first heap scan of vacuum would go faster than dump as it
> does not have to write out anything, and the second scan ( nr 3 in above
> list ) would be either faster or slower, as it has to lock each page and
> rearrange tuples there.
>
> so it would be very hard to synchronize vacuum with either of them.
Well, I guess it depends on what the dump was writing to. Also depends
on available cache I expect.
Is this something that could be hacked together fairly easy just for
testing purposes? Would firing off a VACUUM tablename at the same time
as a COPY tablename be a good enough approximation?
--
Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461