Обсуждение: Vacuum questions...

Поиск
Список
Период
Сортировка

Vacuum questions...

От
"Jim C. Nasby"
Дата:
Would it be difficult to vacuum as part of a dump? The reasoning behind
this is that you have to read the table to do the dump anyway, so it
would be a good time to be able to piggy-back other operations that need
to read the entire table on top. I know vacuuming of indexes complicates
this, so it's probably not as simple as just firing off a vacuum and
copy at the same time (although that idea is probably worth testing,
since it might still be a win).

When dropping a table or index, is it's space immediately released in
the FSM?

Also, would it be possible to add some means to check the status of a
running vacuum? Even with vacuum verbose, once it starts in on a large
table you have no way to know how far along it is.

Finally, if vacuum_delay is enabled, does vacuum_cost_page_miss consider
a miss as not in the database buffer, or not in the kernel buffer? I
remember discussions about trying to track IO request times to try and
determine if something came out of kernel buffers or not, but AFAIK
that's all vaporware right now...
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Vacuum questions...

От
"Joshua D. Drake"
Дата:
Jim C. Nasby wrote:

>Would it be difficult to vacuum as part of a dump? The reasoning behind
>this is that you have to read the table to do the dump anyway, so it
>would be a good time to be able to piggy-back other operations that need
>to read the entire table on top. I know vacuuming of indexes complicates
>this, so it's probably not as simple as just firing off a vacuum and
>copy at the same time (although that idea is probably worth testing,
>since it might still be a win).
>  
>
This would be a nightmare on a large database. Think of how
long it takes to dump 20 gig, now add how long it is going to
take to vacuum that size of DB, now think about a 500 gig
database.

Actually this also probably would not gain you much in 8.1
as vacuum in theory is already dealing with itself.

>When dropping a table or index, is it's space immediately released in
>the FSM?
>  
>
I would have to double check but I believe you would have to
vacuum to reclaim the space to the FSM because the relationship
is still there just like when you delete (but not truncate).

>Also, would it be possible to add some means to check the status of a
>running vacuum? Even with vacuum verbose, once it starts in on a large
>table you have no way to know how far along it is.
>  
>
That is an interesting thought... Perhaps a quick scan of
the table to see how many dead rows there are? Then check
back every n/10 ? Hmmm... I am not a C guy so I don't know if
that is technically feasible (although probably possible) but it
is interesting from a DBA perspective.

Although that could be an issue on a large table as well I think.

>Finally, if vacuum_delay is enabled, does vacuum_cost_page_miss consider
>a miss as not in the database buffer, or not in the kernel buffer? I
>remember discussions about trying to track IO request times to try and
>determine if something came out of kernel buffers or not, but AFAIK
>that's all vaporware right now...
>  
>
Good question, anyone else?

Sincerely,

Joshua D. Drake





-- 
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/



Re: Vacuum questions...

От
Alvaro Herrera
Дата:
On Sat, Sep 24, 2005 at 07:17:38PM -0500, Jim C. Nasby wrote:

> Finally, if vacuum_delay is enabled, does vacuum_cost_page_miss consider
> a miss as not in the database buffer, or not in the kernel buffer?

The database buffer.

> I
> remember discussions about trying to track IO request times to try and
> determine if something came out of kernel buffers or not, but AFAIK
> that's all vaporware right now...

I don't remember the discussion, but it certainly hasn't been
implemented.

-- 
Alvaro Herrera                                http://www.PlanetPostgreSQL.org
"Escucha y olvidarás; ve y recordarás; haz y entenderás" (Confucio)


Re: Vacuum questions...

От
Hannu Krosing
Дата:
On L, 2005-09-24 at 20:25 -0700, Joshua D. Drake wrote:

> Actually this also probably would not gain you much in 8.1
> as vacuum in theory is already dealing with itself.

Interesting. Could you explain it in a more detailed way ?
How does vacuum "deal with itself" in 8.1 ?

> >Also, would it be possible to add some means to check the status of a
> >running vacuum? Even with vacuum verbose, once it starts in on a large
> >table you have no way to know how far along it is.
> >  
> >
> That is an interesting thought... Perhaps a quick scan of
> the table to see how many dead rows there are? Then check
> back every n/10 ? Hmmm... I am not a C guy so I don't know if
> that is technically feasible (although probably possible) but it
> is interesting from a DBA perspective.

Not sure of a "quick scan" approach, espacially for tables big enough
for the progress info would be interesting (in my experience a scan is
never quick).

Perhaps VACUUM could send some statistics after each N pages and this
would then be available through something similar to pg_statistics
table.

-- 
Hannu Krosing <hannu@skype.net>



Re: Vacuum questions...

От
"Joshua D. Drake"
Дата:
Hannu Krosing wrote:

>On L, 2005-09-24 at 20:25 -0700, Joshua D. Drake wrote:
>
>  
>
>>Actually this also probably would not gain you much in 8.1
>>as vacuum in theory is already dealing with itself.
>>    
>>
>
>Interesting. Could you explain it in a more detailed way ?
>How does vacuum "deal with itself" in 8.1 ?
>  
>
Autovacuum is integrated into the backend for 8.1

>Not sure of a "quick scan" approach, espacially for tables big enough
>for the progress info would be interesting (in my experience a scan is
>never quick).
>  
>
It would be a seq so on a larger table it would probably be
a long time. I was thinking if there was some mapping of
known dead rows or something so we didn't have to scan
the page for the statistics.

We of course would scan to do the actual work but if vacuum
cleared the map while doing the work it may not be that bad.

Sincerely,

Joshua D. Drake


>Perhaps VACUUM could send some statistics after each N pages and this
>would then be available through something similar to pg_statistics
>table.
>
>  
>


-- 
Your PostgreSQL solutions company - Command Prompt, Inc. 1.800.492.2240
PostgreSQL Replication, Consulting, Custom Programming, 24x7 support
Managed Services, Shared and Dedicated Hosting
Co-Authors: plPHP, plPerlNG - http://www.commandprompt.com/



Re: Vacuum questions...

От
Tom Lane
Дата:
Hannu Krosing <hannu@skype.net> writes:
> Perhaps VACUUM could send some statistics after each N pages and this
> would then be available through something similar to pg_statistics
> table.

Why not just have it send some text to be displayed in the "current
command" field of pg_stat_activity?  The infrastructure is all there
already for that.
        regards, tom lane


Re: Vacuum questions...

От
Gaetano Mendola
Дата:
Joshua D. Drake wrote:
> Hannu Krosing wrote:
> 
>> On L, 2005-09-24 at 20:25 -0700, Joshua D. Drake wrote:
>>
>>  
>>
>>> Actually this also probably would not gain you much in 8.1
>>> as vacuum in theory is already dealing with itself.
>>>   
>>
>> Interesting. Could you explain it in a more detailed way ?
>> How does vacuum "deal with itself" in 8.1 ?
>>  
>>
> Autovacuum is integrated into the backend for 8.1

Can I set the autovacuum parameter per table instead of per
engine ? I'm using pg_autovacuum right now in 7.4 and is not
enough because some tables ( one that implement a materialized
view for example ) are out of an average engine usage and other
tables are so huge to not be analyzed for months.....


Regards
Gaetano Mendola




Re: Vacuum questions...

От
Alvaro Herrera
Дата:
On Mon, Sep 26, 2005 at 05:41:24PM +0200, Gaetano Mendola wrote:
> Joshua D. Drake wrote:

> > Autovacuum is integrated into the backend for 8.1
> 
> Can I set the autovacuum parameter per table instead of per
> engine ?

Yes.

-- 
Alvaro Herrera                         Architect, http://www.EnterpriseDB.com
Jude: I wish humans laid eggs
Ringlord: Why would you want humans to lay eggs?
Jude: So I can eat them


Re: Vacuum questions...

От
"Jim C. Nasby"
Дата:
On Sat, Sep 24, 2005 at 08:25:30PM -0700, Joshua D. Drake wrote:
> Jim C. Nasby wrote:
> 
> >Would it be difficult to vacuum as part of a dump? The reasoning behind
> >this is that you have to read the table to do the dump anyway, so it
> >would be a good time to be able to piggy-back other operations that need
> >to read the entire table on top. I know vacuuming of indexes complicates
> >this, so it's probably not as simple as just firing off a vacuum and
> >copy at the same time (although that idea is probably worth testing,
> >since it might still be a win).
> > 
> >
> This would be a nightmare on a large database. Think of how
> long it takes to dump 20 gig, now add how long it is going to
> take to vacuum that size of DB, now think about a 500 gig
> database.

What says that the length of the dump is limited by the disks the
database is on? I suspect in many cases it's not.

Anyway, this would certainly be an optional step, so if it turns out it
hurts you in your environment, you don't have to use it.

> Actually this also probably would not gain you much in 8.1
> as vacuum in theory is already dealing with itself.

Maybe, maybe not. If you're already reading the entire table to dump it,
why read the entire table again some other time to vacuum it. Just get
it all done at once.

Whenever we get a 'map of pages that need vacuuming' which will
presumably greatly speed up vacuum then maybe your argument makes sense.
Right now I don't see how it helps anything.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Vacuum questions...

От
"Jim C. Nasby"
Дата:
On Sun, Sep 25, 2005 at 11:50:14AM -0400, Tom Lane wrote:
> Hannu Krosing <hannu@skype.net> writes:
> > Perhaps VACUUM could send some statistics after each N pages and this
> > would then be available through something similar to pg_statistics
> > table.
> 
> Why not just have it send some text to be displayed in the "current
> command" field of pg_stat_activity?  The infrastructure is all there
> already for that.

If someone wanted to write a nice interface showing the status of a
vacuum it would be easier if they didn't have to parse a text field...
but *anything* would be a vast improvement over what we have now.

newbie TODO?
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Vacuum questions...

От
Gaetano Mendola
Дата:
Alvaro Herrera wrote:
> On Mon, Sep 26, 2005 at 05:41:24PM +0200, Gaetano Mendola wrote:
>> Joshua D. Drake wrote:
> 
>>> Autovacuum is integrated into the backend for 8.1
>> Can I set the autovacuum parameter per table instead of per
>> engine ?
> 
> Yes.

Finally :-)

good work.

Regards
Gaetano Mendola


Re: Vacuum questions...

От
Gaetano Mendola
Дата:
Gaetano Mendola wrote:
> Alvaro Herrera wrote:
>> On Mon, Sep 26, 2005 at 05:41:24PM +0200, Gaetano Mendola wrote:
>>> Joshua D. Drake wrote:
>>>> Autovacuum is integrated into the backend for 8.1
>>> Can I set the autovacuum parameter per table instead of per
>>> engine ?
>> Yes.
> 

Reading the 8.1 release note I found:


Move /contrib/pg_autovacuum into the main server (Alvaro)
   Integrating autovacuum into the server allows it to be automatically started   and stopped in sync with the database
server,and allows autovacuum to be   configured from postgresql.conf.
 


May be it could be useles mention that was not exactly pg_autovacuum moved
because for example you can now set parameter per table and pg_autvacuum did not.


Regards
Gaetano Mendola




Re: Vacuum questions...

От
Jan Wieck
Дата:
On 9/24/2005 8:17 PM, Jim C. Nasby wrote:

> Would it be difficult to vacuum as part of a dump? The reasoning behind
> this is that you have to read the table to do the dump anyway, 

I think aside from what's been said so far, it would be rather difficult 
anyway. pg_dump relies on MVCC and requires to run in one transaction to 
see a consistent snapshot while vacuum jiggles around with transactions 
in some rather non-standard way.


Jan

-- 
#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#================================================== JanWieck@Yahoo.com #


Re: Vacuum questions...

От
"Jim C. Nasby"
Дата:
On Tue, Sep 27, 2005 at 02:47:46PM -0400, Jan Wieck wrote:
> On 9/24/2005 8:17 PM, Jim C. Nasby wrote:
> 
> >Would it be difficult to vacuum as part of a dump? The reasoning behind
> >this is that you have to read the table to do the dump anyway, 
> 
> I think aside from what's been said so far, it would be rather difficult 
> anyway. pg_dump relies on MVCC and requires to run in one transaction to 
> see a consistent snapshot while vacuum jiggles around with transactions 
> in some rather non-standard way.

Is this true even if they were in different connections?

My (vague) understanding of the vacuum process is that it first vacuums
indexes, and then vacuums the heap. Since we don't dump indexes, there's
nothing for backup to do while those are vacuumed, so my idea is:

pg_dump:
foreach (table)   spawn vacuum   wait for vacuum to hit heap   start copy   wait for analyze to finish
next;

dump_vacuum (table):
foreach (index on table)   vacuum index
next;

notify pg_dump we're going to start vacuum of heap
vacuum heap

if we should analyze {   analyze table
}

notify pg_dump analyze is done
exit

AFAIK, this should allow both to run in seperate transactions. Granted,
it would slow down the dump, since it would have to wait while indexes
were being vacuumed, but it would win when it came to the heap.
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Vacuum questions...

От
Tom Lane
Дата:
"Jim C. Nasby" <jnasby@pervasive.com> writes:
> AFAIK, this should allow both to run in seperate transactions.

... and pretty much destroy any synchronization between the two scans,
which was sort of the point wasn't it?
        regards, tom lane


Re: Vacuum questions...

От
"Jim C. Nasby"
Дата:
On Tue, Sep 27, 2005 at 07:12:21PM -0400, Tom Lane wrote:
> "Jim C. Nasby" <jnasby@pervasive.com> writes:
> > AFAIK, this should allow both to run in seperate transactions.
> 
> ... and pretty much destroy any synchronization between the two scans,
> which was sort of the point wasn't it?

Aren't there ways to sync them outside of a transaction? My theory is
that you don't need to syncronize them at the tuple level, since
whichever one gets ahead reading the HEAP will be pulling the data off
the drive, while the one that's behind will just grab it out of the
buffer (or at worst, the kernel's cache). So all you should need to do
is start both scans at about (as in within a few seconds) the same time.

Heck, if vacuum was made to put more verbose info in it's process status
then it could be as simple as having pg_dump start a vacuum of a table
in a seperate connection and just watching for the status to indicate it
had started vacuuming the table.

I *think* this shouldn't be too hard to test, which is good since it's
all theory right now. :)
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


Re: Vacuum questions...

От
Hannu Krosing
Дата:
On T, 2005-09-27 at 17:57 -0500, Jim C. Nasby wrote:
> On Tue, Sep 27, 2005 at 02:47:46PM -0400, Jan Wieck wrote:
> > On 9/24/2005 8:17 PM, Jim C. Nasby wrote:
> > 
> > >Would it be difficult to vacuum as part of a dump? The reasoning behind
> > >this is that you have to read the table to do the dump anyway, 
> > 
> > I think aside from what's been said so far, it would be rather difficult 
> > anyway. pg_dump relies on MVCC and requires to run in one transaction to 
> > see a consistent snapshot while vacuum jiggles around with transactions 
> > in some rather non-standard way.
> 
> Is this true even if they were in different connections?
> 
> My (vague) understanding of the vacuum process is that it first vacuums
> indexes, and then vacuums the heap. 

actually (lazy) vacuum does this

1) scan heap, collect ctids of rows to remove
2) clean indexes
3) clean heap

> Since we don't dump indexes, there's
> nothing for backup to do while those are vacuumed, so my idea is:
> 
> pg_dump:
> foreach (table)
>     spawn vacuum
>     wait for vacuum to hit heap
>     start copy
>     wait for analyze to finish
> next;

probably the first heap scan of vacuum would go faster than dump as it
does not have to write out anything, and the second scan ( nr 3 in above
list ) would be either faster or slower, as it has to lock each page and
rearrange tuples there.

so it would be very hard to synchronize vacuum with either of them.

-- 
Hannu Krosing <hannu@skype.net>



Re: Vacuum questions...

От
"Jim C. Nasby"
Дата:
On Thu, Sep 29, 2005 at 12:50:13AM +0300, Hannu Krosing wrote:
> On T, 2005-09-27 at 17:57 -0500, Jim C. Nasby wrote:
> > On Tue, Sep 27, 2005 at 02:47:46PM -0400, Jan Wieck wrote:
> > > On 9/24/2005 8:17 PM, Jim C. Nasby wrote:
> > > 
> > > >Would it be difficult to vacuum as part of a dump? The reasoning behind
> > > >this is that you have to read the table to do the dump anyway, 
> > > 
> > > I think aside from what's been said so far, it would be rather difficult 
> > > anyway. pg_dump relies on MVCC and requires to run in one transaction to 
> > > see a consistent snapshot while vacuum jiggles around with transactions 
> > > in some rather non-standard way.
> > 
> > Is this true even if they were in different connections?
> > 
> > My (vague) understanding of the vacuum process is that it first vacuums
> > indexes, and then vacuums the heap. 
> 
> actually (lazy) vacuum does this
> 
> 1) scan heap, collect ctids of rows to remove
> 2) clean indexes
> 3) clean heap
> 
> > Since we don't dump indexes, there's
> > nothing for backup to do while those are vacuumed, so my idea is:
> > 
> > pg_dump:
> > foreach (table)
> >     spawn vacuum
> >     wait for vacuum to hit heap
> >     start copy
> >     wait for analyze to finish
> > next;
> 
> probably the first heap scan of vacuum would go faster than dump as it
> does not have to write out anything, and the second scan ( nr 3 in above
> list ) would be either faster or slower, as it has to lock each page and
> rearrange tuples there.
> 
> so it would be very hard to synchronize vacuum with either of them.

Well, I guess it depends on what the dump was writing to. Also depends
on available cache I expect.

Is this something that could be hacked together fairly easy just for
testing purposes? Would firing off a VACUUM tablename at the same time
as a COPY tablename be a good enough approximation?
-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461