Обсуждение: Impact of vacuum full...
Hello, I was wondering if someone could enlighten me as to the impact to the entire database of running VACUUM FULL against a single table. The reason I ask is that at company we work for we have a very large number of queue type tables that fill up and empty out on a regular basis, on the average every two days per table would be a good estimate. These tables, when filled, can contain anywhere from less than 50 to upwards of 500,000 entries. We recently moved out db over to a new box as the old one wasn't cutting it (we haven't been able to run a successful VACUUM FULL on the whole db in over a year and after the transfer our db was smaller by 50GB!!!). Thus we are looking into a more aggressive vacuuming strategy for these table in order to keep our database lean and clean for the purpose of quick and fast dumps and general performance issues otherwise. Also, what kind of times am I looking at to do VACUUM FULL's on single tables of different sizes? Thank you for any advice/input you may have... -- erik jones <erik@myemma.com> software development emma(r)
Erik, On Fri, 2006-07-21 at 17:13, Erik Jones wrote: > Hello, I was wondering if someone could enlighten me as to the impact to > the entire database of running VACUUM FULL against a single table. The > reason I ask is that at company we work for we have a very large number > of queue type tables that fill up and empty out on a regular basis, on > the average every two days per table would be a good estimate. We also do have here some queue-like tables, though they tend to be small and our clean-empty rate is minutes not days. I solved this kind of problem using the CLUSTER command, which completely rebuilds the table, and as an added benefit it will be physically ordered using the index you have chosen to cluster on. I think clustering is faster than vacuum full, and cleans your indexes too (which vacuum full won't do). One interesting detail is that CLUSTER is not respecting MVCC, i.e. it will clean all dead tuples regardless if there are older transactions running which could see them. This might be a problem for you, but for my queue-like tables was a big help to stay clean, as CLUSTER is able to shrink them even in the presence long-running transactions which normally would prevent cleaning dead tuples back to the oldest running transaction. For small tables CLUSTER is a nice feature... but beware that it locks the table exclusively, so if you have a big table you might have a long down-time during the clustering operation where the table is not accessible. That said, I use it for fairly big tables too occasionally when I need to clean up stuff... Cheers, Csaba.
Csaba Nagy wrote: > Erik, > > On Fri, 2006-07-21 at 17:13, Erik Jones wrote: > >> Hello, I was wondering if someone could enlighten me as to the impact to >> the entire database of running VACUUM FULL against a single table. The >> reason I ask is that at company we work for we have a very large number >> of queue type tables that fill up and empty out on a regular basis, on >> the average every two days per table would be a good estimate. >> > > We also do have here some queue-like tables, though they tend to be > small and our clean-empty rate is minutes not days. I solved this kind > of problem using the CLUSTER command, which completely rebuilds the > table, and as an added benefit it will be physically ordered using the > index you have chosen to cluster on. > > I think clustering is faster than vacuum full, and cleans your indexes > too (which vacuum full won't do). One interesting detail is that CLUSTER > is not respecting MVCC, i.e. it will clean all dead tuples regardless if > there are older transactions running which could see them. This might be > a problem for you, but for my queue-like tables was a big help to stay > clean, as CLUSTER is able to shrink them even in the presence > long-running transactions which normally would prevent cleaning dead > tuples back to the oldest running transaction. > > For small tables CLUSTER is a nice feature... but beware that it locks > the table exclusively, so if you have a big table you might have a long > down-time during the clustering operation where the table is not > accessible. That said, I use it for fairly big tables too occasionally > when I need to clean up stuff... > That is an excellent idea, however, what are the effects of CLUSTER on empty tables? Considering that most of our queue tables sit empty until their used, our main concern is keep the disk space that they use available and our 'cleaning' activities will be done whenever the tables empty out (unless they are scheduled for use within, say, an hour) I have to wonder at whether or not CLUSTER would do anything to an empty table with no actual data to cluster. -- erik jones <erik@myemma.com> software development emma(r)
On Fri, 2006-07-21 at 10:13, Erik Jones wrote: > Hello, I was wondering if someone could enlighten me as to the impact to > the entire database of running VACUUM FULL against a single table. The > reason I ask is that at company we work for we have a very large number > of queue type tables that fill up and empty out on a regular basis HOLD ON! Do you empty them by doing something like delete from table with no where clause? If so, then try truncating the table. That will clean it completely and reclaim all the dead space, plus it's faster than delete anyway. If that doesn't help, look at scheduling more aggressive plain vacuums (no just autovacuum, but cron job vacuum on specific tables that you know have a high turnover). Vacuum full is basically admitting your regular vacuum schedule isn't / can't be aggressive enough.
Scott Marlowe wrote: > On Fri, 2006-07-21 at 10:13, Erik Jones wrote: > >> Hello, I was wondering if someone could enlighten me as to the impact to >> the entire database of running VACUUM FULL against a single table. The >> reason I ask is that at company we work for we have a very large number >> of queue type tables that fill up and empty out on a regular basis >> > > HOLD ON! Do you empty them by doing something like > > delete from table > > with no where clause? > > If so, then try truncating the table. That will clean it completely and > reclaim all the dead space, plus it's faster than delete anyway. > > If that doesn't help, look at scheduling more aggressive plain vacuums > (no just autovacuum, but cron job vacuum on specific tables that you > know have a high turnover). > > Vacuum full is basically admitting your regular vacuum schedule isn't / > can't be aggressive enough. > No!!! The table is filled and entries are deleted one at a time, or in groups, but definitely not all at once. So, then what is the difference between scheduling regular vacuum on specific tables v. scheduling vacuum full on specific tables? Basically, what I want to do is to ensure that when I clean out a table row or rows at a time, the space is immediately freed up. -- erik jones <erik@myemma.com> software development emma(r)
On Fri, 21 Jul 2006 10:35:17 -0500 Erik Jones <erik@myemma.com> wrote: > Csaba Nagy wrote: [snip] > That is an excellent idea, however, what are the effects of CLUSTER on > empty tables? Considering that most of our queue tables sit empty until > their used, our main concern is keep the disk space that they use > available and our 'cleaning' activities will be done whenever the tables > empty out (unless they are scheduled for use within, say, an hour) I > have to wonder at whether or not CLUSTER would do anything to an empty > table with no actual data to cluster. Based on that description, have you considered using TRUNCATE to clear out the tables when you're done using them? Truncate is faster than DELETE and I believe it's the equivalient of dropping and recreating the table, which means it will free up the space. -- Bill Moran Potential Technologies http://www.potentialtech.com
On Fri, 21 Jul 2006 11:40:32 -0500 Erik Jones <erik@myemma.com> wrote: > No!!! The table is filled and entries are deleted one at a time, or in > groups, but definitely not all at once. So, then what is the > difference between scheduling regular vacuum on specific tables v. > scheduling vacuum full on specific tables? Basically, what I want to do > is to ensure that when I clean out a table row or rows at a time, the > space is immediately freed up. Why would you want to do that? PostgreSQL is not designed to operate in that fashon, and it's ineffecient. You're forcing PostgreSQL to constantly increase and decrease the amount of disk space allocated for the table, which isn't the best way to do things. Much better is to schedule frequent VACUUMs so that the table size reaches an equilibrium. This way there is always a little free space in the table so that rows can be added/updated without increasing the table size. This is how PostgreSQL is designed to run. If you don't have enough disk space to do this, then you didn't purchase large enough drives to hold your DB efficiently. You can do the VACUUM FULL as you describe, but it's non-optimal. To answer your original question directly: VACUUM FULL is expensive. It needs to lock out the table for the duration of its work, and other transactions will block during the operation. Depending on the table size, it could be prohibitively time-consuming. -- Bill Moran Potential Technologies http://www.potentialtech.com
On Fri, 2006-07-21 at 11:40, Erik Jones wrote: > Scott Marlowe wrote: > > On Fri, 2006-07-21 at 10:13, Erik Jones wrote: > > > >> Hello, I was wondering if someone could enlighten me as to the impact to > >> the entire database of running VACUUM FULL against a single table. The > >> reason I ask is that at company we work for we have a very large number > >> of queue type tables that fill up and empty out on a regular basis > >> > > > > HOLD ON! Do you empty them by doing something like > > > > delete from table > > > > with no where clause? > > > > If so, then try truncating the table. That will clean it completely and > > reclaim all the dead space, plus it's faster than delete anyway. > > > > If that doesn't help, look at scheduling more aggressive plain vacuums > > (no just autovacuum, but cron job vacuum on specific tables that you > > know have a high turnover). > > > > Vacuum full is basically admitting your regular vacuum schedule isn't / > > can't be aggressive enough. > > > No!!! The table is filled and entries are deleted one at a time, or in > groups, but definitely not all at once. So, then what is the > difference between scheduling regular vacuum on specific tables v. > scheduling vacuum full on specific tables? Basically, what I want to do > is to ensure that when I clean out a table row or rows at a time, the > space is immediately freed up. Oh, ok. Misunderstood based on your description there. With regular vacuum, postgresql marks the freed tuples as available, and the next time someone writes to the table it uses the freed up space. Eventually, the table should stop growing and reach a kind of "stable state" where it averages some percentage free (20 to 50% is generally considered optimal). If the space used by your table continues to grow, this points to a possible problem with not having a large enough free space map. Since regular vacuums are MUCH cheaper in terms of locking and such, it might be practical to schedule a plain vacuum at the end of any large deletes that you currently run. I'd use regular cronned vacuums on the tables that you know grown a lot (or just hit the whole db and not worry about it) and run occasional vacuum verbose / vacuum full verbose by hand to see if you have problems with your Free Space Map being too small.
Scott Marlowe wrote: > On Fri, 2006-07-21 at 11:40, Erik Jones wrote: > >> Scott Marlowe wrote: >> >>> On Fri, 2006-07-21 at 10:13, Erik Jones wrote: >>> >>> >>>> Hello, I was wondering if someone could enlighten me as to the impact to >>>> the entire database of running VACUUM FULL against a single table. The >>>> reason I ask is that at company we work for we have a very large number >>>> of queue type tables that fill up and empty out on a regular basis >>>> >>>> >>> HOLD ON! Do you empty them by doing something like >>> >>> delete from table >>> >>> with no where clause? >>> >>> If so, then try truncating the table. That will clean it completely and >>> reclaim all the dead space, plus it's faster than delete anyway. >>> >>> If that doesn't help, look at scheduling more aggressive plain vacuums >>> (no just autovacuum, but cron job vacuum on specific tables that you >>> know have a high turnover). >>> >>> Vacuum full is basically admitting your regular vacuum schedule isn't / >>> can't be aggressive enough. >>> >>> >> No!!! The table is filled and entries are deleted one at a time, or in >> groups, but definitely not all at once. So, then what is the >> difference between scheduling regular vacuum on specific tables v. >> scheduling vacuum full on specific tables? Basically, what I want to do >> is to ensure that when I clean out a table row or rows at a time, the >> space is immediately freed up. >> > > Oh, ok. Misunderstood based on your description there. > > With regular vacuum, postgresql marks the freed tuples as available, and > the next time someone writes to the table it uses the freed up space. > Eventually, the table should stop growing and reach a kind of "stable > state" where it averages some percentage free (20 to 50% is generally > considered optimal). > > If the space used by your table continues to grow, this points to a > possible problem with not having a large enough free space map. > > Since regular vacuums are MUCH cheaper in terms of locking and such, it > might be practical to schedule a plain vacuum at the end of any large > deletes that you currently run. > > I'd use regular cronned vacuums on the tables that you know grown a lot > (or just hit the whole db and not worry about it) and run occasional > vacuum verbose / vacuum full verbose by hand to see if you have problems > with your Free Space Map being too small. > Awesome! Thanks, guys, for all of your input/advice. That's pretty much how I thought stuff worked after reading the docs but was confused/misled by other inputs. With regards to the Free Space Map and max_fsm_relations: is using the value of "SELECT COUNT(*) FROM pg_class;" plus some room for growth a good way to set that? -- erik jones <erik@myemma.com> software development emma(r)
Scott Marlowe wrote: > On Fri, 2006-07-21 at 15:49, Erik Jones wrote: > >> Scott Marlowe wrote: >> >>> I'd use regular cronned vacuums on the tables that you know grown a lot >>> (or just hit the whole db and not worry about it) and run occasional >>> vacuum verbose / vacuum full verbose by hand to see if you have problems >>> with your Free Space Map being too small. >>> >>> >> Awesome! Thanks, guys, for all of your input/advice. That's pretty >> much how I thought stuff worked after reading the docs but was >> confused/misled by other inputs. With regards to the Free Space Map and >> max_fsm_relations: is using the value of "SELECT COUNT(*) FROM >> pg_class;" plus some room for growth a good way to set that? >> > > I always use vacuum verbose to see that. At the end, it'll have a part > that looks like this: > > INFO: free space map: 35 relations, 18903 pages stored; 17504 total > pages needed > DETAIL: Allocated FSM size: 5000 relations + 100000 pages = 894 kB > shared memory. > VACUUM > > > So, on this machine, we can handle 5000 relations of 100,000 total > pages, and we're only uses the space of 35 relations and ~20,000 pages. > > If the pages needed exceeds the allocated size, you've got problems. > > You've got to run the database for a while to see what the state will be > like over time. > Well, just to give you guys an idea of the size of db we're working with (and, hopefully to make you postgres developers proud) a 'SELECT COUNT(*) FROM pg_class;' returns a # over 300k for the # of relations in the db. I really can't go too much furthur into that or the reason we have so many due to my nda and such, but I'm sure you can see why this has been such a big issue for me. Thanks again for all of your help, and I'll be back soon with some questions regarding ANALYZE that I've got cooking up in the back of my head... -- erik jones <erik@myemma.com> software development emma(r)
On Fri, 2006-07-21 at 15:49, Erik Jones wrote: > Scott Marlowe wrote: > > > > I'd use regular cronned vacuums on the tables that you know grown a lot > > (or just hit the whole db and not worry about it) and run occasional > > vacuum verbose / vacuum full verbose by hand to see if you have problems > > with your Free Space Map being too small. > > > Awesome! Thanks, guys, for all of your input/advice. That's pretty > much how I thought stuff worked after reading the docs but was > confused/misled by other inputs. With regards to the Free Space Map and > max_fsm_relations: is using the value of "SELECT COUNT(*) FROM > pg_class;" plus some room for growth a good way to set that? I always use vacuum verbose to see that. At the end, it'll have a part that looks like this: INFO: free space map: 35 relations, 18903 pages stored; 17504 total pages needed DETAIL: Allocated FSM size: 5000 relations + 100000 pages = 894 kB shared memory. VACUUM So, on this machine, we can handle 5000 relations of 100,000 total pages, and we're only uses the space of 35 relations and ~20,000 pages. If the pages needed exceeds the allocated size, you've got problems. You've got to run the database for a while to see what the state will be like over time.