Обсуждение: Re: [QUESTIONS] Business cases
On Sat, 17 Jan 1998, Tom wrote: > How are large users handling the vacuum problem? vaccuum locks other > users out of tables too long. I don't need a lot performance (a few per > minutes), but I need to be handle queries non-stop). Not sure, but this one is about the only major thing that is continuing to bother me :( Is there any method of improving this? > Also, how are people handling tables with lots of rows? The 8k tuple > size can waste a lot of space. I need to be able to handle a 2 million > row table, which will eat up 16GB, plus more for indexes. This oen is improved upon in v6.3, where at compile time you can stipulate the tuple size. We are looking into making this an 'initdb' option instead, so that you can have the same binary for multiple "servers", but any database created under a particular server will be constrained by that tuple size. Marc G. Fournier Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
On Sat, 17 Jan 1998, The Hermit Hacker wrote: > On Sat, 17 Jan 1998, Tom wrote: > > > How are large users handling the vacuum problem? vaccuum locks other > > users out of tables too long. I don't need a lot performance (a few per > > minutes), but I need to be handle queries non-stop). > > Not sure, but this one is about the only major thing that is continuing > to bother me :( Is there any method of improving this? vacuum seems to do a _lot_ of stuff. It seems that crash recovery features, and maintenance features should be separated. I believe the only required maintenance features are recovering space used by deleted tuples and updating stats? Both of these shouldn't need to lock the database for long periods of time. > > Also, how are people handling tables with lots of rows? The 8k tuple > > size can waste a lot of space. I need to be able to handle a 2 million > > row table, which will eat up 16GB, plus more for indexes. > > This oen is improved upon in v6.3, where at compile time you can stipulate > the tuple size. We are looking into making this an 'initdb' option instead, > so that you can have the same binary for multiple "servers", but any database > created under a particular server will be constrained by that tuple size. That might help a bit, but same tables may have big rows and some not. For example, my 2 million row table requires only requires two date fields, and 7 integer fields. That isn't very much data. However, I'd like to be able to join against another table with much larger rows. > Marc G. Fournier > Systems Administrator @ hub.org > primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org Tom
Tom wrote: > > > How are large users handling the vacuum problem? vaccuum locks other > > > users out of tables too long. I don't need a lot performance (a few per > > > minutes), but I need to be handle queries non-stop). > > > > Not sure, but this one is about the only major thing that is continuing > > to bother me :( Is there any method of improving this? > > vacuum seems to do a _lot_ of stuff. It seems that crash recovery > features, and maintenance features should be separated. I believe the > only required maintenance features are recovering space used by deleted > tuples and updating stats? Both of these shouldn't need to lock the > database for long periods of time. Would it be possible to add an option to VACUUM, like a max number of blocks to sweep? Or is this impossible because of the way PG works? Would it be possible to (for example) compact data from the front of the file to make one block free somewhere near the beginning of the file and then move rows from the last block to this new, empty block? -- To limit the number of rows to compact: psql=> VACUUM MoveMax 1000; -- move max 1000 rows -- To limit the time used for vacuuming: psql=> VACUUM MaxSweep 1000; -- Sweep max 1000 blocks Could this work with the current method of updating statistics? *** Btw, why doesn't PG update statistics when inserting/updating? /* m */
> > Also, how are people handling tables with lots of rows? The 8k tuple > > size can waste a lot of space. I need to be able to handle a 2 million > > row table, which will eat up 16GB, plus more for indexes. > > This oen is improved upon in v6.3, where at compile time you can stipulate > the tuple size. We are looking into making this an 'initdb' option instead, > so that you can have the same binary for multiple "servers", but any database > created under a particular server will be constrained by that tuple size. Tom's "problem" is probably not a bad as he thinks. The 8k tuple size limit is a result of the current 8k page size limit, but multiple tuples are allowed on a page. They are just not allowed to span pages. So, there is some wasted space (as is true for many, most, or all commercial dbs also) but it is on average only the size of half a tuple, and can be easily predicted once the tuple sizes are known. - Tom (T2?)
> Could this work with the current method of updating statistics? > > > *** Btw, why doesn't PG update statistics when inserting/updating? Too slow to do at that time. You need to span all the data to get an accurate figure. -- Bruce Momjian maillist@candle.pha.pa.us
Bruce Momjian wrote: > > > *** Btw, why doesn't PG update statistics when inserting/updating? > > Too slow to do at that time. You need to span all the data to get an > accurate figure. Is it not possible to take the current statistics and the latest changes and calculate new statistics from that? What information is kept in these statistics? How are they used? Obviously this is more than just number of rows, but what exactly? /* m */
> > Bruce Momjian wrote: > > > > > *** Btw, why doesn't PG update statistics when inserting/updating? > > > > Too slow to do at that time. You need to span all the data to get an > > accurate figure. > > Is it not possible to take the current statistics and the latest > changes and calculate new statistics from that? > > What information is kept in these statistics? How are they used? > Obviously this is more than just number of rows, but what exactly? Look in commands/vacuum.c. It measures the spread-ness of the data, and there is no way to get this figure on-the-fly unless you maintain buckets for each range of data values and decrement/increment as values are added-subtraced. Seeing a the MySQL optimizer is a single file, and so is the executor, I doubt that is what it is doing. Probably just keeps a count of how many rows in the table. -- Bruce Momjian maillist@candle.pha.pa.us