Обсуждение: Re: [QUESTIONS] Business cases

Поиск
Список
Период
Сортировка

Re: [QUESTIONS] Business cases

От
The Hermit Hacker
Дата:
On Sat, 17 Jan 1998, Tom wrote:

>   How are large users handling the vacuum problem?  vaccuum locks other
> users out of tables too long.  I don't need a lot performance (a few per
> minutes), but I need to be handle queries non-stop).

    Not sure, but this one is about the only major thing that is continuing
to bother me :(  Is there any method of improving this?

>   Also, how are people handling tables with lots of rows?  The 8k tuple
> size can waste a lot of space.  I need to be able to handle a 2 million
> row table, which will eat up 16GB, plus more for indexes.

    This oen is improved upon in v6.3, where at compile time you can stipulate
the tuple size.  We are looking into making this an 'initdb' option instead,
so that you can have the same binary for multiple "servers", but any database
created under a particular server will be constrained by that tuple size.

Marc G. Fournier
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org


Re: [QUESTIONS] Business cases

От
Tom
Дата:
On Sat, 17 Jan 1998, The Hermit Hacker wrote:

> On Sat, 17 Jan 1998, Tom wrote:
>
> >   How are large users handling the vacuum problem?  vaccuum locks other
> > users out of tables too long.  I don't need a lot performance (a few per
> > minutes), but I need to be handle queries non-stop).
>
>     Not sure, but this one is about the only major thing that is continuing
> to bother me :(  Is there any method of improving this?

  vacuum seems to do a _lot_ of stuff.  It seems that crash recovery
features, and maintenance features should be separated.  I believe the
only required maintenance features are recovering space used by deleted
tuples and updating stats?  Both of these shouldn't need to lock the
database for long periods of time.

> >   Also, how are people handling tables with lots of rows?  The 8k tuple
> > size can waste a lot of space.  I need to be able to handle a 2 million
> > row table, which will eat up 16GB, plus more for indexes.
>
>     This oen is improved upon in v6.3, where at compile time you can stipulate
> the tuple size.  We are looking into making this an 'initdb' option instead,
> so that you can have the same binary for multiple "servers", but any database
> created under a particular server will be constrained by that tuple size.

  That might help a bit, but same tables may have big rows and some not.
For example, my 2 million row table requires only requires two date
fields, and 7 integer fields.  That isn't very much data.  However, I'd
like to be able to join against another table with much larger rows.

> Marc G. Fournier
> Systems Administrator @ hub.org
> primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org

Tom


Re: Re: [HACKERS] Re: [QUESTIONS] Business cases

От
Mattias Kregert
Дата:
Tom wrote:
> > >   How are large users handling the vacuum problem?  vaccuum locks other
> > > users out of tables too long.  I don't need a lot performance (a few per
> > > minutes), but I need to be handle queries non-stop).
> >
> >       Not sure, but this one is about the only major thing that is continuing
> > to bother me :(  Is there any method of improving this?
>
>   vacuum seems to do a _lot_ of stuff.  It seems that crash recovery
> features, and maintenance features should be separated.  I believe the
> only required maintenance features are recovering space used by deleted
> tuples and updating stats?  Both of these shouldn't need to lock the
> database for long periods of time.

Would it be possible to add an option to VACUUM, like a max number
of blocks to sweep? Or is this impossible because of the way PG works?

Would it be possible to (for example) compact data from the front of
the file to make one block free somewhere near the beginning of the
file and then move rows from the last block to this new, empty block?

-- To limit the number of rows to compact:
psql=> VACUUM MoveMax 1000; -- move max 1000 rows

-- To limit the time used for vacuuming:
psql=> VACUUM MaxSweep 1000; -- Sweep max 1000 blocks

Could this work with the current method of updating statistics?


*** Btw, why doesn't PG update statistics when inserting/updating?


/* m */

Re: [HACKERS] Re: [QUESTIONS] Business cases

От
"Thomas G. Lockhart"
Дата:
> >   Also, how are people handling tables with lots of rows?  The 8k tuple
> > size can waste a lot of space.  I need to be able to handle a 2 million
> > row table, which will eat up 16GB, plus more for indexes.
>
>         This oen is improved upon in v6.3, where at compile time you can stipulate
> the tuple size.  We are looking into making this an 'initdb' option instead,
> so that you can have the same binary for multiple "servers", but any database
> created under a particular server will be constrained by that tuple size.

Tom's "problem" is probably not a bad as he thinks. The 8k tuple size limit is a
result of the current 8k page size limit, but multiple tuples are allowed on a page.
They are just not allowed to span pages. So, there is some wasted space (as is true
for many, most, or all commercial dbs also) but it is on average only the size of
half a tuple, and can be easily predicted once the tuple sizes are known.

                                                         - Tom (T2?)


Re: Re: [HACKERS] Re: [QUESTIONS] Business cases

От
Bruce Momjian
Дата:
> Could this work with the current method of updating statistics?
>
>
> *** Btw, why doesn't PG update statistics when inserting/updating?

Too slow to do at that time.  You need to span all the data to get an
accurate figure.

--
Bruce Momjian
maillist@candle.pha.pa.us

Re: [HACKERS] Re: [QUESTIONS] Business cases

От
Mattias Kregert
Дата:
Bruce Momjian wrote:
>
> > *** Btw, why doesn't PG update statistics when inserting/updating?
>
> Too slow to do at that time.  You need to span all the data to get an
> accurate figure.

Is it not possible to take the current statistics and the latest
changes and calculate new statistics from that?

What information is kept in these statistics? How are they used?
Obviously this is more than just number of rows, but what exactly?

/* m */

Re: [HACKERS] Re: [QUESTIONS] Business cases

От
Bruce Momjian
Дата:
>
> Bruce Momjian wrote:
> >
> > > *** Btw, why doesn't PG update statistics when inserting/updating?
> >
> > Too slow to do at that time.  You need to span all the data to get an
> > accurate figure.
>
> Is it not possible to take the current statistics and the latest
> changes and calculate new statistics from that?
>
> What information is kept in these statistics? How are they used?
> Obviously this is more than just number of rows, but what exactly?

Look in commands/vacuum.c.  It measures the spread-ness of the data, and
there is no way to get this figure on-the-fly unless you maintain
buckets for each range of data values and decrement/increment as values
are added-subtraced.  Seeing a the MySQL optimizer is a single file, and
so is the executor, I doubt that is what it is doing.  Probably just
keeps a count of how many rows in the table.

--
Bruce Momjian
maillist@candle.pha.pa.us