Re: Why is my database so big?

Поиск
Список
Период
Сортировка
От Stephen Frost
Тема Re: Why is my database so big?
Дата
Msg-id 20160222145808.GB13092@tamriel.snowman.net
обсуждение исходный текст
Ответ на Re: Why is my database so big?  ("FarjadFarid\(ChkNet\)" <farjad.farid@checknetworks.com>)
Список pgsql-general
All,

* FarjadFarid(ChkNet) (farjad.farid@checknetworks.com) wrote:
> Tom, thanks for your unbiased detailed response.
>
> Interesting post.

Please don't top-post.  My comments are in-line, below.

> -----Original Message-----
> From: pgsql-general-owner@postgresql.org
> [mailto:pgsql-general-owner@postgresql.org] On Behalf Of Tom Lane
> Sent: 22 February 2016 05:06
> To: Andrew Smith
> Cc: pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Why is my database so big?
>
> Andrew Smith <laconical@gmail.com> writes:
> > I am setting up a proof of concept database to store some historical data.
> > Whilst I've used PostgreSQL a bit in the past this is the first time
> > I've looked into disk usage due to the amount of data that could
> > potentially be stored. I've done a quick test and I'm a little
> > confused as to why it is occupying so much space on disk. Here is my table
> definition:
>
> > CREATE TABLE "TestSize"
> > (
> >   "Id" integer NOT NULL,
> >   "Time" timestamp without time zone NOT NULL,
> >   "Value" real NOT NULL,
> >   "Status" smallint NOT NULL,
> >   PRIMARY KEY ("Id", "Time")
> > );
>
> > CREATE INDEX test_index ON "TestSize" ("Id");

Note that you don't really need an index on "Id" because including a
primary key will automatically include an index on those fields, and an
index over ("Id", "Time") can be used to satisfy queries which have a
conditional on just the "Id" column.  Removing that extra index will
likely help with space issues.

> > With a completely empty table the database is 7 MB. After I insert 1
> > million records into the table the database is 121 MB. My
> > understanding is that each of the fields is sized as follows:
>
> > integer - 4 bytes
> > timestamp without time zone - 8 bytes
> > real - 4 bytes
> > smallint - 2 bytes

I'd recommend against using timestamp w/o time zone.  For starters, as
noted, it's not actually saving you any space over timestamp w/ time
zone, and second, it makes working with that field painful and prone to
error.

> The long and the short of it is that Postgres is more oriented to OLTP-style
> applications where access to and update of individual rows is the key
> performance metric.  If you're dealing with historical data and mainly want
> aggregated query results, it's possible you'd get better performance and
> more-compact storage from a column-store database.
>
> There's ongoing investigation into extending Postgres to support
> column-style storage for better support of applications like that; but any
> such feature is probably several years away, and it will not come without
> performance compromises of its own.

One approach to dealing with the PG per-row overhead is to use arrays
instead of rows, when you don't need that per-row visibility
information.  I've found that to be very successful for single-column
tables, but the technique may work reasonably well for other structures
also.

Thanks!

Stephen

Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: "FarjadFarid\(ChkNet\)"
Дата:
Сообщение: Re: Why is my database so big?
Следующее
От: Seamus Abshere
Дата:
Сообщение: Why does query planner choose slower BitmapAnd ?