Re: splitting data into multiple tables

Поиск

Список

Период

Сортировка

От	Viji V Nair
Тема	Re: splitting data into multiple tables
Дата	26 января 2010 г. 03:58:16
Msg-id	84c89ac11001252358k498053cdpa0c05f6e5ece3303@mail.gmail.com обсуждение исходный текст
Ответ на	Re: splitting data into multiple tables (nair rajiv <nair331@gmail.com>)
Ответы	Re: splitting data into multiple tables
Список	pgsql-performance

Дерево обсуждения

On Tue, Jan 26, 2010 at 9:18 AM, nair rajiv <nair331@gmail.com> wrote:

On Tue, Jan 26, 2010 at 6:19 AM, Andres Freund <andres@anarazel.de> wrote:
On Tuesday 26 January 2010 01:39:48 nair rajiv wrote:
> On Tue, Jan 26, 2010 at 1:01 AM, Craig James
<craig_james@emolecules.com>wrote:
> I am working on a project that will take out structured content
> from wikipedia
> and put it in our database. Before putting the data into the database I
> wrote a script to
> find out the number of rows every table would be having after the data is
> in and I found
> there is a table which will approximately have 50,000,000 rows after data
> harvesting.
> Is it advisable to keep so much data in one table ?
Depends on your access patterns. I.e. how many rows are you accessing at the
same time - do those have some common locality and such.

I'll give a brief idea of how this table is. The important columns are
subject, predicate and object. So given a predicate and object one should
be able to get all the subjects, given subject and a predicate one should
be able to retrieve all the objects. I have created an indexes on these three
columns.

> I have read about 'partitioning' a table. An other idea I have is
> to break the table into
> different tables after the no of rows in a table has reached a certain
> limit say 10,00,000.
> For example, dividing a table 'datatable' to 'datatable_a', 'datatable_b'
> each having 10,00,000 rows.
> I needed advice on whether I should go for partitioning or the approach I
> have thought of.
Your approach is pretty close to partitioning - except that partitioning makes
that mostly invisible to the outside so it is imho preferrable.

> We have a HP server with 32GB ram,16 processors. The storage has
> 24TB diskspace (1TB/HD).
> We have put them on RAID-5. It will be great if we could know the
> parameters that can be changed in the
> postgres configuration file so that the database makes maximum utilization
> of the server we have.
> For eg parameters that would increase the speed of inserts and selects.
Not using RAID-5 possibly would be a good start - many people (me included)
experienced bad write performance on it. It depends a great deal on the
controller/implementation though.
RAID-10 is normally to be considered more advantageous despite its lower
usable space ratio.
Did you create one big RAID-5 out of all disks? Thats not a good idea, because
its pretty likely that another disk fails while you restore a previously
failed disk. Unfortunately in that configuration that means you have lost your
complete data (in the most common implementations at least).

No, I am using only 12TB i.e 12 HDs of the 24TB I have

A 15k rpm SAS drive will give you a throughput of 12MB and 120 IOPS. Now you can calculate the number of disks, specifically spindles, for getting your desired throughput and IOPs

Andres

PS: Your lines are strangely wrapped...

В списке pgsql-performance по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: splitting data into multiple tables