Data warehousing requirements

Поиск
Список
Период
Сортировка
От Gabriele Bartolini
Тема Data warehousing requirements
Дата
Msg-id 6.1.2.0.2.20041006230239.0201bd40@box.tin.it
обсуждение исходный текст
Список pgsql-performance
Hi guys,

    I just discussed about my problem on IRC. I am building a Web usage
mining system based on Linux, PostgreSQL and C++ made up of an OLTP
database which feeds several and multi-purpose data warehouses about users'
behaviour on HTTP servers.

    I modelled every warehouse using the star schema, with a fact table and
then 'n' dimension tables linked using a surrogate ID.

    Discussing with the guys of the chat, I came up with these conclusions,
regarding the warehouse's performance:

1) don't use referential integrity in the facts table
2) use INTEGER and avoid SMALLINT and NUMERIC types for dimensions' IDs
3) use an index for every dimension's ID in the fact table

    As far as administration is concerned: run VACUUM ANALYSE daily and
VACUUM FULL periodically.

    Is there anything else I should keep in mind?

    Also, I was looking for advice regarding hardware requirements for a
data warehouse system that needs to satisfy online queries. I have indeed
no idea at the moment. I can only predict 4 million about records a month
in the fact table, does it make sense or not? is it too much?

    Data needs to be easily backed up and eventually replicated.

    Having this in mind, what hardware architecture should I look for? How
many hard disks do I need, what kind and what RAID solution do you suggest
me to adopt (5 or 10 - I think)?

Thank you so much,
-Gabriele
--
Gabriele Bartolini: Web Programmer, ht://Dig & IWA/HWG Member, ht://Check
maintainer
Current Location: Prato, Toscana, Italia
angusgb@tin.it | http://www.prato.linux.it/~gbartolini | ICQ#129221447
 > "Leave every hope, ye who enter!", Dante Alighieri, Divine Comedy, The
Inferno

---
Outgoing mail is certified Virus Free.
Checked by AVG anti-virus system (http://www.grisoft.com).
Version: 6.0.773 / Virus Database: 520 - Release Date: 05/10/2004

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: sequential scan on select distinct
Следующее
От: Paul Ramsey
Дата:
Сообщение: Re: The never ending quest for clarity on shared_buffers