Postgresql 8.4 and Data warehousing

Поиск
Список
Период
Сортировка
От Dhaval Rami
Тема Postgresql 8.4 and Data warehousing
Дата
Msg-id 18e902e00911020318m517baf0ep170bc321e674933b@mail.gmail.com
обсуждение исходный текст
Список pgsql-admin
Data warehousing using postgresql is it possible ?

Answer is yes, it is possible.

Postgresql has always been considered as not suitable for data
warehousing because of advanced database features present in it, like
transaction support , locking, etc.

But the fact is: We cannot decide performance of a database just by
reading about it, best way is to practically implement the scenario,
tune it and compare it with other databases.

I did just that and following are the points one should consider to
achieve data warehouse in postgresql.

    Handling transactions and locks
        -Optimum transaction size
        -Parallelism to avoid locks
    PostgreSQL parameter configuration
        -Tweaking postgresql configuration parameters to best utilize
available hardware.
    Compartmentalize data storage
        -Making logically physical division of data on disk
    Views and fact tables
        -Frequency of create and update view and maintaining fact tables
    Column Index planning
        -Deciding use of index.
    Query planning
        -Planning a query based on requested criteria
    Hardware/Software properties that effect performance
        -RAM, HDD properties and softwares that can increase /
decrease system performance

Case study: Open source SIEM Cyberoam-iview
-> http://sourceforge.net/projects/cyberoam-iview/

Cyberoam-iView is a practically successful example of A Real time data
warehouse using unchanged postgresql 8.4 and files.

-->Features:
-Supports up to 60,000 Events/Sec. ( collect, parse, process, insert
into postgresql tables)
-1040 predefined reports (OLAP Aggregated cubes).
-Storage capacity: Aggregated cubes with granularity of 5 minutes for
1 week, cubes with granularity of 4 hours for 1 year, cubes with
granularity of 24 hours for 7 years. Fact table in compressed form.
-Response time < 30 seconds for most of the reports.
-Real time view update, Aggregation cubes updated every 5 min.
-Reports for any time range selection (up to 7 years) supported.

-->Limitations:
-Does not support ad hoc aggregated report generation from fact table,
All reports needs to be predefined at the time of deployment.
-Fact table need to be extracted and loaded before searching.
-Each cube is updated with top n significant aggregated records where
0 < n < 4001.

-->Minimum Hardware requirements :
Pentium P4 processor 3.0 G Hz
2 GB RAM
1 TB HDD , standard company with read write speed > 50 MBps

Regards
Dhaval Rami

В списке pgsql-admin по дате отправления:

Предыдущее
От: Kevin Kempter
Дата:
Сообщение: Re: dupes for PK and other UNIQUE indexes
Следующее
От: Shruthi A
Дата:
Сообщение: Need help in enabling remote connection