Re: Offending My Tender Sensibilities -OR- OLTP on a Star Schema

Поиск
Список
Период
Сортировка
От Andy Colson
Тема Re: Offending My Tender Sensibilities -OR- OLTP on a Star Schema
Дата
Msg-id 52F40D19.6060107@squeakycode.net
обсуждение исходный текст
Ответ на Offending My Tender Sensibilities -OR- OLTP on a Star Schema  (Roy Anderson <roy.anderson@gmail.com>)
Список pgsql-general
On 2/4/2014 10:06 PM, Roy Anderson wrote:
> We have an OLTP database and no data warehouse. We are currently
> planning out a build for a data warehouse however (possibly using
> Hadoop). "X" is recommending that we convert our current, normalized
> OLTP database into a flattened Star Schema.
>
> The primary rationale for this OLTP flattening is that since most of
> the app calls to the database are SELECT statements, the DB should be
> optimized for that. A peripheral rationale I've heard is that the
> 1NF-3NF database is hard to understand sometimes.
>
> I'm not going to state what I believe. I would like to hear my
> esteemed colleagues take on this situation. What would you do? What
> would you say? Is there anything you would investigate? In short, what
> say you?
>
> Thanks for your time.
>
>

I think NF is more about correctness than about speed.  People saying
they denormalized a database to improve speed also de-corrected it, and
I doubt they improved the speed.

The only experience I have with data warehousing is storing my apache
logs in PG.  I have have many mega-millions of rows.  So many that I
don't run select count(*) because it takes forever.

I have an index on hitdate, so I can pull any small number of records
based on time very quickly, but I can never select all of them.  When I
add detail rows to the table I also add to a table of hourly summaries.
  My graphs are run off the summary tables and pulling a few months
worth of data is very quick.

So I guess it would come down to how you need to access your data
warehouse.  If you need to "learn" things by hitting every detail record
for all time, then PG isn't gonna work well for you.

If you can do indexed queries and return small subsets, PG will work
great.  If you can keep ongoing summaries you'll be fine too.
Rebuilding a summary table (or creating a new one) is painful, but
possible as long as your not in a huge hurry.

-Andy



В списке pgsql-general по дате отправления:

Предыдущее
От: David Johnston
Дата:
Сообщение: Re: subtracting from a date
Следующее
От: Edson Richter
Дата:
Сообщение: Re: Help with connection issue - started today