Re: designing time dimension for star schema

Поиск

Список

Период

Сортировка

От	CS DBA
Тема	Re: designing time dimension for star schema
Дата	10 февраля 2014 г. 20:21:03
Msg-id	52F90A73.10408@consistentstate.com обсуждение исходный текст
Ответ на	designing time dimension for star schema (Mark Wong <markwkm@gmail.com>)
Список	pgsql-general

Дерево обсуждения

I've done a lot of DSS architecture. A couple of thoughts:

- in most cases the ETL process figures out the time id's as part of the
preparation and then does bulk loads into the fact tables
      I would be very concerned about performance of a trigger that
fired for every row on the fact table

      you mention you want to do data streaming instead of bulk loads,
can you elaborate?


- When querying a star schema one of the performance features is the
fact that all joins to the dimension tables are performed via a numeric
key, such as:
"select * from fact, time_dim, geo_dim
   where fact.time_id = time_dim.time_id..."

In the case of this being a timestamp I suspect the performance would
take a hit, depending on the size of your fact table and the
scope/volume of your DSS queries this could easily be a show stopper
based on the assumption that the database can do a numeric binary search
much faster than a timestamp search





On 2/10/14, 9:45 AM, Mark Wong wrote:
> Hello everybody,
>
> I was wondering if anyone had any experiences they can share when
> designing the time dimension for a star schema and the like.  I'm
> curious about how well it would work to use a timestamp for the
> attribute key, as opposed to a surrogate key, and populating the time
> dimension with triggers on insert to the fact tables.  This is
> something that would have data streaming in (as oppose to bulk
> loading) and I think we want time granularity to the minute.
>
> A simplified example:
>
> -- Time dimension
> CREATE TABLE time (
>      datetime TIMESTAMP WITH TIME ZONE NOT NULL,
>      day_of_week SMALLINT NOT NULL
> );
> CREATE UNIQUE INDEX ON time (datetime);
>
> -- Fact
> CREATE TABLE fact(
>      datetime TIMESTAMP WITH TIME ZONE NOT NULL,
>      FOREIGN KEY (datetime) REFERENCES time(datetime)
> );
>
> -- Function to populate the time dimension
> CREATE OR REPLACE FUNCTION decompose_timestamp() RETURNS TRIGGER AS $$
> BEGIN
>      NEW.datetime = date_trunc('minutes', NEW.datetime);
>      INSERT INTO time (datetime, day_of_week)
>      VALUES (NEW.datetime, date_part('dow', NEW.datetime));
>      RETURN NEW;
> EXCEPTION
>      WHEN unique_violation THEN
>          -- Do nothing if the timestamp already exists in the dimension table.
>          RETURN new;
> END; $$
> LANGUAGE 'plpgsql';
>
> CREATE TRIGGER populate_time BEFORE INSERT
> ON fact FOR EACH ROW
> EXECUTE PROCEDURE decompose_timestamp();
>
> Regards,
> Mark
>
>

В списке pgsql-general по дате отправления:

Предыдущее

От: Andrew Sullivan
Дата: 10 февраля 2014 г., 20:09:14
Сообщение: Re: How to turn off DEBUG statements from psql commends

Следующее

От: peterlen
Дата: 10 февраля 2014 г., 20:50:30
Сообщение: Re: How to turn off DEBUG statements from psql commends

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: designing time dimension for star schema

Предыдущее

Следующее