Re: best db schema for time series data?

Поиск
Список
Период
Сортировка
От Chris Browne
Тема Re: best db schema for time series data?
Дата
Msg-id 8762vxfcqb.fsf@cbbrowne.afilias-int.info
обсуждение исходный текст
Ответ на best db schema for time series data?  (Louis-David Mitterrand <vindex+lists-pgsql-performance@apartia.org>)
Ответы Re: best db schema for time series data?  (Louis-David Mitterrand <vindex+lists-pgsql-performance@apartia.org>)
Список pgsql-performance
vindex+lists-pgsql-performance@apartia.org (Louis-David Mitterrand)
writes:
> I have to collect lots of prices from web sites and keep track of their
> changes. What is the best option?
>
> 1) one 'price' row per price change:
>
>     create table price (
>         id_price primary key,
>         id_product integer references product,
>         price integer
>     );
>
> 2) a single 'price' row containing all the changes:
>
>     create table price (
>         id_price primary key,
>         id_product integer references product,
>         price integer[] -- prices are 'pushed' on this array as they change
>     );
>
> Which is bound to give the best performance, knowing I will often need
> to access the latest and next-to-latest prices?

I'd definitely bias towards #1, but with a bit of a change...

create table product (
  id_product serial primary key
);

create table price (
   id_product integer references product,
   as_at timestamptz default now(),
   primary key (id_product, as_at),
   price integer
);

The query to get the last 5 prices for a product should be
splendidly efficient:

   select price, as_at from price
    where id_product = 17
    order by as_at desc limit 5;

(That'll use the PK index perfectly nicely.)

If you needed higher performance, for "latest price," then I'd add a
secondary table, and use triggers to copy latest price into place:

  create table latest_prices (
     id_product integer primary key references product,
     price integer
  );

create or replace function capture_latest_price () returns trigger as $$
declare
begin
    delete from latest_prices where id_product = NEW.id_product;
    insert into latest_prices (id_product,price) values
       (NEW.id_product, NEW.price);
    return NEW;
end
$$ language plpgsql;

create trigger price_capture after insert on price execute procedure capture_latest_price();

This captures *just* the latest price for each product.  (There's a bit
of race condition - if there are two concurrent price updates, one will
fail, which wouldn't happen without this trigger in place.)
--
"... Turns   out that JPG  was in  fact using his  brain... and   I am
inclined to encourage him  to continue the practice  even if  it isn't
exactly what I  would have done myself."   -- Alan Bawden  (way out of
context)

В списке pgsql-performance по дате отправления:

Предыдущее
От: Harald Fuchs
Дата:
Сообщение: Re: best db schema for time series data?
Следующее
От: Greg Smith
Дата:
Сообщение: Re: Defaulting wal_sync_method to fdatasync on Linux for 9.1?