Re: Understanding the differences between Temporal tables, CDC and Time Series.

Поиск
Список
Период
Сортировка
От SQL Padawan
Тема Re: Understanding the differences between Temporal tables, CDC and Time Series.
Дата
Msg-id Ivk4J_pes2viw4ya19FuyEafyzHtybivspMEGRDDZ6mkcWfu9Ht3jpdyJaF9svnSxCEiyOVz8tTNoIivpe9j4RdQi-6NP1Br1cpbc6z-Mvs=@protonmail.com
обсуждение исходный текст
Ответ на Re: Understanding the differences between Temporal tables, CDC and Time Series.  (Peter Eisentraut <peter.eisentraut@enterprisedb.com>)
Список pgsql-general

Hi Peter, and thanks for answering.


> > I'm unclear as to the "philosophical" distinctions here and would be
> > grateful if anybody could explain the diffence(s) between them?

> A time series database contains data records containing a time stamp

> A temporal database contains records that have a timestamp range


OK - so we have two stock "ticker" dbs - one a Time Series db (TS-db) and one a Temporal db (T-db).

All of the code below is on a fiddle here

https://dbfiddle.uk/?rdbms=postgres_13&fiddle=207f38e5c7d6e7861c402c2c4926840c

CREATE TABLE ts_portfolio  -- time series
(
  txn_id  INTEGER NOT NULL,
  company TEXT NOT NULL,
  price   INTEGER NOT NULL,
  txn_ts  TIMESTAMP NOT NULL
);

with a few records


INSERT INTO ts_portfolio VALUES
(1234,       'DB',     10,  '07/11/21 00:12:00'),    -- record inserted on 07/11/21
(2345,       'DB',     20,  '08/11/21 00:12:00'),    -- record inserted on 08/11/21
(3456,       'DB',     30,  '09/11/21 00:12:00'),    -- record inserted on 09/11/21
(4567,       'DB',     40,  '10/11/21 00:12:00'),    -- record inserted on 10/11/21
(5678,       'DB',     50,  '11/11/21 00:12:00');    -- record inserted on 11/11/21


So, for example, in order to query the share price AS OF 00:00:00 on the 09/11

SELECT company, price
FROM ts_portfolio
WHERE company = 'DB'
AND txn_ts <= '09/11/21 00:00:00'
ORDER BY txn_ts DESC
LIMIT 1;

We pick up the price from the latest transaction either before or exactly on 09/11/2021 00:00:00

Things become tricky when we want to see the prices on a given day: 09/11/2021

--
-- UNION better here? See EXPLAIN in fiddle - not important for this debate!
--

WITH cte1 AS
(
  SELECT txn_id AS t_start
  FROM ts_portfolio
  WHERE company = 'DB'
  AND txn_ts <= '09/11/21 00:00:00'
  ORDER BY txn_ts DESC
  LIMIT 1
), cte2 AS
(
  SELECT txn_id AS t_end
  FROM ts_portfolio
  WHERE company = 'DB'
  AND txn_ts <= '09/11/21 23:59:59.999999'
  ORDER BY txn_ts DESC
  LIMIT 1
)
SELECT
  t.txn_id, t.company, t.price
FROM
  ts_portfolio t
WHERE txn_id BETWEEN (SELECT t_start FROM cte1) AND (SELECT t_end FROM cte2);


txn_id    company    price
2345    DB    20
3456    DB    30

which is correct - from 08/11 midday till 09/11 midday, the price was 20 and then at midday 09/11 it rose to 30!


Becoming complex.

However, a Temporal (versioned) table would only require one extra field:

  txn_t_from TIMESTAMP NOT NULL,
  txn_t_to   TIMESTAMP NOT NULL  -- extra field - see fiddle for data inserts and updates!


Each INSERT (behind the scenes) is an INSERT and an UPDATE.

INSERT INTO tdb_portfolio VALUES
(1000, 'DB',  5, '01/01/1900',        '07/11/21 00:12:00'),  -- inserted in the past, updated 07/11 @ 12:00
(1234, 'DB', 10, '07/11/21 00:12:00', '08/11/21 00:12:00'),  -- record inserted on 07/11/21 - updated 08/11 @ 12:00
(2345, 'DB', 20, '08/11/21 00:12:00', '09/11/21 00:12:00'),  -- record inserted on 08/11/21 - and so on...
(3456, 'DB', 30, '09/11/21 00:12:00', '10/11/21 00:12:00'),  -- record inserted on 09/11/21
(4567, 'DB', 40, '10/11/21 00:12:00', '11/11/21 00:12:00'),  -- record inserted on 10/11/21
(5678, 'DB', 50, '11/11/21 00:12:00', '19/01/2038');  -- record inserted on 11/11/21 - never updated (effectively
INFINITYuntil next update) 



The queries become simpler:

SELECT txn_id, company, price
FROM tdb_portfolio AS OF '09/11/2021 00:00:00';

and the complex one above collapses to:

SELECT txn_id, company, price FROM tdb_portfolio
FOR SYSTEM_TIME BETWEEN ('09/11/2021 00:00:00' AND '09/11/2021 23:59:59.999999');

or maybe if there was a DATE(SYSTEM_TIME) function, ... FOR DATE(SYSTEM_TIME) = '09/11/2021';


I'm assuming that (when properly introduced into PostgreSQL), there will be optimisations for these sorts of query.

Do temporal tables bring anything else "to the party" - the augmented functionality is a nice-to-have, but hardly
earth-shattering?

Have I missed out on anything important?

Thx, SQLP!






В списке pgsql-general по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: Understanding the differences between Temporal tables, CDC and Time Series.
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: E.1.2. Migration to Version 14