Обсуждение: Understanding the differences between Temporal tables, CDC and Time Series.
Good morning to everyone,
I'm looking into Temporal Tables (TT - i.e. System/Application time versioning) but I would like to understand the differences between TTs and two other features (which appear similar in many ways) - Change Data Capture and Time Series.
Are Temporal Tables (in a sense) a query framework framework around CDC?
And if that's the case, what then are Time Series - which are specifically designed to host (time) changing data?
Are Temporal Tables (again, in a sense) Time Series databases which only record changes in time(-stamps)?
I'm unclear as to the "philosophical" distinctions here and would be grateful if anybody could explain the diffence(s) between them?
I did look at this myself and found these posts on StackOverflow, but I don't feel as if they've fundamentally explained the difference(s).
Any ideas, discussion, references, URLs welcome.
Thx,
SQLP!
Sent with ProtonMail Secure Email.
Re: Understanding the differences between Temporal tables, CDC and Time Series.
От
Peter Eisentraut
Дата:
On 10.11.21 07:43, SQL Padawan wrote: > I'm looking into Temporal Tables (TT - i.e. System/Application time > versioning) but I would like to understand the differences between TTs > and two other features (which appear similar in many ways) - Change Data > Capture and Time Series. > > Are Temporal Tables (in a sense) a query framework framework around CDC? > > And if that's the case, what then are Time Series - which are > specifically designed to host (time) changing data? > > Are Temporal Tables (again, in a sense) Time Series databases which only > record changes in time(-stamps)? > > I'm unclear as to the "philosophical" distinctions here and would be > grateful if anybody could explain the diffence(s) between them? A time series database contains data records containing a time stamp (a point in time). These records represent things that happened, for example log files, measurements, events. This data is then usually used for analytics, such as, how many web site visitors did we have per hour over the last week. A temporal database contains records that have a timestamp range (in PostgreSQL terms, also known as period or interval, delimited by two points in time). There are (at least) two ways to use this. One is that the database system automatically records when a record was inserted, updated, and deleted, so that historical versions of records are automatically kept, and you can then query, what was the value of this at this time, or more generally what was the result of this query at this time. Use cases here are record-keeping, auditing, and the like. The other is that the database users themselves enter when a record is supposed to be valid, and the database system takes this into account in queries. So you could have product prices or salaries that automatically change at a certain time. This is basically a tool to make development of certain database applications more convenient. So these two concepts are really totally separate and independent things, except that they have a word based on "time" in their name.
Re: Understanding the differences between Temporal tables, CDC and Time Series.
От
SQL Padawan
Дата:
Hi Peter, and thanks for answering. > > I'm unclear as to the "philosophical" distinctions here and would be > > grateful if anybody could explain the diffence(s) between them? > A time series database contains data records containing a time stamp > A temporal database contains records that have a timestamp range OK - so we have two stock "ticker" dbs - one a Time Series db (TS-db) and one a Temporal db (T-db). All of the code below is on a fiddle here https://dbfiddle.uk/?rdbms=postgres_13&fiddle=207f38e5c7d6e7861c402c2c4926840c CREATE TABLE ts_portfolio -- time series ( txn_id INTEGER NOT NULL, company TEXT NOT NULL, price INTEGER NOT NULL, txn_ts TIMESTAMP NOT NULL ); with a few records INSERT INTO ts_portfolio VALUES (1234, 'DB', 10, '07/11/21 00:12:00'), -- record inserted on 07/11/21 (2345, 'DB', 20, '08/11/21 00:12:00'), -- record inserted on 08/11/21 (3456, 'DB', 30, '09/11/21 00:12:00'), -- record inserted on 09/11/21 (4567, 'DB', 40, '10/11/21 00:12:00'), -- record inserted on 10/11/21 (5678, 'DB', 50, '11/11/21 00:12:00'); -- record inserted on 11/11/21 So, for example, in order to query the share price AS OF 00:00:00 on the 09/11 SELECT company, price FROM ts_portfolio WHERE company = 'DB' AND txn_ts <= '09/11/21 00:00:00' ORDER BY txn_ts DESC LIMIT 1; We pick up the price from the latest transaction either before or exactly on 09/11/2021 00:00:00 Things become tricky when we want to see the prices on a given day: 09/11/2021 -- -- UNION better here? See EXPLAIN in fiddle - not important for this debate! -- WITH cte1 AS ( SELECT txn_id AS t_start FROM ts_portfolio WHERE company = 'DB' AND txn_ts <= '09/11/21 00:00:00' ORDER BY txn_ts DESC LIMIT 1 ), cte2 AS ( SELECT txn_id AS t_end FROM ts_portfolio WHERE company = 'DB' AND txn_ts <= '09/11/21 23:59:59.999999' ORDER BY txn_ts DESC LIMIT 1 ) SELECT t.txn_id, t.company, t.price FROM ts_portfolio t WHERE txn_id BETWEEN (SELECT t_start FROM cte1) AND (SELECT t_end FROM cte2); txn_id company price 2345 DB 20 3456 DB 30 which is correct - from 08/11 midday till 09/11 midday, the price was 20 and then at midday 09/11 it rose to 30! Becoming complex. However, a Temporal (versioned) table would only require one extra field: txn_t_from TIMESTAMP NOT NULL, txn_t_to TIMESTAMP NOT NULL -- extra field - see fiddle for data inserts and updates! Each INSERT (behind the scenes) is an INSERT and an UPDATE. INSERT INTO tdb_portfolio VALUES (1000, 'DB', 5, '01/01/1900', '07/11/21 00:12:00'), -- inserted in the past, updated 07/11 @ 12:00 (1234, 'DB', 10, '07/11/21 00:12:00', '08/11/21 00:12:00'), -- record inserted on 07/11/21 - updated 08/11 @ 12:00 (2345, 'DB', 20, '08/11/21 00:12:00', '09/11/21 00:12:00'), -- record inserted on 08/11/21 - and so on... (3456, 'DB', 30, '09/11/21 00:12:00', '10/11/21 00:12:00'), -- record inserted on 09/11/21 (4567, 'DB', 40, '10/11/21 00:12:00', '11/11/21 00:12:00'), -- record inserted on 10/11/21 (5678, 'DB', 50, '11/11/21 00:12:00', '19/01/2038'); -- record inserted on 11/11/21 - never updated (effectively INFINITYuntil next update) The queries become simpler: SELECT txn_id, company, price FROM tdb_portfolio AS OF '09/11/2021 00:00:00'; and the complex one above collapses to: SELECT txn_id, company, price FROM tdb_portfolio FOR SYSTEM_TIME BETWEEN ('09/11/2021 00:00:00' AND '09/11/2021 23:59:59.999999'); or maybe if there was a DATE(SYSTEM_TIME) function, ... FOR DATE(SYSTEM_TIME) = '09/11/2021'; I'm assuming that (when properly introduced into PostgreSQL), there will be optimisations for these sorts of query. Do temporal tables bring anything else "to the party" - the augmented functionality is a nice-to-have, but hardly earth-shattering? Have I missed out on anything important? Thx, SQLP!