Re: Temporal extensions

Поиск
Список
Период
Сортировка
От Jim Nasby
Тема Re: Temporal extensions
Дата
Msg-id 553EF53E.2070007@BlueTreble.com
обсуждение исходный текст
Ответ на Re: Temporal extensions  (Dave Jones <dave@waveform.org.uk>)
Ответы Re: Temporal extensions  (Dave Jones <dave@waveform.org.uk>)
Список pgsql-hackers
On 4/27/15 6:08 PM, Dave Jones wrote:
>> (Though, I dislike using timestamps to do change/history tracking, but
>> >that's just me...)
> I've been playing around with history tracking (in the context of BI,
> typically with batch loaded reporting databases) for about 7-8 years now
> and always found timestamps perfect for the purpose, but are you perhaps
> referring to using it for audit purposes? If that's the case I'd agree
> entirely - this is absolutely the wrong tool for such things (which is
> something I need to put a bit more prominently in the docs - it's buried
> in the design section at the moment).

Most warehouses dumb things down to a day level, so it's probably OK there.

What I specifically don't like is that using a timestamp to try and 
determine the order in which something happened is just fraught with 
gotchas. For starters, now() is locked in when you do a BEGIN, but maybe 
a newer transaction modifies a table before an older one does. Now the 
ordering is *backwards*. You have the same problem with using an XID. 
The only way I've thought of to make this guaranteed safe is to somehow 
serialize the logging with something like

CREATE TABLE customer_history(  customer_hid serial primary key -- hid == history_id  , previous_customer_hid int
referencescustomer_history  , customer_id int NOT NULL references customer
 
...
);
CREATE UNIQUE INDEX ... ON customer_history(previous_customer_hid) WHERE 
previous_customer_hid IS NOT NULL;
CREATE UNIQUE INDEX ... ON customer_history(customer_hid) WHERE 
previous_customer_hid IS NULL;

and then have a before trigger enforce
NEW.previous_customer_hid := customer_history__get_latest(customer_id)

where customer_history__get_latest() will 'walk the chain' starting with 
the first link customer_id = blah AND previous_customer_id = NULL

Because of the indexes that will serialize inserts on a per-customer 
basis. You could still run into problems with a newer snapshot creating 
a history record before a transaction with an older snapshot does 
though. :( Though, if you included txid_current_snapshot() with each 
record you could probably detect when that happens.

> Or did you mean ranges would be better? They certainly looked intriguing
> when I started moving this stuff to postgres, and I'd like to re-visit
> them in the near future as they offer capabilities I don't have with
> timestamps (such as guaranteeing no overlapping ranges via exclusion
> constraints) but my initial tests suggested some rather major
> performance degradation so I put it on the back-burner at first.

If you're going to keep both a start and end for each record you'd 
definitely want to do it with a range. If you're only keeping the change 
time then you can handle it differently.
-- 
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Sawada Masahiko
Дата:
Сообщение: Re: Proposal: knowing detail of config files via SQL
Следующее
От: Andrew Dunstan
Дата:
Сообщение: Re: Allow SQL/plpgsql functions to accept record