Temporal database design with a twist (hard question)

Поиск
Список
Период
Сортировка
От Denis de Bernardy
Тема Temporal database design with a twist (hard question)
Дата
Msg-id 765219.58515.qm@web112406.mail.gq1.yahoo.com
обсуждение исходный текст
Список pgsql-general
I'm looking into implementing object-versioning with the added twist of needing to have both live and draft objects,
andcould use the insights from someone experience in this, as I'm beginning to wonder if it's even possible without
potentiallyhorrific hacks. 


I'll break it down to posts with tags for the sake of the example, but my use-case is a bit more general (involving
slowlychanging dimensions - http://en.wikipedia.org/wiki/Slowly_changing_dimension). 

Suppose you've a posts table, a tags table, and a post2tag table:

posts (
 id
)

tags (
 id
)

post2tag (
 post_id fkey posts(id),
 tag_id fkey tags(id)
)

I'm in need of a couple of things:

1. Being able to show exactly how a post looked like at an arbitrary datetime, including for deleted rows.
2. Keep track of who is editing what, for a complete audit trail.
3. Needs a set of materialized views ("live" tables) for the sake of keeping referential integrity (i.e. logging should
betransparent to the developers). 

4. Needs to be appropriately fast for live *and* the latest draft rows.
5. Being able to have a draft post coexist with a live post.

I've been investigating various options. So far, the best I've come up with (without points #4/#5) looks a bit like the
SCDtype6-hybrid setup, but instead of having a current boolean there's a materialized view for the current row. For all
intentsand purposes, it looks like this: 

posts (
 id pkey,
 public,
 created_at,

 updated_at,
 updated_by
)

post_revs (
 id,
 rev pkey,
 public,
 created_at,

 created_by,
 deleted_at
)

tags (
 id pkey,
 public,
 created_at,

 updated_at,
 updated_by
)


tag_revs (
 id,
 public,
 rev pkey,
 created_at,

 created_by,
 deleted_at
)

post2tag (
 post_id fkey posts(id),
 tag_id fkey tags(id),
 public,
 created_at,
 updated_at,
 updated_by
)

post2tag_revs (
 post_id,
 tag_id,
 post_rev fkey post_revs(rev), -- the rev when the relation started
 tag_rev fkey tag_revs(rev), -- the rev when the relation started
 public,
 created_at,
 created_by,
 deleted_at,
 pkey (post_rev, tag_rev)
)


I'm using pg_temporal to maintain indexes on period(created_at, deleted_at). And I keep the various tables in sync
usingtriggers. Yada yada yada... I created the triggers that allow to cancel an edit to posts/tags in such a way that
thedraft gets stored into the revs without being published. It works great. 

*Except* when I need to worry about draft-row related relations on post2tag. In that case, all hell breaks loose, and
thishints to me that I've some kind of design problem in there. But I'm running out of ideas... 

I've considered introducing lot of needless data duplication (i.e. n post2tag rows introduced for each draft revision).
Thiskind of works, but tends to be a lot slower than I'd like it to be. 


I've considered introducing drafts tables for the "last draft", but this quickly tends to become very very ugly.

I've considered all sorts of flags...

So question: is there a generally accepted means of managing live vs non-live rows in a row-version controlled
environment?And if not, what have you tried and been reasonably successful with? 

D.

В списке pgsql-general по дате отправления:

Предыдущее
От: Dmitriy Igrishin
Дата:
Сообщение: Re: PostgreSQL 9.0 users
Следующее
От: Ivan Sergio Borgonovo
Дата:
Сообщение: benchmark comparing different postgresql versions