SQL:2011 application time
От | Paul A Jungwirth |
---|---|
Тема | SQL:2011 application time |
Дата | |
Msg-id | CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com обсуждение исходный текст |
Ответы |
Re: SQL:2011 application time
(Paul A Jungwirth <pj@illuminatedcomputing.com>)
|
Список | pgsql-hackers |
Hello, Here is a set of patches to add SQL:2011 application-time support (aka valid-time). Previous discussion was on https://www.postgresql.org/message-id/20200930073908.GQ1996@paquier.xyz but I thought I should update the email subject. There are four patches here: - Add PERIODs. - Add temporal PRIMARY KEY and UNIQUE constraints. - Add UPDATE/DELETE FOR PORTION OF. - Add temporal FOREIGN KEYs. The PERIOD patch is mostly Vik Fearing's work (submitted here a few years ago), so he should get credit for that! All patches have tests & documentation. I do have a few more tests I plan to write, and there are some questions for reviewers embedded in patches (mostly about when to lock and/or copy data structures). I've tried to format these as C++ comments to indicate they should be removed before committing. Throughout I've made sure that wherever SQL:2011 accepts a PERIOD, we also accept a range column. So in all these examples valid_at could be either one: PRIMARY KEY (id, valid_at WITHOUT OVERLAPS) FOREIGN KEY (id, PERIOD valid_at) REFERENCES too (id, PERIOD valid_at) FOR PORTION OF valid_at FROM t1 TO t2 Range types are superior to PERIODs in many ways, so I think we should support both. For example you can SELECT them, WHERE them, GROUP BY them, pass them to functions, return them from functions, do arithmetic on them, index them, etc. In fact whether you use a PERIOD or a range, the implementation uses ranges a lot, since they are such a good fit. A temporal PK is really an exclusion constraint, etc. When you define a PERIOD, we find a matching range type and store its oid on the period record. If there are more than one range type we raise an error, but you can give a rangetype option to remove the ambiguity. This means we support PERIODs of any type (basically), not just dates & timestamps. According to SQL:2011 we should automatically set any columns used by a PERIOD to NOT NULL. I've ignored that requirement, since permitting nullable columns is strictly greater functionality: you can always make the columns NOT NULL if you like. Interpreting NULLs as unbounded fits better with our range types, and it means you don't have to use sentinels. (Timestamp has +-Infinity, but many types don't.) Oracle also accepts null PERIOD columns and treats them the same way. I don't think it would break anything though to force PERIOD columns to NOT NULL. If you hate sentinels you can just use range columns. But still I see no reason to force this on our users. In the FOR PORTION OF bounds I accept MINVALUE and MAXVALUE as special tokens. I chose the names to be consistent with partition syntax. This isn't part of the standard but seems nice. Here are a few other things to discuss: - My patch only adds application time. There is a separate patch to add system time: https://commitfest.postgresql.org/33/2316/ I don't foresee any serious conflicts between our work, and in general I think each patch implements its functionality at an appropriate (but different) level of abstraction. But I haven't looked at that patch recently. I'll try to give some comments during this commitfest. The one place they probably overlap is with defining PERIODs. Since system-time periods *must* be named SYSTEM_TIME, even that overlap should be slight, but it still might be worth accepting the PERIOD patch here before adopting either. Even SYSTEM_TIME ought to be recorded in information_schema.periods IIRC. - The biggest thing remaining to do is to add support for partitioned tables. I would love some help with that if anyone is interested. - Since temporal PKs are implemented with exclusion constraints they use GiST indexes, so you can't really use them without the btree_gist extension (unless *all* your key parts are ranges---which is how we test exclusion constraints). Personally I'm okay with this, since even exclusion constraints are pretty useless without that extension. But it seems like something to talk about. - At PgCon 2020 Vik suggested a different way of querying for FK checks, which he used in his own temporal tables extension. It is more complicated but he thinks it may be faster. I plan to try both and run some benchmarks. I'm not sure whether his approach will work with CASCADE/SET NULL/SET DEFAULT---but I haven't looked at it in a while. - It is hard to avoid a shift/reduce conflict in FOR PORTION OF <period_or_range> FROM <expr> TO <expr> because expressions may contain date INTERVALs that also may contain TO. So this is an error: FOR PORTION OF valid_at FROM '2018-03-01' AT TIME ZONE INTERVAL '2' HOUR TO '2019-01-01' but this works: FOR PORTION OF valid_at FROM ('2018-03-01' AT TIME ZONE INTERVAL '2' HOUR) TO '2019-01-01' I'm personally satisfied with that, but if anyone thinks it can be improved please let me know. It would be nice if the parser were smart enough to see that without a second TO, it must belong to FOR PORTION OF, not the interval. But *I'm* not smart enough to teach it that. :-) If only it could have a greater lookahead. . . . - Normally we return the number of rows affected by an UPDATE/DELETE. What do you think we should do when a FOR PORTION OF causes extra rows to be inserted? I'm not doing anything special here today. After all foreign keys don't do anything extra when they CASCADE/SET (to my knowledge). Also I think adding info about the inserted rows might be annoying, since I'd have to communicate it from within the trigger function. I'm really hoping no one asks for this. - Since PERIODs are a weird neither-fish-nor-foul thing (parsed a lot like a column, but also behaving like a constraint), they add a lot of tedious if-statements when they are used by an index or constraint. In many places I've used a zero attnum to signal that a component is really a PERIOD. (Range columns are easy since they really are a column.) I feel this approach is pretty ugly, so I will probably experiment a bit with a different way. If anyone else wants to take this on though, I'm grateful for the help. - It would be really cool if ON CONFLICT DO UPDATE had a temporal variant so it would INSERT the missing durations and UPDATE the existing ones. That's what Tom Johnston said the standard should have required in *Bitemporal Data*, and it does make things a lot easier on the client side. But that is something to do in a later patch. . . . Yours, Paul
Вложения
В списке pgsql-hackers по дате отправления: