Обсуждение: SQL:2011 application time

Поиск

Список

Период

Сортировка

SQL:2011 application time

От

Paul A Jungwirth

Дата:

30 июня 2021 г., 20:39:00

Hello,

Here is a set of patches to add SQL:2011 application-time support (aka
valid-time).
Previous discussion was on
https://www.postgresql.org/message-id/20200930073908.GQ1996@paquier.xyz
but I thought I should update the email subject.

There are four patches here:

- Add PERIODs.
- Add temporal PRIMARY KEY and UNIQUE constraints.
- Add UPDATE/DELETE FOR PORTION OF.
- Add temporal FOREIGN KEYs.

The PERIOD patch is mostly Vik Fearing's work (submitted here a few
years ago), so he should get credit for that!

All patches have tests & documentation. I do have a few more tests I
plan to write, and there are some questions for reviewers embedded in
patches (mostly about when to lock and/or copy data structures). I've
tried to format these as C++ comments to indicate they should be
removed before committing.

Throughout I've made sure that wherever SQL:2011 accepts a PERIOD, we
also accept a range column. So in all these examples valid_at could be
either one:

    PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)
    FOREIGN KEY (id, PERIOD valid_at)
    REFERENCES too (id, PERIOD valid_at)
    FOR PORTION OF valid_at FROM t1 TO t2

Range types are superior to PERIODs in many ways, so I think we should
support both. For example you can SELECT them, WHERE them, GROUP BY
them, pass them to functions, return them from functions, do
arithmetic on them, index them, etc.

In fact whether you use a PERIOD or a range, the implementation uses
ranges a lot, since they are such a good fit. A temporal PK is really
an exclusion constraint, etc. When you define a PERIOD, we find a
matching range type and store its oid on the period record. If there
are more than one range type we raise an error, but you can give a
rangetype option to remove the ambiguity. This means we support
PERIODs of any type (basically), not just dates & timestamps.

According to SQL:2011 we should automatically set any columns used by
a PERIOD to NOT NULL. I've ignored that requirement, since permitting
nullable columns is strictly greater functionality: you can always
make the columns NOT NULL if you like. Interpreting NULLs as unbounded
fits better with our range types, and it means you don't have to use
sentinels. (Timestamp has +-Infinity, but many types don't.) Oracle
also accepts null PERIOD columns and treats them the same way. I don't
think it would break anything though to force PERIOD columns to NOT
NULL. If you hate sentinels you can just use range columns. But still
I see no reason to force this on our users.

In the FOR PORTION OF bounds I accept MINVALUE and MAXVALUE as special
tokens. I chose the names to be consistent with partition syntax. This
isn't part of the standard but seems nice.

Here are a few other things to discuss:

- My patch only adds application time. There is a separate patch to
add system time: https://commitfest.postgresql.org/33/2316/ I don't
foresee any serious conflicts between our work, and in general I think
each patch implements its functionality at an appropriate (but
different) level of abstraction. But I haven't looked at that patch
recently. I'll try to give some comments during this commitfest. The
one place they probably overlap is with defining PERIODs. Since
system-time periods *must* be named SYSTEM_TIME, even that overlap
should be slight, but it still might be worth accepting the PERIOD
patch here before adopting either. Even SYSTEM_TIME ought to be
recorded in information_schema.periods IIRC.

- The biggest thing remaining to do is to add support for partitioned
tables. I would love some help with that if anyone is interested.

- Since temporal PKs are implemented with exclusion constraints they
use GiST indexes, so you can't really use them without the btree_gist
extension (unless *all* your key parts are ranges---which is how we
test exclusion constraints). Personally I'm okay with this, since even
exclusion constraints are pretty useless without that extension. But
it seems like something to talk about.

- At PgCon 2020 Vik suggested a different way of querying for FK
checks, which he used in his own temporal tables extension. It is more
complicated but he thinks it may be faster. I plan to try both and run
some benchmarks. I'm not sure whether his approach will work with
CASCADE/SET NULL/SET DEFAULT---but I haven't looked at it in a while.

- It is hard to avoid a shift/reduce conflict in FOR PORTION OF
<period_or_range> FROM <expr> TO <expr> because expressions may
contain date INTERVALs that also may contain TO. So this is an error:

    FOR PORTION OF valid_at
      FROM '2018-03-01' AT TIME ZONE INTERVAL '2' HOUR
      TO   '2019-01-01'

but this works:

   FOR PORTION OF valid_at
     FROM ('2018-03-01' AT TIME ZONE INTERVAL '2' HOUR)
     TO   '2019-01-01'

I'm personally satisfied with that, but if anyone thinks it can be
improved please let me know. It would be nice if the parser were smart
enough to see that without a second TO, it must belong to FOR PORTION
OF, not the interval. But *I'm* not smart enough to teach it that. :-)
If only it could have a greater lookahead. . . .

- Normally we return the number of rows affected by an UPDATE/DELETE.
What do you think we should do when a FOR PORTION OF causes extra rows
to be inserted? I'm not doing anything special here today. After all
foreign keys don't do anything extra when they CASCADE/SET (to my
knowledge). Also I think adding info about the inserted rows might be
annoying, since I'd have to communicate it from within the trigger
function. I'm really hoping no one asks for this.

- Since PERIODs are a weird neither-fish-nor-foul thing (parsed a lot
like a column, but also behaving like a constraint), they add a lot of
tedious if-statements when they are used by an index or constraint. In
many places I've used a zero attnum to signal that a component is
really a PERIOD. (Range columns are easy since they really are a
column.) I feel this approach is pretty ugly, so I will probably
experiment a bit with a different way. If anyone else wants to take
this on though, I'm grateful for the help.

- It would be really cool if ON CONFLICT DO UPDATE had a temporal
variant so it would INSERT the missing durations and UPDATE the
existing ones. That's what Tom Johnston said the standard should have
required in *Bitemporal Data*, and it does make things a lot easier on
the client side. But that is something to do in a later patch. . . .

Yours,
Paul

Вложения

Re: SQL:2011 application time

От

Paul A Jungwirth

Дата:

03 июля 2021 г., 00:39:50

On Wed, Jun 30, 2021 at 10:39 AM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
> Here is a set of patches to add SQL:2011 application-time support (aka
> valid-time).

Here is a small fix to prevent `FOR PORTION OF valid_at FROM MAXVALUE
TO foo` or `FROM foo TO MINVALUE`. I rebased on latest master too.

Yours,
Paul

Вложения

Re: SQL:2011 application time

От

Paul A Jungwirth

Дата:

03 июля 2021 г., 20:46:55

On Fri, Jul 2, 2021 at 2:39 PM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> On Wed, Jun 30, 2021 at 10:39 AM Paul A Jungwirth
> <pj@illuminatedcomputing.com> wrote:
> > Here is a set of patches to add SQL:2011 application-time support (aka
> > valid-time).
>
> Here is a small fix to prevent `FOR PORTION OF valid_at FROM MAXVALUE
> TO foo` or `FROM foo TO MINVALUE`. I rebased on latest master too.

Here is a patch set that cleans up the catalog docs for pg_period. The
columns have changed since that was written, and also we use a
different sgml structure on those pages now. Note pg_period still
contains a couple essentially-unused columns, perislocal and
perinhcount. Those are intended for supporting table inheritance, so
I've left them in.

Paul

Вложения

Re: SQL:2011 application time

От

Paul A Jungwirth

Дата:

06 сентября 2021 г., 22:52:37

On Sat, Sep 4, 2021 at 12:56 PM Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
>
> patch 01: does apply but doesn't compile, attached the compile errors.
> patch 04: does not apply clean.

Thanks for taking a look! I've rebased & made it compile again. v7 attached.

Yours,
Paul

Вложения

Re: SQL:2011 application time

От

Zhihong Yu

Дата:

06 сентября 2021 г., 23:40:13

On Mon, Sep 6, 2021 at 12:53 PM Paul A Jungwirth <pj@illuminatedcomputing.com> wrote:

On Sat, Sep 4, 2021 at 12:56 PM Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
>
> patch 01: does apply but doesn't compile, attached the compile errors.
> patch 04: does not apply clean.

Thanks for taking a look! I've rebased & made it compile again. v7 attached.

Yours,
Paul

Hi,

For v7-0001-Add-PERIODs.patch :

It seems the year (2018) should be updated to 2021.

For RemovePeriodById(), it seems table_open() can be called after SearchSysCache1(). This way, if HeapTupleIsValid(tup) is true, table_open() can be skipped.

For tablecmds.c, AT_PASS_ADD_PERIOD is defined as 5 with AT_PASS_ADD_CONSTR etc moved upward. Do we need to consider compatibility ?

There are a few TODO's such as:

+ * TODO: What about periods?

Are they going to be addressed in the next round of patches ?

There seems to be some overlap between ATExecAddPeriod() and AddRelationNewPeriod().

Is it possible to reduce code duplication ?

Cheers

Re: SQL:2011 application time

От

Jaime Casanova

Дата:

11 сентября 2021 г., 04:50:17

On Mon, Sep 06, 2021 at 12:52:37PM -0700, Paul A Jungwirth wrote:
> On Sat, Sep 4, 2021 at 12:56 PM Jaime Casanova
> <jcasanov@systemguards.com.ec> wrote:
> >
> > patch 01: does apply but doesn't compile, attached the compile errors.
> > patch 04: does not apply clean.
> 
> Thanks for taking a look! I've rebased & made it compile again. v7 attached.
> 

patch 01: does apply but gives a compile warning (which is fixed by patch
02)
"""
parse_utilcmd.c: In function ‘generateClonedIndexStmt’:
parse_utilcmd.c:1730:2: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
  Period *p = makeNode(Period);
  ^~~~~~
"""

patch 03: produces these compile errors.  

analyze.c: In function ‘transformForPortionOfBound’:
analyze.c:1171:3: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement]
   A_Const    *n2 = makeNode(A_Const);
   ^~~~~~~
analyze.c:1172:10: error: ‘union ValUnion’ has no member named ‘type’
   n2->val.type = T_Null;
          ^
analyze.c:1172:18: error: ‘T_Null’ undeclared (first use in this function)
   n2->val.type = T_Null;
                  ^~~~~~
analyze.c:1172:18: note: each undeclared identifier is reported only once for each function it appears in



-- 
Jaime Casanova
Director de Servicios Profesionales
SystemGuards - Consultores de PostgreSQL

Re: SQL:2011 application time

От

Paul A Jungwirth

Дата:

13 сентября 2021 г., 07:12:19

On Fri, Sep 10, 2021 at 6:50 PM Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
>
> patch 01: does apply but gives a compile warning (which is fixed by patch
> 02)
> [snip]
> patch 03: produces these compile errors.

I did a rebase and fixed this new error, as well as the warnings.

On Mon, Sep 6, 2021 at 1:40 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> + * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
>
> It seems the year (2018) should be updated to 2021.

Done.

> For RemovePeriodById(), it seems table_open() can be called after SearchSysCache1(). This way, if
HeapTupleIsValid(tup)is true, table_open() can be skipped.

This seems like it permits a race condition when two connections both
try to drop the period, right?

> For tablecmds.c, AT_PASS_ADD_PERIOD is defined as 5 with AT_PASS_ADD_CONSTR etc moved upward. Do we need to consider
compatibility?

I don't think there is a compatibility problem---can you explain?
These symbols aren't used outside tablecmds.c and the values aren't
saved anywhere AFAIK.

> There are a few TODO's such as:
> Are they going to be addressed in the next round of patches ?

These are mostly questions I'm hoping a reviewer can help me answer,
but I'll take a pass through them and see which I can remove myself.
Several are for adding support for partitioned tables, where I would
definitely appreciate help.

> There seems to be some overlap between ATExecAddPeriod() and AddRelationNewPeriod().
> Is it possible to reduce code duplication ?

I've refactored those functions to remove some duplication, but I
think I prefer the old version---let me know if you have suggestions
to avoid the duplication in a nicer way.

Oh also I realized fp_triggers.c wasn't included in the last few patch
files---I'm sorry about that!

Latest files attached. Thanks for the reviews!

Paul

Вложения

Re: SQL:2011 application time

От

Corey Huinker

Дата:

13 сентября 2021 г., 09:05:41

So I've been eagerly watching this thread and hoping to have time to devote to it. I've also been looking at the thread at https://www.postgresql.org/message-id/CALAY4q8Pp699qv-pJZc4toS-e2NzRJKrvaX-xqG1aqj2Q+Ww-w@mail.gmail.com that covers system versioning, and per our conversation far too long ago (again, my bad) it's obvious that the two efforts shouldn't do anything that would be in conflict with one another, as we eventually have to support bitemporal [1] tables: tables that have both system versioning and an application period.

Below is a list of observations and questions about this proposed patch of itself in isolation, but mostly about how it relates to the work being done for system versioning.

1. This patch creates a pg_period catalog table, whereas the system versioning relies on additions to pg_attribute to identify the start/end columns. Initially I thought this was because it was somehow possible to have multiple application periods defined on a table, but in reading [1] I see that there are some design suppositions that would make a second application period impossible[2]. I can also see where having this table would facilitate the easy creation of INFORMATION_SCHEMA.PERIODS. I was previously unaware that this info schema table was a thing, but I have found references to it, though I'm unclear as to whether it's supposed to have information about system versioned tables in it as well.

Q 1.1. Would a bitemporal table have two entries in that view?
Q 1.2. Could you see being able to implement this without pg_period, using only additions to pg_attribute (start/end for system temporal, start/end for application, plus an addition for period name)?
Q 1.3. Can you see a way to represent the system versioning in pg_period such that bitemporal tables were possible?

2. The system versioning effort has chosen 'infinity' as their end-time value, whereas you have chosen NULL as that makes sense for an unbounded range. Other databases seem to leverage '9999-12-31 23:59:59' (SQLServer, IIRC) whereas some others seem to used '2999-12-31 23:59:59' but those might have been home-rolled temporal implementations. To further add to the confusion, the syntax seems to specify the keyword of MAXVALUE, which further muddies things. The system versioning people went with 'infinity' seemingly because it prescribe and end to the world like SQLServer did, but also because it allowed for a primary key based on (id, endtime) and that's just not possible with NULL endtime values.

Q 2.1. Do you have any thoughts about how to resolve this notational logjam?

3. I noticed some inconsistency in the results from various "SELECT * FROM portion_of_test" examples. In some, the "valid_at" range is shown but not columns that make it up, and in some others, the "valid_from" and "valid_to" columns are shown, with no mention of the period. From what I've seen, the period column should be invisible unless invoked, like ctid or xmin.

4. The syntax '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR TO MINUTE simply confounded me. I googled around for it, but could find no matches for postgres exception in mailing list discussions circa 2003. I tried it out myself and, lo and behold

# SELECT '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR TO MINUTE;
timezone
---------------------
2018-03-04 05:02:00
(1 row)

I really didn't expect that to work, or even "work". I can see that it added 2 minutes to UTC's perspective on my local concept of midnight, but I don't understand what it's supposed to mean.

Q 4.1. What does it mean?

5. I haven't seen any actual syntax conflicts between this patch and the system versioning patch. Both teams added basically the same keywords, though I haven't dove more deeply into any bison incompatibilities. Still, it's a great start.

6. Overall, I'm really excited about what this will mean for data governance in postgres.

[1] https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf

[2] In the bitemporal table example in [1] - the application period get the defined primary key, and the system_time period would be merely unique

On Mon, Sep 13, 2021 at 12:12 AM Paul A Jungwirth <pj@illuminatedcomputing.com> wrote:

On Fri, Sep 10, 2021 at 6:50 PM Jaime Casanova
<jcasanov@systemguards.com.ec> wrote:
>
> patch 01: does apply but gives a compile warning (which is fixed by patch
> 02)
> [snip]
> patch 03: produces these compile errors.

I did a rebase and fixed this new error, as well as the warnings.

On Mon, Sep 6, 2021 at 1:40 PM Zhihong Yu <zyu@yugabyte.com> wrote:
>
> + * Portions Copyright (c) 1996-2018, PostgreSQL Global Development Group
>
> It seems the year (2018) should be updated to 2021.

Done.

> For RemovePeriodById(), it seems table_open() can be called after SearchSysCache1(). This way, if HeapTupleIsValid(tup) is true, table_open() can be skipped.

This seems like it permits a race condition when two connections both
try to drop the period, right?

> For tablecmds.c, AT_PASS_ADD_PERIOD is defined as 5 with AT_PASS_ADD_CONSTR etc moved upward. Do we need to consider compatibility ?

I don't think there is a compatibility problem---can you explain?
These symbols aren't used outside tablecmds.c and the values aren't
saved anywhere AFAIK.

> There are a few TODO's such as:
> Are they going to be addressed in the next round of patches ?

These are mostly questions I'm hoping a reviewer can help me answer,
but I'll take a pass through them and see which I can remove myself.
Several are for adding support for partitioned tables, where I would
definitely appreciate help.

> There seems to be some overlap between ATExecAddPeriod() and AddRelationNewPeriod().
> Is it possible to reduce code duplication ?

I've refactored those functions to remove some duplication, but I
think I prefer the old version---let me know if you have suggestions
to avoid the duplication in a nicer way.

Oh also I realized fp_triggers.c wasn't included in the last few patch
files---I'm sorry about that!

Latest files attached. Thanks for the reviews!

Paul

Re: SQL:2011 application time

От

Paul A Jungwirth

Дата:

14 сентября 2021 г., 06:56:47

Hi Corey,

Thanks for all the good questions!

> 1. This patch creates a pg_period catalog table, whereas the system versioning relies on additions to pg_attribute to
identifythe start/end columns. Initially I thought this was because it was somehow possible to have multiple
applicationperiods defined on a table, but in reading [1] I see that there are some design suppositions that would make
asecond application period impossible[2]. I can also see where having this table would facilitate the easy creation of
INFORMATION_SCHEMA.PERIODS.I was previously unaware that this info schema table was a thing, but I have found
referencesto it, though I'm unclear as to whether it's supposed to have information about system versioned tables in it
aswell. 

Yes, information_schema.periods is given by the standard. Having
pg_period seems like a natural place to store periods, since they are
separate entities. I think that is a better design than just storing
them as extra fields in pg_attribute. It follows normal normalization
rules.

The standard forbids multiple application-time periods per table. From
SQL:2011 in the SQL/Foundation section
(7IWD2-02-Foundation-2011-12.pdf available from
http://www.wiscorp.com/sql20nn.zip) under 11.27 <add table period
definition>:

> 5) If <table period definition> contains <application time period specification> ATPS, then:
>    b) The table descriptor of T shall not include a period descriptor other than a system-time period descriptor.

In other words you can add both a SYSTEM TIME period and one other
application-time period (whose name is your choice), but if you
already have an application-time period, you can't add another one.

I also checked other RDBMSes and none of them allow it either:

In Mariadb 10.6.4 (the latest) I get "ERROR 4154 (HY000); Cannot
specify more than one application-time period".

Oracle disallows it with a vague error:

      SQL> create table t2 (id int, valid_from date, valid_til date,
period for valid_at (valid_from, valid_til), period for valid_at2
valid_from, valid_til));
      create table t2 (id int, valid_from date, valid_til date, period
for valid_at (valid_from, valid_til), period for valid_at2
(valid_from, valid_til))

                         *
      ERROR at line 1:
      ORA-55603: invalid flashback archive or valid time period command

(Using different start/end columns for each period doesn't change the result.)

In IBM DB2 you can only have one because application-time periods must
be named "business_time" (not joking).

Mssql (2019) doesn't support application periods.

Personally I feel like it's a weird limitation and I wouldn't mind
supporting more, but my current implementation only allows for one,
and I'd have to rethink some things to do it differently.

Also: I think information_schema.periods *should* include SYSTEM_TIME
periods. The spec says (in SQL/Schemata, file
7IWD2-11-Schemata-2011-12.pdf at the link above), "The PERIODS base
table has one row for each period defined for a table. It effectively
contains a representation of the period descriptors." It doesn't say
anything about excluding system-time periods.

I checked mariadb, mssql, oracle, and db2, and I could only find this
table in db2, as syscat.periods. It includes both application-time and
system-time periods.

The spec calls for the columns table_catalog, table_schema,
table_name, period_name, start_column_name, and end_column_name. There
isn't a column to distinguish the period type, but since a period is a
system-time period iff its name is "SYSTEM_TIME", technically such a
column isn't needed.

The db2 columns are periodname, tabschema, tabname, begincolname,
endcolname, periodtype, historytabschema, and historytabname. The
periodtype column is either A or S (for application-time or
system-time).

> Q 1.1. Would a bitemporal table have two entries in that view?

Yes.

> Q 1.2. Could you see being able to implement this without pg_period, using only additions to pg_attribute (start/end
forsystem temporal, start/end for application, plus an addition for period name)? 

Not just period name, but also the range type associated with the
period (which should be determined at period creation, so that you can
pass an option to disambiguate if there are two ranges defined for the
same base type), the constraint oid (to prevent end <= start), and
some more data for inherited tables (not really used yet). It seems
ugly to hang all these extra values on a pg_attribute record.

> Q 1.3. Can you see a way to represent the system versioning in pg_period such that bitemporal tables were possible?

Yes. Even though the name "SYSTEM_TIME" is technically enough, I'd
still include a pertype column to make distinguishing system vs
application periods easier and more obvious.

> 2. The system versioning effort has chosen 'infinity' as their end-time value, whereas you have chosen NULL as that
makessense for an unbounded range. Other databases seem to leverage '9999-12-31 23:59:59' (SQLServer, IIRC) whereas
someothers seem to used '2999-12-31 23:59:59' but those might have been home-rolled temporal implementations. To
furtheradd to the confusion, the syntax seems to specify the keyword of MAXVALUE, which further muddies things. The
systemversioning people went with 'infinity' seemingly because it prescribe and end to the world like SQLServer did,
butalso because it allowed for a primary key based on (id, endtime) and that's just not possible with NULL endtime
values.

I think it's a little weird that our system-time patch mutates your
primary key. None of the other RDMBSes do that. I don't think it's
incompatible (as long as the system time patch knows how to preserve
the extra period/range data in an application-time temporal key), but
it feels messy to me.

I would prefer if system-time and application-time used the same value
to mean "unbounded". Using null means we can support any type (not
just types with +-Infinity). And it pairs nicely with range types. If
the only reason for system-time to use Infinity is the primary key, I
think it would be better not to mutate the primary key (and store the
historical records in a separate table as other RDMSes do).

Btw Oracle also uses NULL to mean "unbounded".

We presently forbid PKs from including expressions, but my patch lifts
that exception so it can index a rangetype expression built from the
period start & end columns. So even if we must include the system-time
end column in a PK, perhaps it can use a COALESCE expression to store
Infinity even while using NULL to signify "currently true" from a user
perspective.

> 3. I noticed some inconsistency in the results from various "SELECT * FROM portion_of_test" examples. In some, the
"valid_at"range is shown but not columns that make it up, and in some others, the "valid_from" and "valid_to" columns
areshown, with no mention of the period. From what I've seen, the period column should be invisible unless invoked,
likectid or xmin. 

In most cases the tests test the same functionality with both PERIODs
and rangetype columns. For FKs they test all four combinations of
PERIOD-referencing-PERIOD, PERIOD-referencing-range,
range-referencing-PERIOD, and range-referencing-range. If valid_at is
a genuine column, it is included in SELECT *, but not if it is a
PERIOD.

> 4. The syntax '2018-03-04' AT TIME ZONE INTERVAL '2'  HOUR TO MINUTE simply confounded me.

Me too! I have no idea what that is supposed to mean. But that
behavior predates my patch. I only had to deal with it because it
creates a shift-reduce conflict with `FOR PORTION OF valid_at FROM x
TO y`, where x & y are expressions. I asked about this syntax at my
PgCon 2020 talk, but I haven't ever received an answer. Perhaps
someone else knows what this kind of INTERVAL means (as a modifier of
a time value).

> 5. I haven't seen any actual syntax conflicts between this patch and the system versioning patch. Both teams added
basicallythe same keywords, though I haven't dove more deeply into any bison incompatibilities. Still, it's a great
start.

I think that's right. Early on the other patch used `FOR PERIOD SYSTEM
TIME (x, y)` instead of the standard `FOR PERIOD SYSTEM_TIME (x, y)`
but I believe that was fixed, so that the period name is an identifier
and not two keywords.

> 6. Overall, I'm really excited about what this will mean for data governance in postgres.

Me too, and thank you for the detailed review!

Yours,
Paul

Re: SQL:2011 application time

От

Corey Huinker

Дата:

19 сентября 2021 г., 03:46:16

In IBM DB2 you can only have one because application-time periods must
be named "business_time" (not joking).

I saw that as well, and it made me think that someone at IBM is a fan of Flight Of The Conchords.

Personally I feel like it's a weird limitation and I wouldn't mind
supporting more, but my current implementation only allows for one,
and I'd have to rethink some things to do it differently.

I'm satisfied that it's not something we need to do in the first MVP.

Yes. Even though the name "SYSTEM_TIME" is technically enough, I'd
still include a pertype column to make distinguishing system vs
application periods easier and more obvious.

SYSTEM_TIME seems to allow for DATE values in the start_time and end_time fields, though I cannot imagine how that would ever be practical, unless it were somehow desirable to reject subsequent updates within a 24 hour timeframe. I have seen instances where home-rolled application periods used date values, which had similar problems where certain intermediate updates would simply have to be discarded in favor of the one that was still standing at midnight.

> 2. The system versioning effort has chosen 'infinity' as their end-time value, whereas you have chosen NULL as that makes sense for an unbounded range. Other databases seem to leverage '9999-12-31 23:59:59' (SQLServer, IIRC) whereas some others seem to used '2999-12-31 23:59:59' but those might have been home-rolled temporal implementations. To further add to the confusion, the syntax seems to specify the keyword of MAXVALUE, which further muddies things. The system versioning people went with 'infinity' seemingly because it prescribe and end to the world like SQLServer did, but also because it allowed for a primary key based on (id, endtime) and that's just not possible with NULL endtime values.

I think it's a little weird that our system-time patch mutates your
primary key. None of the other RDMBSes do that. I don't think it's
incompatible (as long as the system time patch knows how to preserve
the extra period/range data in an application-time temporal key), but
it feels messy to me.

Per outline below, I'm proposing an alternate SYSTEM_TIME implementation that would leave the PK as-is.

I would prefer if system-time and application-time used the same value
to mean "unbounded". Using null means we can support any type (not
just types with +-Infinity). And it pairs nicely with range types. If
the only reason for system-time to use Infinity is the primary key, I
think it would be better not to mutate the primary key (and store the
historical records in a separate table as other RDMSes do).

The two "big wins" of infinity seemed (to me) to be:

1. the ability to add "AND end_time = 'infinity'" as a cheap way to get current rows
2. clauses like "WHERE CURRENT_DATE - 3 BETWEEN start_time AND end_time" would work. Granted, there's very specific new syntax to do that properly, but you know somebody's gonna see the columns and try to do it that way.

Btw Oracle also uses NULL to mean "unbounded".

Huh, I missed that one. That is good in that it gives some precedence to how you've approached it.

We presently forbid PKs from including expressions, but my patch lifts
that exception so it can index a rangetype expression built from the
period start & end columns. So even if we must include the system-time
end column in a PK, perhaps it can use a COALESCE expression to store
Infinity even while using NULL to signify "currently true" from a user
perspective.

Either way seems viable, but I understand why you want to leverage ranges in this way.

> 3. I noticed some inconsistency in the results from various "SELECT * FROM portion_of_test" examples. In some, the "valid_at" range is shown but not columns that make it up, and in some others, the "valid_from" and "valid_to" columns are shown, with no mention of the period. From what I've seen, the period column should be invisible unless invoked, like ctid or xmin.

In most cases the tests test the same functionality with both PERIODs
and rangetype columns. For FKs they test all four combinations of
PERIOD-referencing-PERIOD, PERIOD-referencing-range,
range-referencing-PERIOD, and range-referencing-range. If valid_at is
a genuine column, it is included in SELECT *, but not if it is a
PERIOD.

Ok, I'll have to look back over the test coverage to make sure that I understand the behavior now.

> 4. The syntax '2018-03-04' AT TIME ZONE INTERVAL '2' HOUR TO MINUTE simply confounded me.

Me too! I have no idea what that is supposed to mean. But that
behavior predates my patch. I only had to deal with it because it
creates a shift-reduce conflict with `FOR PORTION OF valid_at FROM x
TO y`, where x & y are expressions. I asked about this syntax at my
PgCon 2020 talk, but I haven't ever received an answer. Perhaps
someone else knows what this kind of INTERVAL means (as a modifier of
a time value).

I think I'll open this as a separate thread, because it would simplify matters if we can reject this nonsense syntax.

This was the alternative method of system versioning I proposed recently in the system versioning thread

1. The regular table remains unchanged, but a pg_class attribute named "relissystemversioned" would be set to true

2. I'm unsure if the standard allows dropping a column from a table while it is system versioned, and the purpose behind system versioning makes me believe the answer is a strong "no" and requiring DROP COLUMN to fail on relissystemversioned = 't' seems pretty straightforward.
3. The history table would be given a default name of $FOO_history (space permitting), but could be overridden with the history_table option.
4. The history table would have relkind = 'h'
5. The history table will only have rows that are not current, so it is created empty.
6. As such, the table is effectively append-only, in a way that vacuum can actually leverage, and likewise the fill factor of such a table should never be less than 100.
7. The history table could only be updated only via system defined triggers (insert,update,delete, alter to add columns), or row migration similar to that found in partitioning. It seems like this would work as the two tables working as partitions of the same table, but presently we can't have multi-parent partitions.
8. The history table would be indexed the same as the base table, except that all unique indexes would be made non-unique, and an index of pk + start_time + end_time would be added
9. The primary key of the base table would remain the existing pk vals, and would basically function normally, with triggers to carry forth changes to the history table. The net effect of this is that the end_time value of all rows in the main table would always be the chosen "current" value (infinity, null, 9999-12-31, etc) and as such might not actually _need_ to be stored.
10. Queries that omit the FOR SYSTEM_TIME clause, as well as ones that use FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, would simply use the base table directly with no quals to add.
11. Queries that use FOR SYSTEM_TIME and not FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, then the query would do a union of the base table and the history table with quals applied to both.
12. It's a fair question whether the history table would be something that could be queried directly. I'm inclined to say no, because that allows for things like SELECT FOR UPDATE, which of course we'd have to reject.
13. If a history table is directly referenceable, then SELECT permission can be granted or revoked as normal, but all insert/update/delete/truncate options would raise an error.
14. DROP SYSTEM VERSIONING from a table would be quite straightforward - the history table would be dropped along with the triggers that reference it, setting relissystemversioned = 'f' on the base table.

The benefits to your effort here would be:

1. No change to the primary key except for the ones dictated by application period

2. The INFORMATION_SCHEMA view need merely take into account The pg_class.relkind = 'h' entries

3. system versioning is no longer mutating (trigger on X updates X), which eliminates the possibility that application period triggers get into a loop

4. DROP SYSTEM VERSIONING would be entirely transparent to application versioning.

Thoughts?

Re: SQL:2011 application time

От

Paul A Jungwirth

Дата:

01 октября 2021 г., 23:47:57

On Sat, Sep 18, 2021 at 5:46 PM Corey Huinker <corey.huinker@gmail.com> wrote:

SYSTEM_TIME seems to allow for DATE values in the start_time and end_time fields,
though I cannot imagine how that would ever be practical, unless it were somehow
desirable to reject subsequent updates within a 24 hour timeframe.

I agree that for SYSTEM_TIME it doesn't make much sense to use anything but the smallest time granularity.

The two "big wins" of infinity seemed (to me) to be:

1. the ability to add "AND end_time = 'infinity'" as a cheap way to get current rows
2. clauses like "WHERE CURRENT_DATE - 3 BETWEEN start_time AND end_time" would work.

Yes. OTOH there is equivalent syntax for ranges, e.g. `valid_at @> now()`. But if you had a real PERIOD then that wouldn't be available, since you can't use a PERIOD as an expression. Personally I think that's a shame, and I wonder if PERIODs should be another kind of expression (much like a column value) that evaluates to an equivalent range. Then you'd get all kinds of operators & functions that work with them, you could `SELECT` them, `GROUP BY` them, pass them to functions, etc.

The spec doesn't say anything about using PERIODs in those places, but it *does* have a section on period *predicates*, which seem to be allowed anywhere you can put an expression. The spec's discussion of this is in 4.14.2 ("Operations involving periods") and 8.20 ("<period predicate>"), and says there should be predicates for overlap, equals, contains, precedes, succeeds, immediately precedes, and immediately succeeds. So in the spec, the smallest possible "element" is not a bare PERIOD, but rather these predicates. My patch doesn't include these (it's a lot of new syntax), and no other RDBMS seems to have implemented them. I'm inclined to just treat PERIODs like ranges, or at least maybe let you cast from one to another. (Casting is weird though since if a bare PERIOD isn't a valid expression, what are you casting from/to?)

I should add that using +-Infinity for application-time bounds is completely acceptable under my patch; you just have the option to use NULL instead. So your examples of filtering above are fine. There aren't any operations where we have to set a bounded rangepart to unbounded, so we never pass a NULL; only the user would do that. We do bless NULLs by translating MINVALUE/MAXVALUE to NULL, but that is necessary to support arbitrary types. Even that could be refined so that we use +-Infinity when available but NULL elsewhere. Or we could just drop MINVALUE/MAXVALUE entirely. It's my own addition to make sentinels less arbitrary; it's not in the standard.

One of my design goals was to let people favor ranges over PERIODs if they like. Forcing people to use +-Infinity doesn't completely eliminate that goal, but it does mean your ranges are different than you're used to seeing (`[2020-01-01, Infinity)' vs [2020-01-01,)`. More importantly you can only use {date,ts,tstz}range for application-time periods, not other rangetypes. So I'd prefer to keep NULL bounds *possible*, even if MINVALUE/MAXVALUE aren't giving it a sanction.

This was the alternative method of system versioning I proposed recently in the system versioning thread
1. The regular table remains unchanged, but a pg_class attribute named "relissystemversioned" would be set to true
2. I'm unsure if the standard allows dropping a column from a table while it is system versioned, and the purpose behind system versioning makes me believe the answer is a strong "no" and requiring DROP COLUMN to fail on relissystemversioned = 't' seems pretty straightforward.
3. The history table would be given a default name of $FOO_history (space permitting), but could be overridden with the history_table option.
4. The history table would have relkind = 'h'

+1 so far. Behavior of DDL in temporal tables is almost untouched even in the academic literature I've read. (My bibliography mentions a few places that at least mention that it's a hard problem.) Forbidding to drop a column seems pretty harsh---but on the other hand that's just the tip of the iceberg, so failing is probably the practical choice. For example what happens to old rows if you add a NOT NULL constraint? For application-time we can make the user responsible for figuring out the most sensible thing, but for SYSTEM_TIME we have to figure that out ourselves. But what about column type changes, or domains? What about removing an enum option? Or adding a CHECK constraint? With SYSTEM_TIME the user is supposed to be unable to change the history data, so they can't accommodate it to future requirements.

5. The history table will only have rows that are not current, so it is created empty.
6. As such, the table is effectively append-only, in a way that vacuum can actually leverage, and likewise the fill factor of such a table should never be less than 100.
7. The history table could only be updated only via system defined triggers (insert,update,delete, alter to add columns), or row migration similar to that found in partitioning. It seems like this would work as the two tables working as partitions of the same table, but presently we can't have multi-parent partitions.

I don't think they should be sibling partitions, but I do think it would be cool if you could ask for the history table to be partitioned. Mariadb offers a way to do this (see my blog post comparing SQL:2011 implementations). It doesn't have to be in the first patch though, and it's not part of the standard.

8. The history table would be indexed the same as the base table, except that all unique indexes would be made non-unique, and an index of pk + start_time + end_time would be added

Is there any value to indexing both start_time and end_time? Just one already takes you to a single row.

The system-time code would need to know how to handle application-time PKs since they are a little different, but that's not hard. And it still is just adding a column (or two if you think they should both be there).

The history table also should not have any FKs, and no FKs should reference it.

9. The primary key of the base table would remain the existing pk vals, and would basically function normally, with triggers to carry forth changes to the history table. The net effect of this is that the end_time value of all rows in the main table would always be the chosen "current" value (infinity, null, 9999-12-31, etc) and as such might not actually _need_ to be stored.

Interesting thought that we wouldn't really even need to store the end time. I don't have an opinion about whether the optimization is worth the complexity, but yeah it seems possible.

10. Queries that omit the FOR SYSTEM_TIME clause, as well as ones that use FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, would simply use the base table directly with no quals to add.
11. Queries that use FOR SYSTEM_TIME and not FOR SYSTEM_TIME AS OF CURRENT_TIMESTAMP, then the query would do a union of the base table and the history table with quals applied to both.

I like this, but it means people can't filter directly on the columns themselves as you suggest above. Can we detect when they're doing that? Keep in mind it might be happening inside a user-defined function, etc. So perhaps it is safer to always use the UNION.

12. It's a fair question whether the history table would be something that could be queried directly. I'm inclined to say no, because that allows for things like SELECT FOR UPDATE, which of course we'd have to reject.
13. If a history table is directly referenceable, then SELECT permission can be granted or revoked as normal, but all insert/update/delete/truncate options would raise an error.

It seems to break the abstraction to let people query the history table directly. OTOH sometimes it's helpful to see behind the curtain. I could go either way here, but I slightly favor letting people do it.

14. DROP SYSTEM VERSIONING from a table would be quite straightforward - the history table would be dropped along with the triggers that reference it, setting relissystemversioned = 'f' on the base table.

I like this approach a lot, and I think it's a better design than carrying all the history inside the main table. I also like how bitemporal will Just Work^TM. One is in user-space and the other is controlled by Postgres---which fits the intention.

Yours,

Paul

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

17 ноября 2021 г., 02:55:26

Here are some new patches rebased on the latest master.

I haven't made any substantive changes, but I should have time soon to 
take a stab at supporting partitioned tables and removing some of my own 
TODOs (things like making sure I'm locking things correctly). I don't 
think there is any outstanding feedback other than that.

But in the meantime here are some up-to-date patches.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

Re: SQL:2011 application time

От

Paul A Jungwirth

Дата:

21 ноября 2021 г., 04:51:16

On Tue, Nov 16, 2021 at 3:55 PM Paul Jungwirth <pj@illuminatedcomputing.com> wrote:

I haven't made any substantive changes, but I should have time soon to
take a stab at supporting partitioned tables and removing some of my own
TODOs (things like making sure I'm locking things correctly).

Hello,

Here are updated patches. They are rebased and clean up some of my TODOs. Here is what remains:

- Various TODOs asking for advice about concurrency things: where to lock, when to copy structs, etc. I'd appreciate some review on these from someone more experienced than me.

- Supporting FOR PORTION OF against updateable views. I'll keep working on this, but I thought there was enough progress to pass along new patches in the meantime.

- Support partitioned tables. I think this is a medium-size effort, and I'm not sure whether it's really needed for pg 15 or something we can add later. I'm going to do my best to get it done though. (I should have more time for this project now: having a sixth baby recently made side projects challenging for a while, but lately things have been getting easier.) Partitioning could use some design discussion though, both for application time alone and for bitemporal tables (so overlapping with the system time work). Here are some thoughts so far:

- Creating a PERIOD on a partitioned table should automatically create the PERIOD (and associated constraints) on the child tables. This one seems easy and I'll try to get it done soon.

- Sort of related, but not strictly partitioning: CREATE TABLE LIKE should have a new INCLUDING PERIODS option. (I'm tempted to include this under INCLUDING CONSTRAINTS, but I think a separate option is nicer since it gives more control.)

- If you partition by something in the scalar part of the temporal PK, that's easy. I don't think we have to do anything special there. I'd like to add some tests about it though.

- We should allow temporal primary keys on the top-level partitioned table, even though they are essentially exclusion constraints. Whereas in the general case an exclusion constraint cannot prove its validity across all the tables, a temporal PK *can* prove its validity so long the partition key includes at least one scalar part of the temporal PK (so that all records for one "entity" get routed to the same table).

- If you partition by the temporal part of the temporal PK, things are harder. I'm inclined to forbid this, at least for v15. Suppose you partition by the start time. Then you wind up with the same entity spread across several tables, so you can't validate the overall exclusion constraint anymore.

- OTOH you *could* partition by application-time itself (not start time alone nor end time alone) where each partition has application-time ranges/periods that are trimmed to fit within that partition's limits. Then since each partition is responsible for a non-overlapping time period, you could validate the overall exclusion constraint. You'd just have to add some logic to tuple re-routing that could transform single records into multiple records. For example if each partition holds a different year and you INSERT a record that is valid for a decade, you'd have to insert one row into ten partitions, and change the application-time range/period of each row appropriately. This is a special kind of range partitioning. I don't have any ideas how to make hash or list partitioning work on the temporal part of the PK. I don't think we should allow it.

- Partitioning by application time requires no special syntax. Partitioning by system time (if that's desired) would probably require extra (non-standard) syntax. Mariadb has this: https://mariadb.com/kb/en/system-versioned-tables/#storing-the-history-separately Perhaps that is orthogonal to application-time partitioning though. It sounds like people think we should store non-current system time in a separate table (I agree), and in that case I think a bitemporal table that is partitioned by scalar keys or application-time would just have a separate system-time history table for each partition, and that would Just Work. And if we *do* want to partition by system time too, then it would be transparent to the application-time logic.

- Since system time doesn't add anything to your PK (or at least it shouldn't), there is no extra complexity around dealing with exclusion constraints. We should just guarantee that all *current* rows land in the same partition, because for a bitemporal table that's the only one that needs a temporal PK. I guess that means you could partition by end system-time but not start system-time. This would be an exception to the rule that a PK must include the partition keys. Instead we'd say that all current (i.e. non-historical) records stay together (at the system-time level of partitioning).

- I don't think system-time partitioning needs to be in v15. It seems more complicated than ordinary partitioning.

Yours,

Paul

Вложения

Re: SQL:2011 application time

От

Peter Eisentraut

Дата:

05 января 2022 г., 19:07:53

On 21.11.21 02:51, Paul A Jungwirth wrote:
> Here are updated patches. They are rebased and clean up some of my 
> TODOs.

This patch set looks very interesting.  It's also very big, so it's
difficult to see how to get a handle on it.  I did a pass through it
to see if there were any obvious architectural or coding style
problems.  I also looked at some of your TODO comments to see if I had
something to contribute there.

I'm confused about how to query tables based on application time
periods.  Online, I see examples using AS OF, but in the SQL standard
I only see this used for system time, which we are not doing here.
What is your understanding of that?

v10-0001-Add-PERIODs.patch

src/backend/commands/tablecmds.c

Might be worth explaining somewhere why AT_PASS_ADD_PERIOD needs to be
its own pass. -- Ah, this is explained in ATPrepCmd().  Maybe that is
okay, but I would tend to prefer a comprehensive explanation here
rather than sprinkled around.

make_period_not_backward(): Hardcoding the name of the operator as "<"
is not good.  You should perhaps lookup the less-than operator in the
type cache.  Look around for TYPECACHE_LT_OPR for how this is usually done.

validate_period(): Could use an explanatory comment.  There are a
bunch of output arguments, and it's not clear what all of this is
supposed to do, and what "validating" is in this context.

MergeAttributes(): I would perhaps initially just prohibit inheritance
situations that involve periods on either side.  (It should work for
partitioning, IMO, but that should be easier to arrange.)

AlterTableGetLockLevel(): The choice of AccessExclusiveLock looks
correct.  I think the whole thing can also be grouped with some of the
other "affects concurrent SELECTs" cases?

Maybe the node type Period could have a slightly more specific name,
perhaps PeriodDef, analogous to ColumnDef?

I didn't follow why indexes would have periods, for example, the new
period field in IndexStmt.  Is that explained anywhere?

While reading this patch I kept wondering whether it would be possible
to fold periods into pg_attribute, perhaps with negative attribute
numbers.  Have you looked into something like that?  No doubt it's
also complicated, but it might simplify some things, like the name
conflict checking.

v10-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch

src/backend/catalog/Catalog.pm: I see you use this change in the
subsequent patches, but I would recommend skipping all this.  The
comments added are kind of redundant with the descr fields anyway.

transformIndexConstraint(): As above, we can't look up the && operator
by name.  In this case, I suppose we should look it up through the
index AM support operators.

Further, the additions to this function are very complicated and not
fully explained.  I'm suspicious about things like
findNewOrOldColumn() -- generally we should look up columns by number
not name.  Perhaps you can add a header comment or split out the code
further into smaller functions.

pg_dump.c getIndexes() has been refactored since to make
version-specific additions easier.  But your patch is now failing to
apply because of this.

Of course, the main problem in this patch is that for most uses it
requires btree_gist.  I think we should consider moving that into
core, or at least the support for types that are most relevant to this
functionality, specifically the date/time types.  Aside from user
convenience, this would also allow writing more realistic test cases.

v10-0003-Add-UPDATE-DELETE-FOR-PORTION-OF.patch

Use of MINVALUE and MAXVALUE for unbounded seems problematic to me.
(If it is some value, it is not really larger than any value.)  We
have the keyword UNBOUNDED, which seems better suited.

src/backend/access/brin/brin_minmax_multi.c

These renaming changes seem unrelated (but still seem like a good
idea).  Should they be progressed separately?

Again, some hardcoded operator name lookup in this patch.

I don't understand why a temporal primary key is required for doing
UPDATE FOR PORTION OF.  I don't see this in the standard.

v10-0004-Add-temporal-FOREIGN-KEYs.patch

Do we really need different trigger names depending on whether the
foreign key is temporal?

range_as_string() doesn't appear to be used anywhere.

I ran out of steam on this patch, it's very big.  But it seems sound
in general.

How to proceed.  I suppose we could focus on committing 0001 and 0002
first.  That would be a sensible feature set even if the remaining
patches did not make a release.  I do feel we need to get btree_gist
into core.  That might be a big job by itself.  I'm also bemused why
btree_gist is so bloated compared to btree_gin.  btree_gin uses macros
to eliminate duplicate code where btree_gist is full of
copy-and-paste.  So there are some opportunities there to make things
more compact.  Is there anything else you think we can do as
preparatory work to make the main patches more manageable?

Re: SQL:2011 application time

От

Corey Huinker

Дата:

06 января 2022 г., 01:03:23

On Wed, Jan 5, 2022 at 11:07 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:

On 21.11.21 02:51, Paul A Jungwirth wrote:
> Here are updated patches. They are rebased and clean up some of my
> TODOs.

This patch set looks very interesting. It's also very big, so it's
difficult to see how to get a handle on it. I did a pass through it
to see if there were any obvious architectural or coding style
problems. I also looked at some of your TODO comments to see if I had
something to contribute there.

I'm confused about how to query tables based on application time
periods. Online, I see examples using AS OF, but in the SQL standard
I only see this used for system time, which we are not doing here.
What is your understanding of that?

Paul has previously supplied me with this document https://cs.ulb.ac.be/public/_media/teaching/infoh415/tempfeaturessql2011.pdf and that formed the basis of a lot of my questions a few months earlier.

There was similar work being done for system periods, which are a bit simpler but require a side (history) table to be created. I was picking people's brains about some aspects of system versioning to see if I could help bringing that into this already very large patchset, but haven't yet felt like I had done enough research to post it.

It is my hope that we can at least get the syntax for both application and system versioning committed, even if it's just stubbed in with not-yet-supported errors.

Re: SQL:2011 application time

От

Paul A Jungwirth

Дата:

06 января 2022 г., 08:44:54

On Wed, Jan 5, 2022 at 8:07 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> This patch set looks very interesting.

Thank you for the review!

I'll work on your feedback but in the meantime here are replies to
your questions:

> I'm confused about how to query tables based on application time
> periods.  Online, I see examples using AS OF, but in the SQL standard
> I only see this used for system time, which we are not doing here.

Correct, the standard only gives it for system time. I think
application time is intended to be more "in user space" so it's fine
to use regular operators in your WHERE condition against the time
columns, whereas system time is more of a managed thing---automatic,
read-only, possibly stored in a separate table. Having a special
syntax cue lets the RDBMS know it needs to involve the historical
records.

> validate_period(): Could use an explanatory comment.  There are a
> bunch of output arguments, and it's not clear what all of this is
> supposed to do, and what "validating" is in this context.

I'm not too happy with that function, but a previous reviewer asked me
to factor out what was shared between the CREATE TABLE and ALTER TABLE
cases. It does some sanity checks on the columns you've chosen, and
along the way it collects info about those columns that we'll need
later. But yeah all those out parameters are pretty ugly. I'll see if
I can come up with a stronger abstraction for it, and at the very
least I'll add some comments.

> MergeAttributes(): I would perhaps initially just prohibit inheritance
> situations that involve periods on either side.  (It should work for
> partitioning, IMO, but that should be easier to arrange.)

Okay. I'm glad to hear you think partitioning won't be too hard. It is
one of the last things, but to me it's a bit intimidating.

> I didn't follow why indexes would have periods, for example, the new
> period field in IndexStmt.  Is that explained anywhere?

When you create a primary key or a unique constraint (which are backed
by a unique index), you can give a period name to make it a temporal
constraint. We create the index first and then create the constraint
as a side-effect of that (e.g. index_create calls
index_constraint_create). The analysis phase generates an IndexStmt.
So I think this was mostly a way to pass the period info down to the
constraint. It probably doesn't actually need to be stored on pg_index
though. Maybe it does for index_concurrently_create_copy. I'll add
some comments, but if you think it's the wrong approach let me know.

> While reading this patch I kept wondering whether it would be possible
> to fold periods into pg_attribute, perhaps with negative attribute
> numbers.  Have you looked into something like that?  No doubt it's
> also complicated, but it might simplify some things, like the name
> conflict checking.

Hmm, I thought that sort of thing would be frowned upon. :-) But also
it seems like periods really do have a bunch of details they need
beyond what other attributes have (e.g. the two source attributes, the
matching range type, the period type (application-vs-system), maybe
some extra things for table inheritance.

Also are you sure we aren't already using negative attnums somewhere
already? I thought I saw something like that.

> Of course, the main problem in this patch is that for most uses it
> requires btree_gist.  I think we should consider moving that into
> core, or at least the support for types that are most relevant to this
> functionality, specifically the date/time types.  Aside from user
> convenience, this would also allow writing more realistic test cases.

I think this would be great too. How realistic do you think it is? I
figured since exclusion constraints are also pretty useless without
btree_gist, it wasn't asking too much to have people install the
extension, but still it'd be better if it were all built in.

> src/backend/access/brin/brin_minmax_multi.c
>
> These renaming changes seem unrelated (but still seem like a good
> idea).  Should they be progressed separately?

I can pull this out into a separate patch. I needed to do it because
when I added an `#include <rangetypes.h>` somewhere, these conflicted
with the range_{de,}serialize functions declared there.

> I don't understand why a temporal primary key is required for doing
> UPDATE FOR PORTION OF.  I don't see this in the standard.

You're right, it's not in the standard. I'm doing that because
creating the PK is when we add the triggers to implement UPDATE FOR
PORTION OF. I thought it was acceptable since we also require a
PK/unique constraint as the referent of a foreign key. But we could
avoid it if I went back to the executor-based FOR PORTION OF
implementation, since that doesn't depend on triggers. What do you
think?

Also: I noticed recently that you can't use FOR PORTION OF against an
updatable view. I'm working on a new patch set to fix that. But the
main reason is this PK check. So that's maybe another reason to go
back to the executor implementation.

> How to proceed.  I suppose we could focus on committing 0001 and 0002
> first.

That would be great! I don't think either is likely to conflict with
future system-time work.

> Is there anything else you think we can do as
> preparatory work to make the main patches more manageable?

I think it would be smart to have a rough plan for how this work will
be compatible with system-time support. Corey & I have talked about
that a lot, and In general they are orthogonal, but it would be nice
to have details written down somewhere.

Yours,
Paul

Re: SQL:2011 application time

От

Vik Fearing

Дата:

06 января 2022 г., 17:44:58

On 1/5/22 11:03 PM, Corey Huinker wrote:
> 
> There was similar work being done for system periods, which are a bit
> simpler but require a side (history) table to be created.

This is false.  SYSTEM_TIME periods do not need any kind of history.
This was one of the problems I had with Surafel's attempt because it was
confusing the period with SYSTEM VERSIONING.  Versioning needs the
period but the inverse is not true.
-- 
Vik Fearing

Re: SQL:2011 application time

От

Paul A Jungwirth

Дата:

06 января 2022 г., 21:08:54

On Thu, Jan 6, 2022 at 6:45 AM Vik Fearing <vik@postgresfriends.org> wrote:
>
> On 1/5/22 11:03 PM, Corey Huinker wrote:
> >
> > There was similar work being done for system periods, which are a bit
> > simpler but require a side (history) table to be created.
>
> This is false.  SYSTEM_TIME periods do not need any kind of history.
> This was one of the problems I had with Surafel's attempt because it was
> confusing the period with SYSTEM VERSIONING.  Versioning needs the
> period but the inverse is not true.

This is an interesting point. Syntactically, there are three different
things: the generated started/end columns, the period declaration, and
the WITH SYSTEM VERSIONING modifier to the table. You could declare a
system period without making the table versioned. Practically speaking
I don't know why you'd ever create a system period without a versioned
table (do you know of any uses Vik?), but perhaps we can exploit the
separation to add system periods in the same patch that adds
application periods.

The first two bits of syntax *are* tied together: you need columns
with GENERATED ALWAYS AS ROW START/END to declare the system period,
and less intuitively the standard says you can't use AS ROW START/END
unless those columns appear in a system period (2.e.v.2 under Part 2:
Foundation, 11.3 <table definition>). Personally I'd be willing to
ignore that latter requirement. For one thing, what does Postgres do
with the columns if you drop the period? Dropping the columns
altogether seems very harsh, so I guess you'd just remove the
GENERATED clause.

Another weird thing is that you don't (can't) say STORED for those
columns. But they are certainly stored somewhere. I would store the
values just like any other column (even if non-current rows get moved
to a separate table). Also then you don't have to do anything extra
when the GENERATED clause is dropped.

If we wanted to support system-time periods without building all of
system versioning, what would that look like? At first I thought it
would be a trivial addition to part-1 of the patch here, but the more
I think about it the more it seems to deserve its own patch.

One rule I think we should follow is that using a non-system-versioned
table (with a system period) should get you to the same place as using
a system-versioned table and then removing the system versioning. But
the standard says that dropping system versioning should automatically
drop all historical records (2 under Part 2: Foundation, 11.30 <drop
system versioning clause>). That actually makes sense though: when you
do DML we automatically update the start/end columns, but we don't
save copies of the previous data (and incidentally the end column will
always be the max value.) So there is a use case, albeit a thin one:
you get a Rails-like updated_at column that is maintained
automatically by your RDBMS. That is pretty easy, but I think I'd
still break it out into a separate patch. I'm happy to work on that as
something that builds on top of my part-1 patch here.

Yours,
Paul

Re: SQL:2011 application time

От

Corey Huinker

Дата:

06 января 2022 г., 21:52:30

But
the standard says that dropping system versioning should automatically
drop all historical records (2 under Part 2: Foundation, 11.30 <drop
system versioning clause>). That actually makes sense though: when you
do DML we automatically update the start/end columns, but we don't
save copies of the previous data (and incidentally the end column will
always be the max value.)

This is what I was referring to when I mentioned a side-table. deleting history would be an O(1) operation. Any other misunderstandings are all mine.

Re: SQL:2011 application time

От

Peter Eisentraut

Дата:

10 января 2022 г., 11:53:48

On 06.01.22 06:44, Paul A Jungwirth wrote:
>> I didn't follow why indexes would have periods, for example, the new
>> period field in IndexStmt.  Is that explained anywhere?
> 
> When you create a primary key or a unique constraint (which are backed
> by a unique index), you can give a period name to make it a temporal
> constraint. We create the index first and then create the constraint
> as a side-effect of that (e.g. index_create calls
> index_constraint_create). The analysis phase generates an IndexStmt.
> So I think this was mostly a way to pass the period info down to the
> constraint. It probably doesn't actually need to be stored on pg_index
> though. Maybe it does for index_concurrently_create_copy. I'll add
> some comments, but if you think it's the wrong approach let me know.

This seems backwards.  Currently, when you create a constraint, the 
index is created as a side effect and is owned, so to speak, by the 
constraint.  What you are describing here sounds like the index owns the 
constraint.  This needs to be reconsidered, I think.

>> Of course, the main problem in this patch is that for most uses it
>> requires btree_gist.  I think we should consider moving that into
>> core, or at least the support for types that are most relevant to this
>> functionality, specifically the date/time types.  Aside from user
>> convenience, this would also allow writing more realistic test cases.
> 
> I think this would be great too. How realistic do you think it is? I
> figured since exclusion constraints are also pretty useless without
> btree_gist, it wasn't asking too much to have people install the
> extension, but still it'd be better if it were all built in.

IMO, if this temporal feature is to happen, btree_gist needs to be moved 
into core first.  Having to install an extension in order to use an 
in-core feature like this isn't going to be an acceptable experience.

>> src/backend/access/brin/brin_minmax_multi.c
>>
>> These renaming changes seem unrelated (but still seem like a good
>> idea).  Should they be progressed separately?
> 
> I can pull this out into a separate patch. I needed to do it because
> when I added an `#include <rangetypes.h>` somewhere, these conflicted
> with the range_{de,}serialize functions declared there.

OK, I have committed this separately.

>> I don't understand why a temporal primary key is required for doing
>> UPDATE FOR PORTION OF.  I don't see this in the standard.
> 
> You're right, it's not in the standard. I'm doing that because
> creating the PK is when we add the triggers to implement UPDATE FOR
> PORTION OF. I thought it was acceptable since we also require a
> PK/unique constraint as the referent of a foreign key.

That part *is* in the standard.

> But we could
> avoid it if I went back to the executor-based FOR PORTION OF
> implementation, since that doesn't depend on triggers. What do you
> think?

I think it's worth trying to do this without triggers.

But if you are just looking for a way to create the triggers, why are 
they not just created when the table is created?

> I think it would be smart to have a rough plan for how this work will
> be compatible with system-time support. Corey & I have talked about
> that a lot, and In general they are orthogonal, but it would be nice
> to have details written down somewhere.

I personally don't see why we need to worry about system time now. 
System time seems quite a complicated feature, since you have to figure 
out a system to store and clean the old data, whereas this application 
time feature is ultimately mostly syntax sugar around ranges and 
exclusion constraints.  As long as we keep the standard syntax for 
system time available for future use (which is what your patch does), I 
don't see a need to go deeper right now.

Re: SQL:2011 application time

От

Julien Rouhaud

Дата:

15 января 2022 г., 08:58:49

Hi,

On Sat, Nov 20, 2021 at 05:51:16PM -0800, Paul A Jungwirth wrote:
> 
> Here are updated patches. They are rebased and clean up some of my TODOs.

The cfbot reports that the patchset doesn't apply anymore:
http://cfbot.cputube.org/patch_36_2048.log
=== Applying patches on top of PostgreSQL commit ID 5513dc6a304d8bda114004a3b906cc6fde5d6274 ===
=== applying patch ./v10-0001-Add-PERIODs.patch
patching file src/backend/commands/tablecmds.c
Hunk #1 FAILED at 40.
[...]
1 out of 21 hunks FAILED -- saving rejects to file src/backend/commands/tablecmds.c.rej
patching file src/bin/pg_dump/pg_dump.c
Hunk #1 succeeded at 5906 with fuzz 2 (offset -454 lines).
Hunk #2 FAILED at 6425.
Hunk #3 succeeded at 6121 with fuzz 2 (offset -566 lines).
Hunk #4 succeeded at 6203 (offset -561 lines).
Hunk #5 succeeded at 8015 with fuzz 2 (offset -539 lines).
Hunk #6 FAILED at 8862.
Hunk #7 FAILED at 8875.
Hunk #8 FAILED at 8917.
[...]
4 out of 15 hunks FAILED -- saving rejects to file src/bin/pg_dump/pg_dump.c.rej
patching file src/bin/pg_dump/pg_dump.h
Hunk #2 FAILED at 284.
Hunk #3 FAILED at 329.
Hunk #4 succeeded at 484 (offset 15 lines).
2 out of 4 hunks FAILED -- saving rejects to file src/bin/pg_dump/pg_dump.h.rej

I also see that there were multiple reviews with unanswered comments, so I will
switch the cf entry to Waiting on Author.

Re: SQL:2011 application time

От

Peter Eisentraut

Дата:

19 января 2022 г., 11:32:36

On 10.01.22 09:53, Peter Eisentraut wrote:
>>> Of course, the main problem in this patch is that for most uses it
>>> requires btree_gist.  I think we should consider moving that into
>>> core, or at least the support for types that are most relevant to this
>>> functionality, specifically the date/time types.  Aside from user
>>> convenience, this would also allow writing more realistic test cases.
>>
>> I think this would be great too. How realistic do you think it is? I
>> figured since exclusion constraints are also pretty useless without
>> btree_gist, it wasn't asking too much to have people install the
>> extension, but still it'd be better if it were all built in.
> 
> IMO, if this temporal feature is to happen, btree_gist needs to be moved 
> into core first.  Having to install an extension in order to use an 
> in-core feature like this isn't going to be an acceptable experience.

I have started a separate thread about this question.

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

04 мая 2023 г., 00:02:47

Hello,

Thank you again for the review. Here is a patch with most of your 
feedback addressed. Sorry it has taken so long! These patches are 
rebased up to 1ab763fc22adc88e5d779817e7b42b25a9dd7c9e
(May 3).

The big change is switching from implementing FOR PORTION OF with 
triggers back to an executor node implementation. I think this is a lot 
simpler and means we don't have to be so "premeditated" (for example you 
just need a PERIOD/range, not a temporal PK).

I've also made some progress on partitioning temporal tables. It still 
needs some work though, and also it depends on my separate commitfest 
entry (https://commitfest.postgresql.org/43/4065/). So I've left it out 
of the patches attached here.

A few more details below:

Back in January 2022, Peter Eisentraut wrote:
> make_period_not_backward(): Hardcoding the name of the operator as "<"
> is not good.  You should perhaps lookup the less-than operator in the
> type cache.  Look around for TYPECACHE_LT_OPR for how this is usually done. 
> ...
> transformIndexConstraint(): As above, we can't look up the && operator
> by name.  In this case, I suppose we should look it up through the
> index AM support operators.

I've changed most locations to look up the operators we need using 
strategy number. But in some places I need the range intersects operator 
(`*`) and we don't have a strategy number for that. I don't really 
understand the purpose of not hardcoding operator names here. Can you 
give me the reasons for that? Do you have any suggestions what I can do 
to use `*`? Also, when I'm doing these operator lookups, do I need 
permission checks similar to what I see in ComputeIndexAttrs?

> Further, the additions to this function are very complicated and not
> fully explained.  I'm suspicious about things like
> findNewOrOldColumn() -- generally we should look up columns by number
> not name.  Perhaps you can add a header comment or split out the code
> further into smaller functions.

I still have some work to do on this. I agree it's very complicated, so 
I'm going to see what kind of refactoring I can do.

>>> I didn't follow why indexes would have periods, for example, the new
>>> period field in IndexStmt.  Is that explained anywhere?
>>
>> When you create a primary key or a unique constraint (which are backed
>> by a unique index), you can give a period name to make it a temporal
>> constraint. We create the index first and then create the constraint
>> as a side-effect of that (e.g. index_create calls
>> index_constraint_create). The analysis phase generates an IndexStmt.
>> So I think this was mostly a way to pass the period info down to the
>> constraint. It probably doesn't actually need to be stored on pg_index
>> though. Maybe it does for index_concurrently_create_copy. I'll add
>> some comments, but if you think it's the wrong approach let me know.
> 
> This seems backwards.  Currently, when you create a constraint, the index is created as a side effect and is owned,
soto speak, by the constraint.  What you are describing here sounds like the index owns the constraint.  This needs to
bereconsidered, I think.

After looking at this again I do think to reference the period from the 
index, not vice versa. The period is basically one of the index elements 
(e.g. `PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)`). You can define a 
`PERIOD` without an index, but you can't define a WITHOUT OVERLAPS index 
without a period. In addition you could have multiple indexes using the 
same period (though this is probably unusual and technically disallowed 
by the standard, although in principal you could do it), but not 
multiple periods within the same index. I understand what you're saying 
about how constraints cause indexes as a by-product, but here the 
constraint isn't the PERIOD; it's the PRIMARY KEY or UNIQUE constraint. 
The PERIOD is just something the constraint & index refer to (like an 
expression indexElem). The dependency direction also suggests the period 
should be referenced by the index: you can drop the index without 
dropping the period, but dropping the period would cascade to dropping 
the index (or fail). I hope that makes sense. But let me know if you 
still disagree.

> Do we really need different trigger names depending on whether the
> foreign key is temporal? 

They don't have to be different. I used separate C functions because I 
didn't want standard FKs to be slowed/complicated by the temporal ones, 
and also I wanted to avoid merge conflicts with the work on avoiding SPI 
in RI checks. But you're just asking about the trigger names, right? I 
haven't changed those yet but it shouldn't take long.

> IMO, if this temporal feature is to happen, btree_gist needs to be moved 
> into core first.  Having to install an extension in order to use an 
> in-core feature like this isn't going to be an acceptable experience.

As far as I can tell the conversation about moving this into core hasn't 
gone anywhere. Do you still think this is a prerequisite to this patch? 
Is there anything I can do to help move `btree_gist` forward? It seems 
like a large backwards compatibility challenge. I imagine that getting 
agreement on how to approach it is actually more work than doing the 
development. I'd be very happy for any suggestions here!

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

Re: SQL:2011 application time

От

Peter Eisentraut

Дата:

08 мая 2023 г., 10:10:09

On 03.05.23 23:02, Paul Jungwirth wrote:
> Thank you again for the review. Here is a patch with most of your 
> feedback addressed. Sorry it has taken so long! These patches are 
> rebased up to 1ab763fc22adc88e5d779817e7b42b25a9dd7c9e
> (May 3).

Here are a few small fixup patches to get your patch set compiling cleanly.

Also, it looks like the patches 0002, 0003, and 0004 are not split up 
correctly.  0002 contains tests using the FOR PORTION OF syntax 
introduced in 0003, and 0003 uses the function build_period_range() from 
0004.

On Thu, Jul 6, 2023 at 1:13 AM Peter Eisentraut
<peter.eisentraut@enterprisedb.com> wrote:
>
> I had talked to Paul about this offline a while ago.  btree_gist to core
> is no longer considered a prerequisite.  But Paul was planning to
> produce a new patch set that is arranged and sequenced a bit
> differently.  Apparently, that new version is not done yet, so it would
> make sense to either close this entry as returned with feedback, or move
> it to the next commit fest as waiting on author.

Here are some new patch files based on discussions from PGCon. The
patches are reorganized a bit to hopefully make them easier to review:

Initially I implement all functionality on just range columns, without
supporting PERIODs yet. There are patches for temporal PRIMARY
KEY/UNIQUE constraints, for simple foreign keys (without CASCADE/SET
NULL/SET DEFAULT), for UPDATE/DELETE FOR PORTION OF, and then for the
rest of the FK support (which depends on FOR PORTION OF). If you
compare these patches to the v11 ones, you'll see that a ton of
clutter disappears by not supporting PERIODs as a separate "thing".

Finally there is a patch adding PERIOD syntax, but with a new
implementation where a PERIOD causes us to just define a GENERATED
range column. That means we can support all the same things as before
but without adding the clutter. This patch isn't quite working yet
(especially ALTER TABLE), but I thought I'd send where I'm at so far,
since it sounds like folks are interested in doing a review. Also it
was a little tricky dealing with the dependency between the PERIOD and
the GENERATED column. (See the comments in the patch.) If anyone has a
suggestion there I'd be happy to hear it.

My goal is to include another patch soon to support hidden columns, so
that the period's GENERATED column can be hidden. I read the
conversation about a recent patch attempt for something similar, and I
think I can use most of that (but cut some of the things the community
was worried about).

All these patches need some polishing, but I think there is enough new
here for them to be worth reading for anyone interested in temporal
progress.

I'll set this commitfest entry back to Needs Review. Thanks for taking a look!

Paul

Вложения

Re: SQL:2011 application time

От

Peter Eisentraut

Дата:

12 июля 2023 г., 11:24:45

On 07.07.23 03:03, Paul A Jungwirth wrote:
> Here are some new patch files based on discussions from PGCon.

Here are a few fixup patches to get things building without warnings and 
errors.

The last patch (your 0005) fails the regression test for me and it 
didn't appear to be a trivial problem, so please take another look at 
that sometime.  (Since it's the last patch, it's obviously lower priority.)

Вложения

Re: SQL:2011 application time

От

jian he

Дата:

15 июля 2023 г., 12:04:07

On Fri, Jul 7, 2023 at 9:04 AM Paul A Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> On Thu, Jul 6, 2023 at 1:13 AM Peter Eisentraut
> <peter.eisentraut@enterprisedb.com> wrote:
> >
> > I had talked to Paul about this offline a while ago.  btree_gist to core
> > is no longer considered a prerequisite.  But Paul was planning to
> > produce a new patch set that is arranged and sequenced a bit
> > differently.  Apparently, that new version is not done yet, so it would
> > make sense to either close this entry as returned with feedback, or move
> > it to the next commit fest as waiting on author.
>
> Here are some new patch files based on discussions from PGCon. The
> patches are reorganized a bit to hopefully make them easier to review:
>
> Initially I implement all functionality on just range columns, without
> supporting PERIODs yet. There are patches for temporal PRIMARY
> KEY/UNIQUE constraints, for simple foreign keys (without CASCADE/SET
> NULL/SET DEFAULT), for UPDATE/DELETE FOR PORTION OF, and then for the
> rest of the FK support (which depends on FOR PORTION OF). If you
> compare these patches to the v11 ones, you'll see that a ton of
> clutter disappears by not supporting PERIODs as a separate "thing".
>
> Finally there is a patch adding PERIOD syntax, but with a new
> implementation where a PERIOD causes us to just define a GENERATED
> range column. That means we can support all the same things as before
> but without adding the clutter. This patch isn't quite working yet
> (especially ALTER TABLE), but I thought I'd send where I'm at so far,
> since it sounds like folks are interested in doing a review. Also it
> was a little tricky dealing with the dependency between the PERIOD and
> the GENERATED column. (See the comments in the patch.) If anyone has a
> suggestion there I'd be happy to hear it.
>
> My goal is to include another patch soon to support hidden columns, so
> that the period's GENERATED column can be hidden. I read the
> conversation about a recent patch attempt for something similar, and I
> think I can use most of that (but cut some of the things the community
> was worried about).
>
> All these patches need some polishing, but I think there is enough new
> here for them to be worth reading for anyone interested in temporal
> progress.
>
> I'll set this commitfest entry back to Needs Review. Thanks for taking a look!
>
> Paul

due to change in:
https://www.postgresql.org/message-id/flat/ec8b1d9b-502e-d1f8-e909-1bf9dffe6fa5@illuminatedcomputing.com

git apply $DOWNLOADS/patches/v12-0001-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch
error: patch failed: src/backend/commands/indexcmds.c:940
error: src/backend/commands/indexcmds.c: patch does not apply

probably need some adjustment.

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

01 сентября 2023 г., 00:26:31

On 7/12/23 01:24, Peter Eisentraut wrote:
> On 07.07.23 03:03, Paul A Jungwirth wrote:
>> Here are some new patch files based on discussions from PGCon.
> 
> Here are a few fixup patches to get things building without warnings and 
> errors.
> 
> The last patch (your 0005) fails the regression test for me and it 
> didn't appear to be a trivial problem, so please take another look at 
> that sometime.  (Since it's the last patch, it's obviously lower priority.)

Hello,

Here are the latest patches for my temporal tables work. They are 
rebased on e8d74ad6 from Aug 31.

I incorporated Peter's edits mentioned above and have made various other 
improvements.

The most significant change is support for partitioned tables. Those now 
work with temporal PRIMARY KEY and UNIQUE constraints, FOR PORTION OF 
commands, and FOREIGN KEYs.

I've tried to clean up the first four patches to get them ready for 
committing, since they could get committed before the PERIOD patch. I 
think there is a little more cleanup needed but they should be ready for 
a review.

The PERIOD patch is not finished and includes some deliberately-failing 
tests. I did make some progress here finishing ALTER TABLE ADD PERIOD.

I could use help handling the INTERNAL depenency from the PERIOD to its 
(hidden) GENERATED column. The problem is in findDependentObjects: if 
you drop the PERIOD, then Postgres automatically tries to drop the 
column (correctly), but then it seems to think it needs to drop the 
whole table. I think this is because a column's object address is the 
table's object address plus a subaddress equaling the attno, and 
findDependentObjects thinks it should drop the whole thing. I'm sure I 
can sort this out, but if anyone has a suggestion it might save me some 
time.

Thanks for taking a look!

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

Re: SQL:2011 application time

От

Corey Huinker

Дата:

01 сентября 2023 г., 07:02:09

The PERIOD patch is not finished and includes some deliberately-failing
tests. I did make some progress here finishing ALTER TABLE ADD PERIOD.

If it's ok with you, I need PERIODs for System Versioning, and planned on developing a highly similar version, albeit closer to the standard. It shouldn't interfere with your work as you're heavily leveraging range types.

Re: SQL:2011 application time

От

Peter Eisentraut

Дата:

01 сентября 2023 г., 12:30:57

On 31.08.23 23:26, Paul Jungwirth wrote:
> I've tried to clean up the first four patches to get them ready for
> committing, since they could get committed before the PERIOD patch. I
> think there is a little more cleanup needed but they should be ready for
> a review.

Looking at the patch 0001 "Add temporal PRIMARY KEY and UNIQUE constraints":

Generally, this looks like a good direction. The patch looks
comprehensive, with documentation and tests, and appears to cover all
the required pieces (client programs, ruleutils, etc.).

I have two conceptual questions that should be clarified before we go
much further:

1) If I write UNIQUE (a, b, c WITHOUT OVERLAPS), does the WITHOUT
OVERLAPS clause attach to the last column, or to the whole column list?
In the SQL standard, you can only have one period and it has to be
listed last, so this question does not arise. But here we are building
a more general facility to then build the SQL facility on top of. So I
think it doesn't make sense that the range column must be last or that
there can only be one. Also, your implementation requires at least one
non-overlaps column, which also seems like a confusing restriction.

I think the WITHOUT OVERLAPS clause should be per-column, so that
something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) would
be possible. Then the WITHOUT OVERLAPS clause would directly correspond
to the choice between equality or overlaps operator per column.

An alternative interpretation would be that WITHOUT OVERLAPS applies to
the whole column list, and we would take it to mean, for any range
column, use the overlaps operator, for any non-range column, use the
equals operator. But I think this would be confusing and would prevent
the case of using the equality operator for some ranges and the overlaps
operator for some other ranges in the same key.

2) The logic hinges on get_index_attr_temporal_operator(), to pick the
equality and overlaps operator for each column. For btree indexes, the
strategy numbers are fixed, so this is straightforward. But for gist
indexes, the strategy numbers are more like recommendations. Are we
comfortable with how this works? I mean, we could say, if you want to
be able to take advantage of the WITHOUT OVERLAPS syntax, you have to
use these numbers, otherwise you're on your own. It looks like the gist
strategy numbers are already hardcoded in a number of places, so maybe
that's all okay, but I feel we should be more explicit about this
somewhere, maybe in the documentation, or at least in code comments.

Besides that, some stylistic comments:

* There is a lot of talk about "temporal" in this patch, but this
functionality is more general than temporal. I would prefer to change
this to more neutral terms like "overlaps".

* The field ii_Temporal in IndexInfo doesn't seem necessary and could be
handled via local variables. See [0] for a similar discussion:

[0]:
https://www.postgresql.org/message-id/flat/f84640e3-00d3-5abd-3f41-e6a19d33c40b@eisentraut.org

* In gram.y, change withoutOverlapsClause -> without_overlaps_clause for
consistency with the surrounding code.

* No-op assignments like n->without_overlaps = NULL; can be omitted.
(Or you should put them everywhere. But only in some places seems
inconsistent and confusing.)

Re: SQL:2011 application time

От

Vik Fearing

Дата:

01 сентября 2023 г., 13:50:08

On 9/1/23 11:30, Peter Eisentraut wrote:
> 1) If I write UNIQUE (a, b, c WITHOUT OVERLAPS), does the WITHOUT 
> OVERLAPS clause attach to the last column, or to the whole column list? 
> In the SQL standard, you can only have one period and it has to be 
> listed last, so this question does not arise.  But here we are building 
> a more general facility to then build the SQL facility on top of.  So I 
> think it doesn't make sense that the range column must be last or that 
> there can only be one.  Also, your implementation requires at least one 
> non-overlaps column, which also seems like a confusing restriction.
> 
> I think the WITHOUT OVERLAPS clause should be per-column, so that 
> something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) would 
> be possible.  Then the WITHOUT OVERLAPS clause would directly correspond 
> to the choice between equality or overlaps operator per column.
> 
> An alternative interpretation would be that WITHOUT OVERLAPS applies to 
> the whole column list, and we would take it to mean, for any range 
> column, use the overlaps operator, for any non-range column, use the 
> equals operator.  But I think this would be confusing and would prevent 
> the case of using the equality operator for some ranges and the overlaps 
> operator for some other ranges in the same key.

I prefer the first option.  That is: WITHOUT OVERLAPS applies only to 
the column or expression it is attached to, and need not be last in line.
-- 
Vik Fearing

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

01 сентября 2023 г., 22:56:40

On 9/1/23 03:50, Vik Fearing wrote:
> On 9/1/23 11:30, Peter Eisentraut wrote:
>> 1) If I write UNIQUE (a, b, c WITHOUT OVERLAPS), does the WITHOUT 
>> OVERLAPS clause attach to the last column, or to the whole column 
>> list? In the SQL standard, you can only have one period and it has to 
>> be listed last, so this question does not arise.  But here we are 
>> building a more general facility to then build the SQL facility on top 
>> of.  So I think it doesn't make sense that the range column must be 
>> last or that there can only be one.  Also, your implementation 
>> requires at least one non-overlaps column, which also seems like a 
>> confusing restriction.
>>
>> I think the WITHOUT OVERLAPS clause should be per-column, so that 
>> something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) 
>> would be possible.  Then the WITHOUT OVERLAPS clause would directly 
>> correspond to the choice between equality or overlaps operator per 
>> column.
>>
>> An alternative interpretation would be that WITHOUT OVERLAPS applies 
>> to the whole column list, and we would take it to mean, for any range 
>> column, use the overlaps operator, for any non-range column, use the 
>> equals operator.  But I think this would be confusing and would 
>> prevent the case of using the equality operator for some ranges and 
>> the overlaps operator for some other ranges in the same key.
> 
> I prefer the first option.  That is: WITHOUT OVERLAPS applies only to 
> the column or expression it is attached to, and need not be last in line.

I agree. The second option seems confusing and is more restrictive.

I think allowing multiple uses of `WITHOUT OVERLAPS` (and in any 
position) is a great recommendation that enables a lot of new 
functionality. Several books[1,2] about temporal databases describe a 
multi-dimensional temporal space (even beyond application time vs. 
system time), and the standard is pretty disappointing here. It's not a 
weird idea.

But I just want to be explicit that this isn't something the standard 
describes. (I think everyone in the conversation so far understands 
that.) So far I've tried to be pretty scrupulous about following 
SQL:2011, although personally I'd rather see Postgres support this 
functionality. And it's not like it goes *against* what the standard 
says. But if there are any objections, I'd love to hear them before 
putting in the work. :-)

If we allow multiple+anywhere WITHOUT OVERLAPS in PRIMARY KEY & UNIQUE 
constraints, then surely we also allow multiple+anywhere PERIOD in 
FOREIGN KEY constraints too. (I guess the standard switched keywords 
because a FK is more like "MUST OVERLAPS". :-)

Also if you have multiple application-time dimensions we probably need 
to allow multiple FOR PORTION OF clauses. I think the syntax would be:

UPDATE t
   FOR PORTION OF valid_at FROM ... TO ...
   FOR PORTION OF asserted_at FROM ... TO ...
   [...]
   SET foo = bar

Does that sound okay?

I don't quite understand this part:

 >> Also, your implementation
 >> requires at least one non-overlaps column, which also seems like a
 >> confusing restriction.

That's just a regular non-temporal constraint. Right? If I'm missing 
something let me know.

[1] C. J. Date, Hugh Darwen, Nikos Lorentzos. Time and Relational 
Theory, Second Edition: Temporal Databases in the Relational Model and 
SQL. 2nd edition, 2014.
[2] Tom Johnston. Bitemporal Data: Theory and Practice. 2014.

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Re: SQL:2011 application time

От

Vik Fearing

Дата:

02 сентября 2023 г., 01:41:13

On 9/1/23 21:56, Paul Jungwirth wrote:
> On 9/1/23 03:50, Vik Fearing wrote:
>> On 9/1/23 11:30, Peter Eisentraut wrote:
>>> 1) If I write UNIQUE (a, b, c WITHOUT OVERLAPS), does the WITHOUT 
>>> OVERLAPS clause attach to the last column, or to the whole column 
>>> list? In the SQL standard, you can only have one period and it has to 
>>> be listed last, so this question does not arise.  But here we are 
>>> building a more general facility to then build the SQL facility on 
>>> top of.  So I think it doesn't make sense that the range column must 
>>> be last or that there can only be one.  Also, your implementation 
>>> requires at least one non-overlaps column, which also seems like a 
>>> confusing restriction.
>>>
>>> I think the WITHOUT OVERLAPS clause should be per-column, so that 
>>> something like UNIQUE (a WITHOUT OVERLAPS, b, c WITHOUT OVERLAPS) 
>>> would be possible.  Then the WITHOUT OVERLAPS clause would directly 
>>> correspond to the choice between equality or overlaps operator per 
>>> column.
>>>
>>> An alternative interpretation would be that WITHOUT OVERLAPS applies 
>>> to the whole column list, and we would take it to mean, for any range 
>>> column, use the overlaps operator, for any non-range column, use the 
>>> equals operator.  But I think this would be confusing and would 
>>> prevent the case of using the equality operator for some ranges and 
>>> the overlaps operator for some other ranges in the same key.
>>
>> I prefer the first option.  That is: WITHOUT OVERLAPS applies only to 
>> the column or expression it is attached to, and need not be last in line.
> 
> I agree. The second option seems confusing and is more restrictive.
> 
> I think allowing multiple uses of `WITHOUT OVERLAPS` (and in any 
> position) is a great recommendation that enables a lot of new 
> functionality. Several books[1,2] about temporal databases describe a 
> multi-dimensional temporal space (even beyond application time vs. 
> system time), and the standard is pretty disappointing here. It's not a 
> weird idea.
> 
> But I just want to be explicit that this isn't something the standard 
> describes. (I think everyone in the conversation so far understands 
> that.) So far I've tried to be pretty scrupulous about following 
> SQL:2011, although personally I'd rather see Postgres support this 
> functionality. And it's not like it goes *against* what the standard 
> says. But if there are any objections, I'd love to hear them before 
> putting in the work. :-)


I have no problem with a first version doing exactly what the standard 
says and expanding it later.


> If we allow multiple+anywhere WITHOUT OVERLAPS in PRIMARY KEY & UNIQUE 
> constraints, then surely we also allow multiple+anywhere PERIOD in 
> FOREIGN KEY constraints too. (I guess the standard switched keywords 
> because a FK is more like "MUST OVERLAPS". :-)


Seems reasonable.


> Also if you have multiple application-time dimensions we probably need 
> to allow multiple FOR PORTION OF clauses. I think the syntax would be:
> 
> UPDATE t
>    FOR PORTION OF valid_at FROM ... TO ...
>    FOR PORTION OF asserted_at FROM ... TO ...
>    [...]
>    SET foo = bar
> 
> Does that sound okay?


That sounds really cool.


> [1] C. J. Date, Hugh Darwen, Nikos Lorentzos. Time and Relational 
> Theory, Second Edition: Temporal Databases in the Relational Model and 
> SQL. 2nd edition, 2014.
> [2] Tom Johnston. Bitemporal Data: Theory and Practice. 2014.


Thanks!  I have ordered these books.
-- 
Vik Fearing

Re: SQL:2011 application time

От

jian he

Дата:

08 сентября 2023 г., 04:24:22

On Sat, Sep 2, 2023 at 5:58 AM Paul Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
>
> I don't quite understand this part:
>
>  >> Also, your implementation
>  >> requires at least one non-overlaps column, which also seems like a
>  >> confusing restriction.
>
> That's just a regular non-temporal constraint. Right? If I'm missing
> something let me know.
>

for a range primary key, is it fine to expect it to be unique, not
null and also not overlap? (i am not sure how hard to implement it).

-----------------------------------------------------------------
quote from 7IWD2-02-Foundation-2011-12.pdf. 4.18.3.2 Unique
constraints, page 97 of 1483.

4.18.3.2 Unique constraints In addition to the components of every
table constraint descriptor, a unique constraint descriptor includes:
— An indication of whether it was defined with PRIMARY KEY or UNIQUE.
— The names and positions of the unique columns specified in the
<unique column list>
 — If <without overlap specification> is specified, then the name of
the period specified.

If the table descriptor for base table T includes a unique constraint
descriptor indicating that the unique constraint was defined with
PRIMARY KEY, then the columns of that unique constraint constitute the
primary key of T. A table that has a primary key cannot have a proper
supertable.
A unique constraint that does not include a <without overlap
specification> on a table T is satisfied if and only if there do not
exist two rows R1 and R2 of T such that R1 and R2 have the same
non-null values in the unique columns. If a unique constraint UC on a
table T includes a <without overlap specification> WOS, then let
<application time period name> ATPN be the contained in WOS. UC is
satisfied if and only if there do not exist two rows R1 and R2 of T
such that R1 and R2 have the same non-null values in the unique
columns and the ATPN period values of R1 and R2 overlap. In addition,
if the unique constraint was defined with PRIMARY KEY, then it
requires that none of the values in the specified column or columns be
a null value.
-----------------------------------------------------------------
based on the above, the unique constraint does not specify that the
column list must be range type. UNIQUE (a, c WITHOUT OVERLAPS).
Here column "a" can be a range type (that have overlap property) and
can be not.
In fact, many of your primary key, foreign key regess test using
something like '[11,11]' (which make it more easy to understand),
which in logic is a non-range usage.
So UNIQUE (a, c WITHOUT OVERLAPS), column "a" be a non-range data type
does make sense?

Re: SQL:2011 application time

От

jian he

Дата:

08 сентября 2023 г., 12:35:10

hi.
the following script makes the server crash (Segmentation fault).

create schema test;
set search_path to test;
DROP TABLE  IF EXISTS temporal_rng;
CREATE TABLE temporal_rng (id int4range, valid_at daterange);
ALTER TABLE temporal_rng
ADD CONSTRAINT temporal_rng_pk
PRIMARY KEY (id, valid_at WITHOUT OVERLAPS);

INSERT INTO temporal_rng VALUES
    ('[11,11]', daterange('2018-01-01', '2020-01-01')),
    ('[11,11]', daterange('2020-01-01', '2021-01-01')),
    ('[20,20]', daterange('2018-01-01', '2020-01-01')),
    ('[20,20]', daterange('2020-01-01', '2021-01-01'));

DROP TABLE  IF EXISTS temporal_fk_rng2rng;

CREATE TABLE temporal_fk_rng2rng (
id int4range,
valid_at tsrange,
parent_id int4range,
CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)
);
---------------------------------------------------------------
BEGIN;
ALTER TABLE temporal_fk_rng2rng DROP CONSTRAINT IF EXISTS
temporal_fk_rng2rng_fk;
ALTER TABLE temporal_fk_rng2rng
  ALTER COLUMN parent_id SET DEFAULT '[-1,-1]',
  ALTER COLUMN valid_at SET DEFAULT tsrange('2018-01-01', '2019-11-11');

ALTER TABLE temporal_fk_rng2rng
ADD CONSTRAINT temporal_fk_rng2rng_fk
FOREIGN KEY (parent_id, PERIOD valid_at)
REFERENCES temporal_rng
        on update set DEFAULT
        on delete set DEFAULT;
---------------------------------------------------------gdb related
info:---------------
(gdb) continue
Continuing.

Program received signal SIGSEGV, Segmentation fault.
FindFKComparisonOperators (fkconstraint=0x556450100bd8,
tab=0x55644ff8f570, i=1, fkattnum=0x7ffeb3286ba0,
old_check_ok=0x7ffeb3286b11, old_pfeqop_item=0x7ffeb3286b28,
pktype=3912, fktype=3908, opclass=10078, is_temporal=true,
for_overlaps=true, pfeqopOut=0x7ffeb3286da4, ppeqopOut=0x7ffeb3286e24,
ffeqopOut=0x7ffeb3286ea4) at
../../Desktop/pg_sources/main/postgres/src/backend/commands/tablecmds.c:11582
11582                           pkattr_name = strVal(fkconstraint->pk_period);
(gdb) where
#0  FindFKComparisonOperators (fkconstraint=0x556450100bd8,
tab=0x55644ff8f570, i=1,
    fkattnum=0x7ffeb3286ba0, old_check_ok=0x7ffeb3286b11,
old_pfeqop_item=0x7ffeb3286b28, pktype=3912,
    fktype=3908, opclass=10078, is_temporal=true, for_overlaps=true,
pfeqopOut=0x7ffeb3286da4,
    ppeqopOut=0x7ffeb3286e24, ffeqopOut=0x7ffeb3286ea4)
    at ../../Desktop/pg_sources/main/postgres/src/backend/commands/tablecmds.c:11582
#1  0x000055644e53875a in ATAddForeignKeyConstraint
(wqueue=0x7ffeb3287118, tab=0x55644ff8f570,
    rel=0x7fb2dc124430, fkconstraint=0x556450100bd8, recurse=true,
recursing=false, lockmode=6)
    at ../../Desktop/pg_sources/main/postgres/src/backend/commands/tablecmds.c:10395
#2  0x000055644e536cc2 in ATExecAddConstraint (wqueue=0x7ffeb3287118,
tab=0x55644ff8f570,
    rel=0x7fb2dc124430, newConstraint=0x556450100bd8, recurse=true,
is_readd=false, lockmode=6)
    at ../../Desktop/pg_sources/main/postgres/src/backend/commands/tablecmds.c:9948
#3  0x000055644e528eaa in ATExecCmd (wqueue=0x7ffeb3287118,
tab=0x55644ff8f570, cmd=0x5564500fae48,
    lockmode=6, cur_pass=10, context=0x7ffeb3287310)
    at ../../Desktop/pg_sources/main/postgres/src/backend/commands/tablecmds.c:5711
#4  0x000055644e5283f6 in ATRewriteCatalogs (wqueue=0x7ffeb3287118,
lockmode=6, context=0x7ffeb3287310)
    at ../../Desktop/pg_sources/main/postgres/src/backend/commands/tablecmds.c:5569
#5  0x000055644e527031 in ATController (parsetree=0x55645000e228,
rel=0x7fb2dc124430,
    cmds=0x55645000e1d8, recurse=true, lockmode=6, context=0x7ffeb3287310)
    at ../../Desktop/pg_sources/main/postgres/src/backend/commands/tablecmds.c:5136
#6  0x000055644e526a9d in AlterTable (stmt=0x55645000e228, lockmode=6,
context=0x7ffeb3287310)
    at ../../Desktop/pg_sources/main/postgres/src/backend/commands/tablecmds.c:4789
#7  0x000055644e92eb65 in ProcessUtilitySlow (pstate=0x55644ff8f460,
pstmt=0x55645000e2d8,
--Type <RET> for more, q to quit, c to continue without paging--
    55645000d330 "ALTER TABLE temporal_fk_rng2rng\n\tADD CONSTRAINT
temporal_fk_rng2rng_fk\n\tFOREIGN KEY (parent_id, PERIOD
valid_at)\n\tREFERENCES temporal_rng\n        on update set DEFAULT \n
       on delete set DEFAULT;", context=PROCESS_UTILITY_TOPLEVEL,
params=0x0, queryEnv=0x0,
    dest=0x55645000e698, qc=0x7ffeb3287970)
    at ../../Desktop/pg_sources/main/postgres/src/backend/tcop/utility.c:1329
#8  0x000055644e92e24c in standard_ProcessUtility (pstmt=0x55645000e2d8,
    queryString=0x55645000d330 "ALTER TABLE temporal_fk_rng2rng\n\tADD
CONSTRAINT temporal_fk_rng2rng_fk\n\tFOREIGN KEY (parent_id, PERIOD
valid_at)\n\tREFERENCES temporal_rng\n        on update set DEFAULT \n
       on delete set DEFAULT;", readOnlyTree=false,
context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
    queryEnv=0x0, dest=0x55645000e698, qc=0x7ffeb3287970)
    at ../../Desktop/pg_sources/main/postgres/src/backend/tcop/utility.c:1078
#9  0x000055644e92c921 in ProcessUtility (pstmt=0x55645000e2d8,
    queryString=0x55645000d330 "ALTER TABLE temporal_fk_rng2rng\n\tADD
CONSTRAINT temporal_fk_rng2rng_fk\n\tFOREIGN KEY (parent_id, PERIOD
valid_at)\n\tREFERENCES temporal_rng\n        on update set DEFAULT \n
       on delete set DEFAULT;", readOnlyTree=false,
context=PROCESS_UTILITY_TOPLEVEL, params=0x0,
    queryEnv=0x0, dest=0x55645000e698, qc=0x7ffeb3287970)
    at ../../Desktop/pg_sources/main/postgres/src/backend/tcop/utility.c:530
#10 0x000055644e92a83e in PortalRunUtility (portal=0x5564500a9840,
pstmt=0x55645000e2d8,
    isTopLevel=true, setHoldSnapshot=false, dest=0x55645000e698,
qc=0x7ffeb3287970)
    at ../../Desktop/pg_sources/main/postgres/src/backend/tcop/pquery.c:1158
#11 0x000055644e92abdb in PortalRunMulti (portal=0x5564500a9840,
isTopLevel=true,
    setHoldSnapshot=false, dest=0x55645000e698,
altdest=0x55645000e698, qc=0x7ffeb3287970)
    at ../../Desktop/pg_sources/main/postgres/src/backend/tcop/pquery.c:1315
#12 0x000055644e929b53 in PortalRun (portal=0x5564500a9840,
count=9223372036854775807, isTopLevel=true,
    run_once=true, dest=0x55645000e698, altdest=0x55645000e698,
qc=0x7ffeb3287970)
    at ../../Desktop/pg_sources/main/postgres/src/backend/tcop/pquery.c:791
#13 0x000055644e91f206 in exec_simple_query (
    query_string=0x55645000d330 "ALTER TABLE
temporal_fk_rng2rng\n\tADD CONSTRAINT
temporal_fk_rng2rng_fk\n\tFOREIGN KEY (parent_id, PERIOD
valid_at)\n\tREFERENCES temporal_rng\n        on update set DEFAULT \n
       on delete set DEFAULT;")
    at ../../Desktop/pg_sources/main/postgres/src/backend/tcop/postgres.c:1274
--Type <RET> for more, q to quit, c to continue without paging--
#14 0x000055644e926c49 in PostgresMain (dbname=0x556450045610 "regression",
    username=0x5564500455f8 "jian")
    at ../../Desktop/pg_sources/main/postgres/src/backend/tcop/postgres.c:4637
#15 0x000055644e7ff0e9 in BackendRun (port=0x5564500394f0)
    at ../../Desktop/pg_sources/main/postgres/src/backend/postmaster/postmaster.c:4438
#16 0x000055644e7fe6a1 in BackendStartup (port=0x5564500394f0)
    at ../../Desktop/pg_sources/main/postgres/src/backend/postmaster/postmaster.c:4166
#17 0x000055644e7f8aa0 in ServerLoop ()
    at ../../Desktop/pg_sources/main/postgres/src/backend/postmaster/postmaster.c:1780
#18 0x000055644e7f8042 in PostmasterMain (argc=3, argv=0x55644ff77e60)
    at ../../Desktop/pg_sources/main/postgres/src/backend/postmaster/postmaster.c:1464
#19 0x000055644e67f884 in main (argc=3, argv=0x55644ff77e60)
    at ../../Desktop/pg_sources/main/postgres/src/backend/main/main.c:198

Re: SQL:2011 application time

От

Paul A Jungwirth

Дата:

08 сентября 2023 г., 20:26:59

On Fri, Sep 8, 2023 at 2:35 AM jian he <jian.universality@gmail.com> wrote:
>
> hi.
> the following script makes the server crash (Segmentation fault).
> [snip]
>
> ALTER TABLE temporal_fk_rng2rng
> ADD CONSTRAINT temporal_fk_rng2rng_fk
> FOREIGN KEY (parent_id, PERIOD valid_at)
> REFERENCES temporal_rng
>         on update set DEFAULT
>         on delete set DEFAULT;

Thank you for the report! It looks like I forgot to handle implicit
column names after REFERENCES. The PERIOD part needs to get looked up
from the PK as we do for normal FK attrs. I'll add that to the next
patch.

Yours,
Paul

Re: SQL:2011 application time

От

jian he

Дата:

09 сентября 2023 г., 10:54:37

hi
I am confused by (pk,fk) on delete on update (restriction and no
action) result based on v13.
related post: https://stackoverflow.com/questions/14921668/difference-between-restrict-and-no-action
Please check the following test and comments.

---common setup for test0, test1,test2,test3
BEGIN;
DROP TABLE IF EXISTS temporal_rng, temporal_fk_rng2rng;
CREATE TABLE temporal_rng ( id int4range,valid_at tsrange);
ALTER TABLE temporal_rng
ADD CONSTRAINT temporal_rng_pk
PRIMARY KEY (id, valid_at WITHOUT OVERLAPS);
CREATE TABLE temporal_fk_rng2rng (
id int4range,
valid_at tsrange,
parent_id int4range
);
commit;

----------------no_action_vs_restriction test0
BEGIN;
ALTER TABLE temporal_fk_rng2rng
DROP CONSTRAINT IF EXISTS temporal_fk_rng2rng_fk;
ALTER TABLE temporal_fk_rng2rng
ADD CONSTRAINT temporal_fk_rng2rng_fk
FOREIGN KEY (parent_id, PERIOD valid_at)
REFERENCES temporal_rng
    ON DELETE NO ACTION
ON UPDATE NO ACTION;
INSERT INTO temporal_rng VALUES ('[5,5]', tsrange('2018-01-01', '2018-02-01')),
                                 ('[5,5]', tsrange('2018-02-01', '2018-03-01'));
INSERT INTO temporal_fk_rng2rng VALUES ('[3,3]', tsrange('2018-01-05',
'2018-01-10'), '[5,5]');

/*
expect below to fail.
since to be deleted range is being referenced (in temporal_fk_rng2rng)
but the v13 patch won't fail.
*/
delete from temporal_rng
    FOR PORTION OF valid_at FROM '2018-01-06' TO '2018-01-11'
WHERE   id = '[5,5]'
AND     valid_at @> '2018-01-05'::timestamp;
TABLE temporal_rng \; table temporal_fk_rng2rng;
ROLLBACK;


----------------no_action_vs_restriction test1
BEGIN;
ALTER TABLE temporal_fk_rng2rng
DROP CONSTRAINT IF EXISTS temporal_fk_rng2rng_fk;
ALTER TABLE temporal_fk_rng2rng
ADD CONSTRAINT temporal_fk_rng2rng_fk
FOREIGN KEY (parent_id, PERIOD valid_at)
REFERENCES temporal_rng
ON DELETE RESTRICT
ON UPDATE RESTRICT;
INSERT INTO temporal_rng VALUES ('[5,5]', tsrange('2018-01-01', '2018-02-01')),
                                 ('[5,5]', tsrange('2018-02-01', '2018-03-01'));
INSERT INTO temporal_fk_rng2rng VALUES ('[3,3]', tsrange('2018-01-05',
'2018-01-10'), '[5,5]');

/*
expect the below command not to fail.
since to be deleted range is not being referenced in temporal_fk_rng2rng)
but the v13 patch will fail.
*/
delete from temporal_rng
    FOR PORTION OF valid_at FROM '2018-01-12' TO '2018-01-20'
WHERE   id = '[5,5]'
AND     valid_at @> '2018-01-05'::timestamp;
ROLLBACK;


----------------no_action_vs_restriction test2
BEGIN;
ALTER TABLE temporal_fk_rng2rng
DROP CONSTRAINT IF EXISTS temporal_fk_rng2rng_fk;
ALTER TABLE temporal_fk_rng2rng
ADD CONSTRAINT temporal_fk_rng2rng_fk
FOREIGN KEY (parent_id, PERIOD valid_at)
REFERENCES temporal_rng
ON DELETE no action
ON UPDATE no action;
INSERT INTO temporal_rng VALUES ('[5,5]', tsrange('2018-01-01', '2018-02-01')),
                                 ('[5,5]', tsrange('2018-02-01', '2018-03-01'));
INSERT INTO temporal_fk_rng2rng VALUES ('[3,3]', tsrange('2018-01-05',
'2018-01-10'), '[5,5]');
/*
expect below command fail.
since to be deleted range is being referenced (in temporal_fk_rng2rng)
*/
UPDATE temporal_rng FOR PORTION OF valid_at FROM '2018-01-06' TO '2018-01-08'
SET     id = '[7,7]'
WHERE   id = '[5,5]'
AND     valid_at @> '2018-01-05'::timestamp;
TABLE temporal_rng \; table temporal_fk_rng2rng;

ROLLBACK;


----------------no_action_vs_restriction test3
BEGIN;
ALTER TABLE temporal_fk_rng2rng
DROP CONSTRAINT IF EXISTS temporal_fk_rng2rng_fk;
ALTER TABLE temporal_fk_rng2rng
ADD CONSTRAINT temporal_fk_rng2rng_fk
FOREIGN KEY (parent_id, PERIOD valid_at)
REFERENCES temporal_rng
ON DELETE RESTRICT
ON UPDATE RESTRICT;
INSERT INTO temporal_rng VALUES ('[5,5]', tsrange('2018-01-01', '2018-02-01')),
                                 ('[5,5]', tsrange('2018-02-01', '2018-03-01'));
INSERT INTO temporal_fk_rng2rng VALUES ('[3,3]', tsrange('2018-01-05',
'2018-01-10'), '[5,5]');

/*
expect the below command not to fail.
since to be deleted range is not being referenced in temporal_fk_rng2rng)
but the v13 patch will fail.
*/
UPDATE temporal_rng FOR PORTION OF valid_at FROM '2018-01-12' TO '2018-01-20'
SET     id = '[7,7]'
WHERE   id = '[5,5]'
AND     valid_at @> '2018-01-05'::timestamp;
ROLLBACK;

Re: SQL:2011 application time

От

jian he

Дата:

12 сентября 2023 г., 12:01:36

hi. some trivial issue:

in src/backend/catalog/index.c
/* * System attributes are never null, so no need to check. */
if (attnum <= 0)

since you already checked attnum == 0
so here you can just attnum < 0?
-------------------------------------------------
ERROR:  column "valid_at" named in WITHOUT OVERLAPS is not a range type

IMHO, "named" is unnecessary.
-------------------------------------------------
doc/src/sgml/catalogs.sgml
pg_constraint adds another attribute (column): contemporal, seems no doc entry.

also the temporal in oxford definition is "relating to time", here we
can deal with range.
So maybe  "temporal" is not that accurate?

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

14 сентября 2023 г., 19:09:19

On 9/7/23 18:24, jian he wrote:
> for a range primary key, is it fine to expect it to be unique, not
> null and also not overlap? (i am not sure how hard to implement it).
> 
> -----------------------------------------------------------------
> quote from 7IWD2-02-Foundation-2011-12.pdf. 4.18.3.2 Unique
> constraints, page 97 of 1483.
> 
> ...
> -----------------------------------------------------------------
> based on the above, the unique constraint does not specify that the
> column list must be range type. UNIQUE (a, c WITHOUT OVERLAPS).
> Here column "a" can be a range type (that have overlap property) and
> can be not.
> In fact, many of your primary key, foreign key regess test using
> something like '[11,11]' (which make it more easy to understand),
> which in logic is a non-range usage.
> So UNIQUE (a, c WITHOUT OVERLAPS), column "a" be a non-range data type
> does make sense?

I'm not sure I understand this question, but here are a few things that 
might help clarify things:

In SQL:2011, a temporal primary key, unique constraint, or foreign key 
may have one or more "scalar" parts (just like a regular key) followed 
by one "PERIOD" part, which is denoted with "WITHOUT OVERLAPS" (in 
PKs/UNIQUEs) or "PERIOD" (in FKs). Except for this last key part, 
everything is still compared for equality, just as in a traditional key. 
But this last part is compared for overlaps. It's exactly the same as 
`EXCLUDE (id WITH =, valid_at WITH &&)`. The overlap part must come last 
and you can have only one (but you may have more than one scalar part if 
you like).

In the patch, I have followed that pattern, except I also allow a 
regular range column anywhere I allow a PERIOD. In fact PERIODs are 
mostly implemented on top of range types. (Until recently PERIOD support 
was in the first patch, not the last, and there was code all throughout 
for handling both, e.g. within indexes, etc. But at pgcon Peter 
suggested building everything on just range columns, and then having 
PERIODs create an "internal" GENERATED column, and that cleaned up the 
code considerably.)

One possible source of confusion is that in the tests I'm using range 
columns *also* for the scalar key part. So valid_at is a tsrange, and 
int is an int4range. This is not normally how you'd use the feature, but 
you need the btree_gist extension to mix int & tsrange (e.g.), and 
that's not available in the regress tests. We are still comparing the 
int4range for regular equality and the tsrange for overlaps. If you 
search this thread there was some discussion about bringing btree_gist 
into core, but it sounds like it doesn't need to happen. (It might be 
still desirable independently. EXCLUDE constraints are also not really 
something you can use practically without it, and their tests use the 
same trick of comparing ranges for plain equality.)

The piece of discussion you're replying to is about allowing *multiple* 
WITHOUT OVERLAPS modifiers on a PK/UNIQUE constraint, and in any 
position. I think that's a good idea, so I've started adapting the code 
to support it. (In fact there is a lot of code that assumes the overlaps 
key part will be in the last position, and I've never really been happy 
with that, so it's an excuse to make that more robust.) Here I'm saying 
(1) you will still need at least one scalar key part, (2) if there are 
no WITHOUT OVERLAPS parts then you just have a regular key, not a 
temporal one, (3) changing this obliges us to do the same for foreign 
keys and FOR PORTION OF.

I hope that helps! I apologize if I've completely missed the point. If 
so please try again. :-)

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

14 сентября 2023 г., 19:11:02

Thanks for the thorough review and testing!

Here is a v14 patch with the segfault and incorrect handling of NO 
ACTION and RESTRICT fixed (and reproductions added to the test suite).

A few more comments below on feedback from you and Peter:

On 9/12/23 02:01, jian he wrote:
> hi. some trivial issue:
> 
> in src/backend/catalog/index.c
> /* * System attributes are never null, so no need to check. */
> if (attnum <= 0)
> 
> since you already checked attnum == 0
> so here you can just attnum < 0?

I fixed the "/* *" typo here. I'm reluctant to change the attnum 
comparison since that's not a line I touched. (It was just part of the 
context around the updated comment.) Your suggestion does make sense 
though, so perhaps it should be a separate commit?

> ERROR:  column "valid_at" named in WITHOUT OVERLAPS is not a range type
> 
> IMHO, "named" is unnecessary.

Changed.

> doc/src/sgml/catalogs.sgml
> pg_constraint adds another attribute (column): contemporal, seems no doc entry.

Added.

> also the temporal in oxford definition is "relating to time", here we
> can deal with range.
> So maybe  "temporal" is not that accurate?

I agree if we allow multiple WITHOUT OVERLAPS/etc clauses, we should 
change the terminology. I'll include that with the multiple-range-keys 
change discussed upthread.

On 9/1/23 02:30, Peter Eisentraut wrote:
 > * There is a lot of talk about "temporal" in this patch, but this
 > functionality is more general than temporal.  I would prefer to change
 > this to more neutral terms like "overlaps".

Okay, sounds like several of us agree on this.

 > * The field ii_Temporal in IndexInfo doesn't seem necessary and could
 > be  handled via local variables.  See [0] for a similar discussion:
 >
 > [0]:
 > 
https://www.postgresql.org/message-id/flat/f84640e3-00d3-5abd-3f41-e6a19d33c40b@eisentraut.org

Done.

 > * In gram.y, change withoutOverlapsClause -> without_overlaps_clause
 > for consistency with the surrounding code.

Done.

 > * No-op assignments like n->without_overlaps = NULL; can be omitted.
 > (Or you should put them everywhere.  But only in some places seems
 > inconsistent and confusing.)

Changed. That makes sense since newNode uses palloc0fast. FWIW there is 
quite a lot of other code in gram.y that sets NULL fields though, 
including in ConstraintElem, and it seems like it does improve the 
clarity a little. By "everywhere" I think you mean wherever the file 
calls makeNode(Constraint)? I might go back and do it that way later.

I'll keep working on a patch to support multiple range keys, but I 
wanted to work through the rest of the feedback first. Also there is 
some fixing to do with partitions I believe, and then I'll finish the 
PERIOD support. So this v14 patch is just some minor fixes & tweaks from 
September feedback.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

On 9/17/23 20:11, jian he wrote:
> small issues so far I found, v14.

Thank you again for the review! v15 is attached.

> IndexInfo struct definition comment still has Temporal related
> comment, should be removed.

Fixed.

> catalog-pg-index.html, no indperiod doc entry, also in table pg_index,
> column indperiod  is junk value now.

You're right, it is just unneeded now that PERIODs are implemented by 
GENERATED columns. I've removed it.

> I think in UpdateIndexRelation, you need an add indperiod to build a
> pg_index tuple, similar to what you did in CreateConstraintEntry.

It's gone now.

> seems to make the following query works, we need to bring btree_gist
> related code to core?
> CREATE TABLE temporal_fk_rng2rng22 (id int8, valid_at int4range, > unique (id, valid_at WITHOUT OVERLAPS));

It doesn't need to be brought into core, but you would need to say 
`CREATE EXTENSION btree_gist` first. Since the regression tests don't 
assume we've built contrib, we have to use a workaround there.

> /* ----------------
>   * pg_period definition.  cpp turns this into
>   * typedef struct FormData_pg_period
>   * ----------------
>   */
> CATALOG(pg_period,8000,PeriodRelationId)
> {
> Oid oid; /* OID of the period */
> NameData pername; /* name of period */
> Oid perrelid; /* OID of relation containing this period */
> int16 perstart; /* column for start value */
> int16 perend; /* column for end value */
> int16 perrange; /* column for range value */
> Oid perconstraint; /* OID of (start < end) constraint */
> } FormData_pg_period;
> 
> no idea what the above comment "cpp'' refers to.

I believe cpp = C Pre-Processor. This comment is at the top of all the 
catalog/pg_*.h files. The next line is part of the same sentence (which 
took me a while to notice :-).

> The sixth field in
> FormData_pg_period: perrange, the comment conflict with catalogs.sgml
>>> perrngtype oid (references pg_type.oid)
>>> The OID of the range type associated with this period

You're right, fixed! More cruft from the old PERIOD implementation.

> create table pt (id integer, ds date, de date, period for p (ds, de));
> SELECT table_name, column_name, column_default, is_nullable,
> is_generated, generation_expression
> FROM    information_schema.columns
> WHERE table_name = 'pt' ORDER BY 1, 2;
> 
> the hidden generated column  (p)  is_nullable return NO. but ds, de
> is_nullable both return YES. so column p is_nullable should return
> YES?

The is_nullable behavior is correct I believe. In a range if the 
lower/upper value is NULL, it signifies the range has no lower/upper 
bound. So it's fine for ds or de to be NULL, but not the range itself (p).

Technically the SQL spec says that the PERIOD start & end columns should 
be NOT NULL, but that forces people to use ugly sentinel values like 
'3999-01-01'. It's a shame to make people do that when NULL works so 
well instead. Our time-related types do have Infinity and -Infinity 
which is not as ugly, but many other types do not. Plus those values 
interact badly with ranges. For example `select '(,)'::daterange - 
'(,Infinity)'::daterange` gives the infinitesimal result `[infinity,)`. 
I've heard at least one report of that make a mess in a user's database. 
If a user wants to make the start/end columns NOT NULL they can, so I 
prefer not to force them.

Continuing to your other email:

On 9/18/23 05:49, jian he wrote:
 > BEGIN;
 > ...
 > ALTER TABLE temporal_fk_rng2rng ALTER CONSTRAINT
 > temporal_fk_rng2rng_fk  DEFERRABLE INITIALLY DEFERRED;
 >
 > delete from temporal_rng; ---should not fail.
 > commit; ---fail in here.

Great catch! This is fixed also.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

Re: SQL:2011 application time

От

jian he

Дата:

25 сентября 2023 г., 07:52:40

On Wed, Sep 20, 2023 at 10:50 AM Paul Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> On 9/17/23 20:11, jian he wrote:
> > small issues so far I found, v14.
>
> Thank you again for the review! v15 is attached.
>

hi. some tiny issues.
IN src/backend/utils/adt/ri_triggers.c

else {
appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
pk_only, pkrelname);
}
should change to

else
{
appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
pk_only, pkrelname);
}

----
It would be better, we mention it somewhere:
by default, you can only have a primary key(range_type[...],
range_type WITHOUT OVERLAPS).

preceding without overlaps, all columns (in primary key) data types
only allowed range types.
-------------------------------
The WITHOUT OVERLAPS value must be a range type and is used to
constrain the record's applicability to just that interval (usually a
range of dates or timestamps).

"interval", I think "period" or "range" would be better. I am not sure
we need to mention " must be a range type, not a multi range type".
---------------------------------------------
I just `git apply`, then ran the test, and one test failed. Some minor
changes need to make the test pass.

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

25 сентября 2023 г., 23:20:59

On 9/24/23 21:52, jian he wrote:
> On Wed, Sep 20, 2023 at 10:50 AM Paul Jungwirth
> <pj@illuminatedcomputing.com> wrote:
>>
>> On 9/17/23 20:11, jian he wrote:
>>> small issues so far I found, v14.
>>
>> Thank you again for the review! v15 is attached.
>>
> 
> hi. some tiny issues.

Rebased v16 patches attached.

> IN src/backend/utils/adt/ri_triggers.c
> 
> else {
> appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
> pk_only, pkrelname);
> }
> should change to
> 
> else
> {
> appendStringInfo(&querybuf, "SELECT 1 FROM %s%s x",
> pk_only, pkrelname);
> }

Fixed.

> It would be better, we mention it somewhere:
> by default, you can only have a primary key(range_type[...],
> range_type WITHOUT OVERLAPS).
> 
> preceding without overlaps, all columns (in primary key) data types
> only allowed range types.
> -------------------------------
> The WITHOUT OVERLAPS value must be a range type and is used to
> constrain the record's applicability to just that interval (usually a
> range of dates or timestamps).
> 
> "interval", I think "period" or "range" would be better. I am not sure
> we need to mention " must be a range type, not a multi range type".

I reworked those two paragraphs to incorporate those suggestions and 
hopefully clarify the idea bit further. (I'll revise these again once I 
support multiple WITHOUT OVERLAPS columns.)

> I just `git apply`, then ran the test, and one test failed. Some minor
> changes need to make the test pass.

I couldn't reproduce this. If you're still seeing a failure please let 
me know what you're seeing.

These patches also fix a problem I found with FKs when used with 
btree_gist. Privately I'm using the script below [1] to re-run all my 
tests with that extension and int+range columns. I'd like to add 
something similar to contrib/btree_gist. I'm open to advice how best to 
do that if anyone has any!

[1] #!/bin/bash
set -eu

# without_overlaps

cat ../src/test/regress/sql/without_overlaps.sql | \
   sed -E 's/int4range/integer/g' | \
   sed -E 's/valid_at integer/valid_at int4range/' | \
   sed -E 's/int8range/bigint/g' | \
   sed -E 's/'"'"'\[(-?[[:digit:]]+),\1\]'"'"'/\1/g' | \
   cat > ./sql/without_overlaps.sql

cat ../src/test/regress/expected/without_overlaps.out | \
   sed -E 's/int4range/integer/g' | \
   sed -E 's/valid_at integer/valid_at int4range/' | \
   sed -E 's/incompatible types: integer and tsrange/incompatible types: 
int4range and tsrange/' | \
   sed -E 's/int8range/bigint/g' | \
   sed -E 's/'"'"'\[(-?[[:digit:]]+),\1\]'"'"'/\1/g' | \
   sed -E 's/'"'"'\[(-?[[:digit:]]+),-?[[:digit:]]+\)'"'"'/\1/g' | \
   sed -E 's/\[(-?[[:digit:]]+),\1\]/\1/g' | \
   sed -E 's/\[(-?[[:digit:]]+),-?[[:digit:]]+\)/\1/g' | \
   sed -E 'N;P;s/^ +id [^\n]+\n-+(\+.*)$/----\1/p;D' | \
   sed -E 

's/^----------\+-----------\+-----------\+----------\+---------$/----------+---------+-----------+----------+---------/'

| \
   sed -E 
's/^----\+-------------------------\+--------\+-------$/----+-------------------------+-----+-------/' 
| \
   cat > ./expected/without_overlaps.out

# for_portion_of

cat ../src/test/regress/sql/for_portion_of.sql | \
   sed -E 's/int4range/integer/g' | \
   sed -E 's/valid_at integer/valid_at int4range/' | \
   sed -E 's/'"'"'\[(-?[[:digit:]]+),\1\]'"'"'/\1/g' | \
   cat > ./sql/for_portion_of.sql

cat ../src/test/regress/expected/for_portion_of.out | \
   sed -E 's/int4range/integer/g' | \
   sed -E 's/valid_at integer/valid_at int4range/' | \
   sed -E 's/'"'"'\[(-?[[:digit:]]+),\1\]'"'"'/\1/g' | \
   sed -E 's/'"'"'\[(-?[[:digit:]]+),-?[[:digit:]]+\)'"'"'/\1/g' | \
   sed -E 's/\[(-?[[:digit:]]+),\1\]/\1/g' | \
   sed -E 's/\[(-?[[:digit:]]+),-?[[:digit:]]+\)/\1/g' | \
   sed -E 'N;P;s/^ +id [^\n]+\n-+(\+.*)$/----\1/p;D' | \
   cat > ./expected/for_portion_of.out

Regards,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

hi. also based on v16.
-----------------tests.
drop table if exists for_portion_of_test1;
CREATE unlogged TABLE for_portion_of_test1 (id int4range, valid_at
tsrange,name text );
INSERT INTO for_portion_of_test1 VALUES  ('[1,1]', NULL,
'[1,1]_NULL'),('[1,1]', '(,)', '()_[1,]')
,('[1,1]', 'empty', '[1,1]_empty'),(NULL,NULL, NULL), (nuLL,
'(2018-01-01,2019-01-01)','misc');
--1
UPDATE for_portion_of_test1 FOR PORTION OF valid_at FROM NULL TO NULL
SET name = 'for_portition_NULLtoNULL';
select * from for_portion_of_test1;
--2
UPDATE for_portion_of_test1 FOR PORTION OF valid_at FROM null TO
UNBOUNDED SET name = 'NULL_TO_UNBOUNDED';
select * from for_portion_of_test1;
--3
UPDATE for_portion_of_test1 FOR PORTION OF valid_at FROM UNBOUNDED TO
null SET name = 'UNBOUNDED__TO_NULL';
select * from for_portion_of_test1;
--4
UPDATE for_portion_of_test1 FOR PORTION OF valid_at FROM UNBOUNDED TO
UNBOUNDED SET name = 'UNBOUNDED__TO_UNBOUNDED';
select * from for_portion_of_test1;
------------------------
File: /src/backend/executor/nodeModifyTable.c
1277: oldRange = slot_getattr(oldtupleSlot,
forPortionOf->rangeVar->varattno, &isNull);
1278:
1279: if (isNull)
1280: elog(ERROR, "found a NULL range in a temporal table");
1281: oldRangeType = DatumGetRangeTypeP(oldRange);

I wonder when this isNull will be invoked. the above tests won't
invoke the error.
also the above test, NULL seems equivalent to unbounded. FOR PORTION
OF "from" and "to" both bound should not be null?

which means the following code does not work as intended? I also
cannot find a way to invoke the following elog error branch.
File:src/backend/executor/nodeModifyTable.c
4458: exprState = ExecPrepareExpr((Expr *) forPortionOf->targetRange, estate);
4459: targetRange = ExecEvalExpr(exprState, econtext, &isNull);
4460: if (isNull)
4461: elog(ERROR, "Got a NULL FOR PORTION OF target range");

---------------------------
i also made some changes in the function range_leftover_internal,
ExecForPortionOfLeftovers.
please see the attached patch.

Вложения

Re: SQL:2011 application time

От

jian he

Дата:

25 октября 2023 г., 09:14:34

On Wed, Oct 11, 2023 at 12:47 PM Paul Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> On 9/25/23 14:00, Peter Eisentraut wrote:
> > Looking through the tests in v16-0001:
> >
> > +-- PK with no columns just WITHOUT OVERLAPS:
> > +CREATE TABLE temporal_rng (
> > +       valid_at tsrange,
> > +       CONSTRAINT temporal_rng_pk PRIMARY KEY (valid_at WITHOUT OVERLAPS)
> > +);
> > +ERROR:  syntax error at or near "WITHOUT"
> > +LINE 3:  CONSTRAINT temporal_rng_pk PRIMARY KEY (valid_at WITHOUT OV...
> > +                                                          ^
> >
> > I think this error is confusing.  The SQL standard requires at least one
> > non-period column in a PK.  I don't know why that is or why we should
> > implement it.  But if we want to implement it, maybe we should enforce
> > that in parse analysis rather than directly in the parser, to be able to
> > produce a more friendly error message.
>
> Okay.
>
> (I think the reason the standard requires one non-period column is to
> identify the "entity". If philosophically the row is an Aristotelian
> proposition about that thing, the period qualifies it as true just
> during some time span. So the scalar part is doing the work that a PK
> conventionally does, and the period part does something else. Perhaps a
> PK/UNIQUE constraint with no scalar part would still be useful, but not
> very often I think, and I'm not sure it makes sense to call it PK/UNIQUE.)
>
> > +-- PK with a range column/PERIOD that isn't there:
> > +CREATE TABLE temporal_rng (
> > +       id INTEGER,
> > +       CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT
> > OVERLAPS)
> > +);
> > +ERROR:  range or PERIOD "valid_at" in WITHOUT OVERLAPS does not exist
> >
> > I think here we should just produce a "column doesn't exist" error
> > message, the same as if the "id" column was invalid.  We don't need to
> > get into the details of what kind of column it should be.  That is done
> > in the next test
>
> I'll change it. The reason for the different wording is that it might
> not be a column at all. It might be a PERIOD. So what about just "column
> or PERIOD doesn't exist"? (Your suggestion is fine too though.)
>
> > +ERROR:  column "valid_at" in WITHOUT OVERLAPS is not a range type
> >
> > Also, in any case it would be nice to have a location pointer here (for
> > both cases).
>
> Agreed.
>

I refactored findNeworOldColumn to better handle error reports.
please check the attached.

Вложения

v1-0001-refactor-findNewOrOldColumn.patch

Re: SQL:2011 application time

От

jian he

Дата:

28 октября 2023 г., 11:25:57

V16 patch  doc/src/sgml/html/sql-createtable.html doc SET NULL description:
`
SET NULL [ ( column_name [, ... ] ) ]
Set all of the referencing columns, or a specified subset of the
referencing columns, to null. A subset of columns can only be
specified for ON DELETE actions.
In a temporal foreign key, the change will use FOR PORTION OF
semantics to constrain the effect to the bounds of the referenced row.
`

I think it means, if the foreign key has PERIOD column[s], then the
PERIOD column[s] will not be set to NULL in {ON DELETE|ON UPDATE}. We
can also use FOR PORTION OF semantics to constrain the effect to the
bounds of the referenced row.
see below demo:


BEGIN;
drop table if exists temporal_rng CASCADE;
drop table if exists temporal_fk_rng2rng CASCADE;
CREATE unlogged TABLE temporal_rng (id int4range,valid_at tsrange);
ALTER TABLE temporal_rng ADD CONSTRAINT temporal_rng_pk PRIMARY KEY
(id, valid_at WITHOUT OVERLAPS);
CREATE unlogged TABLE temporal_fk_rng2rng (id int4range,valid_at
tsrange,parent_id int4range,
CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at)
REFERENCES temporal_rng (id, PERIOD valid_at) on update set null ON
DELETE SET NULL);

INSERT INTO temporal_rng VALUES ('[11,11]', tsrange('2018-01-01',
'2021-01-01'));
INSERT INTO temporal_fk_rng2rng VALUES ('[7,7]', tsrange('2018-01-01',
'2021-01-01'), '[11,11]');
DELETE FROM temporal_rng WHERE id = '[11,11]';
table temporal_fk_rng2rng;
commit;
-----------------------------------------------------
also
"REFERENCES temporal_rng (id, PERIOD valid_at) ON UPDATE SET NULL ON
DELETE SET NULL)"
is the same as
"REFERENCES temporal_rng (id, PERIOD valid_at) ON UPDATE SET NULL ON
DELETE SET NULL (parent_id)"
in the current implementation.
we might need to change the pg_constraint column "confdelsetcols" description.
-------
the above also applies to SET DEFAULT.


--------------------------------------------------------------------------------------------------------------------------
can you add the following for the sake of code coverage. I think
src/test/regress/sql/without_overlaps.sql can be simplified.

--- common template for test foreign key constraint.
CREATE OR REPLACE PROCEDURE overlap_template()
LANGUAGE SQL
AS $$
DROP TABLE IF EXISTS temporal_rng CASCADE;
DROP TABLE IF EXISTS temporal_fk_rng2rng CASCADE;
CREATE UNLOGGED TABLE temporal_rng (id int4range,valid_at tsrange);
ALTER TABLE temporal_rng
ADD CONSTRAINT temporal_rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS);
CREATE UNLOGGED TABLE temporal_fk_rng2rng (
id int4range,
valid_at tsrange,
parent_id int4range,
CONSTRAINT temporal_fk_rng2rng_fk
FOREIGN KEY (parent_id, PERIOD valid_at)
REFERENCES temporal_rng (id, PERIOD valid_at)
ON UPDATE no action ON DELETE no action
DEFERRABLE
);
$$;
call overlap_template();

--- on update/delete restrict
-- coverage for TRI_FKey_restrict_upd,TRI_FKey_restrict_del.
BEGIN;
ALTER TABLE temporal_fk_rng2rng
DROP CONSTRAINT temporal_fk_rng2rng_fk,
ADD CONSTRAINT  temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at)
REFERENCES temporal_rng(id,PERIOD valid_at) ON UPDATE RESTRICT ON
DELETE RESTRICT;

INSERT INTO temporal_rng VALUES ('[11,11]', tsrange('2018-01-01',
'2021-01-01'));
INSERT INTO temporal_fk_rng2rng VALUES ('[7,7]', tsrange('2018-01-01',
'2020-01-01'), '[11,11]');
savepoint s;

UPDATE temporal_rng FOR PORTION OF valid_at FROM '2018-01-01' TO '2018-01-03'
SET id = '[9,9]' WHERE id = '[11,11]';
ROLLBACK to s;
delete from  temporal_rng FOR PORTION OF valid_at FROM '2018-01-01' TO
'2020-01-01';
ROLLBACK to s;
--this one should not have error.
delete from  temporal_rng FOR PORTION OF valid_at FROM '2020-01-01' TO
'2021-01-01';
table temporal_rng;
ROLLBACK;

-------------
--- on delete set column list coverage for function tri_set. branch
{if (riinfo->ndelsetcols != 0)}
BEGIN;
ALTER TABLE temporal_fk_rng2rng
DROP CONSTRAINT temporal_fk_rng2rng_fk,
ADD CONSTRAINT  temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at)
REFERENCES temporal_rng(id,PERIOD valid_at) ON DELETE set default(parent_id);

ALTER TABLE temporal_fk_rng2rng  ALTER COLUMN parent_id SET DEFAULT '[2,2]';
ALTER TABLE temporal_fk_rng2rng  ALTER COLUMN valid_at SET DEFAULT tsrange'(,)';
INSERT INTO temporal_rng VALUES ('[11,11]', tsrange('2018-01-01',
'2021-01-01'));
INSERT INTO temporal_fk_rng2rng VALUES ('[7,7]', tsrange('2018-01-01',
'2020-01-01'), '[11,11]');
insert into temporal_rng values('[2,2]','(,)');
savepoint s;
delete from  temporal_rng FOR PORTION OF valid_at FROM '2018-01-01' TO
'2019-01-01' where id = '[11,11]';
-- delete from  temporal_rng where id = '[11,11]';
table temporal_fk_rng2rng;
rollback;

Re: SQL:2011 application time

От

jian he

Дата:

30 октября 2023 г., 03:00:00

hi.

* The attached patch makes foreign keys with PERIOD fail if any of the
foreign key columns is "generated columns".

* The following queries will cause segmentation fault. not sure the
best way to fix it. the reason
in LINE: numpks = transformColumnNameList(RelationGetRelid(pkrel),
fkconstraint->pk_attrs, pkattnum, pktypoid);
begin;
drop table if exists temporal3,temporal_fk_rng2rng;
CREATE TABLE temporal3 (id int4range,valid_at tsrange,
CONSTRAINT temporal3_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS));
CREATE TABLE temporal_fk_rng2rng (
id int4range,valid_at tsrange,parent_id int4range,
CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS),
CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at)
REFERENCES temporal3 (id, valid_at)
);

* change the function FindFKComparisonOperators's "eqstrategy"  to
make pg_constraint record correct {conpfeqop,conppeqop,conffeqop}.

* fix the ON DELETE SET NULL/DEFAULT (columnlist). Now the following
queries error will be more consistent.
ALTER TABLE temporal_fk_rng2rng DROP CONSTRAINT temporal_fk_rng2rng_fk,
ADD CONSTRAINT temporal_fk_rng2rng_fk
FOREIGN KEY (parent_id, PERIOD valid_at) REFERENCES temporal_rng
ON DELETE SET DEFAULT(valid_at);
--ON DELETE SET NULL(valid_at);

* refactor restrict_cascading_range function.

* you did if (numfks != numpks) before if (is_temporal) {numfks +=
1;}, So I changed the code order to make the error report more
consistent.

anyway, I put it in one patch. please check the attached.

Вложения

v1-0001-various-small-fixes.patch

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

02 ноября 2023 г., 23:21:37

Thanks for all the feedback! Consolidating several emails below:

 > On Fri, Oct 20, 2023 at 5:45 AM jian he <jian.universality@gmail.com> 
wrote:
 > I don't think we need to check attr->attisdropped here

Changed.

 > "return *typnamespace;" should be "return true"?

No, but I added a comment to clarify.

 > Maybe name it get_typname_and_typnamespace?

I could go either way on this but I left it as-is since it seems 
redundant, and there are other functions here that don't repeat the 
three-letter prefix.

 > you can just `elog(ERROR, "missing range type %s", range_type_name);` ?

No, because this failure happens trying to look up the name.

 > Also, this should be placed just below if 
(!type_is_range(attr->atttypid))?

We ereport there (not elog) because it's a user error (using a 
non-rangetype for the option), not an internal error.

 > periods are always associated with the table, is the above else 
branch correct?

True but I'm following the code just above for OCLASS_CONSTRAINT. Even 
if this case is unexpected, it seems better to handle it gracefully than 
have a harder failure.

 > XXX: Does this hold for periods?
 > Yes. we can add the following 2 sql for code coverage.
 > alter table pt add period for tableoid (ds, de);
 > alter table pt add period for "........pg.dropped.4........" (ds, de);

Added, thanks!

 > On Sun, Oct 22, 2023 at 5:01 PM jian he <jian.universality@gmail.com> 
wrote:
 > drop table if exists for_portion_of_test1;
 > CREATE unlogged TABLE for_portion_of_test1 (id int4range, valid_at
 > tsrange,name text );
 > ...

These are good tests, thanks! Originally FOR PORTION OF required a 
PRIMARY KEY or UNIQUE constraint, so we couldn't find NULLs here, but we 
changed that a while back, so it's good to verify it handles that case.

 > 1279: if (isNull)
 > 1280: elog(ERROR, "found a NULL range in a temporal table");
 > 1281: oldRangeType = DatumGetRangeTypeP(oldRange);
 >
 > I wonder when this isNull will be invoked. The above tests won't
 > invoke the error.

As far as I can tell it shouldn't happen, which is why it's elog. The 
new tests don't hit it because a NULL range should never match the range 
in the FROM+TO of the FOR PORTION OF clause. Maybe this should even be 
an assert, but I think I prefer elog for the nicer error message and 
less-local condition.

 > also the above test, NULL seems equivalent to unbounded. FOR PORTION
 > OF "from and "to" both bound should not be null?

Correct, NULL and UNBOUNDED mean the same thing. This matches the 
meaning of NULL in ranges.

 > which means the following code does not work as intended? I also
 > cannot find a way to invoke the following elog error branch.
 > File:src/backend/executor/nodeModifyTable.c
 > 4458: exprState = ExecPrepareExpr((Expr *) forPortionOf->targetRange, 
estate);
 > 4459: targetRange = ExecEvalExpr(exprState, econtext, &isNull);
 > 4460: if (isNull)
 > 4461: elog(ERROR, "Got a NULL FOR PORTION OF target range");

Here we're checking the "target range", in other words the range built 
from the FROM+TO of the FOR PORTION OF clause---not a range from a 
tuple. Finding a NULL here *for the range itself* would indeed be an 
error. A NULL *bound* means "unbounded", but a NULL *range* should not 
be possible to construct.

 > I also made some changes in the function range_leftover_internal,

I'm not really comfortable with these changes. "Leftover" doesn't refer 
to "left" vs "right" but to what *remains* (what is "left behind") after 
the UPDATE/DELETE. Also r1 and r2 are common parameter names throughout 
the rangetypes.c file, and they are more general than the names you've 
suggested. We shouldn't assume we will only ever call this function from 
the FOR PORTION OF context.

 > ExecForPortionOfLeftovers

Thanks! I've made these code changes (with slight modifications, e.g. no 
need to call ExecFetchSlotHeapTuple if there are no leftovers).

I'm not sure about the comment change though---I want to verify that 
myself (particularly the case when the partition key is updated so we 
have already been routed to a different partition than the old tuple).

 > On Tue, Oct 24, 2023 at 11:14 PM jian he 
<jian.universality@gmail.com> wrote:
 > I refactored findNewOrOldColumn to better handle error reports.

Thanks, I like your changes here. Applied with some small adjustments.

 > On Sat, Oct 28, 2023 at 1:26 AM jian he <jian.universality@gmail.com> 
wrote:
 > I think it means, if the foreign key has PERIOD column[s], then the
 > PERIOD column[s] will not be set to NULL in {ON DELETE|ON UPDATE}. . . .

I reworded this to explain that the PERIOD element will not be set to 
NULL (or the default value).

 > can you add the following for the sake of code coverage. I think
 > src/test/regress/sql/without_overlaps.sql can be simplified.
 > ...
 > call overlaps_template();

I'm not sure I want to add indirection like this to the tests, which I 
think makes them harder to read (and update). But there is indeed a 
tough combinatorial explosion, especially in the foreign key tests. We 
want to cover {ON DELETE,ON UPDATE} {NO ACTION,RESTRICT,CASCADE,SET 
NULL,SET DEFAULT} when {child inserts,child updates,parent 
updates,parent deletes} with {one,two} scalar columns and {,not} 
partitioned. Also ON DELETE SET {NULL,DEFAULT} against only a subset of 
columns. I updated the test cases to delete and re-use the same id 
values, so at least they are more isolated and thus easier to edit. I 
also added tests for `(parent_id1, parent2, PERIOD valid_at)` cases as 
well as `ON DELETE SET {NULL,DEFAULT} (parent_id1)`. (I think that last 
case covers what you are trying to do here, but if I misunderstood 
please let me know.)

I haven't worked through your last email yet, but this seemed like 
enough changes to warrant an update.

New patches attached (rebased to 0bc726d9).

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

Re: SQL:2011 application time

От

jian he

Дата:

07 ноября 2023 г., 10:07:15

hi. based on v17. I found several doc related issues. previously I
didn't look closely....

+         </para>
+          In a temporal foreign key, the delete/update will use
+          <literal>FOR PORTION OF</literal> semantics to constrain the
+          effect to the bounds being deleted/updated in the referenced row.
+         </para>

The first "para" should be <para> ?
---
There are many warnings after #define WRITE_READ_PARSE_PLAN_TREES
see: http://cfbot.cputube.org/highlights/all.html#4308
Does that mean oue new change in gram.y is somehow wrong?
------
sgml/html/sql-update.html:
"range_or_period_name
The range column or period to use when performing a temporal update.
This must match the range or period used in the table's temporal
primary key."

Is the second sentence unnecessary? since no primary can still do "for
portion of update".

sgml/html/sql-update.html:
"start_time
The earliest time (inclusive) to change in a temporal update. This
must be a value matching the base type of the range or period from
range_or_period_name. It may also be the special value MINVALUE to
indicate an update whose beginning is unbounded."

probably something like the following:
"lower_bound"
The lower bound (inclusive) to change in an overlap update. This must
be a value matching the base type of the range or period from
range_or_period_name. It may also be the special value UNBOUNDED to
indicate an update whose beginning is unbounded."

Obviously the "start_time" reference also needs to change, and the
sql-delete.html reference also needs to change.
----------------------------------
UPDATE for_portion_of_test FOR PORTION OF valid_at  FROM NULL TO
"unbounded" SET name = 'NULL to NULL';
should fail, but not. double quoted unbounded is a column reference, I assume.

That's why I am confused with the function transformForPortionOfBound.
"if (nodeTag(n) == T_ColumnRef)" part.
-----------------------------------
in create_table.sgml. you also need to add  WITHOUT OVERLAPS related
info into <varlistentry id="sql-createtable-parms-unique">

Re: SQL:2011 application time

От

Peter Eisentraut

Дата:

09 ноября 2023 г., 16:47:12

On 02.11.23 21:21, Paul Jungwirth wrote:
> New patches attached (rebased to 0bc726d9).

I went over the patch 
v17-0001-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch in more 
detail.  Attached is a fixup patch that addresses a variety of cosmetic 
issues.

Some details:

- Renamed contemporal to conwithoutoverlaps, as previously discussed. 
Also renamed various variables and function arguments similarly.

- Rearranged text in CREATE TABLE reference page so there are no forward 
references.  (Describe WITHOUT OVERLAPS under UNIQUE and then PRIMARY 
KEy says "see above", rather than describe it under PRIMARY KEY and have 
UNIQUE say "see below.)

- Removed various bits that related to temporal foreign keys, which 
belong in a later patch.

- Reverted some apparently unrelated changes in src/backend/catalog/index.c.

- Removed the "temporal UNIQUE" constraint_type assignment in 
DefineIndex().  This is meant to be used in error messages and should 
refer to actual syntax.  I think it's fine without it this change.

- Field contemporal in NewConstraint struct is not used by this patch.

- Rearrange the grammar so that the rule with WITHOUT OVERLAPS is just a 
Boolean attribute rather than column name plus keywords.  This was kind 
of confusing earlier and led to weird error messages for invalid syntax. 
  I kept the restriction that you need at least one non-overlaps column, 
but that is now enforced in parse analysis, not in the grammar.  (But 
maybe we don't need it?)

(After your earlier explanation, I'm content to just allow one WITHOUT 
OVERLAPS column for now.)

- Some places looked at conexclop to check whether something is a 
WITHOUT OVERLAPS constraint, instead of looking at conwithoutoverlaps 
directly.

- Removed some redundant "unlike" entries in the pg_dump tests.  (This 
caused cfbot tests to fail.)

- Moved the "without_overlaps" test later in the schedule.  It should at 
least be after "constraints" so that normal constraints are tested first.


Two areas that could be improved:

1) In src/backend/commands/indexcmds.c, 
get_index_attr_temporal_operator() has this comment:

+    * This seems like a hack
+    * but I can't find any existing lookup function
+    * that knows about pseudotypes.

This doesn't see very confident. ;-)  I don't quite understand this.  Is 
this a gap in the currently available APIs, do we need to improve 
something here, or does this need more research?

2) In src/backend/parser/parse_utilcmd.c, transformIndexConstraint(), 
there is too much duplication between the normal and the if 
(constraint->without_overlaps) case, like the whole not-null constraints 
stuff at the end.  This should be one code block with a few conditionals 
inside.  Also, the normal case deals with things like table inheritance, 
which the added parts do not.  Is this all complete?

I'm not sure the validateWithoutOverlaps() function is needed at this 
point in the code.  We just need to check that the column exists, which 
the normal code path already does, and then have the index creation code 
later check that an appropriate overlaps operator exists.  We don't even 
need to restrict this to range types.  Consider for example, it's 
possible that a type does not have a btree equality operator.  We don't 
check that here either, but let the index code later check it.


Overall, with these fixes, I think this patch is structurally okay.  We 
just need to make sure we have all the weird corner cases covered.

Вложения

0001-fixup-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraint.patch.nocfbot

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

09 ноября 2023 г., 20:15:31

On 11/9/23 05:47, Peter Eisentraut wrote:
> I went over the patch 
> v17-0001-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch in more 
> detail

Thanks Peter!

I'm about halfway through jian he's last two emails. I'll address your 
feedback also. I wanted to reply to this without waiting though:

> Overall, with these fixes, I think this patch is structurally okay.  We 
> just need to make sure we have all the weird corner cases covered.

One remaining issue I know about is with table partitions whose column 
order has changed. I've got an in-progress fix for that, but I've been 
prioritizing reviewer feedback the last few months. Just want to make 
sure you know about it for now.

Thanks!

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Re: SQL:2011 application time

От

jian he

Дата:

17 ноября 2023 г., 10:12:30

based on v17.

begin;
drop table if exists s1;
CREATE TABLE s1 (id numrange, misc int, misc1 text);
create role  test101 login;
grant update, select  on s1 to test101;
insert into s1 VALUES ('[1,1000]',2);
set session authorization test101;
update s1 set id = '[1,1000]';
savepoint sp1;
update s1 FOR PORTION OF id from 10 to 100 set misc1 = 'test';
table s1;
savepoint sp2;
insert into s1 VALUES ('[2,1000]',12);
rollback;

In UPDATE FOR PORTION OF from x to y, if range [x,y) overlaps with the
"source" range
then the UPDATE action would be UPDATE and INSERT.
The above UPDATE FOR PORTION OF query should fail?
UPDATE FOR PORTION OF, may need insert privilege. We also need to document this.
Similarly, we also need to apply the above logic to DELETE FOR PORTION OF.
-------------------------------------------------------
+  <para>
+   If the table has a <link
linkend="ddl-periods-application-periods">range column
+   or <literal>PERIOD</literal></link>, you may supply a

should be

+ <para>
+  If the table has a range column or  <link
linkend="ddl-periods-application-periods">
+  <literal>PERIOD</literal></link>, you may supply a

similarly the doc/src/sgml/ref/delete.sgml the link reference also broken.
--------------------------------------------------------
  <para>
   If the table has a range column or  <link
linkend="ddl-periods-application-periods">
  <literal>PERIOD</literal></link>, you may supply a
   <literal>FOR PORTION OF</literal> clause, and your update will only
affect rows
   that overlap the given interval. Furthermore, if a row's span extends outside
   the <literal>FOR PORTION OF</literal> bounds, then it will be
truncated to fit
   within the bounds, and new rows spanning the "cut off" duration will be
   inserted to preserve the old values.
  </para>

 "given interval", "cut off" these words,  imho, feel not so clear.
We also need a document that:
 "UPDATE FOR PORTION OF" is UPDATE and INSERT (if overlaps).
If the "UPDATE FOR PORTION OF" range overlaps then
It will invoke triggers in the following order: before row update,
before row insert, after row insert. after row update.
---------------------------------------
src/test/regress/sql/for_portion_of.sql
You only need to create two triggers?
since for_portion_of_trigger only raises notice to output the triggers
meta info.

CREATE TRIGGER trg_for_portion_of_before
  BEFORE INSERT OR UPDATE OR DELETE ON for_portion_of_test
  FOR EACH ROW
  EXECUTE FUNCTION for_portion_of_trigger();
CREATE TRIGGER trg_for_portion_of_after
AFTER INSERT OR UPDATE OR DELETE ON for_portion_of_test
FOR EACH ROW
EXECUTE FUNCTION for_portion_of_trigger();

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

17 ноября 2023 г., 21:39:58

On 11/9/23 05:47, Peter Eisentraut wrote:
> I went over the patch v17-0001-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch in more 
> detail.  Attached is a fixup patch that addresses a variety of cosmetic issues.

Thanks for the review! This all looks great to me, and it's applied in the attached patch (with one 
typo correction in a C comment). The patch addresses some of jian he's feedback too but I'll reply 
to those emails separately.

> Two areas that could be improved:
> 
> 1) In src/backend/commands/indexcmds.c, get_index_attr_temporal_operator() has this comment:
> 
> +    * This seems like a hack
> +    * but I can't find any existing lookup function
> +    * that knows about pseudotypes.
> 
> This doesn't see very confident. ;-)  I don't quite understand this.  Is this a gap in the currently 
> available APIs, do we need to improve something here, or does this need more research?

I've improved this a bit but I'm still concerned about part of it.

First the improved part: I realized I should be calling get_opclass_opfamily_and_input_type first 
and passing the opcintype to get_opfamily_member, which solves the problem of having a concrete 
rangetype but needing an operator that targets anyrange. We do the same thing with partition keys.

But I feel the overall approach is wrong: originally I used hardcoded "=" and "&&" operators, and 
you asked me to look them up by strategy number instead. But that leads to trouble with core gist 
types vs btree_gist types. The core gist opclasses use RT*StrategyNumbers, but btree_gist creates 
opclasses with BT*StrategyNumbers. I don't see any way to ask ahead of time which class of strategy 
numbers are used by a given opclass. So I have code like this:

     *strat = RTEqualStrategyNumber;
     opname = "equality";
     *opid = get_opfamily_member(opfamily, opcintype, opcintype, *strat);

     /*
      * For the non-overlaps key elements,
      * try both RTEqualStrategyNumber and BTEqualStrategyNumber.
      * If you're using btree_gist then you'll need the latter.
      */
     if (!OidIsValid(*opid))
     {
         *strat = BTEqualStrategyNumber;
         *opid = get_opfamily_member(opfamily, opcintype, opcintype, *strat);
     }

I do a similar thing for foreign keys.

But that can't be right. I added a scary comment there in this patch, but I'll explain here too:

It's only by luck that RTEqualStrategyNumber (18) is bigger than any BT*StrategyNumber. If I checked 
in the reverse order, I would always find an operator---it would just sometimes be the wrong one! 
And what if someone has defined a new type+opclass with totally different strategy numbers? As far 
as I can tell, the gist AM doesn't require an opclass have any particular operators, only support 
functions, so the strategy numbers are "private" and can vary between opclasses.

What we want is a way to ask which operators mean equality & overlaps for a given opclass. But the 
strategy numbers aren't meaningful terms to ask the question.

So I think asking for "=" and "&&" is actually better here. Those will be correct for both core & 
btree_gist, and they should also match user expectations for custom types. They are what you'd use 
in a roll-your-own temporal constraint via EXCLUDE. We can also document that we implement WITHOUT 
OVERLAPS with those operator names, so people can get the right behavior from custom types.

(This also maybe lets us implement WITHOUT OVERLAPS for more than rangetypes, as you suggested. See 
below for more about that.)

It's taken me a while to grok the am/opclass/opfamily/amop interaction, and maybe I'm still missing 
something here. Let me know if that's the case!

> 2) In src/backend/parser/parse_utilcmd.c, transformIndexConstraint(), there is too much duplication 
> between the normal and the if (constraint->without_overlaps) case, like the whole not-null 
> constraints stuff at the end.  This should be one code block with a few conditionals inside.  Also, 
> the normal case deals with things like table inheritance, which the added parts do not.  Is this all 
> complete?

Cleaned things up here. I agree it's much better now.

And you're right, now you should be able to use an inherited column in a temporal PK/UQ constraint. 
I think I need a lot more test coverage for how this feature combines with inherited tables, so I'll 
work on that.

> I'm not sure the validateWithoutOverlaps() function is needed at this point in the code.

Agreed, I removed it and moved the is-it-a-rangetype check into the caller.

> We don't even need to 
> restrict this to range types.  Consider for example, it's possible that a type does not have a btree 
> equality operator.  We don't check that here either, but let the index code later check it.

That is very interesting. Perhaps we allow anything with equals and overlaps then?

Note that we need more for FOR PORTION OF, foreign keys, and foreign keys with CASCADE/SET. So it 
might be confusing if a type works with temporal PKs but not those other things. But if we 
documented what operators you need for each feature then you could implement as much as you liked.

I like this direction a lot. It matches what I suggested in the conversation about multiple WITHOUT 
OVERLAPS/PERIOD columns: rather than having foreign keys and FOR PORTION OF know how to find 
n-dimensional "leftovers" we could leave it up the type, and just call a documented operator. (We 
would need to add that operator for rangetypes btw, one that calls range_leftover_internal. It 
should return an array (not a multirange!) of the untouched parts of the record.) This makes it easy 
to support bi/tri/n-temporal, spatial, multiranges, etc.

(For spatial you probably want PostGIS instead, and I'm wary of over-abstracting here, but I like 
how this "leaves the door open" for PostGIS to eventually support spatial PKs/FKs.)

Please let me know what you think!

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Thank you again for such thorough reviews!

On Thu, Nov 16, 2023 at 11:12 PM jian he <jian.universality@gmail.com> wrote:
 > UPDATE FOR PORTION OF, may need insert privilege. We also need to document this.
 > Similarly, we also need to apply the above logic to DELETE FOR PORTION OF.

I don't think UPDATE/DELETE FOR PORTION OF is supposed to require INSERT permission.

Notionally the INSERTs are just to preserve what was there already, not to add new data.
The idea is that a temporal table is equivalent to a table with one row for every "instant",
i.e. one row per microsecond/second/day/whatever-time-resolution. Of course that would be too slow,
so we use PERIODs/ranges instead, but the behavior should be the same. Date's book has a good 
discussion of this idea.

I also checked the SQL:2011 draft standard, and there is a section called Access Rules in Part 2: 
SQL/Foundation for UPDATE and DELETE statements. Those sections say you need UPDATE/DELETE 
privileges, but say nothing about needing INSERT privileges. That is on page 949 and 972 of the PDFs 
from the "SQL:20nn Working Draft Documents" link at [1]. If someone has a copy of SQL:2016 maybe 
something was changed, but I would be surprised.

I also checked MariaDB and IBM DB2, the only two RDBMSes that implement FOR PORTION OF to my 
knowledge. (It is not in Oracle or MSSQL.) I created a table with one row, then gave another user 
privileges to SELECT & UPDATE, but not INSERT. In both cases, that user could execute an UPDATE FOR 
PORTION OF that resulted in new rows, but could not INSERT genuinely new rows. [2,3]

So instead of changing this I've updated the documentation to make it explicit that you do not need 
INSERT privilege to use FOR PORTION OF. I also documented which triggers will fire and in which order.

 > +  <para>
 > +   If the table has a <link
 > linkend="ddl-periods-application-periods">range column
 > +   or <literal>PERIOD</literal></link>, you may supply a
 >
 > should be
 >
 > + <para>
 > +  If the table has a range column or  <link
 > linkend="ddl-periods-application-periods">
 > +  <literal>PERIOD</literal></link>, you may supply a
 >
 > similarly the doc/src/sgml/ref/delete.sgml the link reference also broken.

Okay, changed.

 >  "given interval", "cut off" these words,  imho, feel not so clear.
 > We also need a document that:
 >  "UPDATE FOR PORTION OF" is UPDATE and INSERT (if overlaps).
 > If the "UPDATE FOR PORTION OF" range overlaps then
 > It will invoke triggers in the following order: before row update,
 > before row insert, after row insert. after row update.

Okay, reworked the docs for this.

 > src/test/regress/sql/for_portion_of.sql
 > You only need to create two triggers?
 > since for_portion_of_trigger only raises notice to output the triggers
 > meta info.

Changed.

v19 patch series attached, rebased to a11c9c42ea.

[1] https://web.archive.org/web/20230923221106/https://www.wiscorp.com/SQLStandards.html

[2] MariaDB test:

First create a table as the root user:

```
create table t (id int, ds date, de date, name text, period for valid_at (ds, de));
insert into t values (1, '2000-01-01', '2001-01-01', 'foo');
```

and give another user select & update privlege (but not insert):

```
create database paul;
use paul;
create user 'update_only'@'localhost' identified by 'test';
grant select, update on paul.t to 'update_only'@'localhost';
flush privileges;
```

Now as that user:

```
mysql -uupdate_only -p
use paul;
-- We can update the whole record:
update t for portion of valid_at from '2000-01-01' to '2001-01-01' set name = 'bar';
-- We can update a part of the record:
update t for portion of valid_at from '2000-01-01' to '2000-07-01' set name = 'baz';
select * from t;
+------+------------+------------+------+
| id   | ds         | de         | name |
+------+------------+------------+------+
|    1 | 2000-01-01 | 2000-07-01 | baz  |
|    1 | 2000-07-01 | 2001-01-01 | bar  |
+------+------------+------------+------+
-- We cannot insert:
insert into t values (2, '2000-01-01', '2001-01-01' 'another');
ERROR 1142 (42000): INSERT command denied to user 'update_only'@'localhost' for table `paul`.`t`
```

[3] IBM DB2 test:

```
mkdir ~/local/db2
cd ~/local/db2
tar xzvf ~/Downloads/v11.5.9_linuxx64_server_dec.tar.gz
cd server_dev
./db2_install # should put something at ~/sqllib
source ~/sqllib/db2profile
db2start  # but I got "The database manager is already active."
db2
create database paul -- first time only, note no semicolon
connect to paul
create table t (id integer, ds date not null, de date not null, name varchar(4000), period 
business_time (ds, de));
insert into t values (1, '2000-01-01', '2001-01-01', 'foo');
grant connect on database to user james;
grant select, update on t to user james;
```

Now as james:

```
source ~paul/sqllib/db2profile
db2
connect to paul
select * from paul.t;
update paul.t for portion of business_time from '2000-01-01' to '2000-06-01' set name = 'bar';
DB20000I  The SQL command completed successfully.
select * from paul.t;
insert into paul.t values (2, '2000-01-01', '2001-01-01', 'bar');
DB21034E  The command was processed as an SQL statement because it was not a
valid Command Line Processor command.  During SQL processing it returned:
SQL0551N  The statement failed because the authorization ID does not have the
required authorization or privilege to perform the operation.  Authorization
ID: "JAMES".  Operation: "INSERT". Object: "PAUL.T".  SQLSTATE=42501
```

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

On Sun, Dec 3, 2023 at 2:11 AM Paul Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> v19 patch series attached, rebased to a11c9c42ea.
>

this TODO:
 * TODO: It sounds like FOR PORTION OF might need to do something here too?
based on comments on ExprContext. I refactor a bit, and solved this TODO.

tring to the following TODO:
// TODO: Need to save context->mtstate->mt_transition_capture? (See
comment on ExecInsert)

but failed.
I also attached the trial, and also added the related test.

You can also use the test to check portion update with insert trigger
with "referencing old table as old_table new table as new_table"
situation.

Вложения

Re: SQL:2011 application time

От

jian he

Дата:

11 декабря 2023 г., 11:31:24

hi. some small issues....

diff --git a/src/backend/tcop/utility.c b/src/backend/tcop/utility.c
index e3ccf6c7f7..6781e55020 100644
--- a/src/backend/tcop/utility.c
+++ b/src/backend/tcop/utility.c
@@ -1560,7 +1560,7 @@ ProcessUtilitySlow(ParseState *pstate,
  true, /* check_rights */
  true, /* check_not_in_use */
  false, /* skip_build */
- false); /* quiet */
+ false); /* quiet */

Is the above part unnecessary?

diff --git a/src/backend/utils/adt/Makefile b/src/backend/utils/adt/Makefile
index 199eae525d..d04c75b398 100644
--- a/src/backend/utils/adt/Makefile
+++ b/src/backend/utils/adt/Makefile
@@ -78,6 +78,7 @@ OBJS = \
  oracle_compat.o \
  orderedsetaggs.o \
  partitionfuncs.o \
+ period.o \
  pg_locale.o \
  pg_lsn.o \
  pg_upgrade_support.o \
diff --git a/src/backend/utils/adt/period.c b/src/backend/utils/adt/period.c
new file mode 100644
index 0000000000..0ed4304e16
--- /dev/null
+++ b/src/backend/utils/adt/period.c
@@ -0,0 +1,56 @@
+/*-------------------------------------------------------------------------
+ *
+ * period.c
+ *   Functions to support periods.
+ *
+ *
+ * Portions Copyright (c) 1996-2022, PostgreSQL Global Development Group
+ * Portions Copyright (c) 1994, Regents of the University of California
+ *
+ *
+ * IDENTIFICATION
+ *   src/backend/utils/adt/period.c
+ *
+ *-------------------------------------------------------------------------
+ */
+#include "postgres.h"
+
+#include "executor/tuptable.h"
+#include "fmgr.h"
+#include "nodes/primnodes.h"
+#include "utils/fmgrprotos.h"
+#include "utils/period.h"
+#include "utils/rangetypes.h"
+
+Datum period_to_range(TupleTableSlot *slot, int startattno, int
endattno, Oid rangetype)
+{
+ Datum startvalue;
+ Datum endvalue;
+ Datum result;
+ bool startisnull;
+ bool endisnull;
+ LOCAL_FCINFO(fcinfo, 2);
+ FmgrInfo flinfo;
+ FuncExpr   *f;
+
+ InitFunctionCallInfoData(*fcinfo, &flinfo, 2, InvalidOid, NULL, NULL);
+ f = makeNode(FuncExpr);
+ f->funcresulttype = rangetype;
+ flinfo.fn_expr = (Node *) f;
+ flinfo.fn_extra = NULL;
+
+ /* compute oldvalue */
+ startvalue = slot_getattr(slot, startattno, &startisnull);
+ endvalue = slot_getattr(slot, endattno, &endisnull);
+
+ fcinfo->args[0].value = startvalue;
+ fcinfo->args[0].isnull = startisnull;
+ fcinfo->args[1].value = endvalue;
+ fcinfo->args[1].isnull = endisnull;
+
+ result = range_constructor2(fcinfo);
+ if (fcinfo->isnull)
+ elog(ERROR, "function %u returned NULL", flinfo.fn_oid);
+
+ return result;
+}

I am confused. so now I only apply v19, 0001 to 0003.
period_to_range function never used. maybe we can move this part to
0005-Add PERIODs.patch?
Also you add change in Makefile in 0003, meson.build change in 0005,
better put it on in 0005?

diff --git a/src/backend/parser/gram.y b/src/backend/parser/gram.y
index 5b110ca7fe..d54d84adf6 100644
--- a/src/backend/parser/gram.y
+++ b/src/backend/parser/gram.y

+/*
+ * We need to handle this shift/reduce conflict:
+ * FOR PORTION OF valid_at FROM INTERVAL YEAR TO MONTH TO foo.
+ * This is basically the classic "dangling else" problem, and we want a
+ * similar resolution: treat the TO as part of the INTERVAL, not as part of
+ * the FROM ... TO .... Users can add parentheses if that's a problem.
+ * TO just needs to be higher precedence than YEAR_P etc.
+ * TODO: I need to figure out a %prec solution before this gets committed!
+ */
+%nonassoc YEAR_P MONTH_P DAY_P HOUR_P MINUTE_P
+%nonassoc TO

this part will never happen?
since "FROM INTERVAL YEAR TO MONTH TO"
means "valid_at" will be interval range data type, which does not exist now.

  ri_PerformCheck(riinfo, &qkey, qplan,
  fk_rel, pk_rel,
  oldslot, NULL,
+ targetRangeParam, targetRange,
  true, /* must detect new rows */
  SPI_OK_SELECT);

@@ -905,6 +922,7 @@ RI_FKey_cascade_del(PG_FUNCTION_ARGS)
  ri_PerformCheck(riinfo, &qkey, qplan,
  fk_rel, pk_rel,
  oldslot, NULL,
+ -1, 0,
  true, /* must detect new rows */
  SPI_OK_DELETE);

@@ -1026,6 +1044,7 @@ RI_FKey_cascade_upd(PG_FUNCTION_ARGS)
  ri_PerformCheck(riinfo, &qkey, qplan,
  fk_rel, pk_rel,
  oldslot, newslot,
+ -1, 0,
  true, /* must detect new rows */
  SPI_OK_UPDATE);

@@ -1258,6 +1277,7 @@ ri_set(TriggerData *trigdata, bool is_set_null,
int tgkind)
  ri_PerformCheck(riinfo, &qkey, qplan,
  fk_rel, pk_rel,
  oldslot, NULL,
+ -1, 0,
  true, /* must detect new rows */
  SPI_OK_UPDATE);

@@ -2520,6 +2540,7 @@ ri_PerformCheck(const RI_ConstraintInfo *riinfo,
  RI_QueryKey *qkey, SPIPlanPtr qplan,
  Relation fk_rel, Relation pk_rel,
  TupleTableSlot *oldslot, TupleTableSlot *newslot,
+ int forPortionOfParam, Datum forPortionOf,
  bool detectNewRows, int expect_OK)

for all the refactor related to ri_PerformCheck, do you need (Datum) 0
instead of plain 0?

+  <para>
+   If the table has a range column or
+   <link linkend="ddl-periods-application-periods"><literal>PERIOD</literal></link>,
+   you may supply a <literal>FOR PORTION OF</literal> clause, and
your delete will
+   only affect rows that overlap the given interval. Furthermore, if
a row's span


https://influentialpoints.com/Training/basic_statistics_ranges.htm#:~:text=A%20range%20is%20two%20numbers,or%20the%20difference%20between%20them
So "range" is more accurate than "interval"?

+/* ----------
+ * ForPortionOfState()
+ *
+ * Copies a ForPortionOfState into the current memory context.
+ */
+static ForPortionOfState *
+CopyForPortionOfState(ForPortionOfState *src)
+{
+ ForPortionOfState *dst = NULL;
+ if (src) {
+ MemoryContext oldctx;
+ RangeType *r;
+ TypeCacheEntry *typcache;
+
+ /*
+ * Need to lift the FOR PORTION OF details into a higher memory context
+ * because cascading foreign key update/deletes can cause triggers to fire
+ * triggers, and the AfterTriggerEvents will outlive the FPO
+ * details of the original query.
+ */
+ oldctx = MemoryContextSwitchTo(TopTransactionContext);

should it be "Copy a ForPortionOfState into the TopTransactionContext"?

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

31 декабря 2023 г., 11:51:49

On Wed, Dec 6, 2023 at 12:59 AM Peter Eisentraut <peter@eisentraut.org> wrote:
 >
 > On 02.12.23 19:41, Paul Jungwirth wrote:
 > > So what do you think of this idea instead?:
 > >
 > > We could add a new (optional) support function to GiST that translates
 > > "well-known" strategy numbers into the opclass's own strategy numbers.
 >
 > I had some conversations about this behind the scenes.  I think this
 > idea makes sense.

Here is a patch series with the GiST stratnum support function added. I put this into a separate 
patch (before all the temporal ones), so it's easier to review. Then in the PK patch (now #2) we 
call that function to figure out the = and && operators. I think this is a big improvement.

I provide a couple "example" implementations:

- An identity function that returns whatever you give it. The core gist opclasses use this since 
they use the RT* constants. Even though not all opclasses support all strategies, it is okay to 
return a stratnum with no amop entry. You will just get an error when you try to make a temporal PK 
with that type as the WITHOUT OVERLAPS part (which is natural for the types we're talking about).

- A function that translates RT*StrategyNumbers to BT*StrategyNumbers when possible (just 
=/</<=/>/>=, and we really only need =). This is what the btree_gist opclasses use. (No other 
RT*StrategyNumber can be translated, which means you can only use these types for the non-WIHOUT 
OVERLAPS part, but again that is natural.)

I didn't add a similar function to GIN. It's not possible to use GIN for temporal PKs, so I don't 
think it makes sense.

## Foreign Keys

For FKs, I need a couple similar things:

- The ContainedBy operator (<@ for rangetypes).
- An aggregate function to combine referenced rows (instead of hardcoding range_agg as before).

I look up ContainedBy just as I'm doing with Equal & Overlap for PKs. The aggregate function is 
another optional support function.

I broke out that support function into another independent patch here. Then I updated by FKs patch 
to use it (and the ContainedBy operator).

## FOR PORTION OF

Then for FOR PORTION OF I need an intersect operator (*) and a new "leftovers" operator.

We have an intersect operator (for range & multirange at least), but no strategy number for it, thus 
no amop entry. My patch adds that, **but** it is neither a search operator ('s') nor ordering ('o'), 
so I've added a "portion" option ('p'). I'm not sure this is completely valid, since `FOR PORTION 
OF` is not really an *index* operation, but it does seem index-adjacent: you would only/usually use 
it on something with a temporal PK (which is an index). And it is an analogous situation, where 
pg_amop entries tell us how to implement the extensible parts. But if all this seems like the wrong 
approach please let me know.

The new leftovers operator similarly has 'p' for amoppurpose and another amop entry.

The leftovers operator returns an array of T, where T is the type of the valid_at column. Then we'll 
insert a new "leftovers" row for each array entry. So we aren't assuming only "before" and "after" 
(which doesn't work for multiranges or two-dimensional spaces as you'd have with bitemporal or spatial).

But now that "leftovers" are becoming more of an external-facing part of Postgres, I feel we should 
have a less silly name. (That's too bad, because "leftovers" is memorable and non-ambiguous, and 
computer pioneers used all kinds of silly names, so if you tell me I don't have to be quite so 
professional, maybe I'll go back to it.) I considered things like "without" or "multi-subtract" or 
"except". I went with "without portion", which is nice because it evokes FOR PORTION OF and doesn't 
limit the scope to rangetypes.

For the symbol I like `@-`. It conveys the similarity to subtraction, and "@" can be a mnemonic for 
"array". (Too bad we can't use `--`, ha ha.) I also thought about `@-@`, but that is used already by 
path_length and lseg_length, and maybe a non-commutative operator deserves a non-palindromic name.

The {multi,}range_without_portion procs & operators are broken out into a separate commit, and the 
FPO patch now uses them in the exec node. It always made me a little uneasy to have rangetype code 
in nodeModifyTable.c, and now it's gone.

Then the last thing I need for FOR PORTION OF is a "constructor". In SQL:2011 you use `FOR PORTION 
OF valid_at FROM '2000-01-01' TO '2010-01-01'`. But FROM and TO don't really work for non-range 
types. So I added an alternate syntax that is `FOR PORTION OF valid_at 
(tsmultirange(tsrange('2001-01-01', '2002-02-02'), tsrange('2003-03-03', '2004-04-04')))`. In other 
words parens wrapping a value of the type you're using. I still support FROM & TO for building a 
range type, so we follow the standard.

That's it for now. Multiranges should be fully supported (but need lots more tests), as well as 
custom types. I've updated some of the docs, but I need to go through them and clarify where things 
don't necessarily have to be ranges.

Rebased to cb44a8345e.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

02 января, 04:59:47

On 12/31/23 00:51, Paul Jungwirth wrote:
> That's it for now.

Here is another update. I fixed FOR PORTION OF on partitioned tables, in particular when the attnums 
are different from the root partition.

Rebased to cea89c93a1.

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

Re: SQL:2011 application time

От

jian he

Дата:

05 января, 08:06:28

On Tue, Jan 2, 2024 at 9:59 AM Paul Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> On 12/31/23 00:51, Paul Jungwirth wrote:
> > That's it for now.
>
> Here is another update. I fixed FOR PORTION OF on partitioned tables, in particular when the attnums
> are different from the root partition.
>
> Rebased to cea89c93a1.
>

Hi.

+/*
+ * range_without_portion_internal - Sets outputs and outputn to the ranges
+ * remaining and their count (respectively) after subtracting r2 from r1.
+ * The array should never contain empty ranges.
+ * The outputs will be ordered. We expect that outputs is an array of
+ * RangeType pointers, already allocated with two slots.
+ */
+void
+range_without_portion_internal(TypeCacheEntry *typcache, RangeType *r1,
+   RangeType *r2, RangeType **outputs, int *outputn)
+{
+ int cmp_l1l2,
+ cmp_l1u2,
+ cmp_u1l2,
+ cmp_u1u2;
+ RangeBound lower1,
+ lower2;
+ RangeBound upper1,
+ upper2;
+ bool empty1,
+ empty2;
+
+ range_deserialize(typcache, r1, &lower1, &upper1, &empty1);
+ range_deserialize(typcache, r2, &lower2, &upper2, &empty2);
+
+ if (empty1)
+ {
+ /* if r1 is empty then r1 - r2 is empty, so return zero results */
+ *outputn = 0;
+ return;
+ }
+ else if (empty2)
+ {
+ /* r2 is empty so the result is just r1 (which we know is not empty) */
+ outputs[0] = r1;
+ *outputn = 1;
+ return;
+ }
+
+ /*
+ * Use the same logic as range_minus_internal,
+ * but support the split case
+ */
+ cmp_l1l2 = range_cmp_bounds(typcache, &lower1, &lower2);
+ cmp_l1u2 = range_cmp_bounds(typcache, &lower1, &upper2);
+ cmp_u1l2 = range_cmp_bounds(typcache, &upper1, &lower2);
+ cmp_u1u2 = range_cmp_bounds(typcache, &upper1, &upper2);
+
+ if (cmp_l1l2 < 0 && cmp_u1u2 > 0)
+ {
+ lower2.inclusive = !lower2.inclusive;
+ lower2.lower = false; /* it will become the upper bound */
+ outputs[0] = make_range(typcache, &lower1, &lower2, false, NULL);
+
+ upper2.inclusive = !upper2.inclusive;
+ upper2.lower = true; /* it will become the lower bound */
+ outputs[1] = make_range(typcache, &upper2, &upper1, false, NULL);
+
+ *outputn = 2;
+ }
+ else if (cmp_l1u2 > 0 || cmp_u1l2 < 0)
+ {
+ outputs[0] = r1;
+ *outputn = 1;
+ }
+ else if (cmp_l1l2 >= 0 && cmp_u1u2 <= 0)
+ {
+ *outputn = 0;
+ }
+ else if (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0)
+ {
+ lower2.inclusive = !lower2.inclusive;
+ lower2.lower = false; /* it will become the upper bound */
+ outputs[0] = make_range(typcache, &lower1, &lower2, false, NULL);
+ *outputn = 1;
+ }
+ else if (cmp_l1l2 >= 0 && cmp_u1u2 >= 0 && cmp_l1u2 <= 0)
+ {
+ upper2.inclusive = !upper2.inclusive;
+ upper2.lower = true; /* it will become the lower bound */
+ outputs[0] = make_range(typcache, &upper2, &upper1, false, NULL);
+ *outputn = 1;
+ }
+ else
+ {
+ elog(ERROR, "unexpected case in range_without_portion");
+ }
+}

I am confused.
say condition: " (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0)"
the following code will only run PartA, never run PartB?

`
else if (cmp_l1l2 >= 0 && cmp_u1u2 <= 0)
    PartA
else if (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0)
    PartB
`

minimum example:
#include<stdio.h>
#include<string.h>
#include<stdlib.h>
#include<assert.h>
int
main(void)
{
    int cmp_l1l2;
    int cmp_u1u2;
    int cmp_u1l2;
    int cmp_l1u2;
    cmp_l1u2 = -1;
    cmp_l1l2 = 0;
    cmp_u1u2 = 0;
    cmp_u1l2 = 0;
    assert(cmp_u1l2 == 0);
if (cmp_l1u2 > 0 || cmp_u1l2 < 0)
        printf("calling partA\n");
    else if (cmp_l1l2 >= 0 && cmp_u1u2 <= 0)
        printf("calling partB\n");
    else if (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0)
        printf("calling partC\n");
}

I am confused with the name "range_without_portion", I think
"range_not_overlap" would be better.

select numrange(1.1, 2.2) @- numrange(2.0, 3.0);
the result is not the same as
select numrange(2.0, 3.0) @- numrange(1.1, 2.2);

So your categorize oprkind as 'b' for operator "@-" is wrong?
select oprname,oprkind,oprcanhash,oprcanmerge,oprleft,oprright,oprresult,oprcode
from pg_operator
where oprname = '@-';

aslo
select count(*), oprkind from pg_operator group by oprkind;
there are only 5% are prefix operators.
maybe we should design it as:
1. if both inputs are empty range, the result array is empty.
2. if both inputs are non-empty and never overlaps, put both of them
to the result array.
3. if one input is empty another one is not, then put the non-empty
one into the result array.

after applying the patch: now the catalog data seems not correct to me.
SELECT  a1.amopfamily
        ,a1.amoplefttype::regtype
        ,a1.amoprighttype
        ,a1.amopstrategy
        ,amoppurpose
        ,amopsortfamily
        ,amopopr
        ,op.oprname
        ,am.amname
FROM    pg_amop as a1 join pg_operator op on op.oid = a1.amopopr
join    pg_am   am on am.oid = a1.amopmethod
where   amoppurpose = 'p';
output:
 amopfamily | amoplefttype  | amoprighttype | amopstrategy |
amoppurpose | amopsortfamily | amopopr | oprname | amname
------------+---------------+---------------+--------------+-------------+----------------+---------+---------+--------
       2593 | box           |           603 |           31 | p
  |              0 |     803 | #       | gist
       3919 | anyrange      |          3831 |           31 | p
  |              0 |    3900 | *       | gist
       6158 | anymultirange |          4537 |           31 | p
  |              0 |    4394 | *       | gist
       3919 | anyrange      |          3831 |           32 | p
  |              0 |    8747 | @-      | gist
       6158 | anymultirange |          4537 |           32 | p
  |              0 |    8407 | @-      | gist
(5 rows)

select  oprcode, oprname, oprleft::regtype
from    pg_operator opr
where   opr.oprname in ('#','*','@-')
and     oprleft = oprright
and     oprleft in (603,3831,4537);
output:

          oprcode           | oprname |    oprleft
----------------------------+---------+---------------
 box_intersect              | #       | box
 range_intersect            | *       | anyrange
 multirange_intersect       | *       | anymultirange
 range_without_portion      | @-      | anyrange
 multirange_without_portion | @-      | anymultirange
(5 rows)

should amoppurpose = 'p' is true apply to ' @-' operator?

catalog-pg-amop.html:
`
amopsortfamily oid (references pg_opfamily.oid):
The B-tree operator family this entry sorts according to, if an
ordering operator; zero if a search operator
`
you should also update the above entry, the amopsortfamily is also
zero for "portion operator" for the newly implemented "portion
operator".


v21-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
 create mode 100644 src/backend/utils/adt/period.c
 create mode 100644 src/include/utils/period.h
you should put these two files to v21-0008-Add-PERIODs.patch.
it's not related to that patch, it also makes people easy to review.

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

06 января, 03:19:53

Getting caught up on reviews from November and December:

On 11/19/23 22:57, jian he wrote:
 >
 > I believe the following part should fail. Similar tests on
 > src/test/regress/sql/generated.sql. line begin 347.
 >
 > drop table if exists gtest23a,gtest23x cascade;
 > CREATE TABLE gtest23a (x int4range, y int4range,
 > CONSTRAINT gtest23a_pk PRIMARY KEY (x, y WITHOUT OVERLAPS));
 > CREATE TABLE gtest23x (a int4range, b int4range GENERATED ALWAYS AS
 > ('empty') STORED,
 > FOREIGN KEY (a, PERIOD b ) REFERENCES gtest23a(x, PERIOD y) ON UPDATE
 > CASCADE);  -- should be error?

Okay, I've added a restriction for temporal FKs too. But note this will
change once the PERIODs patch (the last one here) is finished. When the
generated column is for a PERIOD, there will be logic to "reroute" the
updates to the constituent start/end columns instead.

 > begin;
 > drop table if exists fk, pk cascade;
 > CREATE TABLE pk (id int4range, valid_at int4range,
 > CONSTRAINT pk_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)
 > );
 > CREATE TABLE fk (
 > id int4range,valid_at tsrange, parent_id int4range,
 > CONSTRAINT fk FOREIGN KEY (parent_id, valid_at)
 >      REFERENCES pk
 > );
 > rollback;
 > --
 > the above query will return an error: number of referencing and
 > referenced columns for foreign key disagree.
 > but if you look at it closely, primary key and foreign key columns both are two!
 > The error should be saying valid_at should be specified with "PERIOD".

Ah okay, thanks for the clarification! This is tricky because the user
left out the PERIOD on the fk side, and left out the entire pk side, so
those columns are just implicit. So there is no PERIOD anywhere.
But I agree that if the pk has WITHOUT OVERLAPS, we should expect a
corresponding PERIOD modifier on the fk side and explain that that's
what's missing. The attached patches include that.

 > I found out other issues in v18.
 > I first do `git apply` then  `git diff --check`, there is a white
 > space error in v18-0005.

Fixed, thanks!

 > You also need to change update.sgml and delete.sgml <title>Outputs</title> part.
 > Since at most, it can return 'UPDATE 3' or 'DELETE 3'.

This doesn't sound correct to me. An UPDATE or DELETE can target many
rows. Also I don't think the inserted "leftovers" should be included in
these counts. They represent the rows updated/deleted.

 > --the following query should work?
 > drop table pk;
 > CREATE table pk(a numrange PRIMARY key,b text);
 > insert into pk values('[1,10]');
 > create or replace function demo1() returns void as $$
 > declare lb numeric default 1; up numeric default 3;
 > begin
 >      update pk for portion of a from lb to up set b = 'lb_to_up';
 >      return;
 > end
 > $$ language plpgsql;
 > select * from demo1();

Hmm this is a tough one. It is correct that the `FROM __ TO __` values cannot be column references. 
They are computed up front, not per row. One reason is they are used to search the table. In fact 
the standard basically allows nothing but literal strings here. See section 14.14, page 971 then 
look up <point in time> on page 348 and <datetime value expression> on page 308. The most 
flexibility you get is you can add/subtract an interval to the datetime literal. We are already well 
past that by allowing expressions, (certain) functions, parameters, etc.

OTOH in your plpgsql example they are not really columns. They just get represented as ColumnRefs 
and then passed to transformColumnRef. I'm surprised plpgsql does it that way. As a workaround you 
could use `EXECUTE format(...)`, but I'd love to make that work as you show instead. I'll keep 
working on this one but it's not done yet. Perhaps I can move the restriction into 
analysis/planning. If anyone has any advice it'd be welcome.

On 12/6/23 05:22, jian he wrote:
 > this TODO:
 >   * TODO: It sounds like FOR PORTION OF might need to do something here too?
 > based on comments on ExprContext. I refactor a bit, and solved this TODO.

The patch looks wrong to me. We need to range targeted by `FROM __
TO __` to live for the whole statement, not just one tuple (see just
above). That's why it gets computed in the Init function node.

I don't think that TODO is needed anymore at all. Older versions of the
patch had more expressions besides this one, and I think it was those I
was concerned about. So I've removed the comment here.

 > tring to the following TODO:
 > // TODO: Need to save context->mtstate->mt_transition_capture? (See
 > comment on ExecInsert)
 >
 > but failed.
 > I also attached the trial, and also added the related test.
 >
 > You can also use the test to check portion update with insert trigger
 > with "referencing old table as old_table new table as new_table"
 > situation.

Thank you for the test case! This is very helpful. So the problem is
`referencing new table as new_table` gets lost. I don't have a fix yet
but I'll work on it.

On 12/11/23 00:31, jian he wrote:
 > - false); /* quiet */
 > + false); /* quiet */
 >
 > Is the above part unnecessary?

Good catch! Fixed.

 > I am confused. so now I only apply v19, 0001 to 0003.
 > period_to_range function never used. maybe we can move this part to
 > 0005-Add PERIODs.patch?
 > Also you add change in Makefile in 0003, meson.build change in 0005,
 > better put it on in 0005?

You're right, those changes should have been in the PERIODs patch. Moved.

 > +/*
 > + * We need to handle this shift/reduce conflict:
 > + * FOR PORTION OF valid_at FROM INTERVAL YEAR TO MONTH TO foo.
 > + * This is basically the classic "dangling else" problem, and we want a
 > + * similar resolution: treat the TO as part of the INTERVAL, not as part of
 > + * the FROM ... TO .... Users can add parentheses if that's a problem.
 > + * TO just needs to be higher precedence than YEAR_P etc.
 > + * TODO: I need to figure out a %prec solution before this gets committed!
 > + */
 > +%nonassoc YEAR_P MONTH_P DAY_P HOUR_P MINUTE_P
 > +%nonassoc TO
 >
 > this part will never happen?
 > since "FROM INTERVAL YEAR TO MONTH TO"
 > means "valid_at" will be interval range data type, which does not exist now.

It appears still needed to me. Without those lines I get 4 shift/reduce
conflicts. Are you seeing something different? Or if you have a better
solution I'd love to add it. I definitely need to fix this before that
patch gets applied.

 > for all the refactor related to ri_PerformCheck, do you need (Datum) 0
 > instead of plain 0?

Casts added.

 > +  <para>
 > +   If the table has a range column or
 > +   <link linkend="ddl-periods-application-periods"><literal>PERIOD</literal></link>,
 > +   you may supply a <literal>FOR PORTION OF</literal> clause, and
 > your delete will
 > +   only affect rows that overlap the given interval. Furthermore, if
 > a row's span
 >
 > 

https://influentialpoints.com/Training/basic_statistics_ranges.htm#:~:text=A%20range%20is%20two%20numbers,or%20the%20difference%20between%20them
 > So "range" is more accurate than "interval"?

I don't think we should be using R to define the terms "range" and
"interval", which both already have meanings in Postgres, SQL, and the
literature for temporal databases. But I'm planning to revise the docs'
terminology here anyway. Some temporal database texts use "interval"
in this sense, and I thought it was a decent term to mean "range or
PERIOD". But now we need something to mean "range or multirange or
custom type or PERIOD". Actually "portion" seems like maybe the best
term, since the SQL syntax `FOR PORTION OF` reinforces that term. If you
have suggestions I'm happy for ideas.

 > +/* ----------
 > + * ForPortionOfState()
 > + *
 > + * Copies a ForPortionOfState into the current memory context.
 > + */
 > +static ForPortionOfState *
 > +CopyForPortionOfState(ForPortionOfState *src)
 > +{
 > + ForPortionOfState *dst = NULL;
 > + if (src) {
 > + MemoryContext oldctx;
 > + RangeType *r;
 > + TypeCacheEntry *typcache;
 > +
 > + /*
 > + * Need to lift the FOR PORTION OF details into a higher memory context
 > + * because cascading foreign key update/deletes can cause triggers to fire
 > + * triggers, and the AfterTriggerEvents will outlive the FPO
 > + * details of the original query.
 > + */
 > + oldctx = MemoryContextSwitchTo(TopTransactionContext);
 >
 > should it be "Copy a ForPortionOfState into the TopTransactionContext"?

You're right, the other function comments here use imperative mood. Changed.

New patches attached, rebased to 43b46aae12. I'll work on your feedback from Jan 4 next. Thanks!

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

On Sat, Jan 6, 2024 at 8:20 AM Paul Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> Getting caught up on reviews from November and December:
>
>
> New patches attached, rebased to 43b46aae12. I'll work on your feedback from Jan 4 next. Thanks!
>

+/*
+ * ForPortionOfClause
+ * representation of FOR PORTION OF <period-name> FROM <ts> TO <te>
+ * or FOR PORTION OF <period-name> (<target>)
+ */
+typedef struct ForPortionOfClause
+{
+ NodeTag type;
+ char   *range_name;
+ int range_name_location;
+ Node   *target;
+ Node   *target_start;
+ Node   *target_end;
+} ForPortionOfClause;

"range_name_location" can be just "location"?
generally most of the struct put the "location" to the last field in the struct.
(that's the pattern I found all over other code)

+ if (isUpdate)
+ {
+ /*
+ * Now make sure we update the start/end time of the record.
+ * For a range col (r) this is `r = r * targetRange`.
+ */
+ Expr *rangeSetExpr;
+ TargetEntry *tle;
+
+ strat = RTIntersectStrategyNumber;
+ GetOperatorFromCanonicalStrategy(opclass, InvalidOid, "intersects",
"FOR PORTION OF", &opid, &strat);
+ rangeSetExpr = (Expr *) makeSimpleA_Expr(AEXPR_OP, get_opname(opid),
+ (Node *) copyObject(rangeVar), targetExpr,
+ forPortionOf->range_name_location);
+ rangeSetExpr = (Expr *) transformExpr(pstate, (Node *) rangeSetExpr,
EXPR_KIND_UPDATE_PORTION);
+
+ /* Make a TLE to set the range column */
+ result->rangeSet = NIL;
+ tle = makeTargetEntry(rangeSetExpr, range_attno, range_name, false);
+ result->rangeSet = lappend(result->rangeSet, tle);
+
+ /* Mark the range column as requiring update permissions */
+ target_perminfo->updatedCols = bms_add_member(target_perminfo->updatedCols,
+  range_attno - FirstLowInvalidHeapAttributeNumber);
+ }
+ else
+ result->rangeSet = NIL;
I think the name "rangeSet" is misleading, since "set" is generally
related to a set of records.
but here it's more about the "range intersect".

in ExecDelete
we have following code pattern:
ExecDeleteEpilogue(context, resultRelInfo, tupleid, oldtuple, changingPart);
if (processReturning && resultRelInfo->ri_projectReturning)
{
....
if (!table_tuple_fetch_row_version(resultRelationDesc, tupleid,
   SnapshotAny, slot))
elog(ERROR, "failed to fetch deleted tuple for DELETE RETURNING");
}
}

but the ExecForPortionOfLeftovers is inside ExecDeleteEpilogue.
meaning even without ExecForPortionOfLeftovers, we can still call
table_tuple_fetch_row_version
also if it was *not* concurrently updated, then our current process
holds the lock until the ending of the transaction, i think.
So the following TODO is unnecessary?

+ /*
+ * Get the range of the old pre-UPDATE/DELETE tuple,
+ * so we can intersect it with the FOR PORTION OF target
+ * and see if there are any "leftovers" to insert.
+ *
+ * We have already locked the tuple in ExecUpdate/ExecDelete
+ * (TODO: if it was *not* concurrently updated, does
table_tuple_update lock the tuple itself?
+ * I don't found the code for that yet, and maybe it depends on the AM?)
+ * and it has passed EvalPlanQual.
+ * Make sure we're looking at the most recent version.
+ * Otherwise concurrent updates of the same tuple in READ COMMITTED
+ * could insert conflicting "leftovers".
+ */
+ if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
tupleid, SnapshotAny, oldtupleSlot))
+ elog(ERROR, "failed to fetch tuple for FOR PORTION OF");
+

+/* ----------------------------------------------------------------
+ * ExecForPortionOfLeftovers
+ *
+ * Insert tuples for the untouched timestamp of a row in a FOR
+ * PORTION OF UPDATE/DELETE
+ * ----------------------------------------------------------------
+ */
+static void
+ExecForPortionOfLeftovers(ModifyTableContext *context,
+   EState *estate,
+   ResultRelInfo *resultRelInfo,
+   ItemPointer tupleid)

maybe change the comment to
"Insert tuples for the not intersection of a row in a FOR PORTION OF
UPDATE/DELETE."

+ deconstruct_array(DatumGetArrayTypeP(allLeftovers),
typcache->type_id, typcache->typlen,
+   typcache->typbyval, typcache->typalign, &leftovers, NULL, &nleftovers);
+
+ if (nleftovers > 0)
+ {
I think add something like assert nleftovers >=0 && nleftovers <= 2
(assume only range not multirange) would improve readability.


+  <para>
+   If the table has a range column or
+   <link linkend="ddl-periods-application-periods"><literal>PERIOD</literal></link>,
+   you may supply a <literal>FOR PORTION OF</literal> clause, and
your delete will
+   only affect rows that overlap the given interval. Furthermore, if
a row's span
+   extends outside the <literal>FOR PORTION OF</literal> bounds, then
your delete
+   will only change the span within those bounds. In effect you are
deleting any
+   moment targeted by <literal>FOR PORTION OF</literal> and no moments outside.
+  </para>
+
+  <para>
+   Specifically, after <productname>PostgreSQL</productname> deletes
the existing row,
+   it will <literal>INSERT</literal>
+   new rows whose range or start/end column(s) receive the remaining
span outside
+   the targeted bounds, containing the original values in other columns.
+   There will be zero to two inserted records,
+   depending on whether the original span extended before the targeted
+   <literal>FROM</literal>, after the targeted <literal>TO</literal>,
both, or neither.
+  </para>
+
+  <para>
+   These secondary inserts fire <literal>INSERT</literal> triggers. First
+   <literal>BEFORE DELETE</literal> triggers first, then
+   <literal>BEFORE INSERT</literal>, then <literal>AFTER INSERT</literal>,
+   then <literal>AFTER DELETE</literal>.
+  </para>
+
+  <para>
+   These secondary inserts do not require <literal>INSERT</literal>
privilege on the table.
+   This is because conceptually no new information has been added.
The inserted rows only preserve
+   existing data about the untargeted time period. Note this may
result in users firing <literal>INSERT</literal>
+   triggers who don't have insert privileges, so be careful about
<literal>SECURITY DEFINER</literal> trigger functions!
+  </para>

I think you need to wrap them into a big paragraph, otherwise they
lose the context?
please see the attached build sql-update.html.

also I think
+   <link linkend="ddl-periods-application-periods"><literal>PERIOD</literal></link>,
should shove into Add-PERIODs.patch.

otherwise you cannot build  Add-UPDATE-DELETE-FOR-PORTION-OF.patch
without all the patches.
I think the "FOR-PORTION-OF" feature is kind of independ?
Because, IMHO, "for portion" is a range datum interacting with another
single range datum, but the primary key with  "WITHOUT OVERLAPS", is
range datum interacting with a set of range datums.
now I cannot  just git apply v22-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch.
That maybe would make it more difficult to get commited?

Вложения

Screenshot from 2024-01-08 11-09-50.png

Re: SQL:2011 application time

От

jian he

Дата:

09 января, 08:33:33

On Tue, Jan 9, 2024 at 2:54 AM Paul Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> On 1/8/24 06:54, jian he wrote:
>  > On Fri, Jan 5, 2024 at 1:06 PM jian he <jian.universality@gmail.com> wrote:
>  >
>  > range_intersect returns the intersection of two ranges.
>  > I think here we are doing the opposite.
>  > names the main SQL function "range_not_intersect" and the internal
>  > function as "range_not_intersect_internal" should be fine.
>  > so people don't need to understand the meaning of "portion".
>
> Thank you for helping me figure out a name here! I realize that can be a bike-sheddy kind of
> discussion, so let me share some of my principles.
>
> Range and multirange are highly mathematically "pure", and that's something I value in them. It
> makes them more general-purpose, less encumbered by edge cases, easier to combine, and easier to
> reason about. Preserving that close connection to math is a big goal.
>
> What I've called `without_portion` is (like) a closed form of minus (hence `@-` for the operator).
> Minus isn't closed under everything (e.g. ranges), so `without_portion` adds arrays---much as to
> close subtraction we add negative numbers and to close division we add rationals). We get the same
> effect from multiranges, but that only buys us range support. It would be awesome to support
> arbitrary types: ranges, multiranges, mdranges, boxes, polygons, inets, etc., so I think an array is
> the way to go here. And then each array element is a "leftover". What do we call a closed form of
> minus that returns arrays?
>
> Of course "minus" is already taken (and you wouldn't expect it to return arrays anyway), which is
> why I'm thinking about names like "without" or "except". Or maybe "multi-minus". I still think
> "without portion" is the closest to capturing everything above (and avoids ambiguity with other SQL
> operations). And the "portion" ties the operator to `FOR PORTION OF`, which is its purpose. But I
> wouldn't be surprised if there were something better.
>

Thanks for the deep explanation. I think the name
range_without_portion is better than my range_not_intersect.
I learned a lot.
I also googled " bike-sheddy". haha.

src5=# select range_without_portion(numrange(1.0,3.0,'[]'),
numrange(1.5,2.0,'(]'));
   range_without_portion
---------------------------
 {"[1.0,1.5]","(2.0,3.0]"}
(1 row)

src5=# \gdesc
        Column         |   Type
-----------------------+-----------
 range_without_portion | numeric[]
(1 row)

src5=# \df range_without_portion
                                 List of functions
   Schema   |         Name          | Result data type | Argument data
types | Type
------------+-----------------------+------------------+---------------------+------
 pg_catalog | range_without_portion | anyarray         | anyrange,
anyrange  | func
(1 row)

so apparently, you cannot from (anyrange, anyrange) get anyarray the
element type is anyrange.
I cannot find the documented explanation in
https://www.postgresql.org/docs/current/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC

anyrange is POLYMORPHIC, anyarray is POLYMORPHIC,
but I suppose, getting an anyarray the element type is anyrange would be hard.

Re: SQL:2011 application time

От

vignesh C

Дата:

09 января, 09:46:34

On Sat, 6 Jan 2024 at 05:50, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
>
> Getting caught up on reviews from November and December:
>
> On 11/19/23 22:57, jian he wrote:
>  >
>  > I believe the following part should fail. Similar tests on
>  > src/test/regress/sql/generated.sql. line begin 347.
>  >
>  > drop table if exists gtest23a,gtest23x cascade;
>  > CREATE TABLE gtest23a (x int4range, y int4range,
>  > CONSTRAINT gtest23a_pk PRIMARY KEY (x, y WITHOUT OVERLAPS));
>  > CREATE TABLE gtest23x (a int4range, b int4range GENERATED ALWAYS AS
>  > ('empty') STORED,
>  > FOREIGN KEY (a, PERIOD b ) REFERENCES gtest23a(x, PERIOD y) ON UPDATE
>  > CASCADE);  -- should be error?
>
> Okay, I've added a restriction for temporal FKs too. But note this will
> change once the PERIODs patch (the last one here) is finished. When the
> generated column is for a PERIOD, there will be logic to "reroute" the
> updates to the constituent start/end columns instead.
>
>  > begin;
>  > drop table if exists fk, pk cascade;
>  > CREATE TABLE pk (id int4range, valid_at int4range,
>  > CONSTRAINT pk_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)
>  > );
>  > CREATE TABLE fk (
>  > id int4range,valid_at tsrange, parent_id int4range,
>  > CONSTRAINT fk FOREIGN KEY (parent_id, valid_at)
>  >      REFERENCES pk
>  > );
>  > rollback;
>  > --
>  > the above query will return an error: number of referencing and
>  > referenced columns for foreign key disagree.
>  > but if you look at it closely, primary key and foreign key columns both are two!
>  > The error should be saying valid_at should be specified with "PERIOD".
>
> Ah okay, thanks for the clarification! This is tricky because the user
> left out the PERIOD on the fk side, and left out the entire pk side, so
> those columns are just implicit. So there is no PERIOD anywhere.
> But I agree that if the pk has WITHOUT OVERLAPS, we should expect a
> corresponding PERIOD modifier on the fk side and explain that that's
> what's missing. The attached patches include that.
>
>  > I found out other issues in v18.
>  > I first do `git apply` then  `git diff --check`, there is a white
>  > space error in v18-0005.
>
> Fixed, thanks!
>
>  > You also need to change update.sgml and delete.sgml <title>Outputs</title> part.
>  > Since at most, it can return 'UPDATE 3' or 'DELETE 3'.
>
> This doesn't sound correct to me. An UPDATE or DELETE can target many
> rows. Also I don't think the inserted "leftovers" should be included in
> these counts. They represent the rows updated/deleted.
>
>  > --the following query should work?
>  > drop table pk;
>  > CREATE table pk(a numrange PRIMARY key,b text);
>  > insert into pk values('[1,10]');
>  > create or replace function demo1() returns void as $$
>  > declare lb numeric default 1; up numeric default 3;
>  > begin
>  >      update pk for portion of a from lb to up set b = 'lb_to_up';
>  >      return;
>  > end
>  > $$ language plpgsql;
>  > select * from demo1();
>
> Hmm this is a tough one. It is correct that the `FROM __ TO __` values cannot be column references.
> They are computed up front, not per row. One reason is they are used to search the table. In fact
> the standard basically allows nothing but literal strings here. See section 14.14, page 971 then
> look up <point in time> on page 348 and <datetime value expression> on page 308. The most
> flexibility you get is you can add/subtract an interval to the datetime literal. We are already well
> past that by allowing expressions, (certain) functions, parameters, etc.
>
> OTOH in your plpgsql example they are not really columns. They just get represented as ColumnRefs
> and then passed to transformColumnRef. I'm surprised plpgsql does it that way. As a workaround you
> could use `EXECUTE format(...)`, but I'd love to make that work as you show instead. I'll keep
> working on this one but it's not done yet. Perhaps I can move the restriction into
> analysis/planning. If anyone has any advice it'd be welcome.
>
> On 12/6/23 05:22, jian he wrote:
>  > this TODO:
>  >   * TODO: It sounds like FOR PORTION OF might need to do something here too?
>  > based on comments on ExprContext. I refactor a bit, and solved this TODO.
>
> The patch looks wrong to me. We need to range targeted by `FROM __
> TO __` to live for the whole statement, not just one tuple (see just
> above). That's why it gets computed in the Init function node.
>
> I don't think that TODO is needed anymore at all. Older versions of the
> patch had more expressions besides this one, and I think it was those I
> was concerned about. So I've removed the comment here.
>
>  > tring to the following TODO:
>  > // TODO: Need to save context->mtstate->mt_transition_capture? (See
>  > comment on ExecInsert)
>  >
>  > but failed.
>  > I also attached the trial, and also added the related test.
>  >
>  > You can also use the test to check portion update with insert trigger
>  > with "referencing old table as old_table new table as new_table"
>  > situation.
>
> Thank you for the test case! This is very helpful. So the problem is
> `referencing new table as new_table` gets lost. I don't have a fix yet
> but I'll work on it.
>
> On 12/11/23 00:31, jian he wrote:
>  > - false); /* quiet */
>  > + false); /* quiet */
>  >
>  > Is the above part unnecessary?
>
> Good catch! Fixed.
>
>  > I am confused. so now I only apply v19, 0001 to 0003.
>  > period_to_range function never used. maybe we can move this part to
>  > 0005-Add PERIODs.patch?
>  > Also you add change in Makefile in 0003, meson.build change in 0005,
>  > better put it on in 0005?
>
> You're right, those changes should have been in the PERIODs patch. Moved.
>
>  > +/*
>  > + * We need to handle this shift/reduce conflict:
>  > + * FOR PORTION OF valid_at FROM INTERVAL YEAR TO MONTH TO foo.
>  > + * This is basically the classic "dangling else" problem, and we want a
>  > + * similar resolution: treat the TO as part of the INTERVAL, not as part of
>  > + * the FROM ... TO .... Users can add parentheses if that's a problem.
>  > + * TO just needs to be higher precedence than YEAR_P etc.
>  > + * TODO: I need to figure out a %prec solution before this gets committed!
>  > + */
>  > +%nonassoc YEAR_P MONTH_P DAY_P HOUR_P MINUTE_P
>  > +%nonassoc TO
>  >
>  > this part will never happen?
>  > since "FROM INTERVAL YEAR TO MONTH TO"
>  > means "valid_at" will be interval range data type, which does not exist now.
>
> It appears still needed to me. Without those lines I get 4 shift/reduce
> conflicts. Are you seeing something different? Or if you have a better
> solution I'd love to add it. I definitely need to fix this before that
> patch gets applied.
>
>  > for all the refactor related to ri_PerformCheck, do you need (Datum) 0
>  > instead of plain 0?
>
> Casts added.
>
>  > +  <para>
>  > +   If the table has a range column or
>  > +   <link linkend="ddl-periods-application-periods"><literal>PERIOD</literal></link>,
>  > +   you may supply a <literal>FOR PORTION OF</literal> clause, and
>  > your delete will
>  > +   only affect rows that overlap the given interval. Furthermore, if
>  > a row's span
>  >
>  >
>
https://influentialpoints.com/Training/basic_statistics_ranges.htm#:~:text=A%20range%20is%20two%20numbers,or%20the%20difference%20between%20them
>  > So "range" is more accurate than "interval"?
>
> I don't think we should be using R to define the terms "range" and
> "interval", which both already have meanings in Postgres, SQL, and the
> literature for temporal databases. But I'm planning to revise the docs'
> terminology here anyway. Some temporal database texts use "interval"
> in this sense, and I thought it was a decent term to mean "range or
> PERIOD". But now we need something to mean "range or multirange or
> custom type or PERIOD". Actually "portion" seems like maybe the best
> term, since the SQL syntax `FOR PORTION OF` reinforces that term. If you
> have suggestions I'm happy for ideas.
>
>  > +/* ----------
>  > + * ForPortionOfState()
>  > + *
>  > + * Copies a ForPortionOfState into the current memory context.
>  > + */
>  > +static ForPortionOfState *
>  > +CopyForPortionOfState(ForPortionOfState *src)
>  > +{
>  > + ForPortionOfState *dst = NULL;
>  > + if (src) {
>  > + MemoryContext oldctx;
>  > + RangeType *r;
>  > + TypeCacheEntry *typcache;
>  > +
>  > + /*
>  > + * Need to lift the FOR PORTION OF details into a higher memory context
>  > + * because cascading foreign key update/deletes can cause triggers to fire
>  > + * triggers, and the AfterTriggerEvents will outlive the FPO
>  > + * details of the original query.
>  > + */
>  > + oldctx = MemoryContextSwitchTo(TopTransactionContext);
>  >
>  > should it be "Copy a ForPortionOfState into the TopTransactionContext"?
>
> You're right, the other function comments here use imperative mood. Changed.
>
> New patches attached, rebased to 43b46aae12. I'll work on your feedback from Jan 4 next. Thanks!

One of the test has failed in CFBot at [1] with:

diff -U3 /tmp/cirrus-ci-build/src/test/regress/expected/generated.out
/tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/generated.out
--- /tmp/cirrus-ci-build/src/test/regress/expected/generated.out
2024-01-06 00:34:48.078691251 +0000
+++ /tmp/cirrus-ci-build/src/test/recovery/tmp_check/results/generated.out
2024-01-06 00:42:08.782292390 +0000
@@ -19,7 +19,9 @@
  table_name | column_name | dependent_column
 ------------+-------------+------------------
  gtest1     | a           | b
-(1 row)
+ pt         | de          | p
+ pt         | ds          | p
+(3 rows)

More details of the failure is available at [2].

[1] - https://cirrus-ci.com/task/5739983420522496
[2] -
https://api.cirrus-ci.com/v1/artifact/task/5739983420522496/log/src/test/recovery/tmp_check/log/regress_log_027_stream_regress

Regards,
Vignesh

Re: SQL:2011 application time

От

Peter Eisentraut

Дата:

11 января, 17:44:50

On 31.12.23 09:51, Paul Jungwirth wrote:
> On Wed, Dec 6, 2023 at 12:59 AM Peter Eisentraut <peter@eisentraut.org> 
> wrote:
>  >
>  > On 02.12.23 19:41, Paul Jungwirth wrote:
>  > > So what do you think of this idea instead?:
>  > >
>  > > We could add a new (optional) support function to GiST that translates
>  > > "well-known" strategy numbers into the opclass's own strategy numbers.
>  >
>  > I had some conversations about this behind the scenes.  I think this
>  > idea makes sense.
> 
> Here is a patch series with the GiST stratnum support function added. I 
> put this into a separate patch (before all the temporal ones), so it's 
> easier to review. Then in the PK patch (now #2) we call that function to 
> figure out the = and && operators. I think this is a big improvement.

I like this solution.

Here is some more detailed review of the first two patches.  (I reviewed 
v20; I see you have also posted v21, but they don't appear very 
different for this purpose.)

v20-0001-Add-stratnum-GiST-support-function.patch

* contrib/btree_gist/Makefile

Needs corresponding meson.build updates.

* contrib/btree_gist/btree_gist--1.7--1.8.sql

Should gist_stratnum_btree() live in contrib/btree_gist/ or in core?
Are there other extensions that use the btree strategy numbers for
gist?

+ALTER OPERATOR FAMILY gist_vbit_ops USING gist ADD
+   FUNCTION  12 (varbit, varbit) gist_stratnum_btree (int2) ;

Is there a reason for the extra space after FUNCTION here (repeated
throughout the file)?

+-- added in 1.4:

What is the purpose of these "added in" comments?

v20-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch

* contrib/btree_gist/Makefile

Also update meson.build.

* contrib/btree_gist/sql/without_overlaps.sql

Maybe also insert a few values, to verify that the constraint actually
does something?

* doc/src/sgml/ref/create_table.sgml

Is "must have a range type" still true?  With the changes to the
strategy number mapping, any type with a supported operator class
should work?

* src/backend/utils/adt/ruleutils.c

Is it actually useful to add an argument to
decompile_column_index_array()?  Wouldn't it be easier to just print
the " WITHOUT OVERLAPS" in the caller after returning from it?

* src/include/access/gist_private.h

The added function gistTranslateStratnum() isn't really "private" to
gist.  So access/gist.h would be a better place for it.

Also, most other functions there appear to be named "GistSomething",
so a more consistent name might be GistTranslateStratnum.

* src/include/access/stratnum.h

The added StrategyIsValid() doesn't seem that useful?  Plenty of
existing code just compares against InvalidStrategy, and there is only
one caller for the new function.  I suggest to do without it.

* src/include/commands/defrem.h

We are using two terms here, well-known strategy number and canonical
strategy number, to mean the same thing (I think?).  Let's try to
stick with one.  Or explain the relationship?

If these points are addressed, and maybe with another round of checking 
that all corner cases are covered, I think these patches (0001 and 0002) 
are close to ready.

Re: SQL:2011 application time

От

jian he

Дата:

14 января, 03:00:00

On Thu, Jan 11, 2024 at 10:44 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 31.12.23 09:51, Paul Jungwirth wrote:
> > On Wed, Dec 6, 2023 at 12:59 AM Peter Eisentraut <peter@eisentraut.org>
> > wrote:
> >  >
> >  > On 02.12.23 19:41, Paul Jungwirth wrote:
> >  > > So what do you think of this idea instead?:
> >  > >
> >  > > We could add a new (optional) support function to GiST that translates
> >  > > "well-known" strategy numbers into the opclass's own strategy numbers.
> >  >
> >  > I had some conversations about this behind the scenes.  I think this
> >  > idea makes sense.
> >
> > Here is a patch series with the GiST stratnum support function added. I
> > put this into a separate patch (before all the temporal ones), so it's
> > easier to review. Then in the PK patch (now #2) we call that function to
> > figure out the = and && operators. I think this is a big improvement.
>
> I like this solution.
>
> Here is some more detailed review of the first two patches.  (I reviewed
> v20; I see you have also posted v21, but they don't appear very
> different for this purpose.)
>
> v20-0001-Add-stratnum-GiST-support-function.patch
>
> * contrib/btree_gist/Makefile
>
> Needs corresponding meson.build updates.

fixed

>
> * contrib/btree_gist/btree_gist--1.7--1.8.sql
>
> Should gist_stratnum_btree() live in contrib/btree_gist/ or in core?
> Are there other extensions that use the btree strategy numbers for
> gist?
>
> +ALTER OPERATOR FAMILY gist_vbit_ops USING gist ADD
> +   FUNCTION  12 (varbit, varbit) gist_stratnum_btree (int2) ;
>
> Is there a reason for the extra space after FUNCTION here (repeated
> throughout the file)?
>

fixed.

> +-- added in 1.4:
>
> What is the purpose of these "added in" comments?
>
>
> v20-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch
>
> * contrib/btree_gist/Makefile
>
> Also update meson.build.

fixed.

> * contrib/btree_gist/sql/without_overlaps.sql
>
> Maybe also insert a few values, to verify that the constraint actually
> does something?
>

I added an ok and failed INSERT.

> * doc/src/sgml/ref/create_table.sgml
>
> Is "must have a range type" still true?  With the changes to the
> strategy number mapping, any type with a supported operator class
> should work?
>
> * src/backend/utils/adt/ruleutils.c
>
> Is it actually useful to add an argument to
> decompile_column_index_array()?  Wouldn't it be easier to just print
> the " WITHOUT OVERLAPS" in the caller after returning from it?

fixed. i just print it right after decompile_column_index_array.

> * src/include/access/gist_private.h
>
> The added function gistTranslateStratnum() isn't really "private" to
> gist.  So access/gist.h would be a better place for it.
>
> Also, most other functions there appear to be named "GistSomething",
> so a more consistent name might be GistTranslateStratnum.
>
> * src/include/access/stratnum.h
>
> The added StrategyIsValid() doesn't seem that useful?  Plenty of
> existing code just compares against InvalidStrategy, and there is only
> one caller for the new function.  I suggest to do without it.
>

If more StrategyNumber are used in the future, will StrategyIsValid()
make sense?

> * src/include/commands/defrem.h
>
> We are using two terms here, well-known strategy number and canonical
> strategy number, to mean the same thing (I think?).  Let's try to
> stick with one.  Or explain the relationship?
>

In my words:
for range type, well-known strategy number and canonical strategy
number are the same thing.
For types Gist does not natively support equality, like int4,
GetOperatorFromCanonicalStrategy will pass RTEqualStrategyNumber from
ComputeIndexAttrs
and return BTEqualStrategyNumber.

> If these points are addressed, and maybe with another round of checking
> that all corner cases are covered, I think these patches (0001 and 0002)
> are close to ready.
>

the following are my review:

+ /* exclusionOpNames can be non-NIL if we are creating a partition */
+ if (iswithoutoverlaps && exclusionOpNames == NIL)
+ {
+ indexInfo->ii_ExclusionOps = palloc_array(Oid, nkeycols);
+ indexInfo->ii_ExclusionProcs = palloc_array(Oid, nkeycols);
+ indexInfo->ii_ExclusionStrats = palloc_array(uint16, nkeycols);
+ }
I am not sure the above comment is related to the code

+/*
+ * Returns the btree number for equals, otherwise invalid.
+ *
+ * This is for GiST opclasses in btree_gist (and maybe elsewhere)
+ * that use the BT*StrategyNumber constants.
+ */
+Datum
+gist_stratnum_btree(PG_FUNCTION_ARGS)
+{
+ StrategyNumber strat = PG_GETARG_UINT16(0);
+
+ switch (strat)
+ {
+ case RTEqualStrategyNumber:
+ PG_RETURN_UINT16(BTEqualStrategyNumber);
+ case RTLessStrategyNumber:
+ PG_RETURN_UINT16(BTLessStrategyNumber);
+ case RTLessEqualStrategyNumber:
+ PG_RETURN_UINT16(BTLessEqualStrategyNumber);
+ case RTGreaterStrategyNumber:
+ PG_RETURN_UINT16(BTGreaterStrategyNumber);
+ case RTGreaterEqualStrategyNumber:
+ PG_RETURN_UINT16(BTGreaterEqualStrategyNumber);
+ default:
+ PG_RETURN_UINT16(InvalidStrategy);
+ }
the above comment seems not right?
even though currently strat will only be RTEqualStrategyNumber.

+void
+GetOperatorFromCanonicalStrategy(Oid opclass,
+ Oid atttype,
+ const char *opname,
+ Oid *opid,
+ StrategyNumber *strat)
+{
+ Oid opfamily;
+ Oid opcintype;
+ StrategyNumber opstrat = *strat;
+
+ *opid = InvalidOid;
+
+ if (get_opclass_opfamily_and_input_type(opclass,
+ &opfamily,
+ &opcintype))
+ {
+ /*
+ * Ask the opclass to translate to its internal stratnum
+ *
+ * For now we only need GiST support, but this could support
+ * other indexams if we wanted.
+ */
+ *strat = gistTranslateStratnum(opclass, opstrat);
+ if (!StrategyIsValid(*strat))
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("no %s operator found for WITHOUT OVERLAPS constraint", opname),
+ errdetail("Could not translate strategy number %u for opclass %d.",
+ opstrat, opclass),
+ errhint("Define a stratnum support function for your GiST opclass.")));
+
+ *opid = get_opfamily_member(opfamily, opcintype, opcintype, *strat);
+ }
+
+ if (!OidIsValid(*opid))
+ {
+ HeapTuple opftuple;
+ Form_pg_opfamily opfform;
+
+ /*
+ * attribute->opclass might not explicitly name the opfamily,
+ * so fetch the name of the selected opfamily for use in the
+ * error message.
+ */
+ opftuple = SearchSysCache1(OPFAMILYOID,
+   ObjectIdGetDatum(opfamily));
+ if (!HeapTupleIsValid(opftuple))
+ elog(ERROR, "cache lookup failed for opfamily %u",
+ opfamily);
+ opfform = (Form_pg_opfamily) GETSTRUCT(opftuple);
+
+ ereport(ERROR,
+ (errcode(ERRCODE_UNDEFINED_OBJECT),
+ errmsg("no %s operator found for WITHOUT OVERLAPS constraint", opname),
+ errdetail("There must be an %s operator within opfamily \"%s\" for
type \"%s\".",
+   opname,
+   NameStr(opfform->opfname),
+   format_type_be(atttype))));
+ }
+}
I refactored this function.
GetOperatorFromCanonicalStrategy called both for normal and WITHOUT OVERLAPS.
so errmsg("no %s operator found for WITHOUT OVERLAPS constraint",
opname) would be misleading
for columns without "WITHOUT OVERLAPS".
Also since that error part was deemed unreachable, it would make the
error verbose, I guess.

--- a/src/bin/psql/describe.c
+++ b/src/bin/psql/describe.c
@@ -2379,6 +2379,10 @@ describeOneTableDetails(const char *schemaname,
  else
  appendPQExpBufferStr(&buf, ", false AS indisreplident");
  appendPQExpBufferStr(&buf, ", c2.reltablespace");
+ if (pset.sversion >= 170000)
+ appendPQExpBufferStr(&buf, ", con.conwithoutoverlaps");
+ else
+ appendPQExpBufferStr(&buf, ", false AS conwithoutoverlaps");

I don't know how to verify it.
I think it should be:
+ if (pset.sversion >= 170000)
+      appendPQExpBufferStr(&buf, ", con.conwithoutoverlaps");

I refactored the 0002 commit message.
The original commit message seems outdated.
I put all the related changes into one attachment.

Вложения

v1-0001-minor-refactor.no-cfbot

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

18 января, 06:59:06

Hello,

Here are new patches consolidating feedback from several emails.
I haven't addressed everything but I think I'm overdue for a reply:

On 1/4/24 21:06, jian he wrote:
 >
 > I am confused.
 > say condition: " (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0)"
 > the following code will only run PartA, never run PartB?
 >
 > `
 > else if (cmp_l1l2 >= 0 && cmp_u1u2 <= 0)
 >      PartA
 > else if (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0)
 >      PartB
 > `
 >
 > minimum example:
 > #include<stdio.h>
 > #include<string.h>
 > #include<stdlib.h>
 > #include<assert.h>
 > int
 > main(void)
 > {
 >      int cmp_l1l2;
 >      int cmp_u1u2;
 >      int cmp_u1l2;
 >      int cmp_l1u2;
 >      cmp_l1u2 = -1;
 >      cmp_l1l2 = 0;
 >      cmp_u1u2 = 0;
 >      cmp_u1l2 = 0;
 >      assert(cmp_u1l2 == 0);
 > if (cmp_l1u2 > 0 || cmp_u1l2 < 0)
 >          printf("calling partA\n");
 >      else if (cmp_l1l2 >= 0 && cmp_u1u2 <= 0)
 >          printf("calling partB\n");
 >      else if (cmp_l1l2 <= 0 && cmp_u1l2 >= 0 && cmp_u1u2 <= 0)
 >          printf("calling partC\n");
 > }

All of the branches are used. I've attached a `without_portion.c` minimal example showing different 
cases. For ranges it helps to go through the Allen relationships 
(https://en.wikipedia.org/wiki/Allen%27s_interval_algebra) to make a comprehensive check. (But note 
that our operators don't exactly match that terminology, and it's important to consider 
closed-vs-open and unbounded cases.)

 > I am confused with the name "range_without_portion", I think
 > "range_not_overlap" would be better.

I think I covered this in my other reply and we are now in agreement, but if that's mistaken let 
know me.

 > select numrange(1.1, 2.2) @- numrange(2.0, 3.0);
 > the result is not the same as
 > select numrange(2.0, 3.0) @- numrange(1.1, 2.2);

Correct, @- is not commutative.

 > So your categorize oprkind as 'b' for operator "@-" is wrong?
 > select oprname,oprkind,oprcanhash,oprcanmerge,oprleft,oprright,oprresult,oprcode
 > from pg_operator
 > where oprname = '@-';

'b' is the correct oprkind. It is a binary (infix) operator.

 > aslo
 > select count(*), oprkind from pg_operator group by oprkind;
 > there are only 5% are prefix operators.
 > maybe we should design it as:
 > 1. if both inputs are empty range, the result array is empty.
 > 2. if both inputs are non-empty and never overlaps, put both of them
 > to the result array.
 > 3. if one input is empty another one is not, then put the non-empty
 > one into the result array.

Also covered before, but if any of this still applies please let me know.

 > after applying the patch: now the catalog data seems not correct to me.
 > SELECT  a1.amopfamily
 >          ,a1.amoplefttype::regtype
 >          ,a1.amoprighttype
 >          ,a1.amopstrategy
 >          ,amoppurpose
 >          ,amopsortfamily
 >          ,amopopr
 >          ,op.oprname
 >          ,am.amname
 > FROM    pg_amop as a1 join pg_operator op on op.oid = a1.amopopr
 > join    pg_am   am on am.oid = a1.amopmethod
 > where   amoppurpose = 'p';
 > output:
 >   amopfamily | amoplefttype  | amoprighttype | amopstrategy |
 > amoppurpose | amopsortfamily | amopopr | oprname | amname
 > 

------------+---------------+---------------+--------------+-------------+----------------+---------+---------+--------
 >         2593 | box           |           603 |           31 | p
 >    |              0 |     803 | #       | gist
 >         3919 | anyrange      |          3831 |           31 | p
 >    |              0 |    3900 | *       | gist
 >         6158 | anymultirange |          4537 |           31 | p
 >    |              0 |    4394 | *       | gist
 >         3919 | anyrange      |          3831 |           32 | p
 >    |              0 |    8747 | @-      | gist
 >         6158 | anymultirange |          4537 |           32 | p
 >    |              0 |    8407 | @-      | gist
 > (5 rows)
 >
 > select  oprcode, oprname, oprleft::regtype
 > from    pg_operator opr
 > where   opr.oprname in ('#','*','@-')
 > and     oprleft = oprright
 > and     oprleft in (603,3831,4537);
 > output:
 >
 >            oprcode           | oprname |    oprleft
 > ----------------------------+---------+---------------
 >   box_intersect              | #       | box
 >   range_intersect            | *       | anyrange
 >   multirange_intersect       | *       | anymultirange
 >   range_without_portion      | @-      | anyrange
 >   multirange_without_portion | @-      | anymultirange
 > (5 rows)

This seems correct. '#' is the name of the box overlaps operator. Probably I should add a box @- 
operator too. But see below. . . .

 > should amoppurpose = 'p' is true apply to ' @-' operator?

Yes.

 > catalog-pg-amop.html:
 > `
 > amopsortfamily oid (references pg_opfamily.oid):
 > The B-tree operator family this entry sorts according to, if an
 > ordering operator; zero if a search operator
 > `
 > you should also update the above entry, the amopsortfamily is also
 > zero for "portion operator" for the newly implemented "portion
 > operator".

Okay, done.

 > v21-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch
 >   create mode 100644 src/backend/utils/adt/period.c
 >   create mode 100644 src/include/utils/period.h
 > you should put these two files to v21-0008-Add-PERIODs.patch.
 > it's not related to that patch, it also makes people easy to review.

You're right, sorry!

On 1/8/24 16:00, jian he wrote:
 >
 > +/*
 > + * ForPortionOfClause
 > + * representation of FOR PORTION OF <period-name> FROM <ts> TO <te>
 > + * or FOR PORTION OF <period-name> (<target>)
 > + */
 > +typedef struct ForPortionOfClause
 > +{
 > + NodeTag type;
 > + char   *range_name;
 > + int range_name_location;
 > + Node   *target;
 > + Node   *target_start;
 > + Node   *target_end;
 > +} ForPortionOfClause;
 >
 > "range_name_location" can be just "location"?
 > generally most of the struct put the "location" to the last field in the struct.
 > (that's the pattern I found all over other code)

Agreed, done.

 > + if (isUpdate)
 > + {
 > + /*
 > + * Now make sure we update the start/end time of the record.
 > + * For a range col (r) this is `r = r * targetRange`.
 > + */
 > + Expr *rangeSetExpr;
 > + TargetEntry *tle;
 > +
 > + strat = RTIntersectStrategyNumber;
 > + GetOperatorFromCanonicalStrategy(opclass, InvalidOid, "intersects",
 > "FOR PORTION OF", &opid, &strat);
 > + rangeSetExpr = (Expr *) makeSimpleA_Expr(AEXPR_OP, get_opname(opid),
 > + (Node *) copyObject(rangeVar), targetExpr,
 > + forPortionOf->range_name_location);
 > + rangeSetExpr = (Expr *) transformExpr(pstate, (Node *) rangeSetExpr,
 > EXPR_KIND_UPDATE_PORTION);
 > +
 > + /* Make a TLE to set the range column */
 > + result->rangeSet = NIL;
 > + tle = makeTargetEntry(rangeSetExpr, range_attno, range_name, false);
 > + result->rangeSet = lappend(result->rangeSet, tle);
 > +
 > + /* Mark the range column as requiring update permissions */
 > + target_perminfo->updatedCols = bms_add_member(target_perminfo->updatedCols,
 > +  range_attno - FirstLowInvalidHeapAttributeNumber);
 > + }
 > + else
 > + result->rangeSet = NIL;
 > I think the name "rangeSet" is misleading, since "set" is generally
 > related to a set of records.
 > but here it's more about the "range intersect".

Okay, I can see that. I used "rangeSet" because we add it to the SET clause of the UPDATE command. 
Here I've changed it to rangeTargetList. I think this matches other code and better indicates what 
it holds. Any objections?

In the PERIOD patch we will need two TLEs here (that's why it's a List): one for the start column 
and one for the end column.

 > in ExecDelete
 > we have following code pattern:
 > ExecDeleteEpilogue(context, resultRelInfo, tupleid, oldtuple, changingPart);
 > if (processReturning && resultRelInfo->ri_projectReturning)
 > {
 > ....
 > if (!table_tuple_fetch_row_version(resultRelationDesc, tupleid,
 >     SnapshotAny, slot))
 > elog(ERROR, "failed to fetch deleted tuple for DELETE RETURNING");
 > }
 > }
 >
 > but the ExecForPortionOfLeftovers is inside ExecDeleteEpilogue.
 > meaning even without ExecForPortionOfLeftovers, we can still call
 > table_tuple_fetch_row_version
 > also if it was *not* concurrently updated, then our current process
 > holds the lock until the ending of the transaction, i think.
 > So the following TODO is unnecessary?
 >
 > + /*
 > + * Get the range of the old pre-UPDATE/DELETE tuple,
 > + * so we can intersect it with the FOR PORTION OF target
 > + * and see if there are any "leftovers" to insert.
 > + *
 > + * We have already locked the tuple in ExecUpdate/ExecDelete
 > + * (TODO: if it was *not* concurrently updated, does
 > table_tuple_update lock the tuple itself?
 > + * I don't found the code for that yet, and maybe it depends on the AM?)
 > + * and it has passed EvalPlanQual.
 > + * Make sure we're looking at the most recent version.
 > + * Otherwise concurrent updates of the same tuple in READ COMMITTED
 > + * could insert conflicting "leftovers".
 > + */
 > + if (!table_tuple_fetch_row_version(resultRelInfo->ri_RelationDesc,
 > tupleid, SnapshotAny, oldtupleSlot))
 > + elog(ERROR, "failed to fetch tuple for FOR PORTION OF");

I think you're right. According to the comments on TM_Result (returned by table_tuple_update), a 
TM_Ok indicates that the lock was acquired.

 > +/* ----------------------------------------------------------------
 > + * ExecForPortionOfLeftovers
 > + *
 > + * Insert tuples for the untouched timestamp of a row in a FOR
 > + * PORTION OF UPDATE/DELETE
 > + * ----------------------------------------------------------------
 > + */
 > +static void
 > +ExecForPortionOfLeftovers(ModifyTableContext *context,
 > +   EState *estate,
 > +   ResultRelInfo *resultRelInfo,
 > +   ItemPointer tupleid)
 >
 > maybe change the comment to
 > "Insert tuples for the not intersection of a row in a FOR PORTION OF
 > UPDATE/DELETE."

Changed to "untouched portion".

 > + deconstruct_array(DatumGetArrayTypeP(allLeftovers),
 > typcache->type_id, typcache->typlen,
 > +   typcache->typbyval, typcache->typalign, &leftovers, NULL, &nleftovers);
 > +
 > + if (nleftovers > 0)
 > + {
 > I think add something like assert nleftovers >=0 && nleftovers <= 2
 > (assume only range not multirange) would improve readability.

I added the first assert. The second is not true for non-range types.

 > +  <para>
 > +   If the table has a range column or
 > +   <link linkend="ddl-periods-application-periods"><literal>PERIOD</literal></link>,
 > +   you may supply a <literal>FOR PORTION OF</literal> clause, and
 > your delete will
 > +   only affect rows that overlap the given interval. Furthermore, if
 > a row's span
 > +   extends outside the <literal>FOR PORTION OF</literal> bounds, then
 > your delete
 > +   will only change the span within those bounds. In effect you are
 > deleting any
 > +   moment targeted by <literal>FOR PORTION OF</literal> and no moments outside.
 > +  </para>
 > +
 > +  <para>
 > +   Specifically, after <productname>PostgreSQL</productname> deletes
 > the existing row,
 > +   it will <literal>INSERT</literal>
 > +   new rows whose range or start/end column(s) receive the remaining
 > span outside
 > +   the targeted bounds, containing the original values in other columns.
 > +   There will be zero to two inserted records,
 > +   depending on whether the original span extended before the targeted
 > +   <literal>FROM</literal>, after the targeted <literal>TO</literal>,
 > both, or neither.
 > +  </para>
 > +
 > +  <para>
 > +   These secondary inserts fire <literal>INSERT</literal> triggers. First
 > +   <literal>BEFORE DELETE</literal> triggers first, then
 > +   <literal>BEFORE INSERT</literal>, then <literal>AFTER INSERT</literal>,
 > +   then <literal>AFTER DELETE</literal>.
 > +  </para>
 > +
 > +  <para>
 > +   These secondary inserts do not require <literal>INSERT</literal>
 > privilege on the table.
 > +   This is because conceptually no new information has been added.
 > The inserted rows only preserve
 > +   existing data about the untargeted time period. Note this may
 > result in users firing <literal>INSERT</literal>
 > +   triggers who don't have insert privileges, so be careful about
 > <literal>SECURITY DEFINER</literal> trigger functions!
 > +  </para>
 >
 > I think you need to wrap them into a big paragraph, otherwise they
 > lose the context?
 > please see the attached build sql-update.html.

Still TODO.

 > also I think
 > +   <link linkend="ddl-periods-application-periods"><literal>PERIOD</literal></link>,
 > should shove into Add-PERIODs.patch.
 >
 > otherwise you cannot build  Add-UPDATE-DELETE-FOR-PORTION-OF.patch
 > without all the patches.

Fixed.

 > I think the "FOR-PORTION-OF" feature is kind of independ?
 > Because, IMHO, "for portion" is a range datum interacting with another
 > single range datum, but the primary key with  "WITHOUT OVERLAPS", is
 > range datum interacting with a set of range datums.
 > now I cannot  just git apply v22-0006-Add-UPDATE-DELETE-FOR-PORTION-OF.patch.
 > That maybe would make it more difficult to get commited?

Still TODO.

On 1/8/24 21:33, jian he wrote:
 >
 > src5=# select range_without_portion(numrange(1.0,3.0,'[]'),
 > numrange(1.5,2.0,'(]'));
 >     range_without_portion
 > ---------------------------
 >   {"[1.0,1.5]","(2.0,3.0]"}
 > (1 row)
 >
 > src5=# \gdesc
 >          Column         |   Type
 > -----------------------+-----------
 >   range_without_portion | numeric[]
 > (1 row)
 >
 > src5=# \df range_without_portion
 >                                   List of functions
 >     Schema   |         Name          | Result data type | Argument data
 > types | Type
 > ------------+-----------------------+------------------+---------------------+------
 >   pg_catalog | range_without_portion | anyarray         | anyrange,
 > anyrange  | func
 > (1 row)
 >
 > so apparently, you cannot from (anyrange, anyrange) get anyarray the
 > element type is anyrange.
 > I cannot find the documented explanation in
 > https://www.postgresql.org/docs/current/extend-type-system.html#EXTEND-TYPES-POLYMORPHIC
 >
 > anyrange is POLYMORPHIC, anyarray is POLYMORPHIC,
 > but I suppose, getting an anyarray the element type is anyrange would be hard.

You're right, that is a problem.

I think the right approach is to make intersect and without_portion just be support functions, not 
operators. Then I don't need to introduce the new 'p' amop strategy at all, which seemed like a 
dubious idea anyway. Then the without_portion function can return a SETOF instead of an array.

Another idea is to add more polymorphic types, anyrangearray and anymultirangearray, but maybe that 
is too big a thing. OTOH I have wanted those same types before. I will take a stab at it.

On 1/11/24 06:44, Peter Eisentraut wrote:
 > Here is some more detailed review of the first two patches.  (I reviewed v20; I see you have also
 > posted v21, but they don't appear very different for this purpose.)
 >
 > v20-0001-Add-stratnum-GiST-support-function.patch
 >
 > * contrib/btree_gist/Makefile
 >
 > Needs corresponding meson.build updates.

Fixed.

 > * contrib/btree_gist/btree_gist--1.7--1.8.sql
 >
 > Should gist_stratnum_btree() live in contrib/btree_gist/ or in core?
 > Are there other extensions that use the btree strategy numbers for
 > gist?

Moved. None of our other contrib extensions use it. I thought it would be friendly to offer it to 
outside extensions, but maybe that is too speculative.

 > +ALTER OPERATOR FAMILY gist_vbit_ops USING gist ADD
 > +   FUNCTION  12 (varbit, varbit) gist_stratnum_btree (int2) ;
 >
 > Is there a reason for the extra space after FUNCTION here (repeated
 > throughout the file)?

Fixed.

 > +-- added in 1.4:
 >
 > What is the purpose of these "added in" comments?

I added those to help me make sure I was including every type in the extension, but I've taken them 
out here.

 > v20-0002-Add-temporal-PRIMARY-KEY-and-UNIQUE-constraints.patch
 >
 > * contrib/btree_gist/Makefile
 >
 > Also update meson.build.

Done.

 > * contrib/btree_gist/sql/without_overlaps.sql
 >
 > Maybe also insert a few values, to verify that the constraint actually
 > does something?

Done.

 > * doc/src/sgml/ref/create_table.sgml
 >
 > Is "must have a range type" still true?  With the changes to the
 > strategy number mapping, any type with a supported operator class
 > should work?

Updated. Probably more docs to come; I want to go through them all now that we support more types.

 > * src/backend/utils/adt/ruleutils.c
 >
 > Is it actually useful to add an argument to
 > decompile_column_index_array()?  Wouldn't it be easier to just print
 > the " WITHOUT OVERLAPS" in the caller after returning from it?

Okay, done.

 > * src/include/access/gist_private.h
 >
 > The added function gistTranslateStratnum() isn't really "private" to
 > gist.  So access/gist.h would be a better place for it.

Moved.

 > Also, most other functions there appear to be named "GistSomething",
 > so a more consistent name might be GistTranslateStratnum.
 >
 > * src/include/access/stratnum.h

Changed.

 > The added StrategyIsValid() doesn't seem that useful?  Plenty of
 > existing code just compares against InvalidStrategy, and there is only
 > one caller for the new function.  I suggest to do without it.
 >
 > * src/include/commands/defrem.h

Okay, removed.

 > We are using two terms here, well-known strategy number and canonical
 > strategy number, to mean the same thing (I think?).  Let's try to
 > stick with one.  Or explain the relationship?

True. Changed everything to "well-known" which seems like a better match for what's going on.

I haven't gone through jian he's Jan 13 patch yet, but since he was also implementing Peter's 
requests I thought I should share what I have. I did this work a while ago, but I was hoping to 
finish the TODOs above first, and then we got hit with a winter storm that knocked out power. Sorry 
to cause duplicate work!

Rebased to 2f35c14cfb.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

On 1/24/24 08:32, Peter Eisentraut wrote:
 > On 18.01.24 04:59, Paul Jungwirth wrote:
 >> Here are new patches consolidating feedback from several emails.
 >
 > I have committed 0001 and 0002 (the primary key support).

Thanks Peter! I noticed the comment on gist_stratnum_btree was out-of-date, so here is a tiny patch 
correcting it.

Also the remaining patches with some updates:

I fixed the dependency issues with PERIODs and their (hidden) GENERATED range columns. This has been 
causing test failures and bugging me since I reordered the patches at PgCon, so I'm glad to finally 
clean it up. The PERIOD should have an INTERNAL dependency on the range column, but then when you 
dropped the table the dependency code thought the whole table was part of the INTERNAL dependency, 
so the drop would fail. The PERIOD patch here fixes the dependency logic. (I guess this is the first 
time a column has been an internal dependency of something.)

I also fixed an error message when you try to change the type of a start/end column used by a 
PERIOD. Previously the error message would complain about the GENERATED column, not the PERIOD, 
which seems confusing. In fact it was non-deterministic, depending on which pg_depend record the 
index returned first.

On 12/6/23 05:22, jian he wrote:
 > tring to the following TODO:
 > // TODO: Need to save context->mtstate->mt_transition_capture? (See
 > comment on ExecInsert)
 >
 > but failed.
 > I also attached the trial, and also added the related test.
 >
 > You can also use the test to check portion update with insert trigger
 > with "referencing old table as old_table new table as new_table"
 > situation.

Thank you for the very helpful test case here. I fixed the issue of not passing along the transition 
table. But there is still more work to do here I think:

- The AFTER INSERT FOR EACH ROW triggers have *both* leftover rows in the NEW table. Now the docs do 
say that for AFTER triggers, a named transition table can see all the changes from the *statement* 
(although that seems pretty weird to me), but the inserts are two *separate* statements. I think the 
SQL:2011 standard is fairly clear about that. So each time the trigger fires we should still get 
just one row in the transition table.

- The AFTER INSERT FOR EACH STATEMENT triggers never fire. That happens outside ExecInsert (in 
ExecModifyTable). In fact there is a bunch of stuff in ExecModifyTable that maybe we need to do when 
we insert leftovers. Do we even need a separate exec node, perhaps wrapping ExecModifyTable? I'm not 
sure that would give us the correct trigger ordering for the triggers on the implicit insert 
statement(s) vs the explicit update/delete statement, so maybe it does all need to be part of the 
single node. But still I think we need to be more careful about memory, especially the per-tuple 
context.

I'll keep working on that, but at least in this round of patches the transition tables aren't 
missing completely.

My plan is still to replace the 'p' amoppurpose operators with just support functions. I want to do 
that next, although as Peter requested I'll also start focusing more narrowly on the foreign key 
patches.

Rebased to 46a0cd4cef.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

Re: SQL:2011 application time

От

Peter Eisentraut

Дата:

25 января, 09:31:44

On 24.01.24 23:06, Paul Jungwirth wrote:
> On 1/24/24 08:32, Peter Eisentraut wrote:
>  > On 18.01.24 04:59, Paul Jungwirth wrote:
>  >> Here are new patches consolidating feedback from several emails.
>  >
>  > I have committed 0001 and 0002 (the primary key support).
> 
> Thanks Peter! I noticed the comment on gist_stratnum_btree was 
> out-of-date, so here is a tiny patch correcting it.

committed that

Re: SQL:2011 application time

От

jian he

Дата:

29 января, 03:00:00

I fixed your tests, some of your tests can be simplified, (mainly
primary key constraint is unnecessary for the failed tests)
also your foreign key patch test table, temporal_rng is created at
line 141, and we use it at around line 320.
it's hard to get the definition of temporal_rng.  I drop the table
and recreate it.
So people can view the patch with tests more easily.


+         <para>
+          In a temporal foreign key, the delete/update will use
+          <literal>FOR PORTION OF</literal> semantics to constrain the
+          effect to the bounds being deleted/updated in the referenced row.
+         </para>

in v24-0003-Add-temporal-FOREIGN-KEYs.patch
 <literal>FOR PORTION OF</literal> not yet implemented, so we should
not mention it.

+     <para>
+      If the last column is marked with <literal>PERIOD</literal>,
+      it must be a period or range column, and the referenced table
+      must have a temporal primary key.
can we change "it must be a period or range column" to "it must be a
range column", maybe we can add it on another patch.

Вложения

v1-0001-refactor-temporal-FOREIGN-KEYs-test.based_on_v24

Re: SQL:2011 application time

От

jian he

Дата:

02 февраля, 08:53:52

On Mon, Jan 29, 2024 at 8:00 AM jian he <jian.universality@gmail.com> wrote:
>
> I fixed your tests, some of your tests can be simplified, (mainly
> primary key constraint is unnecessary for the failed tests)
> also your foreign key patch test table, temporal_rng is created at
> line 141, and we use it at around line 320.
> it's hard to get the definition of temporal_rng. I drop the table
> and recreate it.
> So people can view the patch with tests more easily.
>
I've attached a new patch that further simplified the tests. (scope
v24 patch's 0002 and 0003)
Please ignore previous email attachments.

I've only applied the v24, 0002, 0003.
seems in doc/src/sgml/ref/create_table.sgml
lack the explanation of `<replaceable
class="parameter">temporal_interval</replaceable>`

since foreign key ON {UPDATE | DELETE} {CASCADE,SET NULL,SET DEFAULT}
not yet supported,
v24-0003 create_table.sgml should reflect that.

+ /*
+ * For FKs with PERIOD we need an operator and aggregate function
+ * to check whether the referencing row's range is contained
+ * by the aggregated ranges of the referenced row(s).
+ * For rangetypes this is fk.periodatt <@ range_agg(pk.periodatt).
+ * FKs will look these up at "runtime", but we should make sure
+ * the lookup works here.
+ */
+ if (is_temporal)
+ FindFKPeriodOpersAndProcs(opclasses[numpks - 1], &periodoperoid,
&periodprocoid);

within the function ATAddForeignKeyConstraint, you called
FindFKPeriodOpersAndProcs,
but never used the computed outputs: periodoperoid, periodprocoid, opclasses.
We validate these(periodoperoid, periodprocoid) at
lookupTRIOperAndProc, FindFKPeriodOpersAndProcs.
I'm not sure whether FindFKPeriodOpersAndProcs in
ATAddForeignKeyConstraint is necessary.

+ * Check if all key values in OLD and NEW are "equivalent":
+ * For normal FKs we check for equality.
+ * For temporal FKs we check that the PK side is a superset of its old value,
+ * or the FK side is a subset.
"or the FK side is a subset." is misleading, should it be something
like "or the FK side is a subset of X"?

+ if (indexStruct->indisexclusion) return i - 1;
+ else return i;

I believe our style should be (with proper indent)
if (indexStruct->indisexclusion)
return i - 1;
else
return i;

in transformFkeyCheckAttrs
+ if (found && is_temporal)
+ {
+ found = false;
+ for (j = 0; j < numattrs + 1; j++)
+ {
+ if (periodattnum == indexStruct->indkey.values[j])
+ {
+ opclasses[numattrs] = indclass->values[j];
+ found = true;
+ break;
+ }
+ }
+ }

can be simplified:
{
found = false;
if (periodattnum == indexStruct->indkey.values[numattrs])
{
opclasses[numattrs] = indclass->values[numattrs];
found = true;
}
}

Also wondering, at the end of the function transformFkeyCheckAttrs `if
(!found)` part:
do we need another error message handle is_temporal is true?

@@ -212,8 +213,11 @@ typedef struct NewConstraint
ConstrType contype; /* CHECK or FOREIGN */
Oid refrelid; /* PK rel, if FOREIGN */
Oid refindid; /* OID of PK's index, if FOREIGN */
+ bool conwithperiod; /* Whether the new FOREIGN KEY uses PERIOD */
Oid conid; /* OID of pg_constraint entry, if FOREIGN */
Node *qual; /* Check expr or CONSTR_FOREIGN Constraint */
+ Oid *operoids; /* oper oids for FOREIGN KEY with PERIOD */
+ Oid *procoids; /* proc oids for FOREIGN KEY with PERIOD */
ExprState *qualstate; /* Execution state for CHECK expr */
} NewConstraint;
primary key can only one WITHOUT OVERLAPS,
so *operoids and *procoids
can be replaced with just
`operoids, procoids`.
Also these two elements in struct NewConstraint not used in v24, 0002, 0003.

Вложения

v1-0001-refactor-temporal-FOREIGN-KEYs-test.patch

Re: SQL:2011 application time

От

Peter Eisentraut

Дата:

12 февраля, 12:55:09

I have done a review of the temporal foreign key patches in this patch
series (0002 and 0003, v24).

The patch set needs a rebase across c85977d8fef.  I was able to do it
manually, but it's a bit tricky, so perhaps you can post a new set to
help future reviews.

(Also, the last (0007) patch has some compiler warnings and also
causes the pg_upgrade test to fail.  I didn't check this further, but
that's why the cfbot is all red.)

In summary, in principle, this all looks more or less correct to me.

As a general comment, we need to figure out the right terminology
"period" vs. "temporal", especially if we are going to commit these
features incrementally.  But I didn't look at this too hard here yet.


* v24-0002-Add-GiST-referencedagg-support-func.patch

Do we really need this level of generality?  Are there examples not
using ranges that would need a different aggregate function?  Maybe
something with geometry (points and lines)?  But it seems to me that
then we'd also need some equivalent to "without portion" support for
those types and a multirange equivalent (basically another gist
support function wrapped around the 0004 patch).


* v24-0003-Add-temporal-FOREIGN-KEYs.patch

- contrib/btree_gist/expected/without_overlaps.out
- contrib/btree_gist/sql/without_overlaps.sql

typo "exusts"


- doc/src/sgml/ref/create_table.sgml

This mentions FOR PORTION OF from a later patch.

It is not documented that SET NULL and SET DEFAULT are not supported,
even though that is added in a later patch.  (So this patch should say
that it's not supported, and then the later patch should remove that.)


- src/backend/commands/indexcmds.c

The changes to GetOperatorFromWellKnownStrategy() don't work for
message translations.  We had discussed a similar issue for this
function previously.  I think it's ok to leave the function as it was.
The additional context could be added with location pointers or
errcontext() maybe, but it doesn't seem that important for now.


- src/backend/commands/tablecmds.c

The changes in ATAddForeignKeyConstraint(), which are the meat of the
changes in this file, are very difficult to review in detail.  I tried
different git-diff options to get a sensible view, but it wasn't
helpful.  Do we need to do some separate refactoring here first?

The error message "action not supported for temporal foreign keys"
could be more detailed, mention the action.  Look for example how the
error for the generated columns is phrased.  (But note that for
generated columns, the actions are impossible to support, whereas here
it is just something not done yet.  So there should probably still be
different error codes.)


- src/backend/nodes/outfuncs.c
- src/backend/nodes/readfuncs.c

Perhaps you would like to review my patch 0001 in
<https://www.postgresql.org/message-id/859d6155-e361-4a05-8db3-4aa1f007ff28@eisentraut.org>,
which removes the custom out/read functions for the Constraint node.
Then you could get rid of these changes.


- src/backend/utils/adt/ri_triggers.c

The added #include "catalog/pg_range.h" doesn't appear to be used for
anything.

Maybe we can avoid the added #include "commands/tablecmds.h" by
putting the common function in some appropriate lower-level module.

typo "PEROID"

Renaming of ri_KeysEqual() to ri_KeysStable() doesn't improve clarity,
I think.  I think we can leave the old name and add a comment (as you
have done).  There is a general understanding around this feature set
that "equal" sometimes means "contained" or something like that.

The function ri_RangeAttributeNeedsCheck() could be documented better.
It's bit terse and unclear.  From the code, it looks like it is used
instead of row equality checks.  Maybe a different function name would
be suitable.

Various unnecessary reformatting in RI_FKey_check().

When assembling the SQL commands, you need to be very careful about
fully quoting and schema-qualifying everything.  See for example
ri_GenerateQual().

Have you checked that the generated queries can use indexes and have
suitable performance?  Do you have example execution plans maybe?


- src/backend/utils/adt/ruleutils.c

This seems ok in principle, but it's kind of weird that the new
argument of decompile_column_index_array() is called "withPeriod"
(which seems appropriate seeing what it does), but what we are passing
in is conwithoutoverlaps.  Maybe we need to reconsider the naming of
the constraint column?  Sorry, I made you change it from "contemporal"
or something, didn't I?  Maybe "conperiod" would cover both meanings
better?


- src/backend/utils/cache/lsyscache.c

get_func_name_and_namespace(): This function would at least need some
identifier quoting.  There is only one caller (lookupTRIOperAndProc),
so let's just put this code inline there; it's not worth a separate
global function.  (Also, you could use psprintf() here to simplify
palloc() + snprintf().)


- src/include/catalog/pg_constraint.h

You are changing in several comments "equality" to "comparison".  I
suspect you effectively mean "equality or containment"?  Maybe
"comparison" is too subtle to convey that meaning?  Maybe be more
explicit.

You are changing a foreign key from DECLARE_ARRAY_FOREIGN_KEY to
DECLARE_ARRAY_FOREIGN_KEY_OPT.  Add a comment about it, like the one
just above has.


- src/include/catalog/pg_proc.dat

For the names of the trigger functions, maybe instead of

     TRI_FKey_check_ins

something like

     RI_FKey_period_check_ins

so that all RI trigger functions group under a common prefix.

On second thought, do we even need separate functions for this?
Looking at ri_triggers.c, the temporal and non-temporal functions are
the same, and all the differences are handled in the underlying
implementation functions.


- src/include/nodes/parsenodes.h

The constants FKCONSTR_PERIOD_OP_CONTAINED_BY and
FKCONSTR_PERIOD_PROC_REFERENCED_AGG could use more documentation here.

For the Constraint struct, don't we just need a bool field saying
"this is a period FK", and then we'd know that the last column is the
period?  Like we did for the primary keys (bool without_overlaps).


- src/include/parser/kwlist.h

For this patch, the keyword PERIOD can be unreserved.  But it
apparently will need to be reserved later for the patch that
introduces PERIOD columns.  Maybe it would make sense to leave it
unreserved for this patch and upgrade it in the later one.

Re: SQL:2011 application time

От

jian he

Дата:

14 февраля, 08:00:00

Hi
more minor issues.

+ FindFKComparisonOperators(
+ fkconstraint, tab, i, fkattnum,
+ &old_check_ok, &old_pfeqop_item,
+ pktypoid[i], fktypoid[i], opclasses[i],
+ is_temporal, false,
+ &pfeqoperators[i], &ppeqoperators[i], &ffeqoperators[i]);
+ }
+ if (is_temporal) {
+ pkattnum[numpks] = pkperiodattnum;
+ pktypoid[numpks] = pkperiodtypoid;
+ fkattnum[numpks] = fkperiodattnum;
+ fktypoid[numpks] = fkperiodtypoid;

- pfeqop = get_opfamily_member(opfamily, opcintype, fktyped,
- eqstrategy);
- if (OidIsValid(pfeqop))
- {
- pfeqop_right = fktyped;
- ffeqop = get_opfamily_member(opfamily, fktyped, fktyped,
- eqstrategy);
- }
- else
- {
- /* keep compiler quiet */
- pfeqop_right = InvalidOid;
- ffeqop = InvalidOid;
- }
+ FindFKComparisonOperators(
+ fkconstraint, tab, numpks, fkattnum,
+ &old_check_ok, &old_pfeqop_item,
+ pkperiodtypoid, fkperiodtypoid, opclasses[numpks],
+ is_temporal, true,
+ &pfeqoperators[numpks], &ppeqoperators[numpks], &ffeqoperators[numpks]);
+ numfks += 1;
+ numpks += 1;
+ }

opening curly brace should be the next line, also do you think it's
good idea to add following in the `if (is_temporal)` branch
`
Assert(OidIsValid(fkperiodtypoid) && OidIsValid(pkperiodtypoid));
Assert(OidIsValid(pkperiodattnum > 0 && fkperiodattnum > 0));
`

` if (is_temporal)` branch, you can set the FindFKComparisonOperators
10th argument (is_temporal)
to true, since you are already in the ` if (is_temporal)` branch.

maybe we need some extra comments on
`
+ numfks += 1;
+ numpks += 1;
`
since it might not be that evident?

Do you think it's a good idea to list arguments line by line (with
good indentation) is good format? like:
FindFKComparisonOperators(fkconstraint,
tab,
i,
fkattnum,
&old_check_ok,
&old_pfeqop_item,
pktypoid[i],
fktypoid[i],
opclasses[i],
false,
false,
&pfeqoperators[i],
&ppeqoperators[i],
&ffeqoperators[i]);

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

01 марта, 00:16:56

Hello,

Here is another patch series for application time. It addresses the feedback from the last few 
emails. Details below:

On 1/28/24 16:00, jian he wrote:
 > +         <para>
 > +          In a temporal foreign key, the delete/update will use
 > +          <literal>FOR PORTION OF</literal> semantics to constrain the
 > +          effect to the bounds being deleted/updated in the referenced row.
 > +         </para>
 >
 > in v24-0003-Add-temporal-FOREIGN-KEYs.patch
 >   <literal>FOR PORTION OF</literal> not yet implemented, so we should
 > not mention it.

Fixed.

 > +     <para>
 > +      If the last column is marked with <literal>PERIOD</literal>,
 > +      it must be a period or range column, and the referenced table
 > +      must have a temporal primary key.
 > can we change "it must be a period or range column" to "it must be a
 > range column", maybe we can add it on another patch.

Rewrote this section to be clearer.

On 2/1/24 21:53, jian he wrote:
 > I've attached a new patch that further simplified the tests. (scope
 > v24 patch's 0002 and 0003)
 > Please ignore previous email attachments.

Thanks, I've pulled in most of these changes to the tests.

 > I've only applied the v24, 0002, 0003.
 > seems in doc/src/sgml/ref/create_table.sgml
 > lack the explanation of `<replaceable
 > class="parameter">temporal_interval</replaceable>`

You're right. Actually I think it is clearer without adding a separate name here, so I've updated 
the docs to use `column_name | period_name`.

 > since foreign key ON {UPDATE | DELETE} {CASCADE,SET NULL,SET DEFAULT}
 > not yet supported,
 > v24-0003 create_table.sgml should reflect that.

Updated.

 > within the function ATAddForeignKeyConstraint, you called
 > FindFKPeriodOpersAndProcs,
 > but never used the computed outputs: periodoperoid, periodprocoid, opclasses.
 > We validate these(periodoperoid, periodprocoid) at
 > lookupTRIOperAndProc, FindFKPeriodOpersAndProcs.
 > I'm not sure whether FindFKPeriodOpersAndProcs in
 > ATAddForeignKeyConstraint is necessary.

This is explained in the comment above: we will do the same lookup when the foreign key is checked, 
but we should make sure it works now so we can report the problem to the user.

 > + * Check if all key values in OLD and NEW are "equivalent":
 > + * For normal FKs we check for equality.
 > + * For temporal FKs we check that the PK side is a superset of its old value,
 > + * or the FK side is a subset.
 > "or the FK side is a subset."  is misleading, should it be something
 > like "or the FK side is a subset of X"?

Okay, changed.

 > + if (indexStruct->indisexclusion) return i - 1;
 > + else return i;
 >
 > I believe our style should be (with proper indent)
 > if (indexStruct->indisexclusion)
 > return i - 1;
 > else
 > return i;

Fixed.

 > in transformFkeyCheckAttrs
 > + if (found && is_temporal)
 > + {
 > + found = false;
 > + for (j = 0; j < numattrs + 1; j++)
 > + {
 > + if (periodattnum == indexStruct->indkey.values[j])
 > + {
 > + opclasses[numattrs] = indclass->values[j];
 > + found = true;
 > + break;
 > + }
 > + }
 > + }
 >
 > can be simplified:
 > {
 > found = false;
 > if (periodattnum == indexStruct->indkey.values[numattrs])
 > {
 > opclasses[numattrs] = indclass->values[numattrs];
 > found = true;
 > }
 > }

Changed.

 > Also wondering, at the end of the function transformFkeyCheckAttrs `if
 > (!found)` part:
 > do we need another error message handle is_temporal is true?

I think the existing error message works well for both temporal and non-temporal cases.

 > @@ -212,8 +213,11 @@ typedef struct NewConstraint
 >    ConstrType contype; /* CHECK or FOREIGN */
 >    Oid refrelid; /* PK rel, if FOREIGN */
 >    Oid refindid; /* OID of PK's index, if FOREIGN */
 > + bool conwithperiod; /* Whether the new FOREIGN KEY uses PERIOD */
 >    Oid conid; /* OID of pg_constraint entry, if FOREIGN */
 >    Node   *qual; /* Check expr or CONSTR_FOREIGN Constraint */
 > + Oid   *operoids; /* oper oids for FOREIGN KEY with PERIOD */
 > + Oid   *procoids; /* proc oids for FOREIGN KEY with PERIOD */
 >    ExprState  *qualstate; /* Execution state for CHECK expr */
 >   } NewConstraint;
 > primary key can only one WITHOUT OVERLAPS,
 > so *operoids and *procoids
 > can be replaced with just
 > `operoids, procoids`.
 > Also these two elements in struct NewConstraint not used in v24, 0002, 0003.

I've removed these entirely. Sorry, they were leftover from an earlier revision.

On 2/12/24 01:55, Peter Eisentraut wrote:
 > (Also, the last (0007) patch has some compiler warnings and also
 > causes the pg_upgrade test to fail.  I didn't check this further, but
 > that's why the cfbot is all red.)

Fixed the pg_upgrade problem. I'm not seeing compiler warnings. If they still exist can you point me 
to those?

 > As a general comment, we need to figure out the right terminology
 > "period" vs. "temporal", especially if we are going to commit these
 > features incrementally.  But I didn't look at this too hard here yet.

Agreed. I think it is okay to use "temporal" in the docs for the feature in general, if we clarify 
that non-temporal values are also supported. That is what the rest of the world calls this kind of 
thing.

The word "period" is confusing because it can be the `PERIOD` keyword used in temporal FKs, or also 
the SQL:2011 `PERIOD` object that is like our range types. And then we also have ranges, etc. In the 
past I was using "interval" to mean "range or PERIOD" (and "interval" is used by Date in his 
temporal book), but perhaps that is too idiosyncratic. I've removed "interval" from the FK docs, and 
instead I've tried to be very explicit and avoid ambiguity. (I haven't given as much attention to 
cleaning up the later patches' docs yet.)

 > * v24-0002-Add-GiST-referencedagg-support-func.patch
 >
 > Do we really need this level of generality?  Are there examples not
 > using ranges that would need a different aggregate function?  Maybe
 > something with geometry (points and lines)?  But it seems to me that
 > then we'd also need some equivalent to "without portion" support for
 > those types and a multirange equivalent (basically another gist
 > support function wrapped around the 0004 patch).

I'm not sure how else to do it. The issue is that `range_agg` returns a multirange, so the result 
type doesn't match the inputs. But other types will likely have the same problem: to combine boxes 
you may need a multibox. The combine mdranges you may need a multimdrange.

I agree we need something to support "without portion" too. The patches here give implementations 
for ranges and multiranges. But that is for `FOR PORTION OF`, so it comes after the foreign key 
patches (part 5 here).

Btw that part changed a bit since v24 because as jian he pointed out, our type system doesn't 
support anyrange inputs and an anyrange[] output. So I changed the support funcs to use SETOF. I 
could alternately add anyrangearray and anymultirangearray pseudotypes. It's not the first time I've 
wanted those, so I'd be happy to go that way if folks are open to it. It seems like it should be a 
totally separate patch though.

 > * v24-0003-Add-temporal-FOREIGN-KEYs.patch
 >
 > - contrib/btree_gist/expected/without_overlaps.out
 > - contrib/btree_gist/sql/without_overlaps.sql
 >
 > typo "exusts"

Fixed.

 > - doc/src/sgml/ref/create_table.sgml
 >
 > This mentions FOR PORTION OF from a later patch.
 >
 > It is not documented that SET NULL and SET DEFAULT are not supported,
 > even though that is added in a later patch.  (So this patch should say
 > that it's not supported, and then the later patch should remove that.)

All fixed.

 > - src/backend/commands/indexcmds.c
 >
 > The changes to GetOperatorFromWellKnownStrategy() don't work for
 > message translations.  We had discussed a similar issue for this
 > function previously.  I think it's ok to leave the function as it was.
 > The additional context could be added with location pointers or
 > errcontext() maybe, but it doesn't seem that important for now.

Okay I've tried a different approach here that should fit better with t9n. Let me know if it still 
needs work.

 > - src/backend/commands/tablecmds.c
 >
 > The changes in ATAddForeignKeyConstraint(), which are the meat of the
 > changes in this file, are very difficult to review in detail.  I tried
 > different git-diff options to get a sensible view, but it wasn't
 > helpful.  Do we need to do some separate refactoring here first?

I moved the FindFKComparisonOperators refactor into a separate patch, and that seems to confuse git 
less. Your suggestion to group the PERIOD attribute with the others (below) also helped a lot to cut 
down the diff here. In fact it means I only call FindFKComparisonOperators once, so pulling it into 
a separate method is not even necessary anymore. But I do think it helps simplify what's already a 
very long function, so I've left it in. Let me know if more work is needed here.

 > The error message "action not supported for temporal foreign keys"
 > could be more detailed, mention the action.  Look for example how the
 > error for the generated columns is phrased.  (But note that for
 > generated columns, the actions are impossible to support, whereas here
 > it is just something not done yet.  So there should probably still be
 > different error codes.)

Fixed.

 > - src/backend/nodes/outfuncs.c
 > - src/backend/nodes/readfuncs.c
 >
 > Perhaps you would like to review my patch 0001 in
 > <https://www.postgresql.org/message-id/859d6155-e361-4a05-8db3-4aa1f007ff28@eisentraut.org>,
 > which removes the custom out/read functions for the Constraint node.
 > Then you could get rid of these changes.

That is a nice improvement!

 > - src/backend/utils/adt/ri_triggers.c
 >
 > The added #include "catalog/pg_range.h" doesn't appear to be used for
 > anything.

Removed.

 > Maybe we can avoid the added #include "commands/tablecmds.h" by
 > putting the common function in some appropriate lower-level module.

Moved to pg_constraint.{c,h}.

 > typo "PEROID"

Fixed.

 > Renaming of ri_KeysEqual() to ri_KeysStable() doesn't improve clarity,
 > I think.  I think we can leave the old name and add a comment (as you
 > have done).  There is a general understanding around this feature set
 > that "equal" sometimes means "contained" or something like that.

Okay.

 > The function ri_RangeAttributeNeedsCheck() could be documented better.
 > It's bit terse and unclear.  From the code, it looks like it is used
 > instead of row equality checks.  Maybe a different function name would
 > be suitable.

I realized I could simplify this a lot and reuse ri_AttributesEqual, so the whole method is gone now.

 > Various unnecessary reformatting in RI_FKey_check().

Fixed, sorry about that.

 > When assembling the SQL commands, you need to be very careful about
 > fully quoting and schema-qualifying everything.  See for example
 > ri_GenerateQual().

Went through everything and added quoting & schemes to a few places that were missing it.

 > Have you checked that the generated queries can use indexes and have
 > suitable performance?  Do you have example execution plans maybe?

The plans look good to me. Here are some tests:

-- test when inserting/updating the FK side:

regression=# explain analyze select 1
from (
select valid_at as r
from only temporal_rng x
where id = '[8,8]'
and valid_at && '[2010-01-01,2012-01-01)'
for key share of x
) x1
having '[2010-01-01,2012-01-01)'::tsrange <@ range_agg(x1.r);
                                                                     QUERY PLAN 


---------------------------------------------------------------------------------------------------------------------------------------------------
  Aggregate  (cost=8.19..8.20 rows=1 width=4) (actual time=0.165..0.167 rows=0 loops=1)
    Filter: ('["2010-01-01 00:00:00","2012-01-01 00:00:00")'::tsrange <@ range_agg(x1.r))
    Rows Removed by Filter: 1
    ->  Subquery Scan on x1  (cost=0.14..8.18 rows=1 width=32) (actual time=0.152..0.153 rows=0 loops=1)
          ->  LockRows  (cost=0.14..8.17 rows=1 width=38) (actual time=0.151..0.151 rows=0 loops=1)
                ->  Index Scan using temporal_rng_pk on temporal_rng x  (cost=0.14..8.16 rows=1 
width=38) (actual time=0.150..0.150 rows=0 loops=1)
                      Index Cond: ((id = '[8,9)'::int4range) AND (valid_at && '["2010-01-01 
00:00:00","2012-01-01 00:00:00")'::tsrange))
  Planning Time: 0.369 ms
  Execution Time: 0.289 ms
(9 rows)

-- test when deleting/updating from the PK side:

regression=# explain analyze select 1 from only temporal_rng x where id = '[8,8]' and valid_at && 
'[2010-01-01,2012-01-01)'
for key share of x;
                                                               QUERY PLAN 


---------------------------------------------------------------------------------------------------------------------------------------
  LockRows  (cost=0.14..8.17 rows=1 width=10) (actual time=0.079..0.079 rows=0 loops=1)
    ->  Index Scan using temporal_rng_pk on temporal_rng x  (cost=0.14..8.16 rows=1 width=10) 
(actual time=0.078..0.078 rows=0 loops=1)
          Index Cond: ((id = '[8,9)'::int4range) AND (valid_at && '["2010-01-01 
00:00:00","2012-01-01 00:00:00")'::tsrange))
  Planning Time: 0.249 ms
  Execution Time: 0.123 ms
(5 rows)

I will do some further tests with more rows, but I haven't yet.

 > - src/backend/utils/adt/ruleutils.c
 >
 > This seems ok in principle, but it's kind of weird that the new
 > argument of decompile_column_index_array() is called "withPeriod"
 > (which seems appropriate seeing what it does), but what we are passing
 > in is conwithoutoverlaps.  Maybe we need to reconsider the naming of
 > the constraint column?  Sorry, I made you change it from "contemporal"
 > or something, didn't I?  Maybe "conperiod" would cover both meanings
 > better?

Certainly conperiod is easier to read. Since we are using it for PK/UNIQUE/FKs, conperiod also seems 
like a better match. FKs don't use WITHOUT OVERLAPS syntax, and OTOH PK/UNIQUEs will still accept a 
PERIOD (eventually, also a range/etc now). I've renamed it, but since the old name was already 
committed with the PK patch, I've broken the renaming into a separate patch that could be committed 
without anything else.

 > - src/backend/utils/cache/lsyscache.c
 >
 > get_func_name_and_namespace(): This function would at least need some
 > identifier quoting.  There is only one caller (lookupTRIOperAndProc),
 > so let's just put this code inline there; it's not worth a separate
 > global function.  (Also, you could use psprintf() here to simplify
 > palloc() + snprintf().)

Removed.

 > - src/include/catalog/pg_constraint.h
 >
 > You are changing in several comments "equality" to "comparison".  I
 > suspect you effectively mean "equality or containment"?  Maybe
 > "comparison" is too subtle to convey that meaning?  Maybe be more
 > explicit.

Okay, changed.

 > You are changing a foreign key from DECLARE_ARRAY_FOREIGN_KEY to
 > DECLARE_ARRAY_FOREIGN_KEY_OPT.  Add a comment about it, like the one
 > just above has.

I don't need this change at all now that we're using GENERATED columns for PERIODs, so I've taken it 
out.

 > - src/include/catalog/pg_proc.dat
 >
 > For the names of the trigger functions, maybe instead of
 >
 >      TRI_FKey_check_ins
 >
 > something like
 >
 >      RI_FKey_period_check_ins
 >
 > so that all RI trigger functions group under a common prefix.

Renamed.

 > On second thought, do we even need separate functions for this?
 > Looking at ri_triggers.c, the temporal and non-temporal functions are
 > the same, and all the differences are handled in the underlying
 > implementation functions.

My thinking was to avoid making the non-temporal functions suffer in performance and complexity. 
What do you think? I've kept the separate functions here but I can combine them if you like.

 > - src/include/nodes/parsenodes.h
 >
 > The constants FKCONSTR_PERIOD_OP_CONTAINED_BY and
 > FKCONSTR_PERIOD_PROC_REFERENCED_AGG could use more documentation here.

Removed. They are obsolete now (and were already in v24---sorry!).

 > For the Constraint struct, don't we just need a bool field saying
 > "this is a period FK", and then we'd know that the last column is the
 > period?  Like we did for the primary keys (bool without_overlaps).

Okay, changed. Also in ATExecAddConstraint we can treat the PERIOD element like any other FK 
element, which simplifies the changes there a lot.

 > - src/include/parser/kwlist.h
 >
 > For this patch, the keyword PERIOD can be unreserved.  But it
 > apparently will need to be reserved later for the patch that
 > introduces PERIOD columns.  Maybe it would make sense to leave it
 > unreserved for this patch and upgrade it in the later one.

I tried doing this but got a shift/reduce conflict, so it's still reserved here.

Thanks,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

01 марта, 01:10:49

On 2/13/24 21:00, jian he wrote:
> Hi
> more minor issues.
> 
> + FindFKComparisonOperators(
> + fkconstraint, tab, i, fkattnum,
> + &old_check_ok, &old_pfeqop_item,
> + pktypoid[i], fktypoid[i], opclasses[i],
> + is_temporal, false,
> + &pfeqoperators[i], &ppeqoperators[i], &ffeqoperators[i]);
> + }
> + if (is_temporal) {
> + pkattnum[numpks] = pkperiodattnum;
> + pktypoid[numpks] = pkperiodtypoid;
> + fkattnum[numpks] = fkperiodattnum;
> + fktypoid[numpks] = fkperiodtypoid;
> 
> - pfeqop = get_opfamily_member(opfamily, opcintype, fktyped,
> - eqstrategy);
> - if (OidIsValid(pfeqop))
> - {
> - pfeqop_right = fktyped;
> - ffeqop = get_opfamily_member(opfamily, fktyped, fktyped,
> - eqstrategy);
> - }
> - else
> - {
> - /* keep compiler quiet */
> - pfeqop_right = InvalidOid;
> - ffeqop = InvalidOid;
> - }
> + FindFKComparisonOperators(
> + fkconstraint, tab, numpks, fkattnum,
> + &old_check_ok, &old_pfeqop_item,
> + pkperiodtypoid, fkperiodtypoid, opclasses[numpks],
> + is_temporal, true,
> + &pfeqoperators[numpks], &ppeqoperators[numpks], &ffeqoperators[numpks]);
> + numfks += 1;
> + numpks += 1;
> + }
> 
> opening curly brace should be the next line,

Fixed in v25 (submitted in my other email).

> also do you think it's
> good idea to add following in the `if (is_temporal)` branch
> `
> Assert(OidIsValid(fkperiodtypoid) && OidIsValid(pkperiodtypoid));
> Assert(OidIsValid(pkperiodattnum > 0 && fkperiodattnum > 0));
> `
> 
> ` if (is_temporal)` branch, you can set the FindFKComparisonOperators
> 10th argument (is_temporal)
> to true, since you are already in the ` if (is_temporal)` branch.
> 
> maybe we need some extra comments on
> `
> + numfks += 1;
> + numpks += 1;
> `
> since it might not be that evident?

That branch doesn't exist anymore. Same with the increments.

> Do you think it's a good idea to list arguments line by line (with
> good indentation) is good format? like:
> FindFKComparisonOperators(fkconstraint,
> tab,
> i,
> fkattnum,
> &old_check_ok,
> &old_pfeqop_item,
> pktypoid[i],
> fktypoid[i],
> opclasses[i],
> false,
> false,
> &pfeqoperators[i],
> &ppeqoperators[i],
> &ffeqoperators[i]);

There are places we do that, but most code I've seen tries to fill the line. I haven't followed that 
strictly here, but I'm trying to get better at doing what pg_indent wants.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

01 марта, 23:38:27

On 2/29/24 13:16, Paul Jungwirth wrote:
> Hello,
> 
> Here is another patch series for application time.
Here is a v26 patch series to fix a cfbot failure in sepgsql. Rebased to 655dc31046.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

02 марта, 00:56:16

On 3/1/24 12:38, Paul Jungwirth wrote:
> On 2/29/24 13:16, Paul Jungwirth wrote:
> Here is a v26 patch series to fix a cfbot failure in sepgsql. Rebased to 655dc31046.

v27 attached, fixing some cfbot failures from headerscheck+cpluspluscheck. Sorry for the noise!

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

Re: SQL:2011 application time

От

Peter Eisentraut

Дата:

11 марта, 10:46:16

On 01.03.24 22:56, Paul Jungwirth wrote:
> On 3/1/24 12:38, Paul Jungwirth wrote:
>> On 2/29/24 13:16, Paul Jungwirth wrote:
>> Here is a v26 patch series to fix a cfbot failure in sepgsql. Rebased 
>> to 655dc31046.
> 
> v27 attached, fixing some cfbot failures from 
> headerscheck+cpluspluscheck. Sorry for the noise!

I had committed v27-0001-Rename-conwithoutoverlaps-to-conperiod.patch a 
little while ago.

I have reviewed v27-0002 through 0004 now.  I have one semantic question 
below, and there are a few places where more clarification of the 
interfaces could help.  Other than that, I think this is pretty good.

Attached is a small patch that changes the PERIOD keyword to unreserved 
for this patch.  You had said earlier that this didn't work for you. 
The attached patch works for me when applied on top of 0003.

* v27-0002-Add-GiST-referencedagg-support-func.patch

You wrote:

 > I'm not sure how else to do it. The issue is that `range_agg` returns 
 > a multirange, so the result
 > type doesn't match the inputs. But other types will likely have the
 > same problem: to combine boxes
 > you may need a multibox. The combine mdranges you may need a
 > multimdrange.

Can we just hardcode the use of range_agg for this release?  Might be 
easier.  I don't see all this generality being useful in the near future.

 > Btw that part changed a bit since v24 because as jian he pointed out, 
 > our type system doesn't
 > support anyrange inputs and an anyrange[] output. So I changed the
 > support funcs to use SETOF.

I didn't see any SETOF stuff in the patch, or I didn't know where to look.

I'm not sure I follow all the details here.  So more explanations of any 
kind could be helpful.

* v27-0003-Refactor-FK-operator-lookup.patch

I suggest to skip this refactoring patch.  I don't think the way this is 
sliced up is all that great, and it doesn't actually help with the 
subsequent patches.

* v27-0004-Add-temporal-FOREIGN-KEYs.patch

- src/backend/catalog/pg_constraint.c

FindFKPeriodOpersAndProcs() could use a bit more top-level
documentation.  Where does the input opclass come from?  What are the
three output values?  What is the business with "symmetric types"?

- src/backend/commands/indexcmds.c

GetOperatorFromWellKnownStrategy() is apparently changed to accept
InvalidOid for rhstype, but the meaning of this is not explained in
the function header.  It's also not clear to me why an existing caller
is changed.  This should be explained more thoroughly.

- src/backend/commands/tablecmds.c

is_temporal and similar should be renamed to with_period or similar 
throughout this patch.

In transformFkeyGetPrimaryKey():

      * Now build the list of PK attributes from the indkey definition (we
-    * assume a primary key cannot have expressional elements)
+    * assume a primary key cannot have expressional elements, unless it
+    * has a PERIOD)

I think the original statement is still true even with PERIOD.  The 
expressional elements refer to expression indexes.  I don't think we can 
have a PERIOD marker on an expression?

- src/backend/utils/adt/ri_triggers.c

Please remove the separate trigger functions for the period case.  They 
are the same as the non-period ones, so we don't need separate ones. 
The difference is handled lower in the call stack, which I think is a 
good setup.  Removing the separate functions also removes a lot of extra 
code in other parts of the patch.

- src/include/catalog/pg_constraint.h

Should also update catalogs.sgml accordingly.

- src/test/regress/expected/without_overlaps.out
- src/test/regress/sql/without_overlaps.sql

A few general comments on the tests:

- In the INSERT commands, specify the column names explicitly.  This 
makes the tests easier to read (especially since the column order 
between the PK and the FK table is sometimes different).

- Let's try to make it so that the inserted literals match the values 
shown in the various error messages, so it's easier to match them up. 
So, change the int4range literals to half-open notation.  And also maybe 
change the date output format to ISO.

- In various comments, instead of test FK "child", maybe use 
"referencing table"?  Instead of "parent", use "referenced table" (or 
primary key table).  When I read child and parent I was looking for 
inheritance.

- Consider truncating the test tables before each major block of tests 
and refilling them with fresh data.  So it's easier to eyeball the 
tests.  Otherwise, there is too much dependency on what earlier tests 
left behind.

A specific question:

In this test, a PERIOD marker on the referenced site is automatically 
inferred from the primary key:

+-- with inferred PK on the referenced table:
+CREATE TABLE temporal_fk_rng2rng (
+   id int4range,
+   valid_at tsrange,
+   parent_id int4range,
+   CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT 
OVERLAPS),
+   CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD 
valid_at)
+       REFERENCES temporal_rng
+);

In your patch, this succeeds.  According to the SQL standard, it should 
not.  In subclause 11.8, syntax rule 4b:

"""
Otherwise, the table descriptor of the referenced table shall include a 
unique constraint UC that specifies PRIMARY KEY. The table constraint 
descriptor of UC shall not include an application time period name.
"""

So this case is apparently explicitly ruled out.

(It might be ok to make an extension here, but then we should be 
explicit about it.)

Вложения

0001-fixup-Add-temporal-FOREIGN-KEYs.patch.nocfbot

Re: SQL:2011 application time

От

jian he

Дата:

12 марта, 05:45:21

+     <para>
+      If the last column is marked with <literal>PERIOD</literal>,
+      it is treated in a special way.
+      While the non-<literal>PERIOD</literal> columns are treated normally
+      (and there must be at least one of them),
+      the <literal>PERIOD</literal> column is not compared for equality.
+      Instead the constraint is considered satisfied
+      if the referenced table has matching records
+      (based on the non-<literal>PERIOD</literal> parts of the key)
+      whose combined <literal>PERIOD</literal> values completely cover
+      the referencing record's.
+      In other words, the reference must have a referent for its
entire duration.
+      Normally this column would be a range or multirange type,
+      although any type whose GiST opclass has a "contained by" operator
+      and a <literal>referenced_agg</literal> support function is allowed.
+      (See <xref linkend="gist-extensibility"/>.)
+      In addition the referenced table must have a primary key
+      or unique constraint declared with <literal>WITHOUT PORTION</literal>.
+     </para>

typo "referenced_agg", in the gist-extensibility.html page is "referencedagg"
<literal>WITHOUT PORTION</literal> should be <literal>WITHOUT OVERLAPS</literal>

+      While the non-<literal>PERIOD</literal> columns are treated normally
+      (and there must be at least one of them),
+      the <literal>PERIOD</literal> column is not compared for equality.
the above sentence didn't say what is "normally"?
maybe we can do the following:
+      While the non-<literal>PERIOD</literal> columns are treated
+ normally for equality
+      (and there must be at least one of them),
+      the <literal>PERIOD</literal> column is not compared for equality.



+<programlisting>
+Datum
+my_range_agg_transfn(PG_FUNCTION_ARGS)
+{
+    MemoryContext    aggContext;
+    Oid              rngtypoid;
+    ArrayBuildState *state;
+
+    if (!AggCheckCallContext(fcinfo, &aggContext))
+        elog(ERROR, "range_agg_transfn called in non-aggregate context");
+
+    rngtypoid = get_fn_expr_argtype(fcinfo->flinfo, 1);
+    if (!type_is_range(rngtypoid))
+        elog(ERROR, "range_agg must be called with a range");
+
+    if (PG_ARGISNULL(0))
+        state = initArrayResult(rngtypoid, aggContext, false);
+    else
+        state = (ArrayBuildState *) PG_GETARG_POINTER(0);
+
+    /* skip NULLs */
+    if (!PG_ARGISNULL(1))
+        accumArrayResult(state, PG_GETARG_DATUM(1), false, rngtypoid,
aggContext);
+
+    PG_RETURN_POINTER(state);
+}
+
+Datum
+my_range_agg_finalfn(PG_FUNCTION_ARGS)
+{
+    MemoryContext    aggContext;
+    Oid              mltrngtypoid;
+    TypeCacheEntry  *typcache;
+    ArrayBuildState *state;
+    int32            range_count;
+    RangeType      **ranges;
+    int              i;
+
+    if (!AggCheckCallContext(fcinfo, &aggContext))
+        elog(ERROR, "range_agg_finalfn called in non-aggregate context");
+
+    state = PG_ARGISNULL(0) ? NULL : (ArrayBuildState *) PG_GETARG_POINTER(0);
+    if (state == NULL)
+        /* This shouldn't be possible, but just in case.... */
+        PG_RETURN_NULL();
+
+    /* Also return NULL if we had zero inputs, like other aggregates */
+    range_count = state->nelems;
+    if (range_count == 0)
+        PG_RETURN_NULL();
+
+    mltrngtypoid = get_fn_expr_rettype(fcinfo->flinfo);
+    typcache = multirange_get_typcache(fcinfo, mltrngtypoid);
+
+    ranges = palloc0(range_count * sizeof(RangeType *));
+    for (i = 0; i < range_count; i++)
+        ranges[i] = DatumGetRangeTypeP(state->dvalues[i]);
+
+    PG_RETURN_MULTIRANGE_P(make_multirange(mltrngtypoid,
typcache->rngtype, range_count, ranges));
+}

my_range_agg_transfn error message is inconsistent?
 `elog(ERROR, "range_agg_transfn called in non-aggregate context");`
`elog(ERROR, "range_agg must be called with a range");`
maybe just `my_range_agg_transfn`, instead of mention
{range_agg_transfn|range_agg}
similarly my_range_agg_finalfn error is also inconsistent.

my_range_agg_finalfn need  `type_is_multirange(mltrngtypoid)`?

Re: SQL:2011 application time

От

jian he

Дата:

12 марта, 05:47:02

On Mon, Mar 11, 2024 at 3:46 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> A few general comments on the tests:
>
> - In the INSERT commands, specify the column names explicitly.  This
> makes the tests easier to read (especially since the column order
> between the PK and the FK table is sometimes different).
>
> - Let's try to make it so that the inserted literals match the values
> shown in the various error messages, so it's easier to match them up.
> So, change the int4range literals to half-open notation.  And also maybe
> change the date output format to ISO.
>
maybe just change the tsrange type to daterange, then the dot out file
will be far less verbose.

minor issues while reviewing v27, 0001 to 0004.
transformFkeyGetPrimaryKey comments need to update,
since bool pk_period also returned.

+/*
+ * FindFKComparisonOperators -
+ *
+ * Gets the operators for pfeqopOut, ppeqopOut, and ffeqopOut.
+ * Sets old_check_ok if we can avoid re-validating the constraint.
+ * Sets old_pfeqop_item to the old pfeqop values.
+ */
+static void
+FindFKComparisonOperators(Constraint *fkconstraint,
+  AlteredTableInfo *tab,
+  int i,
+  int16 *fkattnum,
+  bool *old_check_ok,
+  ListCell **old_pfeqop_item,
+  Oid pktype, Oid fktype, Oid opclass,
+  Oid *pfeqopOut, Oid *ppeqopOut, Oid *ffeqopOut)

I think the above comments is
`Sets the operators for pfeqopOut, ppeqopOut, and ffeqopOut.`.


+ if (is_temporal)
+ {
+ if (!fkconstraint->fk_with_period)
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_FOREIGN_KEY),
+ errmsg("foreign key uses PERIOD on the referenced table but not the
referencing table")));
+ }
can be
if (is_temporal && !fkconstraint->fk_with_period)
ereport(ERROR,
(errcode(ERRCODE_INVALID_FOREIGN_KEY),
errmsg("foreign key uses PERIOD on the referenced table but not the
referencing table")));

+
+ if (is_temporal)
+ {
+ if (!fkconstraint->pk_with_period)
+ /* Since we got pk_attrs, one should be a period. */
+ ereport(ERROR,
+ (errcode(ERRCODE_INVALID_FOREIGN_KEY),
+ errmsg("foreign key uses PERIOD on the referencing table but not the
referenced table")));
+ }
can be
if (is_temporal && !fkconstraint->pk_with_period)
/* Since we got pk_attrs, one should be a period. */
ereport(ERROR,
(errcode(ERRCODE_INVALID_FOREIGN_KEY),
errmsg("foreign key uses PERIOD on the referencing table but not the
referenced table")));

refactor decompile_column_index_array seems unnecessary.
Peter already mentioned it at [1], I have tried to fix it at [2].


@@ -12141,7 +12245,8 @@ transformFkeyGetPrimaryKey(Relation pkrel, Oid
*indexOid,
  /*
  * Now build the list of PK attributes from the indkey definition (we
- * assume a primary key cannot have expressional elements)
+ * assume a primary key cannot have expressional elements, unless it
+ * has a PERIOD)
  */
  *attnamelist = NIL;
  for (i = 0; i < indexStruct->indnkeyatts; i++)
@@ -12155,6 +12260,8 @@ transformFkeyGetPrimaryKey(Relation pkrel, Oid
*indexOid,
    makeString(pstrdup(NameStr(*attnumAttName(pkrel, pkattno)))));
  }
+ *pk_period = (indexStruct->indisexclusion);

I  don't understand the "expression elements" in the comments, most of
the tests case is like
`
PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)
`
+ *pk_period = (indexStruct->indisexclusion);
can be
`+ *pk_period = indexStruct->indisexclusion;`


[1] https://postgr.es/m/7be8724a-5c25-46d7-8325-1bd8be6fa523@eisentraut.org
[2] https://postgr.es/m/CACJufxHVg65raNhG2zBwXgjrD6jqace4NZbePyMhP8-_Q=iT8w@mail.gmail.com

Re: SQL:2011 application time

От

jian he

Дата:

14 марта, 03:00:00

in GetOperatorFromWellKnownStrategy:
*strat = GistTranslateStratnum(opclass, instrat);
if (*strat == InvalidStrategy)
{
HeapTuple tuple;
tuple = SearchSysCache1(CLAOID, ObjectIdGetDatum(opclass));
if (!HeapTupleIsValid(tuple))
elog(ERROR, "cache lookup failed for operator class %u", opclass);
ereport(ERROR,
errcode(ERRCODE_UNDEFINED_OBJECT),
errmsg(errstr, format_type_be(opcintype)),
errdetail("Could not translate strategy number %d for operator class
\"%s\" for access method \"%s\".",
  instrat, NameStr(((Form_pg_opclass) GETSTRUCT(tuple))->opcname), "gist"));
ReleaseSysCache(tuple);
}

last `ReleaseSysCache(tuple);` is unreachable?


@@ -118,12 +120,17 @@ typedef struct RI_ConstraintInfo
  int16 confdelsetcols[RI_MAX_NUMKEYS]; /* attnums of cols to set on
  * delete */
  char confmatchtype; /* foreign key's match type */
+ bool temporal; /* if the foreign key is temporal */
  int nkeys; /* number of key columns */
  int16 pk_attnums[RI_MAX_NUMKEYS]; /* attnums of referenced cols */
  int16 fk_attnums[RI_MAX_NUMKEYS]; /* attnums of referencing cols */
  Oid pf_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = FK) */
  Oid pp_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = PK) */
  Oid ff_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (FK = FK) */
+ Oid period_contained_by_oper; /* operator for PERIOD SQL */
+ Oid agged_period_contained_by_oper; /* operator for PERIOD SQL */
+ Oid period_referenced_agg_proc; /* proc for PERIOD SQL */
+ Oid period_referenced_agg_rettype; /* rettype for previous */

the comment seems not clear to me. Here is my understanding about it:
period_contained_by_oper is the operator where a single period/range
contained by a single period/range.
agged_period_contained_by_oper is the operator oid where a period
contained by a bound of periods
period_referenced_agg_proc is the oprcode of the agged_period_contained_by_oper.
period_referenced_agg_rettype is the function
period_referenced_agg_proc returning data type.

Re: SQL:2011 application time

От

Paul A Jungwirth

Дата:

17 марта, 00:37:10

Hello,

Here is a new patch series addressing the last few feedback emails
from Peter & Jian He. It mostly focuses on the FKs patch, trying to
get it really ready to commit, but it also finishes restoring all the
functionality to the PERIODs patch (that I removed temporarily when we
changed PERIODs to GENERATED columns). I still want to restore a few
more tests there, but all the functionality is back (e.g. PERIODs with
foreign keys and FOR PORTION OF), so it proves the GENERATED idea
works in principle. Specific feedback below:

On Mon, Mar 11, 2024 at 12:46 AM Peter Eisentraut <peter@eisentraut.org> wrote:
> I had committed v27-0001-Rename-conwithoutoverlaps-to-conperiod.patch a
> little while ago.

Thanks! It looks like you also fixed the pg_catalog docs which I missed.

> Attached is a small patch that changes the PERIOD keyword to unreserved
> for this patch.  You had said earlier that this didn't work for you.
> The attached patch works for me when applied on top of 0003.

Applied and included here.

> You wrote:
>
>  > I'm not sure how else to do it. The issue is that `range_agg` returns
>  > a multirange, so the result
>  > type doesn't match the inputs. But other types will likely have the
>  > same problem: to combine boxes
>  > you may need a multibox. The combine mdranges you may need a
>  > multimdrange.
>
> Can we just hardcode the use of range_agg for this release?  Might be
> easier.  I don't see all this generality being useful in the near future.

Okay, I've hard-coded range_agg in the main patch and separated the
support for multirange/etc in the next two patches. But there isn't
much code there (mostly tests and docs). Since we can't hard-code the
*operators*, most of the infrastructure is already there not to
hard-code the aggregate function. Supporting multiranges is already a
nice improvement. E.g. it should cut down on disk usage when a record
gets updated frequently. Supporting arbitrary types also seems very
powerful, and we already do that for PKs.

>  > Btw that part changed a bit since v24 because as jian he pointed out,
>  > our type system doesn't
>  > support anyrange inputs and an anyrange[] output. So I changed the
>  > support funcs to use SETOF.
>
> I didn't see any SETOF stuff in the patch, or I didn't know where to look.
>
> I'm not sure I follow all the details here.  So more explanations of any
> kind could be helpful.

This is talking about the FOR PORTION OF patch, not the FKs patch. It
is the function that gives the "leftovers" after a temporal
UPDATE/DELETE. There is explanation in the preliminary patch (adding
the support function) and the actual FOR PORTION OF patch, but if you
think they need more let me know.

But I'd love to talk more about this here: The reason for using a
SETOF function is because you can't return an anyarray from a function
that takes anyrange or anymultirange. Or rather if you do, the array
elements match the rangetype's bounds' type, not the rangetype itself:
`T[] f(rangetype<T>)`, not `rangetype<T>[] f(rangetype<T>)`, and we
need the latter. So to get a list of rangetype objects we do a SETOF
function that is `anyrange f(anyrange)`. Personally I think an
improvement would be to add a broken-out patch to add pseudotypes
called anyrangearray and anymultirangearray, but using SETOF works
now, and I don't know if anyone is interested in such a patch. But
it's not the first time I've hit this shortcoming in the pg type
system, so I think it's worthwhile. And since FOR PORTION OF isn't
getting into v17, there is time to do it. What do you think? If it's
an acceptable idea I will get started. It should be a separate
commitfest entry I think.

> * v27-0003-Refactor-FK-operator-lookup.patch
>
> I suggest to skip this refactoring patch.  I don't think the way this is
> sliced up is all that great, and it doesn't actually help with the
> subsequent patches.

Okay.

> - src/backend/catalog/pg_constraint.c
>
> FindFKPeriodOpersAndProcs() could use a bit more top-level
> documentation.  Where does the input opclass come from?  What are the
> three output values?  What is the business with "symmetric types"?

Added and tried to clarify about the types.

> - src/backend/commands/indexcmds.c
>
> GetOperatorFromWellKnownStrategy() is apparently changed to accept
> InvalidOid for rhstype, but the meaning of this is not explained in
> the function header.  It's also not clear to me why an existing caller
> is changed.  This should be explained more thoroughly.

It's not so much changing a param as removing one and adding another.
The old param was unneeded because it's just the opclass's opcintype,
and we're already passing the opclass. Then the new param lets you
optionally ask for an operator that is not `opcintype op opcintype`
but `opcintype op rhstype`. We need this because FKs compare fkattr <@
range_agg(pkattr)`, and range_agg returns a multirange, not a range.
Even if we hard-code range_agg, the easiest way to get the operator is
to use this function, passing ANYMULTIRANGEOID (but better is to pass
whatever the referencedagg support func returns, as the now-separate
multirange/custom type patch does).

> - src/backend/commands/tablecmds.c
>
> is_temporal and similar should be renamed to with_period or similar
> throughout this patch.

Done.

> In transformFkeyGetPrimaryKey():
>
>       * Now build the list of PK attributes from the indkey definition (we
> -    * assume a primary key cannot have expressional elements)
> +    * assume a primary key cannot have expressional elements, unless it
> +    * has a PERIOD)
>
> I think the original statement is still true even with PERIOD.  The
> expressional elements refer to expression indexes.  I don't think we can
> have a PERIOD marker on an expression?

You're right: I wrote this back before PERIODs became GENERATED
columns. Updated now.

> - src/backend/utils/adt/ri_triggers.c
>
> Please remove the separate trigger functions for the period case.  They
> are the same as the non-period ones, so we don't need separate ones.
> The difference is handled lower in the call stack, which I think is a
> good setup.  Removing the separate functions also removes a lot of extra
> code in other parts of the patch.

Done. The later patch for FKs with CASCADE/SET NULL/SET DEFAULT still
has separate functions (since they call actually-different
implementations), but I will see if I can unify things a bit more
there.

> - src/include/catalog/pg_constraint.h
>
> Should also update catalogs.sgml accordingly.

Looks like you did this already in 030e10ff1a.

> - src/test/regress/expected/without_overlaps.out
> - src/test/regress/sql/without_overlaps.sql
>
> A few general comments on the tests:
>
> - In the INSERT commands, specify the column names explicitly.  This
> makes the tests easier to read (especially since the column order
> between the PK and the FK table is sometimes different).

Okay.

> - Let's try to make it so that the inserted literals match the values
> shown in the various error messages, so it's easier to match them up.
> So, change the int4range literals to half-open notation.  And also maybe
> change the date output format to ISO.

Done. Also changed the tsrange cols to daterange and made them
YYYY-MM-DD. This is much easier to read IMO.

Note there were already a few tsrange columns in the PK tests, so I
changed those separately in the very first patch here.

> - In various comments, instead of test FK "child", maybe use
> "referencing table"?  Instead of "parent", use "referenced table" (or
> primary key table).  When I read child and parent I was looking for
> inheritance.

Done.

> - Consider truncating the test tables before each major block of tests
> and refilling them with fresh data.  So it's easier to eyeball the
> tests.  Otherwise, there is too much dependency on what earlier tests
> left behind.

Done. This will also let me reuse ids in the FOR PORTION OF
partitioned table tests, but that's not done yet.

> A specific question:
>
> In this test, a PERIOD marker on the referenced site is automatically
> inferred from the primary key:
>
> +-- with inferred PK on the referenced table:
> +CREATE TABLE temporal_fk_rng2rng (
> +   id int4range,
> +   valid_at tsrange,
> +   parent_id int4range,
> +   CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT
> OVERLAPS),
> +   CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD
> valid_at)
> +       REFERENCES temporal_rng
> +);
>
> In your patch, this succeeds.  According to the SQL standard, it should
> not.  In subclause 11.8, syntax rule 4b:
>
> """
> Otherwise, the table descriptor of the referenced table shall include a
> unique constraint UC that specifies PRIMARY KEY. The table constraint
> descriptor of UC shall not include an application time period name.
> """
>
> So this case is apparently explicitly ruled out.
>
> (It might be ok to make an extension here, but then we should be
> explicit about it.)

Okay, I agree it doesn't match the standard. IMO our behavior is
better, but the patches here should let you go either way. The main FK
patch keeps the old behavior, but there is a follow-up patch doing
what the standard says. There are some interesting implications, which
you can see by looking at the test changes in that patch. Basically
you can never give an inferred REFERENCES against a temporal table.
Either your FK has a PERIOD element, and it fails because we exclude
the PK's WITHOUT OVERLAPS in the inferred attributes, or your FK does
not have a PERIOD element, and it fails because you want a PK side
that is genuinely unique, but the PK index has a temporal definition
of "unique" (and is not B-tree but GiST).

I don't see any drawbacks from supporting inferred REFERENCES with
temporal tables, so my vote is to break from the standard here, and
*not* apply that follow-up patch. Should I add some docs about that?
Also skipping the patch will cause some annoying merge conflicts, so
let me know if that's what you choose and I'll handle them right away.

Btw I tried checking what other vendors do here, but no one supports
temporal FKs yet! MS SQL Server doesn't support application time at
all. Oracle and MariaDB don't support temporal PKs or FKs. And IBM DB2
only supports temporal PKs. Actually DB2's docs in 2019 were
*claiming* they supported temporal FKs, but it didn't work for me or
at least one other person posting in their forums. And the latest docs
no longer mention it.[1] I wrote about trying to make it work in my
survey of other vendors.[2] The old docs are now a 404,[3] as is the
forums post.[4] My DB2 test code is below in case anyone else wants to
try.[5] So there is no precedent here for us to follow.

Incidentally, here are two non-standard things I would like to add "some day":

1. FKs from non-temporal tables to temporal tables. Right now temporal
tables are "contagious", which can be annoying. Maybe a non-temporal
record is valid as long as a referenced temporal row exists at *any
time*. You can't do that today. You can't even add an additional
UNIQUE constraint, because there are surely duplicates that invalidate
it. This kind of FK would be satisfied if *at least one* reference
exists.

2. FKs from a single-timestamp table to a temporal table. Maybe the
referring table is an "event" with no duration, but it is valid as
long as the referenced table contains it. A workaround is to have a
range that is `[t,t]`, but that's annoying.

Anyway that's not important for these patches. As far as I can tell,
whatever we choose re inferred PERIOD in REFERENCES keeps our options
open for those ideas.

One more thought: if we wanted to be cheekily compatible with the
standard, we could infer *range types* that are WITHOUT OVERLAPs but
not true PERIOD objects. "The table constraint descriptor of UC shall
not include an application time period name." If it's a rangetype
column, then it doesn't include a period name. :-P. So then we would
skip the follow-up patch here but I could work it into the final patch
for PERIOD support. This is probably not the wisest choice, although I
guess it does let us defer deciding what to do.

On Mon, Mar 11, 2024 at 7:45 PM jian he <jian.universality@gmail.com> wrote:
> typo "referenced_agg", in the gist-extensibility.html page is "referencedagg"
> <literal>WITHOUT PORTION</literal> should be <literal>WITHOUT OVERLAPS</literal>

Good catch! Fixed.

> +      While the non-<literal>PERIOD</literal> columns are treated normally
> +      (and there must be at least one of them),
> +      the <literal>PERIOD</literal> column is not compared for equality.
> the above sentence didn't say what is "normally"?
> maybe we can do the following:
> +      While the non-<literal>PERIOD</literal> columns are treated
> + normally for equality
> +      (and there must be at least one of them),
> +      the <literal>PERIOD</literal> column is not compared for equality.

Reworked the language here.

> my_range_agg_transfn error message is inconsistent?
>  `elog(ERROR, "range_agg_transfn called in non-aggregate context");`
> `elog(ERROR, "range_agg must be called with a range");`
> maybe just `my_range_agg_transfn`, instead of mention
> {range_agg_transfn|range_agg}
> similarly my_range_agg_finalfn error is also inconsistent.

This matches what other aggs do (e.g. array_agg, json_agg, etc.) as
well as the actual core range_agg code. And I think it is an
appropriate difference. You only hit the first error if you are
invoking the transfn directly, so that's what we should say. OTOH you
hit the second error by calling the aggregate function, but with the
wrong type. So the error message should mention the aggregate
function.

> my_range_agg_finalfn need  `type_is_multirange(mltrngtypoid)`?

This isn't part of the core range_agg_finalfn, so I'd rather not
include it here. And I don't think it is needed. You would only get a
non-multirange if the transfn does something wrong, and even if it
does, the error will be caught and reported in
multirange_get_typcache.

On Mon, Mar 11, 2024 at 7:47 PM jian he <jian.universality@gmail.com> wrote:
> maybe just change the tsrange type to daterange, then the dot out file
> will be far less verbose.

Agreed, done.

> minor issues while reviewing v27, 0001 to 0004.
> transformFkeyGetPrimaryKey comments need to update,
> since bool pk_period also returned.

pk_period is no longer returned in this latest patch.

> +/*
> + * FindFKComparisonOperators -
> + *
> + * Gets the operators for pfeqopOut, ppeqopOut, and ffeqopOut.
> + * Sets old_check_ok if we can avoid re-validating the constraint.
> + * Sets old_pfeqop_item to the old pfeqop values.
> + */
> +static void
> +FindFKComparisonOperators(Constraint *fkconstraint,
> +  AlteredTableInfo *tab,
> +  int i,
> +  int16 *fkattnum,
> +  bool *old_check_ok,
> +  ListCell **old_pfeqop_item,
> +  Oid pktype, Oid fktype, Oid opclass,
> +  Oid *pfeqopOut, Oid *ppeqopOut, Oid *ffeqopOut)
>
> I think the above comments is
> `Sets the operators for pfeqopOut, ppeqopOut, and ffeqopOut.`.

This whole function is removed.

> + if (is_temporal)
> + {
> + if (!fkconstraint->fk_with_period)
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> + errmsg("foreign key uses PERIOD on the referenced table but not the
> referencing table")));
> + }
> can be
> if (is_temporal && !fkconstraint->fk_with_period)
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> errmsg("foreign key uses PERIOD on the referenced table but not the
> referencing table")));

The patch about inferred REFERENCES moves things around a bit, so this
no longer applies.

> + if (is_temporal)
> + {
> + if (!fkconstraint->pk_with_period)
> + /* Since we got pk_attrs, one should be a period. */
> + ereport(ERROR,
> + (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> + errmsg("foreign key uses PERIOD on the referencing table but not the
> referenced table")));
> + }
> can be
> if (is_temporal && !fkconstraint->pk_with_period)
> /* Since we got pk_attrs, one should be a period. */
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> errmsg("foreign key uses PERIOD on the referencing table but not the
> referenced table")));

Likewise.

> refactor decompile_column_index_array seems unnecessary.
> Peter already mentioned it at [1], I have tried to fix it at [2].

No, that conversation is about handling WITHOUT OVERLAPS, not PERIOD.
Because the syntax is `valid_at WITHOUT OVERLAPS` but `PERIOD
valid_at` (post vs pre), we must handle PERIOD inside the function.

> I  don't understand the "expression elements" in the comments, most of
> the tests case is like

Covered above in Peter's feedback.

> `
> PRIMARY KEY (id, valid_at WITHOUT OVERLAPS)
> `
> + *pk_period = (indexStruct->indisexclusion);
> can be
> `+ *pk_period = indexStruct->indisexclusion;`

No longer included here.

On Wed, Mar 13, 2024 at 5:00 PM jian he <jian.universality@gmail.com> wrote:
> @@ -118,12 +120,17 @@ typedef struct RI_ConstraintInfo
>   int16 confdelsetcols[RI_MAX_NUMKEYS]; /* attnums of cols to set on
>   * delete */
>   char confmatchtype; /* foreign key's match type */
> + bool temporal; /* if the foreign key is temporal */
>   int nkeys; /* number of key columns */
>   int16 pk_attnums[RI_MAX_NUMKEYS]; /* attnums of referenced cols */
>   int16 fk_attnums[RI_MAX_NUMKEYS]; /* attnums of referencing cols */
>   Oid pf_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = FK) */
>   Oid pp_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = PK) */
>   Oid ff_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (FK = FK) */
> + Oid period_contained_by_oper; /* operator for PERIOD SQL */
> + Oid agged_period_contained_by_oper; /* operator for PERIOD SQL */
> + Oid period_referenced_agg_proc; /* proc for PERIOD SQL */
> + Oid period_referenced_agg_rettype; /* rettype for previous */
>
> the comment seems not clear to me. Here is my understanding about it:
> period_contained_by_oper is the operator where a single period/range
> contained by a single period/range.
> agged_period_contained_by_oper is the operator oid where a period
> contained by a bound of periods
> period_referenced_agg_proc is the oprcode of the agged_period_contained_by_oper.
> period_referenced_agg_rettype is the function
> period_referenced_agg_proc returning data type.

Expanded these comments a bit.

Thanks to you both for such detailed, careful feedback!

Rebased to 605062227f.

If anything else comes up re FKs I'll tackle that first, but otherwise
I think I will work on some of the outstanding issues in the FOR
PORTION OF patch (e.g. trigger transition table names). I may
experiment with handling the leftover inserts as a separate executor
node. If anyone has advice there I'm happy to hear it!

Yours,
Paul

[1] https://www.ibm.com/docs/en/db2/11.5?topic=statements-alter-table
[2] https://illuminatedcomputing.com/posts/2019/08/sql2011-survey/
[3] https://www.ibm.com/support/knowledgecenter/en/SSEPEK_10.0.0/intro/src/tpc/db2z_integrity.html
[4] https://www.ibm.com/developerworks/community/forums/html/topic?id=440e07ad-23ee-4b0a-ae23-8c747abca819
[5] Here is DB2 test code showing temporal FKs don't work. (Note they
disobey the standard re declaring `PERIOD p (s, e)` not `PERIOD FOR p
(s, e)`, and it must be named `business_time`.)

```
create table t (id integer not null, ds date not null, de date not
null, name varchar(4000), period business_time (ds, de));
alter table t add constraint tpk primary key (id, business_time
without overlaps)
insert into t values (1, '2000-01-01', '2001-01-01', 'foo');
create table fk (id integer, ds date not null, de date not null,
period business_time (ds, de));

-- all this fails:
alter table fk add constraint fkfk foreign key (id, period
business_time) references t (id, period business_time);
alter table fk add constraint fkfk foreign key (id, business_time)
references t (id, business_time);
alter table fk add constraint fkfk foreign key (id, period
business_time) references t;
alter table fk add constraint fkfk foreign key (id, business_time) references t;
alter table fk add constraint fkfk foreign key (id, period for
business_time) references t;
alter table fk add constraint fkfk foreign key (id, period for
business_time) references t (id, period for business_time);
alter table fk add constraint fkfk foreign key (id, business_time
without overlaps) references t;
alter table fk add constraint fkfk foreign key (id, business_time
without overlaps) references t (id, business_time without overlaps);
alter table fk add constraint fkfk foreign key (id) references t;
alter table fk add constraint fkfk foreign key (id) references t (id);
```

Вложения

Re: SQL:2011 application time

От

jian he

Дата:

18 марта, 02:30:00

Hi, minor issues from 00001 to 0005.
+      <row>
+       <entry><function>referencedagg</function></entry>
+       <entry>aggregates referenced rows' <literal>WITHOUT OVERLAPS</literal>
+        part</entry>
+       <entry>13</entry>
+      </row>
comparing with surrounding items, maybe need to add `(optional)`?
I think the explanation is not good as explained in referencedagg entry below:
      <para>
       An aggregate function. Given values of this opclass,
       it returns a value combining them all. The return value
       need not be the same type as the input, but it must be a
       type that can appear on the right hand side of the "contained by"
       operator. For example the built-in <literal>range_ops</literal>
       opclass uses <literal>range_agg</literal> here, so that foreign
       keys can check <literal>fkperiod @> range_agg(pkperiod)</literal>.
      </para>


+      In other words, the reference must have a referent for its
entire duration.
+      This column must be a column with a range type.
+      In addition the referenced table must have a primary key
+      or unique constraint declared with <literal>WITHOUT PORTION</literal>.
+     </para>
seems you missed replacing this one.


in v28-0002, the function name is FindFKPeriodOpers,
then in v28-0005 rename it to FindFKPeriodOpersAndProcs?
renaming the function name in a set of patches seems not a good idea?


+      <para>
+       This is used for temporal foreign key constraints.
+       If you omit this support function, your type cannot be used
+       as the <literal>PERIOD</literal> part of a foreign key.
+      </para>
in v28-0004, I think here "your type"  should change to "your opclass"?

+bool
+check_amproc_is_aggregate(Oid funcid)
+{
+ bool result;
+ HeapTuple tp;
+ Form_pg_proc procform;
+
+ tp = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
+ if (!HeapTupleIsValid(tp))
+ elog(ERROR, "cache lookup failed for function %u", funcid);
+ procform = (Form_pg_proc) GETSTRUCT(tp);
+ result = procform->prokind == 'a';
+ ReleaseSysCache(tp);
+ return result;
+}
maybe
`
change procform->prokind == 'a';
`
to
`
procform->prokind == PROKIND_AGGREGATE;
`
or we can put the whole function to cache/lsyscache.c
name it just as proc_is_aggregate.


-  Added pg_dump support.
- Show the correct syntax in psql \d output for foreign keys.
in 28-0002, seems there is no work to correspond to these 2 items in
the commit message?


@@ -12335,7 +12448,8 @@ validateForeignKeyConstraint(char *conname,
  Relation rel,
  Relation pkrel,
  Oid pkindOid,
- Oid constraintOid)
+ Oid constraintOid,
+ bool temporal)
do you need to change the last argument of this function to "is_period"?


+ sprintf(paramname, "$%d", riinfo->nkeys);
+ sprintf(paramname, "$%d", riinfo->nkeys);
do you think it worth the trouble to change to snprintf, I found
related post on [1].

[1] https://stackoverflow.com/a/7316500/15603477

Re: SQL:2011 application time

От

jian he

Дата:

18 марта, 07:47:18

one more minor issue related to error reporting.
I've only applied v28, 0001 to 0005.

-- (parent_id, valid_at) REFERENCES [implicit]
-- FOREIGN KEY part should specify PERIOD
CREATE TABLE temporal_fk_rng2rng (
id int4range,
valid_at daterange,
parent_id int4range,
CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS),
CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, valid_at)
REFERENCES temporal_rng
);
ERROR:  number of referencing and referenced columns for foreign key disagree

-- (parent_id, PERIOD valid_at) REFERENCES (id)
CREATE TABLE temporal_fk_rng2rng (
id int4range,
valid_at daterange,
parent_id int4range,
CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS),
CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at)
REFERENCES temporal_rng (id)
);
ERROR:  foreign key uses PERIOD on the referencing table but not the
referenced table

these error messages seem somehow inconsistent with the comments above?


+ else
+ {
+ /*
+ * Check it's a btree; currently this can never fail since no other
+ * index AMs support unique indexes.  If we ever did have other types
+ * of unique indexes, we'd need a way to determine which operator
+ * strategy number is equality.  (Is it reasonable to insist that
+ * every such index AM use btree's number for equality?)
+ */
+ if (amid != BTREE_AM_OID)
+ elog(ERROR, "only b-tree indexes are supported for foreign keys");
+ eqstrategy = BTEqualStrategyNumber;
+ }

the comments say never fail.
but it actually failed. see:

+-- (parent_id) REFERENCES [implicit]
+-- This finds the PK (omitting the WITHOUT OVERLAPS element),
+-- but it's not a b-tree index, so it fails anyway.
+-- Anyway it must fail because the two sides have a different
definition of "unique".
+CREATE TABLE temporal_fk_rng2rng (
+ id int4range,
+ valid_at daterange,
+ parent_id int4range,
+ CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS),
+ CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id)
+ REFERENCES temporal_rng
+);
+ERROR:  only b-tree indexes are supported for foreign keys

because in transformFkeyGetPrimaryKey.
we have `if (indexStruct->indisexclusion && i == indexStruct->indnatts - 1)`
we have pk_with_period, fk_with_period in Constraint struct.

maybe we can add a bool argument to transformFkeyGetPrimaryKey
indicate, this primary key is a conperiod constraint.
then we can check condition: the primary key is a conperiod constraint
and fk_with_period or is pk_with_period is false

I've made a patch to make these error reporting more accurate.
you can further refine it.

Вложения

v28-0001-refactor-transformFkeyGetPrimaryKey-for-bette.no-cfbot

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

19 марта, 01:49:28

Hi All,

A few more changes here:

On 3/17/24 16:30, jian he wrote:
 > Hi, minor issues from 00001 to 0005.
 > +      <row>
 > +       <entry><function>referencedagg</function></entry>
 > +       <entry>aggregates referenced rows' <literal>WITHOUT OVERLAPS</literal>
 > +        part</entry>
 > +       <entry>13</entry>
 > +      </row>
 > comparing with surrounding items, maybe need to add `(optional)`?

We do say this function is optional above, in the list of support functions. That seems to be the 
normal approach. The only other support function that mentions being optional elsewhere is sortsupport.

 > I think the explanation is not good as explained in referencedagg entry below:
 >        <para>
 >         An aggregate function. Given values of this opclass,
 >         it returns a value combining them all. The return value
 >         need not be the same type as the input, but it must be a
 >         type that can appear on the right hand side of the "contained by"
 >         operator. For example the built-in <literal>range_ops</literal>
 >         opclass uses <literal>range_agg</literal> here, so that foreign
 >         keys can check <literal>fkperiod @> range_agg(pkperiod)</literal>.
 >        </para>

Can you explain what you'd like to see improved here?

 > +      In other words, the reference must have a referent for its
 > entire duration.
 > +      This column must be a column with a range type.
 > +      In addition the referenced table must have a primary key
 > +      or unique constraint declared with <literal>WITHOUT PORTION</literal>.
 > +     </para>
 > seems you missed replacing this one.

I'm not sure what this is referring to. Replaced what?

 > in v28-0002, the function name is FindFKPeriodOpers,
 > then in v28-0005 rename it to FindFKPeriodOpersAndProcs?
 > renaming the function name in a set of patches seems not a good idea?

We'll only apply part 5 if we support more than range types (though I think that would be great). It 
doesn't make sense to name this function FindFKPeriodOpersAndProcs when it isn't yet finding a proc. 
If it's a problem to rename it in part 5 perhaps the commits should be squashed by the committer? 
But I don't see the problem really.

 > +      <para>
 > +       This is used for temporal foreign key constraints.
 > +       If you omit this support function, your type cannot be used
 > +       as the <literal>PERIOD</literal> part of a foreign key.
 > +      </para>
 > in v28-0004, I think here "your type"  should change to "your opclass"?

I think "your type" addresses what the user is more likely to care about, but I added some 
clarification here.

 > +bool
 > +check_amproc_is_aggregate(Oid funcid)
 > +{
 > + bool result;
 > + HeapTuple tp;
 > + Form_pg_proc procform;
 > +
 > + tp = SearchSysCache1(PROCOID, ObjectIdGetDatum(funcid));
 > + if (!HeapTupleIsValid(tp))
 > + elog(ERROR, "cache lookup failed for function %u", funcid);
 > + procform = (Form_pg_proc) GETSTRUCT(tp);
 > + result = procform->prokind == 'a';
 > + ReleaseSysCache(tp);
 > + return result;
 > +}
 > maybe
 > `
 > change procform->prokind == 'a';
 > `
 > to
 > `
 > procform->prokind == PROKIND_AGGREGATE;
 > `
 > or we can put the whole function to cache/lsyscache.c
 > name it just as proc_is_aggregate.

Added the constant reference. Since lsyscache.c already has get_func_prokind, I changed the gist 
validation function to call that directly.

 > -  Added pg_dump support.
 > - Show the correct syntax in psql \d output for foreign keys.
 > in 28-0002, seems there is no work to correspond to these 2 items in
 > the commit message?

The changes to psql and pg_dump happen in pg_get_constraintdef_worker and 
decompile_column_index_array (both in ruleutils.c).

 > @@ -12335,7 +12448,8 @@ validateForeignKeyConstraint(char *conname,
 >    Relation rel,
 >    Relation pkrel,
 >    Oid pkindOid,
 > - Oid constraintOid)
 > + Oid constraintOid,
 > + bool temporal)
 > do you need to change the last argument of this function to "is_period"?

Changed to hasperiod.

 > + sprintf(paramname, "$%d", riinfo->nkeys);
 > + sprintf(paramname, "$%d", riinfo->nkeys);
 > do you think it worth the trouble to change to snprintf, I found
 > related post on [1].
 >
 > [1] https://stackoverflow.com/a/7316500/15603477

paramname holds 16 chars so I don't think there is any risk of an int overflowing here. The existing 
foreign key code already uses sprintf, so I don't think it makes sense to be inconsistent here. And 
if we want to change it it should probably be in a separate commit, not buried in a commit about 
adding temporal foreign keys.

On 3/17/24 21:47, jian he wrote:
 > one more minor issue related to error reporting.
 > I've only applied v28, 0001 to 0005.
 >
 > -- (parent_id, valid_at) REFERENCES [implicit]
 > -- FOREIGN KEY part should specify PERIOD
 > CREATE TABLE temporal_fk_rng2rng (
 > id int4range,
 > valid_at daterange,
 > parent_id int4range,
 > CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS),
 > CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, valid_at)
 > REFERENCES temporal_rng
 > );
 > ERROR:  number of referencing and referenced columns for foreign key disagree
 >
 > -- (parent_id, PERIOD valid_at) REFERENCES (id)
 > CREATE TABLE temporal_fk_rng2rng (
 > id int4range,
 > valid_at daterange,
 > parent_id int4range,
 > CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS),
 > CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id, PERIOD valid_at)
 > REFERENCES temporal_rng (id)
 > );
 > ERROR:  foreign key uses PERIOD on the referencing table but not the
 > referenced table
 >
 > these error messages seem somehow inconsistent with the comments above?

Clarified the comments.

 > + else
 > + {
 > + /*
 > + * Check it's a btree; currently this can never fail since no other
 > + * index AMs support unique indexes.  If we ever did have other types
 > + * of unique indexes, we'd need a way to determine which operator
 > + * strategy number is equality.  (Is it reasonable to insist that
 > + * every such index AM use btree's number for equality?)
 > + */
 > + if (amid != BTREE_AM_OID)
 > + elog(ERROR, "only b-tree indexes are supported for foreign keys");
 > + eqstrategy = BTEqualStrategyNumber;
 > + }
 >
 > the comments say never fail.
 > but it actually failed. see:
 >
 > +-- (parent_id) REFERENCES [implicit]
 > +-- This finds the PK (omitting the WITHOUT OVERLAPS element),
 > +-- but it's not a b-tree index, so it fails anyway.
 > +-- Anyway it must fail because the two sides have a different
 > definition of "unique".
 > +CREATE TABLE temporal_fk_rng2rng (
 > + id int4range,
 > + valid_at daterange,
 > + parent_id int4range,
 > + CONSTRAINT temporal_fk_rng2rng_pk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS),
 > + CONSTRAINT temporal_fk_rng2rng_fk FOREIGN KEY (parent_id)
 > + REFERENCES temporal_rng
 > +);
 > +ERROR:  only b-tree indexes are supported for foreign keys

You're right, now that we have temporal primary keys the comment is out-of-date.
You can reach that error message by creating a regular foreign key against a temporal primary key.

Perhaps we should update the comment separately, although I haven't added a new patch for that here.
I did update the comment as part of this FK patch. I also added "non-PERIOD" to the error message
(which only makes sense in the FK patch). Since the error message was impossible before, I assume 
that is no problem. I think this is a simpler fix than what you have in your attached patch. In 
addition your patch doesn't work if we include part 3 here: see Peter's feedback about the SQL 
standard and my reply.

Rebased to 846311051e.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

On 3/19/24 04:02, Peter Eisentraut wrote:
 > On 16.03.24 22:37, Paul A Jungwirth wrote:
 >> Here is a new patch series addressing the last few feedback emails
 >> from Peter & Jian He. It mostly focuses on the FKs patch, trying to
 >> get it really ready to commit,
 >
 > I have committed the test changes (range and date format etc.).
 >
 > The FOREIGN KEY patch looks okay to me now.  Maybe check if any of the subsequent comments from jian
 > should be applied.

Okay, specifics below.

 > I think we could also handle multiranges in a hardcoded way?  Ranges and multiranges are hardcoded
 > concepts anyway.  It's just when we move to arbitrary types supporting containment, then it gets a
 > bit more complicated.
 >
 > What would a patch that adds just multiranges on the FK side, but without the full pluggable gist
 > support, look like?

Attached a separate patch extending FKs to multiranges only. I'd still love to support arbitrary 
types eventually but it's not part of the patches here now.

 >> I don't see any drawbacks from supporting inferred REFERENCES with
 >> temporal tables, so my vote is to break from the standard here, and
 >> *not* apply that follow-up patch. Should I add some docs about that?
 >> Also skipping the patch will cause some annoying merge conflicts, so
 >> let me know if that's what you choose and I'll handle them right away.
 >
 > I agree we can allow this.

Great, thanks! Took out those changes.

On 3/19/24 02:01, jian he wrote:
 > + * types matching the PERIOD element. periodprocoid is a GiST support
 > function to
 > + * aggregate multiple PERIOD element values into a single value
 > + * (whose return type need not match its inputs,
 > + * e.g. many ranges can be aggregated into a multirange).
 >    * And aggedperiodoperoid is also a ContainedBy operator,
 > - * but one whose rhs is anymultirange.
 > + * but one whose rhs matches the type returned by aggedperiodoperoid.
 >    * That way foreign keys can compare fkattr <@ range_agg(pkattr).
 >    */
 >   void
 > -FindFKPeriodOpers(Oid opclass,
 > -  Oid *periodoperoid,
 > -  Oid *aggedperiodoperoid)
 > +FindFKPeriodOpersAndProcs(Oid opclass,
 > +  Oid *periodoperoid,
 > +  Oid *aggedperiodoperoid,
 > +  Oid *periodprocoid)
 >
 > I think, aggedperiodoperoid is more descriptive than periodprocoid, in
 > 0005, you don't need to rename it.
 > aslo do we need to squash v29 0001 to 0005 together?

I changed the operator names to {,agged}containedbyoperoid. The proc names are not included now 
because we only need them for supporting more than ranges + multiranges.

 > --- a/doc/src/sgml/ref/create_table.sgml
 > +++ b/doc/src/sgml/ref/create_table.sgml
 > @@ -1167,7 +1167,8 @@ WITH ( MODULUS <replaceable
 > class="parameter">numeric_literal</replaceable>, REM
 >         column(s) of some row of the referenced table.  If the <replaceable
 >         class="parameter">refcolumn</replaceable> list is omitted, the
 >         primary key of the <replaceable class="parameter">reftable</replaceable>
 > -      is used.  Otherwise, the <replaceable
 > class="parameter">refcolumn</replaceable>
 > +      is used (omitting any part declared with <literal>WITHOUT
 > OVERLAPS</literal>).
 > +      Otherwise, the <replaceable class="parameter">refcolumn</replaceable>
 >         list must refer to the columns of a non-deferrable unique or primary key
 >         constraint or be the columns of a non-partial unique index.
 >        </para>
 > I think this does not express that
 > foreign key is PERIOD, then the last column of refcolumn must specify PERIOD?

Okay, added a sentence about that (and adjusted some other things re allowing implicit REFERENCES 
and only supporting ranges + multiranges).

 > +     <para>
 > +      If the last column is marked with <literal>PERIOD</literal>,
 > +      it is treated in a special way.
 > +      While the non-<literal>PERIOD</literal> columns are compared for equality
 > +      (and there must be at least one of them),
 > +      the <literal>PERIOD</literal> column is not.
 > +      Instead the constraint is considered satisfied
 > +      if the referenced table has matching records
 > +      (based on the non-<literal>PERIOD</literal> parts of the key)
 > +      whose combined <literal>PERIOD</literal> values completely cover
 > +      the referencing record's.
 > +      In other words, the reference must have a referent for its
 > entire duration.
 > +      This column must be a column with a range type.
 > +      In addition the referenced table must have a primary key
 > +      or unique constraint declared with <literal>WITHOUT PORTION</literal>.
 > +     </para>
 > you forgot to change  <literal>WITHOUT PORTION</literal> to
 > <literal>WITHOUT OVERLAPS</literal>

Oh! Thanks, I guess I was just blind.

 > Oid pf_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = FK) */
 > Oid pp_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (PK = PK) */
 > Oid ff_eq_oprs[RI_MAX_NUMKEYS]; /* equality operators (FK = FK) */
 > in struct RI_ConstraintInfo, these comments need to be updated?

In earlier feedback Peter advised not changing the "equals" language (e.g. in KeysEqual). But I 
added a comment at the top of the struct to clarify.

Rebased to 605721f819.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

Re: SQL:2011 application time

От

jian he

Дата:

20 марта, 13:55:14

hi.
minor cosmetic issues, other than that, looks good.

*pk_period = (indexStruct->indisexclusion);
to
*pk_period = indexStruct->indisexclusion;


if (with_period)
{
if (!fkconstraint->fk_with_period)
ereport(ERROR,
(errcode(ERRCODE_INVALID_FOREIGN_KEY),
errmsg("foreign key uses PERIOD on the referenced table but not the
referencing table")));
}

change to

if (with_period && !fkconstraint->fk_with_period)
ereport(ERROR,
(errcode(ERRCODE_INVALID_FOREIGN_KEY),
errmsg("foreign key uses PERIOD on the referenced table but not the
referencing table")));

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

20 марта, 19:21:44

On 3/20/24 03:55, jian he wrote:
> hi.
> minor cosmetic issues, other than that, looks good.
> 
> *pk_period = (indexStruct->indisexclusion);
> to
> *pk_period = indexStruct->indisexclusion;
> 
> ... >
> if (with_period && !fkconstraint->fk_with_period)
> ereport(ERROR,
> (errcode(ERRCODE_INVALID_FOREIGN_KEY),
> errmsg("foreign key uses PERIOD on the referenced table but not the
> referencing table")));

Both included in the new patches here.

Rebased to a0390f6ca6.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

Re: SQL:2011 application time

От

jian he

Дата:

21 марта, 11:25:15

with foreign key "no action",
in a transaction, we can first insert foreign key data, then primary key data.
also the update/delete can fail at the end of transaction.

based on [1] explanation about the difference between "no action" and
"restrict".
I only refactor the v31-0002-Support-multiranges-in-temporal-FKs.patch test.


[1 https://stackoverflow.com/questions/14921668/difference-between-restrict-and-no-action

Вложения

v31-0001-refactor-temporal-FKs-with-multiranges-regressio.no-cfbot

Re: SQL:2011 application time

От

Peter Eisentraut

Дата:

21 марта, 17:57:20

On 20.03.24 17:21, Paul Jungwirth wrote:
> On 3/20/24 03:55, jian he wrote:
>> hi.
>> minor cosmetic issues, other than that, looks good.
>>
>> *pk_period = (indexStruct->indisexclusion);
>> to
>> *pk_period = indexStruct->indisexclusion;
>>
>> ... >
>> if (with_period && !fkconstraint->fk_with_period)
>> ereport(ERROR,
>> (errcode(ERRCODE_INVALID_FOREIGN_KEY),
>> errmsg("foreign key uses PERIOD on the referenced table but not the
>> referencing table")));
> 
> Both included in the new patches here.
> 
> Rebased to a0390f6ca6.

Two more questions:

1. In ri_triggers.c ri_KeysEqual, you swap the order of arguments to 
ri_AttributesEqual():

-           if (!ri_AttributesEqual(riinfo->ff_eq_oprs[i], 
RIAttType(rel, attnums[i]),
-                                   oldvalue, newvalue))
+           if (!ri_AttributesEqual(eq_opr, RIAttType(rel, attnums[i]),
+                                   newvalue, oldvalue))

But the declared arguments of ri_AttributesEqual() are oldvalue and 
newvalue, so passing them backwards is really confusing.  And the change 
does matter in the tests.

Can we organize this better?

2. There are some tests that error with

ERROR:  only b-tree indexes are supported for non-PERIOD foreign keys

But this is an elog() error, so should not normally be visible.  I 
suspect some other error should really show here, and the order of 
checks is a bit wrong or something?

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

22 марта, 03:35:53

v32 attached.

On 3/21/24 07:57, Peter Eisentraut wrote:
 > Two more questions:
 >
 > 1. In ri_triggers.c ri_KeysEqual, you swap the order of arguments to ri_AttributesEqual():
 >
 > -           if (!ri_AttributesEqual(riinfo->ff_eq_oprs[i], RIAttType(rel, attnums[i]),
 > -                                   oldvalue, newvalue))
 > +           if (!ri_AttributesEqual(eq_opr, RIAttType(rel, attnums[i]),
 > +                                   newvalue, oldvalue))
 >
 > But the declared arguments of ri_AttributesEqual() are oldvalue and newvalue, so passing them
 > backwards is really confusing.  And the change does matter in the tests.
 >
 > Can we organize this better?

I renamed the params and actually the whole function. All it's doing is execute `oldvalue op 
newvalue`, casting if necessary. So I changed it to ri_CompareWithCast and added some documentation. 
In an earlier version of this patch I had a separate function for the PERIOD comparison, but it's 
just doing the same thing, so I think the best thing is to give the function a more accurate name 
and use it.

 > 2. There are some tests that error with
 >
 > ERROR:  only b-tree indexes are supported for non-PERIOD foreign keys
 >
 > But this is an elog() error, so should not normally be visible.  I suspect some other error should
 > really show here, and the order of checks is a bit wrong or something?

At first I thought I should just make this ereport, because it is reachable now, but I didn't like 
the error message or where we were reaching it. The high-level problem is defining a non-temporal FK
against a temporal PK, and we should check for that in those terms, not when looking at individual 
attribute opclasses. So I added a check prior to this and gave it a more descriptive error message.

On 3/21/24 01:25, jian he wrote:
 > with foreign key "no action",
 > in a transaction, we can first insert foreign key data, then primary key data.
 > also the update/delete can fail at the end of transaction.
 >
 > based on [1] explanation about the difference between "no action" and
 > "restrict".
 > I only refactor the v31-0002-Support-multiranges-in-temporal-FKs.patch test.

I added some tests for deferred NO ACTION checks. I added them for all of range/multirange/PERIOD. I 
also adopted your change ALTERing the constraint for NO ACTION (even though it's already that), to
make each test section more independent. Your patch had a lot of other noisy changes, e.g. 
whitespace and reordering lines. If there are other things you intended to add to the tests, can you 
describe them?

Rebased to 7e65ad197f.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

Re: SQL:2011 application time

От

jian he

Дата:

22 марта, 08:33:37

On Fri, Mar 22, 2024 at 8:35 AM Paul Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> Your patch had a lot of other noisy changes, e.g.
> whitespace and reordering lines. If there are other things you intended to add to the tests, can you
> describe them?

i think on update restrict, on delete restrict cannot be deferred,
even if you set it DEFERRABLE INITIALLY DEFERRED.
based on this idea, I made minor change on
v32-0002-Support-multiranges-in-temporal-FKs.patch

other than that, v32, 0002 looks good.

Вложения

v32-0001-minor-refactor-temporal-FKs-with-multiranges-.no-cfbot

Re: SQL:2011 application time

От

Peter Eisentraut

Дата:

22 марта, 18:49:29

On 22.03.24 01:35, Paul Jungwirth wrote:
>  > 1. In ri_triggers.c ri_KeysEqual, you swap the order of arguments to 
> ri_AttributesEqual():
>  >
>  > -           if (!ri_AttributesEqual(riinfo->ff_eq_oprs[i], 
> RIAttType(rel, attnums[i]),
>  > -                                   oldvalue, newvalue))
>  > +           if (!ri_AttributesEqual(eq_opr, RIAttType(rel, attnums[i]),
>  > +                                   newvalue, oldvalue))
>  >
>  > But the declared arguments of ri_AttributesEqual() are oldvalue and 
> newvalue, so passing them
>  > backwards is really confusing.  And the change does matter in the tests.
>  >
>  > Can we organize this better?
> 
> I renamed the params and actually the whole function. All it's doing is 
> execute `oldvalue op newvalue`, casting if necessary. So I changed it to 
> ri_CompareWithCast and added some documentation. In an earlier version 
> of this patch I had a separate function for the PERIOD comparison, but 
> it's just doing the same thing, so I think the best thing is to give the 
> function a more accurate name and use it.

Ok, I see now, and the new explanation is better.

But after reading the comment in the function about collations, I think 
there could be trouble.  As long as we are only comparing for equality 
(and we don't support nondeterministic global collations), then we can 
use any collation to compare for equality.  But if we are doing 
contained-by, then the collation does matter, so we would need to get 
the actual collation somehow.  So as written, this might not always work 
correctly.

I think it would be safer for now if we just kept using the equality 
operation even for temporal foreign keys.  If we did that, then in the 
case that you update a key to a new value that is contained by the old 
value, this function would say "not equal" and fire all the checks, even 
though it wouldn't need to.  This is kind of similar to the "false 
negatives" that the comment already talks about.

What do you think?

Re: SQL:2011 application time

От

jian he

Дата:

23 марта, 06:02:28

On Fri, Mar 22, 2024 at 11:49 PM Peter Eisentraut <peter@eisentraut.org> wrote:
>
> On 22.03.24 01:35, Paul Jungwirth wrote:
> >  > 1. In ri_triggers.c ri_KeysEqual, you swap the order of arguments to
> > ri_AttributesEqual():
> >  >
> >  > -           if (!ri_AttributesEqual(riinfo->ff_eq_oprs[i],
> > RIAttType(rel, attnums[i]),
> >  > -                                   oldvalue, newvalue))
> >  > +           if (!ri_AttributesEqual(eq_opr, RIAttType(rel, attnums[i]),
> >  > +                                   newvalue, oldvalue))
> >  >
> >  > But the declared arguments of ri_AttributesEqual() are oldvalue and
> > newvalue, so passing them
> >  > backwards is really confusing.  And the change does matter in the tests.
> >  >
> >  > Can we organize this better?
> >
> > I renamed the params and actually the whole function. All it's doing is
> > execute `oldvalue op newvalue`, casting if necessary. So I changed it to
> > ri_CompareWithCast and added some documentation. In an earlier version
> > of this patch I had a separate function for the PERIOD comparison, but
> > it's just doing the same thing, so I think the best thing is to give the
> > function a more accurate name and use it.
>
> Ok, I see now, and the new explanation is better.
>
> But after reading the comment in the function about collations, I think
> there could be trouble.  As long as we are only comparing for equality
> (and we don't support nondeterministic global collations), then we can
> use any collation to compare for equality.  But if we are doing
> contained-by, then the collation does matter, so we would need to get
> the actual collation somehow.  So as written, this might not always work
> correctly.
>
> I think it would be safer for now if we just kept using the equality
> operation even for temporal foreign keys.  If we did that, then in the
> case that you update a key to a new value that is contained by the old
> value, this function would say "not equal" and fire all the checks, even
> though it wouldn't need to.  This is kind of similar to the "false
> negatives" that the comment already talks about.
>
> What do you think?
>

we don't need to worry about primary key and foreign key with
different collation.
because it will be error out as incompatible data type,
foreign key constraint will not be created.

if there are the same collation, when we build the query string, we
don't need to worry about collation.
because at runtime, the operator associated oprcode
will fetch collation information later.

main operator and the main oprcode related to this patch(0001, 0002) are:
range_contained_by_multirange
range_eq
range_overlaps
range_contained_by
the first 3 functions will fetch collation information within range_cmp_bounds.
range_contained_by will fetch collation information in
range_contains_elem_internal.

demo:
CREATE COLLATION case_insensitive (provider = icu, locale =
'und-u-ks-level2', deterministic = false);
DROP TABLE IF exists temporal_fk_rng2rng;
DROP TABLE IF exists temporal_rng;
DROP TYPE textrange_case_insensitive;
create type textrange_case_insensitive as range(subtype=text,
collation=case_insensitive);
CREATE TABLE temporal_rng (id int4range, valid_at textrange_case_insensitive);
ALTER TABLE temporal_rng
ADD CONSTRAINT temporal_rng_pk
PRIMARY KEY (id, valid_at WITHOUT OVERLAPS);
CREATE TABLE temporal_fk_rng2rng (
id int4range,
valid_at  textrange_case_insensitive,
parent_id int4range,
CONSTRAINT temporal_fk_rng2rng_fk2 FOREIGN KEY (parent_id, PERIOD valid_at)
REFERENCES temporal_rng (id, PERIOD valid_at)
);
INSERT INTO temporal_rng (id, valid_at) VALUES ('[1,2)',
textrange_case_insensitive('c', 'h','[]'));

--fail
INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id)
VALUES ('[1,2)', textrange_case_insensitive('B', 'B','[]'), '[1,2)');

--fail.
INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id)
VALUES ('[1,2)', textrange_case_insensitive('a', 'F','[]'), '[1,2)');

--fail.
INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id)
VALUES ('[1,2)', textrange_case_insensitive('e', 'Z','[]'), '[1,2)');

--ok
INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id)
VALUES ('[1,2)', textrange_case_insensitive('d', 'F','[]'), '[1,2)');

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

23 марта, 20:42:47

v33 attached with minor changes.

On 3/22/24 20:02, jian he wrote:
 > On Fri, Mar 22, 2024 at 11:49 PM Peter Eisentraut <peter@eisentraut.org> wrote:
 >> But after reading the comment in the function about collations, I think
 >> there could be trouble.  As long as we are only comparing for equality
 >> (and we don't support nondeterministic global collations), then we can
 >> use any collation to compare for equality.  But if we are doing
 >> contained-by, then the collation does matter, so we would need to get
 >> the actual collation somehow.  So as written, this might not always work
 >> correctly.
 >>
 >> I think it would be safer for now if we just kept using the equality
 >> operation even for temporal foreign keys.  If we did that, then in the
 >> case that you update a key to a new value that is contained by the old
 >> value, this function would say "not equal" and fire all the checks, even
 >> though it wouldn't need to.  This is kind of similar to the "false
 >> negatives" that the comment already talks about.
 >>
 >> What do you think?
 >>
 >
 > we don't need to worry about primary key and foreign key with
 > different collation.
 > because it will be error out as incompatible data type,
 > foreign key constraint will not be created.

I agree with jian he here. Here is my own investigation:

Rangetypes themselves are never collatable (see DefineRange in commands/typecmds.c).
But rangetypes do store a collation for their base type. So you can say:

paul=# create type textrange as range (subtype = text, collation = "C");
CREATE TYPE

That is stored in pg_range.rngcollation, but pg_type.typcollation is always zero.

So putting a collection on a rangetype column is an error:

paul=# create table t (r1 textrange collate "en-US-x-icu");
ERROR:  collations are not supported by type textrange

And so is using an ad hoc collation with an operator:

paul=# select '[J,J]'::textrange <@ '[a,z]'::textrange collate "en-US-x-icu";
ERROR:  collations are not supported by type textrange
LINE 1: select '[J,J]'::textrange <@ '[a,z]'::textrange collate "en-...

Almost everything ranges do is built on range_cmp_bounds, which uses the base type's collation. 
There is no way to use a different one.
So when ri_CompareWithCast calls `lhs <@ rhs`, it is using the collation for that range's base type.
Indexes will use the same collation.

You also can't mix different range types.
Our textrange puts (English) lowercase after uppercase:

paul=# select '[j,j]'::textrange <@ '[a,z]'::textrange;
  ?column?
----------
  t
(1 row)

paul=# select '[J,J]'::textrange <@ '[a,z]'::textrange;
  ?column?
----------
  f
(1 row)

We could create a rangetype that intermingles uppercase & lower:

paul=# create type itextrange as range (subtype = text, collation = "en-US-x-icu");
CREATE TYPE
paul=# select '[J,J]'::itextrange <@ '[a,z]'::itextrange;
  ?column?
----------
  t
(1 row)

But you can't mix them:

paul=# select '[J,J]'::itextrange <@ '[a,z]'::textrange;
ERROR:  operator does not exist: itextrange <@ textrange
LINE 1: select '[J,J]'::itextrange <@ '[a,z]'::textrange;
                                    ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

Even if I create casts, mixing still fails:

paul=# create cast (textrange as itextrange) without function as implicit;
CREATE CAST
paul=# create cast (itextrange as textrange) without function as implicit;
CREATE CAST
paul=# select '[J,J]'::itextrange <@ '[a,z]'::textrange;
ERROR:  operator does not exist: itextrange <@ textrange
LINE 1: select '[J,J]'::itextrange <@ '[a,z]'::textrange;
                                    ^
HINT:  No operator matches the given name and argument types. You might need to add explicit type casts.

That's because the operator parameters are anyrange, and in can_coerce_type we call 
check_generic_type_consistency which doesn't use casts.
It just asks if all the concrete range types are the same (as with other polymorphic types).

Adding a foreign key runs the same check:

paul=# create table pk (id int4range, valid_at textrange, constraint pkpk primary key (id, valid_at 
without overlaps));
CREATE TABLE
paul=# create table fk (id int4range, valid_at itextrange, parent_id int4range);
CREATE TABLE
paul=# alter table fk add constraint fkfk foreign key (parent_id, period valid_at) references pk;
ERROR:  foreign key constraint "fkfk" cannot be implemented
DETAIL:  Key columns "valid_at" and "valid_at" are of incompatible types: itextrange and textrange.

I guess the user could define their own `textrange <@ itextrange` operator, using the lhs collation.
We would choose that operator for pfeqop but not ppeqop or ffeqop.
And we use ffeqop here, which would allow us to skip a check that pfeqop would fail.
Is that an issue? It feels like the user is doing their best to get nonsense results at that point,
and it's not really about the collation per se.

Incidentally here is another separate issue with foreign keys and collations I noticed this morning: 
https://www.postgresql.org/message-id/78d824e0-b21e-480d-a252-e4b84bc2c24b%40illuminatedcomputing.com
That comes from nondeterministic collations, which feel like a troublesome thing here.
Probably foreign keys just weren't fully re-thought when we added them.

But we avoid the issue from 59a85cb4 (discussion at 
https://www.postgresql.org/message-id/flat/3326fc2e-bc02-d4c5-e3e5-e54da466e89a@2ndquadrant.com) 
about cascading changes when a PK experiences a not-binary-identical change that the collation 
considers equal. These days we only call ri_CompareWithCast for changes on the FK side.

Now this is a long chain of reasoning to say rangetypes are safe. I added a comment. Note it doesn't 
apply to arbitrary types, so if we support those eventually we should just require a recheck always, 
or alternately use equals, not containedby. (That would require storing equals somewhere. It could 
go in ffeqop, but that feels like a footgun since pfeqop and ppeqop need overlaps.)

On 3/21/24 22:33, jian he wrote:
 > i think on update restrict, on delete restrict cannot be deferred,
 > even if you set it DEFERRABLE INITIALLY DEFERRED.
 > based on this idea, I made minor change on
 > v32-0002-Support-multiranges-in-temporal-FKs.patch

Okay, added those tests too. Thanks!

Rebased to 697f8d266c.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

On 3/24/24 00:38, Peter Eisentraut wrote:> I have committed the patches
> v33-0001-Add-temporal-FOREIGN-KEYs.patch and v33-0002-Support-multiranges-in-temporal-FKs.patch 
> (together).

Hi Hackers,

I found some problems with temporal primary keys and the idea of uniqueness, especially around the 
indisunique column. Here are some small fixes and a proposal for a larger fix, which I think we need 
but I'd like some feedback on.

The first patch fixes problems with ON CONFLICT DO NOTHING/UPDATE.

DO NOTHING fails because it doesn't expect a non-btree unique index. It's fine to make it accept a 
temporal PRIMARY KEY/UNIQUE index though (i.e. an index with both indisunique and indisexclusion).
This is no different than an exclusion constraint. So I skip BuildSpeculativeIndexInfo for WITHOUT 
OVERLAPS indexes. (Incidentally, AFAICT ii_UniqueOps is never used, only ii_UniqueProcs. Right?)

We should still forbid temporally-unique indexes for ON CONFLICT DO UPDATE, since there may be more 
than one row that conflicts. Ideally we would find and update all the conflicting rows, but we don't 
now, and we don't want to update just one:

     postgres=# create table t (id int4range, valid_at daterange, name text, constraint tpk primary 
key (id, valid_at without overlaps));
     CREATE TABLE
     postgres=# insert into t values ('[1,2)', '[2000-01-01,2001-01-01)', 'a'), ('[1,2)', 
'[2001-01-01,2002-01-01)', 'b');
     INSERT 0 2
     postgres=# insert into t values ('[1,2)', '[2000-01-01,2002-01-01)', 'c') on conflict (id, 
valid_at) do update set name = excluded.name;
     INSERT 0 1
     postgres=# select * from t;
       id   |        valid_at         | name
     -------+-------------------------+------
      [1,2) | [2001-01-01,2002-01-01) | b
      [1,2) | [2000-01-01,2001-01-01) | c
     (2 rows)

So I also added code to prevent that. This is just preserving the old behavior for exclusion 
constraints, which was bypassed because of indisunique. All this is in the first patch.

That got me thinking about indisunique and where else it could cause problems. Perhaps there are 
other places that assume only b-trees are unique. I couldn't find anywhere that just gives an error 
like ON CONFLICT, but I can imagine more subtle problems.

A temporal PRIMARY KEY or UNIQUE constraint is unique in at least three ways: It is *metaphorically* 
unique: the conceit is that the scalar part is unique at every moment in time. You may have id 5 in 
your table more than once, as long as the records' application times don't overlap.

And it is *officially* unique: the standard calls these constraints unique. I think it is correct 
for us to report them as unique in pg_index.

But is it *literally* unique? Well two identical keys, e.g. (5, '[Jan24,Mar24)') and (5, 
'[Jan24,Mar24)'), do have overlapping ranges, so the second is excluded. Normally a temporal unique 
index is *more* restrictive than a standard one, since it forbids other values too (e.g. (5, 
'[Jan24,Feb24)')). But sadly there is one exception: the ranges in these keys do not overlap: (5, 
'empty'), (5, 'empty'). With ranges/multiranges, `'empty' && x` is false for all x. You can add that 
key as many times as you like, despite a PK/UQ constraint:

     postgres=# insert into t values
     ('[1,2)', 'empty', 'foo'),
     ('[1,2)', 'empty', 'bar');
     INSERT 0 2
     postgres=# select * from t;
       id   | valid_at | name
     -------+----------+------
      [1,2) | empty    | foo
      [1,2) | empty    | bar
     (2 rows)

Cases like this shouldn't actually happen for temporal tables, since empty is not a meaningful 
value. An UPDATE/DELETE FOR PORTION OF would never cause an empty. But we should still make sure 
they don't cause problems.

One place we should avoid temporally-unique indexes is REPLICA IDENTITY. Fortunately we already do 
that, but patch 2 adds a test to keep it that way.

Uniqueness is an important property to the planner, too.

We consider indisunique often for estimates, where it needn't be 100% true. Even if there are 
nullable columns or a non-indimmediate index, it still gives useful stats. Duplicates from 'empty' 
shouldn't cause any new problems there.

In proof code we must be more careful. Patch 3 updates relation_has_unique_index_ext and 
rel_supports_distinctness to disqualify WITHOUT OVERLAPS indexes. Maybe that's more cautious than 
needed, but better safe than sorry. This patch has no new test though. I had trouble writing SQL 
that was wrong before its change. I'd be happy for help here!

Another problem is GROUP BY and functional dependencies. This is wrong:

     postgres=# create table a (id int4range, valid_at daterange, name text, constraint apk primary 
key (id, valid_at without overlaps));
     CREATE TABLE
     postgres=# insert into a values ('[1,2)', 'empty', 'foo'), ('[1,2)', 'empty', 'bar');
     INSERT 0 2
     postgres=# select * from a group by id, valid_at;
       id   | valid_at | name
     -------+----------+------
      [1,2) | empty    | foo
     (1 row)

One fix is to return false from check_functional_grouping for WITHOUT OVERLAPS primary keys. But I 
think there is a better fix that is less ad hoc.

We should give temporal primary keys an internal CHECK constraint saying `NOT isempty(valid_at)`. 
The problem is analogous to NULLs in parts of a primary key. NULLs prevent two identical keys from 
ever comparing as equal. And just as a regular primary key cannot contain NULLs, so a temporal 
primary key should not contain empties.

The standard effectively prevents this with PERIODs, because a PERIOD adds a constraint saying start 
< end. But our ranges enforce only start <= end. If you say `int4range(4,4)` you get `empty`. If we 
constrain primary keys as I'm suggesting, then they are literally unique, and indisunique seems safer.

Should we add the same CHECK constraint to temporal UNIQUE indexes? I'm inclined toward no, just as 
we don't forbid NULLs in parts of a UNIQUE key. We should try to pick what gives users more options, 
when possible. Even if it is questionably meaningful, I can see use cases for allowing empty ranges 
in a temporal table. For example it lets you "disable" a row, preserving its values but marking it 
as never true.

Also it gives you a way to make a non-temporal foreign key reference to a temporal table. Normally 
temporal tables are "contagious", which is annoying. But if the referencing table had 'empty' for 
its temporal part, then references should succeed. For example this is true: 'empty'::daterange <@ 
'[2000-01-01,2001-01-01)'. (Technically this would require a small change to our FK SQL, because we 
do `pkperiod && fkperiod` as an optimization (to use the index more fully), and we would need to 
skip that when fkperiod is empty.)

Finally, if we have a not-empty constraint on our primary keys, then the GROUP BY problem above goes 
away. And we can still use temporal primary keys in proofs (but maybe not other temporally-unique 
indexes). We can allow them in relation_has_unique_index_ext/rel_supports_distinctness.

The drawback to putting a CHECK constraint on just PKs and not UNIQUEs is that indisunique may not 
be literally unique for them, if they have empty ranges. But even for traditional UNIQUE 
constraints, indisunique can be misleading: If they have nullable parts, identical keys are still 
"unique", so the code is already careful about them. Do note though the problems come from 'empty' 
values, not nullable values, so there might still be some planner rules we need to correct.

Another drawback is that by using isempty we're limiting temporal PKs to just ranges and 
multiranges, whereas currently any type with appropriate operators is allowed. But since we decided 
to limit FKs already, I think this is okay. We can open it back up again later if we like (e.g. by 
adding a support function for the isempty concept).

I'll start working on a patch for this too, but I'd be happy for early feedback/objections/etc.

I guess an alternative would be to add a new operator, say &&&, that is the same as overlaps, except
'empty' overlaps everything instead of nothing. In a way that seems more consistent with <@. (How 
can a range contain something if it doesn't overlap it?) I don't love that a key like (5, 'empty') 
would conflict with every other 5, but you as I said it's not a meaningful value in a temporal table 
anyway. Or you could have 'empty' overlap nothing except itself. Maybe I prefer this solution to an 
internal CHECK constraint, but it feels like it has more unknown unknowns. Thoughts?

Also I suspect there are still places where indisunique causes problems. I'll keep looking for them, 
but if others have thoughts please let me know.

Patches here are generated against c627d944e6.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

On 4/30/24 09:24, Robert Haas wrote:
> Peter, could you have a look at
> http://postgr.es/m/47550967-260b-4180-9791-b224859fe63e@illuminatedcomputing.com
> and express an opinion about whether each of those proposals are (a)
> good or bad ideas and (b) whether they need to be fixed for the
> current release?

Here are the same patches but rebased. I've added a fourth which is my progress on adding the CHECK 
constraint. I don't really consider it finished though, because it has these problems:

- The CHECK constraint should be marked as an internal dependency of the PK, so that you can't drop 
it, and it gets dropped when you drop the PK. I don't see a good way to tie the two together though, 
so I'd appreciate any advice there. They are separate AlterTableCmds, so how do I get the 
ObjectAddress of both constraints at the same time? I wanted to store the PK's ObjectAddress on the 
Constraint node, but since ObjectAddress isn't a Node it doesn't work.

- The CHECK constraint should maybe be hidden when you say `\d foo`? Or maybe not, but that's what 
we do with FK triggers.

- When you create partitions you get a warning about the constraint already existing, because it 
gets created via the PK and then also the partitioning code tries to copy it. Solving the first 
issue here should solve this nicely though.

Alternately we could just fix the GROUP BY functional dependency code to only accept b-tree indexes. 
But I think the CHECK constraint approach is a better solution.

Thanks,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

Re: SQL:2011 application time

От

jian he

Дата:

01 мая, 14:27:56

On Wed, May 1, 2024 at 12:39 AM Paul Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> On 4/30/24 09:24, Robert Haas wrote:
> > Peter, could you have a look at
> > http://postgr.es/m/47550967-260b-4180-9791-b224859fe63e@illuminatedcomputing.com
> > and express an opinion about whether each of those proposals are (a)
> > good or bad ideas and (b) whether they need to be fixed for the
> > current release?
>
> Here are the same patches but rebased. I've added a fourth which is my progress on adding the CHECK
> constraint. I don't really consider it finished though, because it has these problems:
>
> - The CHECK constraint should be marked as an internal dependency of the PK, so that you can't drop
> it, and it gets dropped when you drop the PK. I don't see a good way to tie the two together though,
> so I'd appreciate any advice there. They are separate AlterTableCmds, so how do I get the
> ObjectAddress of both constraints at the same time? I wanted to store the PK's ObjectAddress on the
> Constraint node, but since ObjectAddress isn't a Node it doesn't work.
>
> - The CHECK constraint should maybe be hidden when you say `\d foo`? Or maybe not, but that's what
> we do with FK triggers.
>
> - When you create partitions you get a warning about the constraint already existing, because it
> gets created via the PK and then also the partitioning code tries to copy it. Solving the first
> issue here should solve this nicely though.
>
> Alternately we could just fix the GROUP BY functional dependency code to only accept b-tree indexes.
> But I think the CHECK constraint approach is a better solution.
>

I will consider these issues later.
The following are general ideas after applying your patches.

CREATE TABLE temporal_rng1(
id int4range,
valid_at daterange,
CONSTRAINT temporal_rng1_pk unique (id, valid_at WITHOUT OVERLAPS)
);
insert into temporal_rng1(id, valid_at) values (int4range '[1,1]',
'empty'::daterange), ('[1,1]', 'empty');
table temporal_rng1;
  id   | valid_at
-------+----------
 [1,2) | empty
 [1,2) | empty
(2 rows)

i hope i didn't miss something:
exclude the 'empty' special value, WITHOUT OVERLAP constraint will be
unique and is more restrictive?

if so,
then adding a check constraint to make the WITHOUT OVERLAP not include
the special value 'empty'
is better than
writing a doc explaining that on some special occasion, a unique
constraint is not meant to be unique
?

in here
https://www.postgresql.org/docs/devel/ddl-constraints.html#DDL-CONSTRAINTS-UNIQUE-CONSTRAINTS
says:
<<
Unique constraints ensure that the data contained in a column, or a
group of columns, is unique among all the rows in the table.
<<

+ /*
+ * The WITHOUT OVERLAPS part (if any) must be
+ * a range or multirange type.
+ */
+ if (constraint->without_overlaps && lc == list_last_cell(constraint->keys))
+ {
+ Oid typid = InvalidOid;
+
+ if (!found && cxt->isalter)
+ {
+ /*
+ * Look up the column type on existing table.
+ * If we can't find it, let things fail in DefineIndex.
+ */
+ Relation rel = cxt->rel;
+ for (int i = 0; i < rel->rd_att->natts; i++)
+ {
+ Form_pg_attribute attr = TupleDescAttr(rel->rd_att, i);
+ const char *attname;
+
+ if (attr->attisdropped)
+ break;
+
+ attname = NameStr(attr->attname);
+ if (strcmp(attname, key) == 0)
+ {
+ typid = attr->atttypid;
+ break;
+ }
+ }
+ }
+ else
+ typid = typenameTypeId(NULL, column->typeName);
+
+ if (OidIsValid(typid) && !type_is_range(typid) && !type_is_multirange(typid))
+ ereport(ERROR,
+ (errcode(ERRCODE_DATATYPE_MISMATCH),
+ errmsg("column \"%s\" in WITHOUT OVERLAPS is not a range or
multirange type", key),
+ parser_errposition(cxt->pstate, constraint->location)));
+ }

+ if (attr->attisdropped)
+ break;
it will break the loop?
but here you want to continue the loop?

+ if (OidIsValid(typid) && !type_is_range(typid) && !type_is_multirange(typid))
didn't consider the case where typid is InvalidOid,
maybe we can simplify to
+ if (!type_is_range(typid) && !type_is_multirange(typid))


+ notnullcmds = lappend(notnullcmds, notemptycmd);
seems weird.
we can imitate notnullcmds related logic for notemptycmd,
not associated notnullcmds in any way.

Re: SQL:2011 application time

От

jian he

Дата:

06 мая, 06:01:30

On Wed, May 1, 2024 at 12:39 AM Paul Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> On 4/30/24 09:24, Robert Haas wrote:
> > Peter, could you have a look at
> > http://postgr.es/m/47550967-260b-4180-9791-b224859fe63e@illuminatedcomputing.com
> > and express an opinion about whether each of those proposals are (a)
> > good or bad ideas and (b) whether they need to be fixed for the
> > current release?
>
> Here are the same patches but rebased. I've added a fourth which is my progress on adding the CHECK
> constraint. I don't really consider it finished though, because it has these problems:
>
> - The CHECK constraint should be marked as an internal dependency of the PK, so that you can't drop
> it, and it gets dropped when you drop the PK. I don't see a good way to tie the two together though,
> so I'd appreciate any advice there. They are separate AlterTableCmds, so how do I get the
> ObjectAddress of both constraints at the same time? I wanted to store the PK's ObjectAddress on the
> Constraint node, but since ObjectAddress isn't a Node it doesn't work.
>

hi.
I hope I understand the problem correctly.
my understanding is that we are trying to solve a corner case:
create table t(a int4range, b int4range, primary key(a, b WITHOUT OVERLAPS));
insert into t values ('[1,2]','empty'), ('[1,2]','empty');

I think the entry point is ATAddCheckNNConstraint and index_create.
in a chain of DDL commands, you cannot be sure which one
(primary key constraint or check constraint) is being created first,
you just want to make sure that after both constraints are created,
then add a dependency between primary key and check constraint.

so you need to validate at different functions
(ATAddCheckNNConstraint, index_create)
that these two constraints are indeed created,
only after that we have a dependency linking these two constraints.

I've attached a patch trying to solve this problem.
the patch is not totally polished, but works as expected, and also has
lots of comments.

Вложения

v4-0001-add-a-special-check-constrint-for-PERIOD-primary-.patch

Re: SQL:2011 application time

От

Peter Eisentraut

Дата:

08 мая, 16:51:28

On 30.04.24 18:39, Paul Jungwirth wrote:
> On 4/30/24 09:24, Robert Haas wrote:
>> Peter, could you have a look at
>> http://postgr.es/m/47550967-260b-4180-9791-b224859fe63e@illuminatedcomputing.com
>> and express an opinion about whether each of those proposals are (a)
>> good or bad ideas and (b) whether they need to be fixed for the
>> current release?
> 
> Here are the same patches but rebased.

I have committed v2-0002-Add-test-for-REPLICA-IDENTITY-with-a-temporal-key.patch.

About v2-0001-Fix-ON-CONFLICT-DO-NOTHING-UPDATE-for-temporal-in.patch, I think the
ideas are right, but I wonder if we can fine-tune the new conditionals a bit.

--- a/src/backend/executor/execIndexing.c
+++ b/src/backend/executor/execIndexing.c
@@ -210,7 +210,7 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative)
                  * If the indexes are to be used for speculative insertion, add extra
                  * information required by unique index entries.
                  */
-               if (speculative && ii->ii_Unique)
+               if (speculative && ii->ii_Unique && !ii->ii_HasWithoutOverlaps)
                         BuildSpeculativeIndexInfo(indexDesc, ii);

Here, I think we could check !indexDesc->rd_index->indisexclusion instead.  So we
wouldn't need ii_HasWithoutOverlaps.

Or we could push this into BuildSpeculativeIndexInfo(); it could just skip the rest
if an exclusion constraint is passed, on the theory that all the speculative index
info is already present in that case.

--- a/src/backend/optimizer/util/plancat.c
+++ b/src/backend/optimizer/util/plancat.c
@@ -815,7 +815,7 @@ infer_arbiter_indexes(PlannerInfo *root)
          */
         if (indexOidFromConstraint == idxForm->indexrelid)
         {
-           if (!idxForm->indisunique && onconflict->action == ONCONFLICT_UPDATE)
+           if ((!idxForm->indisunique || idxForm->indisexclusion) && onconflict->action == ONCONFLICT_UPDATE)
                 ereport(ERROR,
                         (errcode(ERRCODE_WRONG_OBJECT_TYPE),
                          errmsg("ON CONFLICT DO UPDATE not supported with exclusion constraints")));

Shouldn't this use only idxForm->indisexclusion anyway?  Like

+           if (idxForm->indisexclusion && onconflict->action == ONCONFLICT_UPDATE)

That matches what the error message is reporting afterwards.

          * constraints), so index under consideration can be immediately
          * skipped if it's not unique
          */
-       if (!idxForm->indisunique)
+       if (!idxForm->indisunique || idxForm->indisexclusion)
             goto next;

Maybe here we need a comment.  Or make that a separate statement, like

         /* not supported yet etc. */
         if (idxForm->indixexclusion)
             next;

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

09 мая, 07:24:09

Here are a couple new patches, rebased to e305f715, addressing Peter's feedback. I'm still working 
on integrating jian he's suggestions for the last patch, so I've omitted that one here.

On 5/8/24 06:51, Peter Eisentraut wrote:
> About v2-0001-Fix-ON-CONFLICT-DO-NOTHING-UPDATE-for-temporal-in.patch, I think the
> ideas are right, but I wonder if we can fine-tune the new conditionals a bit.
> 
> --- a/src/backend/executor/execIndexing.c
> +++ b/src/backend/executor/execIndexing.c
> @@ -210,7 +210,7 @@ ExecOpenIndices(ResultRelInfo *resultRelInfo, bool speculative)
>                   * If the indexes are to be used for speculative insertion, add extra
>                   * information required by unique index entries.
>                   */
> -               if (speculative && ii->ii_Unique)
> +               if (speculative && ii->ii_Unique && !ii->ii_HasWithoutOverlaps)
>                          BuildSpeculativeIndexInfo(indexDesc, ii);
> 
> Here, I think we could check !indexDesc->rd_index->indisexclusion instead.  So we
> wouldn't need ii_HasWithoutOverlaps.

Okay.

> Or we could push this into BuildSpeculativeIndexInfo(); it could just skip the rest
> if an exclusion constraint is passed, on the theory that all the speculative index
> info is already present in that case.

I like how BuildSpeculativeIndexInfo starts with an Assert that it's given a unique index, so I've 
left the check outside the function. This seems cleaner anyway: the function stays more focused.

> --- a/src/backend/optimizer/util/plancat.c
> +++ b/src/backend/optimizer/util/plancat.c
> @@ -815,7 +815,7 @@ infer_arbiter_indexes(PlannerInfo *root)
>           */
>          if (indexOidFromConstraint == idxForm->indexrelid)
>          {
> -           if (!idxForm->indisunique && onconflict->action == ONCONFLICT_UPDATE)
> +           if ((!idxForm->indisunique || idxForm->indisexclusion) && onconflict->action == 
> ONCONFLICT_UPDATE)
>                  ereport(ERROR,
>                          (errcode(ERRCODE_WRONG_OBJECT_TYPE),
>                           errmsg("ON CONFLICT DO UPDATE not supported with exclusion constraints")));
> 
> Shouldn't this use only idxForm->indisexclusion anyway?  Like
> 
> +           if (idxForm->indisexclusion && onconflict->action == ONCONFLICT_UPDATE)
> 
> That matches what the error message is reporting afterwards.

Agreed.

>           * constraints), so index under consideration can be immediately
>           * skipped if it's not unique
>           */
> -       if (!idxForm->indisunique)
> +       if (!idxForm->indisunique || idxForm->indisexclusion)
>              goto next;
> 
> Maybe here we need a comment.  Or make that a separate statement, like

Yes, that is nice. Done.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

On 5/5/24 20:01, jian he wrote:
> hi.
> I hope I understand the problem correctly.
> my understanding is that we are trying to solve a corner case:
> create table t(a int4range, b int4range, primary key(a, b WITHOUT OVERLAPS));
> insert into t values ('[1,2]','empty'), ('[1,2]','empty');
> 
> 
> I think the entry point is ATAddCheckNNConstraint and index_create.
> in a chain of DDL commands, you cannot be sure which one
> (primary key constraint or check constraint) is being created first,
> you just want to make sure that after both constraints are created,
> then add a dependency between primary key and check constraint.
> 
> so you need to validate at different functions
> (ATAddCheckNNConstraint, index_create)
> that these two constraints are indeed created,
> only after that we have a dependency linking these two constraints.
> 
> 
> I've attached a patch trying to solve this problem.
> the patch is not totally polished, but works as expected, and also has
> lots of comments.

Thanks for this! I've incorporated it into the CHECK constraint patch with some changes. In 
particular I thought index_create was a strange place to change the conperiod value of a 
pg_constraint record, and it is not actually needed if we are copying that value correctly.

Some other comments on the patch file:

 > N.B. we also need to have special care for case
 > where check constraint was readded, e.g. ALTER TYPE.
 > if ALTER TYPE is altering the PERIOD column of the primary key,
 > alter column of primary key makes the index recreate, check constraint recreate,
 > however, former interally also including add a check constraint.
 > so we need to take care of merging two check constraint.

This is a good point. I've included tests for this based on your patch.

 > N.B. the check constraint name is hard-wired, so if you create the constraint
 > with the same name, PERIOD primary key cannot be created.

Yes, it may be worth doing something like other auto-named constraints and trying to avoid 
duplicates. I haven't taken that on yet; I'm curious what others have to say about it.

 > N.B. what about UNIQUE constraint?

See my previous posts on this thread about allowing 'empty' in UNIQUE constraints.

 > N.B. seems ok to not care about FOREIGN KEY regarding this corner case?

Agreed.

v3 patches attached, rebased to 3ca43dbbb6.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

On Tue, May 14, 2024 at 01:33:46PM +0800, jian he wrote:
> thanks for the idea, I roughly played around with it, seems doable.
> but the timing seems not good, reverting is a good idea.

Please note that this is still an open item, and that time is running
short until beta1.  A revert seems to be the consensus reached, so,
Peter, are you planning to do so?
--
Michael

Вложения

signature.asc

Re: SQL:2011 application time

От

Peter Eisentraut

Дата:

15 мая, 12:39:47

On 15.05.24 08:13, Michael Paquier wrote:
> On Tue, May 14, 2024 at 01:33:46PM +0800, jian he wrote:
>> thanks for the idea, I roughly played around with it, seems doable.
>> but the timing seems not good, reverting is a good idea.
> 
> Please note that this is still an open item, and that time is running
> short until beta1.  A revert seems to be the consensus reached, so,
> Peter, are you planning to do so?

I'm on it.

Here is the list of patches I have identified to revert:

git show --oneline --no-patch 144c2ce0cc7 c3db1f30cba 482e108cd38 
34768ee3616 5577a71fb0c a88c800deb6 030e10ff1a3 86232a49a43 46a0cd4cefb 
6db4598fcb8

144c2ce0cc7 Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexes
c3db1f30cba doc:  clarify PERIOD and WITHOUT OVERLAPS in CREATE TABLE
482e108cd38 Add test for REPLICA IDENTITY with a temporal key
34768ee3616 Add temporal FOREIGN KEY contraints
5577a71fb0c Use half-open interval notation in without_overlaps tests
a88c800deb6 Use daterange and YMD in without_overlaps tests instead of 
tsrange.
030e10ff1a3 Rename pg_constraint.conwithoutoverlaps to conperiod
86232a49a43 Fix comment on gist_stratnum_btree
46a0cd4cefb Add temporal PRIMARY KEY and UNIQUE constraints
6db4598fcb8 Add stratnum GiST support function

Attached are the individual revert patches.  I'm supplying these here 
mainly so that future efforts can use those instead of the original 
patches, since that would have to redo all the conflict resolution and 
also miss various typo fixes etc. that were applied in the meantime.  I 
will commit this as one squashed patch.

On Tue, 21 May 2024 at 13:57, Robert Haas <robertmhaas@gmail.com> wrote:

What I think is less clear is what that means for temporal primary
keys. As Paul pointed out upthread, in every other case, a temporal
primary key is at least as unique as a regular primary key, but in
this case, it isn't. And someone might reasonably think that a
temporal primary key should exclude empty ranges just as all primary
keys exclude nulls. Or they might think the opposite.

Fascinating. I think you're absolutely right that it's clear that two empty intervals don't conflict. If somebody wants to claim two intervals conflict, they need to point to at least one instant in time that is common between them.

But a major point of a primary key, it seems to me, is that it uniquely identifies a row. If items are identified by a time range, non-overlapping or not, then the empty range can only identify one item (per value of whatever other columns are in the primary key). I think for a unique key the non-overlapping restriction has to be considered an additional restriction on top of the usual uniqueness restriction.

I suspect in many applications there will be a non-empty constraint; for example, it seems quite reasonable to me for a meeting booking system to forbid empty meetings. But when they are allowed they should behave in the mathematically appropriate way.

Re: SQL:2011 application time

От

Jeff Davis

Дата:

21 мая, 22:54:47

On Tue, 2024-05-21 at 13:57 -0400, Robert Haas wrote:
> What I think is less clear is what that means for temporal primary
> keys.

Right.

My message was specifically a response to the concern that there was
some kind of design flaw in the range types or exclusion constraints
mechanisms.

I don't believe that empty ranges represent a design flaw. If they
don't make sense for temporal constraints, then temporal constraints
should forbid them.

Regards,
    Jeff Davis

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

05 июня, 23:56:15

On 5/21/24 11:27, Isaac Morland wrote:
> On Tue, 21 May 2024 at 13:57, Robert Haas <robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>> wrote:
> 
>     What I think is less clear is what that means for temporal primary
>     keys. As Paul pointed out upthread, in every other case, a temporal
>     primary key is at least as unique as a regular primary key, but in
>     this case, it isn't. And someone might reasonably think that a
>     temporal primary key should exclude empty ranges just as all primary
>     keys exclude nulls. Or they might think the opposite.
> 
> 
> Fascinating. I think you're absolutely right that it's clear that two empty intervals don't 
> conflict. If somebody wants to claim two intervals conflict, they need to point to at least one 
> instant in time that is common between them.
> 
> But a major point of a primary key, it seems to me, is that it uniquely identifies a row. If items 
> are identified by a time range, non-overlapping or not, then the empty range can only identify one 
> item (per value of whatever other columns are in the primary key). I think for a unique key the 
> non-overlapping restriction has to be considered an additional restriction on top of the usual 
> uniqueness restriction.
> 
> I suspect in many applications there will be a non-empty constraint; for example, it seems quite 
> reasonable to me for a meeting booking system to forbid empty meetings. But when they are allowed 
> they should behave in the mathematically appropriate way.

Finding a way forward for temporal PKs got a lot of discussion at pgconf.dev (thanks especially to 
Peter Eisentraut and Jeff Davis!), so I wanted to summarize some options and describe what I think 
is the best approach.

First the problem: empty ranges! A temporal PK/UNIQUE constraint is basically an exclusion 
constraint that is `(id WITH =, valid_at WITH &&)`. But the special 'empty' value never overlaps 
anything, *including itself*. (Note it has no "position": [3,3) is the same as [4,4).) Since the 
exclusion constraint forbids overlapping ranges, and empties never overlap, your table can have 
duplicates. (I'm talking about "literal uniqueness" as discussed in [1].) For instance:

     CREATE EXTENSION btree_gist;
     CREATE TABLE t (id int, valid_at daterange, name text);
     ALTER TABLE t ADD CONSTRAINT tpk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS);
     INSERT INTO t VALUES (1, 'empty', 'foo');
     INSERT INTO t VALUES (1, 'empty', 'bar');

Multiranges have the same problem. So what do we do about that?

**Option 0**: Allow it but document it. It shouldn't happen in practice: there is no reason for an 
empty range to get into a temporal table, and it arguably doesn't mean anything. The record is true 
at no time? But of course it will happen anyway. It's a footgun and will break expectations for at 
least some.

It causes problems for us too. If you say `SELECT name FROM t GROUP BY id, valid_at`, we recognize 
that `name` is a functional dependency on the PK, so we allow it and give you the first row matching 
each key. You might get "foo" or you might get "bar". Also the planner uses not-nullable uniqueness 
to take many shortcuts. I couldn't create any concrete breakage there, but I bet someone else could. 
PKs that are not literally unique seems like something that would cause headaches for years.

**Option 1**: Temporal PKs should automatically create a CHECK constraint that forbids empty ranges. 
Should UNIQUE constraints too? I'm tempted to say no, since sometimes users surprise us by coming up 
with new ways to use things. For instance one way to use empty ranges is to reference a temporal 
table from a non-temporal table, since `'empty' <@ anything` is always true (though this has 
questionable meaning or practical use). But probably we should forbid empties for UNIQUE constraints 
too. Forbidding them is more aligned with the SQL standard, which says that when you have a PERIOD, 
startcol < endcol (not <=). And it feels more consistent to treat both constraints the same way. 
Finally, if UNIQUEs do allow empties, we still risk confusing our planner.

My last patch created these CHECK constraints for PKs (but not UNIQUEs) as INTERNAL dependencies. 
It's pretty clunky. There are lots of cases to handle, e.g. `ALTER COLUMN c TYPE` may reuse the PK 
index or may generate a new one. And what if the user already created the same constraint? Seeing 
all the trouble giving PKs automatic (cataloged) NOT NULL constraints makes me wary about this 
approach. It's not as bad, since there is no legacy, but it's still more annoying than I expected.

Finally, hanging the CHECK constraint off the PK sets us up for problems when we add true PERIODs. 
Under 11.27 of SQL/Foundation, General Rules 2b says that defining a PERIOD should automatically add 
a CHECK constraint that startcol < endcol. That is already part of my last patch in this series. But 
that would be redundant with the constraint from the PK. And attaching the constraint to the PERIOD 
is a lot simpler than attaching it to the PK.

**Option 2**: Add a new operator, called &&&, that works like && except an empty range *does* 
overlap another empty range. Empty ranges should still not overlap anything else. This would fix the 
exclusion constraint. You could add `(5, 'empty')` once but not twice. This would allow empties to 
people who want to use them. (We would still forbid them if you define a PERIOD, because those come 
with the CHECK constraint mentioned above.)
And there is almost nothing to code. But it is mathematically suspect to say an empty range overlaps 
something small (something with zero width) but not something big. Surely if a && b and b <@ c, then 
a && c? So this feels like the kind of elegant hack that you eventually regret.

**Option 3**: Forbid empties, not as a reified CHECK constraint, but just with some code in the 
executor. Again we could do just PKs or PKs and UNIQUEs. Let's do both, for all the reasons above. 
Not creating a CHECK constraint is much less clunky. There is no catalog entry to create/drop. Users 
don't wonder where it came from when they say `\d t`. It can't conflict with constraints of their 
own. We would enforce this in ExecConstraints, where we enforce NOT NULL and CHECK constraints, for 
any table with constraints where conperiod is true. We'd also need to do this check on existing rows 
when you create a temporal PK/UQ. This option also requires a new field in pg_class: just as we have 
relchecks, relhasrules, relhastriggers, etc. to let us skip work in the relcache, I assume we'd want 
relperiods.

**Option 4**: Teach GiST indexes to enforce uniqueness. We didn't discuss this at pgconf, at least 
not in reference to the empties problem. But I was thinking about this request from Matthias for 
temporal PKs & UQs to support `USING INDEX idx`.[2] It is confusing that a temporal index has 
indisunique, but if you try to create a unique GiST index directly we say they don't support UNIQUE 
indexes! Similarly `pg_indexam_has_property(783, 'can_unique')` returns false. There is something 
muddled about all that. So how about we give the GiST AM handler amcanunique?

As I understand it, GiST indexes are capable of uniqueness,[3] and indeed today you can create an 
exclusion constraint with the same effect, but in the past the core had no way of asking an opclass 
which operator gave equality. With the stratnum support proc from 6db4598fcb (part of this patch 
series, but reverted from v17), we could get a known operator for "equals". If the index's opclasses 
had that sproc and it gave non-zero for RTEqualStrategyNumber, then CREATE UNIQUE INDEX would 
succeed. We would just ("just") need to make GiST raise an error if it found a duplicate. And if 
*that* was happening, the empty ranges wouldn't cause a problem.

I think Option 3 is good, but I like Option 4 a lot because (1) it doesn't assume ranges & 
multiranges (2) it allows empties if users have some reason for them (3) since the real problem is 
duplicates, forbidding them is a more precise solution, (4) it clears up the confusing situation of
GiST not being canunique, even though you can create an index with indisunique.

OTOH it is probably more work, and it is slower than just forbidding duplicates. (The unique check 
requires a separate index search, according to [3], as an exclusion constraint would do.) Also if we 
do it to make GiST be canunique, that can happen separately from the temporal work.

So I'm proceeding with Option 3, which at worst can eventually become an optimization for Option 4. 
I don't think forbidding empty ranges is a great loss to be honest. But if anyone has any feedback, 
please share: ojections, alternatives, advice---all is welcome.

[1] 
https://www.postgresql.org/message-id/47550967-260b-4180-9791-b224859fe63e%40illuminatedcomputing.com
[2] 
https://www.postgresql.org/message-id/CAEze2Wh21V66udM8cbvBBsAgyQ_5x9nfR0d3sWzbmZk%2B%2Bey7xw%40mail.gmail.com
[3] https://dsf.berkeley.edu/papers/sigmod97-gist.pdf

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

05 июня, 23:57:40

On Thu, May 9, 2024 at 5:44 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
 > Additionally, because I can't create my own non-constraint-backing
 > unique GIST indexes, I can't pre-create my unique constraints
 > CONCURRENTLY as one could do for the non-temporal case

We talked about this a bit at pgconf.dev. I would like to implement it, since I agree it is an 
important workflow to support. Here are some thoughts about what would need to be done.

First we could take a small step: allow non-temporal UNIQUE GiST indexes. This is possible according 
to [1], but in the past we had no way of knowing which strategy number an opclass was using for 
equality. With the stratnum support proc introduced by 6db4598fcb (reverted for v17), we could 
change amcanunique to true for the GiST AM handler. If the index's opclasses had that sproc and it 
gave non-zero for RTEqualStrategyNumber, we would have a reliable "definition of uniqueness". UNIQUE 
GiST indexes would raise an error if they detected a duplicate record.

Incidentally, this would also let us correct the error message about GiST not supporting unique, 
fixing the problem you raised here:

On Sun, May 12, 2024 at 8:51 AM Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
 >
 > On 5/12/24 05:55, Matthias van de Meent wrote:
 > >>   > pg=# CREATE UNIQUE INDEX ON temporal_testing USING gist (id, valid_during);
 > >>   > ERROR:  access method "gist" does not support unique indexes
 > >>
 > >> To me that error message seems correct. The programmer hasn't said anything about the special
 > >> temporal behavior they are looking for.
 > >
 > > But I showed that I had a GIST index that does have the indisunique
 > > flag set, which shows that GIST does support indexes with unique
 > > semantics.
 > >
 > > That I can't use CREATE UNIQUE INDEX to create such an index doesn't
 > > mean the feature doesn't exist, which is what the error message
 > > implies.
 >
 > True, the error message is not really telling the truth anymore.

But that is just regular non-temporal indexes. To avoid a long table lock you'd need a way to build 
the index that is not just unique, but also does exclusion based on &&.  We could borrow syntax from 
SQL:2011 and allow `CREATE INDEX idx ON t (id, valid_at WITHOUT OVERLAPS)`. But since CREATE INDEX 
is a lower-level concept than a constraint, it'd be better to do something more general. You can 
already give opclasses for each indexed column. How about allowing operators as well? For instance 
`CREATE UNIQUE INDEX idx ON t (id WITH =, valid_at WITH &&)`? Then the index would know to enforce 
those rules. This is the same data we store today in pg_constraint.conexclops. So that would get 
moved/copied to pg_index (probably moved).

Then when you add the constraint, what is the syntax? Today when you say PRIMARY KEY/UNIQUE USING 
INDEX, you don't give the column names. So how do we know it's WITHOUT OVERLAPS? I guess if the 
underlying index has (foo WITH = [, bar WITH =], baz WITH &&) we just assume the user wants WITHOUT 
OVERLAPS, and otherwise they want a regular PK/UQ constraint?

In addition this workflow only works if you can CREATE INDEX CONCURRENTLY. I'm not sure yet if we'll 
have problems there. I noticed that for REINDEX at least, there were plans in 2012 to support 
exclusion-constraint indexes,[2] but when the patch was committed in 2019 they had been dropped, 
with plans to add support eventually.[3] Today they are still not supported. Maybe whatever caused 
problems for REINDEX isn't an issue for just INDEX, but it would take more research to find out.

[1] https://dsf.berkeley.edu/papers/sigmod97-gist.pdf
[2] Original patch thread from 2012: 

https://www.postgresql.org/message-id/flat/CAB7nPqS%2BWYN021oQHd9GPe_5dSVcVXMvEBW_E2AV9OOEwggMHw%40mail.gmail.com#e1a372074cfdf37bf9e5b4e29ddf7b2d
[3] Revised patch thread, committed in 2019: 

https://www.postgresql.org/message-id/flat/60052986-956b-4478-45ed-8bd119e9b9cf%402ndquadrant.com#74948a1044c56c5e817a5050f554ddee

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Re: SQL:2011 application time

От

Robert Haas

Дата:

12 июня, 17:31:49

On Wed, Jun 5, 2024 at 4:56 PM Paul Jungwirth
<pj@illuminatedcomputing.com> wrote:
> **Option 2**: Add a new operator, called &&&, that works like && except an empty range *does*
> overlap another empty range. Empty ranges should still not overlap anything else. This would fix the
> exclusion constraint. You could add `(5, 'empty')` once but not twice. This would allow empties to
> people who want to use them. (We would still forbid them if you define a PERIOD, because those come
> with the CHECK constraint mentioned above.)
> And there is almost nothing to code. But it is mathematically suspect to say an empty range overlaps
> something small (something with zero width) but not something big. Surely if a && b and b <@ c, then
> a && c? So this feels like the kind of elegant hack that you eventually regret.

I think this might be fine.

> **Option 3**: Forbid empties, not as a reified CHECK constraint, but just with some code in the
> executor. Again we could do just PKs or PKs and UNIQUEs. Let's do both, for all the reasons above.
> Not creating a CHECK constraint is much less clunky. There is no catalog entry to create/drop. Users
> don't wonder where it came from when they say `\d t`. It can't conflict with constraints of their
> own. We would enforce this in ExecConstraints, where we enforce NOT NULL and CHECK constraints, for
> any table with constraints where conperiod is true. We'd also need to do this check on existing rows
> when you create a temporal PK/UQ. This option also requires a new field in pg_class: just as we have
> relchecks, relhasrules, relhastriggers, etc. to let us skip work in the relcache, I assume we'd want
> relperiods.

I don't really like the existing relhasWHATEVER fields and am not very
keen about adding more of them. Maybe it will turn out to be the best
way, but finding the right times to set and unset such fields has been
challenging over the years, and we've had to fix some bugs. So, if you
go this route, I recommend looking carefully at whether there's a
reasonable way to avoid the need for such a field. Other than that,
this idea seems reasonable.

> **Option 4**: Teach GiST indexes to enforce uniqueness. We didn't discuss this at pgconf, at least
> not in reference to the empties problem. But I was thinking about this request from Matthias for
> temporal PKs & UQs to support `USING INDEX idx`.[2] It is confusing that a temporal index has
> indisunique, but if you try to create a unique GiST index directly we say they don't support UNIQUE
> indexes! Similarly `pg_indexam_has_property(783, 'can_unique')` returns false. There is something
> muddled about all that. So how about we give the GiST AM handler amcanunique?
>
> As I understand it, GiST indexes are capable of uniqueness,[3] and indeed today you can create an
> exclusion constraint with the same effect, but in the past the core had no way of asking an opclass
> which operator gave equality. With the stratnum support proc from 6db4598fcb (part of this patch
> series, but reverted from v17), we could get a known operator for "equals". If the index's opclasses
> had that sproc and it gave non-zero for RTEqualStrategyNumber, then CREATE UNIQUE INDEX would
> succeed. We would just ("just") need to make GiST raise an error if it found a duplicate. And if
> *that* was happening, the empty ranges wouldn't cause a problem.

Isn't this just a more hacky version of option (2)?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: SQL:2011 application time

От

Matthias van de Meent

Дата:

12 июня, 18:48:59

On Wed, 5 Jun 2024 at 22:57, Paul Jungwirth <pj@illuminatedcomputing.com> wrote:
>
> On Thu, May 9, 2024 at 5:44 PM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote:
>  > Additionally, because I can't create my own non-constraint-backing
>  > unique GIST indexes, I can't pre-create my unique constraints
>  > CONCURRENTLY as one could do for the non-temporal case
>
> We talked about this a bit at pgconf.dev. I would like to implement it, since I agree it is an
> important workflow to support. Here are some thoughts about what would need to be done.
>
> First we could take a small step: allow non-temporal UNIQUE GiST indexes. This is possible according
> to [1], but in the past we had no way of knowing which strategy number an opclass was using for
> equality. With the stratnum support proc introduced by 6db4598fcb (reverted for v17), we could
> change amcanunique to true for the GiST AM handler. If the index's opclasses had that sproc and it
> gave non-zero for RTEqualStrategyNumber, we would have a reliable "definition of uniqueness". UNIQUE
> GiST indexes would raise an error if they detected a duplicate record.

Cool.

> But that is just regular non-temporal indexes. To avoid a long table lock you'd need a way to build
> the index that is not just unique, but also does exclusion based on &&.  We could borrow syntax from
> SQL:2011 and allow `CREATE INDEX idx ON t (id, valid_at WITHOUT OVERLAPS)`. But since CREATE INDEX
> is a lower-level concept than a constraint, it'd be better to do something more general. You can
> already give opclasses for each indexed column. How about allowing operators as well? For instance
> `CREATE UNIQUE INDEX idx ON t (id WITH =, valid_at WITH &&)`? Then the index would know to enforce
> those rules.

I think this looks fine. I'd like it even better if we could default
to the equality operator that's used by the type's default btree
opclass in this syntax; that'd make CREATE UNIQUE INDEX much less
awkward for e.g. hash indexes.

> This is the same data we store today in pg_constraint.conexclops. So that would get
> moved/copied to pg_index (probably moved).

I'd keep the pg_constraint.conexclops around: People are inevitably
going to want to keep the current exclusion constraints' handling of
duplicate empty ranges, which is different from expectations we see
for UNIQUE INDEX's handling.

> Then when you add the constraint, what is the syntax? Today when you say PRIMARY KEY/UNIQUE USING
> INDEX, you don't give the column names. So how do we know it's WITHOUT OVERLAPS? I guess if the
> underlying index has (foo WITH = [, bar WITH =], baz WITH &&) we just assume the user wants WITHOUT
> OVERLAPS, and otherwise they want a regular PK/UQ constraint?

Presumably you would know this based on the pg_index.indisunique flag?

> In addition this workflow only works if you can CREATE INDEX CONCURRENTLY. I'm not sure yet if we'll
> have problems there. I noticed that for REINDEX at least, there were plans in 2012 to support
> exclusion-constraint indexes,[2] but when the patch was committed in 2019 they had been dropped,
> with plans to add support eventually.[3] Today they are still not supported. Maybe whatever caused
> problems for REINDEX isn't an issue for just INDEX, but it would take more research to find out.

I don't quite see where exclusion constraints get into the picture?
Isn't this about unique indexes, not exclusion constraints? I
understand exclusion constraints are backed by indexes, but that
doesn't have to make it a unique index, right? I mean, currently, you
can write an exclusion constraint that makes sure that all rows with a
certain prefix have the same suffix columns (given a btree-esque index
type with <> -operator support), which seems exactly opposite of what
unique indexes should do.

Kind regards,

Matthias van de Meent
Neon (https://neon.tech)

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

28 июня, 00:56:15

On 6/12/24 07:31, Robert Haas wrote:
> On Wed, Jun 5, 2024 at 4:56 PM Paul Jungwirth
>> **Option 3**: Forbid empties, not as a reified CHECK constraint, but just with some code in the
>> executor. Again we could do just PKs or PKs and UNIQUEs. Let's do both, for all the reasons above.
>> Not creating a CHECK constraint is much less clunky. There is no catalog entry to create/drop. Users
>> don't wonder where it came from when they say `\d t`. It can't conflict with constraints of their
>> own. We would enforce this in ExecConstraints, where we enforce NOT NULL and CHECK constraints, for
>> any table with constraints where conperiod is true. We'd also need to do this check on existing rows
>> when you create a temporal PK/UQ. This option also requires a new field in pg_class: just as we have
>> relchecks, relhasrules, relhastriggers, etc. to let us skip work in the relcache, I assume we'd want
>> relperiods.
> 
> I don't really like the existing relhasWHATEVER fields and am not very
> keen about adding more of them. Maybe it will turn out to be the best
> way, but finding the right times to set and unset such fields has been
> challenging over the years, and we've had to fix some bugs. So, if you
> go this route, I recommend looking carefully at whether there's a
> reasonable way to avoid the need for such a field. Other than that,
> this idea seems reasonable.

Here is a reworked patch series following Option 3: rather than using a cataloged CHECK constraint, 
we just do the check in the executor (but in the same place we do CHECK constraints). We also make 
sure existing rows are empty-free when you add the index.

I took the reverted commits from v17, squashed the minor fixes, rebased everything, and added a new 
patch to forbid empty ranges/multiranges wherever there is a WITHOUT OVERLAPS constraint. It comes 
right after the PK patch in the series. I don't intend it to be committed separately, but I thought 
it would make review easier, since the other code has been reviewed a lot already.

I did add a relperiods column, but I have a mostly-complete branch here (not included in the 
patches) that does without. Not maintaining that new column is simpler for sure. The consequence is 
that the relcache must scan for WITHOUT OVERLAPS constraints on every table. That seems like a high 
performance cost for a feature most databases won't use. Since we try hard to avoid that kind of 
thing (e.g. [1]), I thought adding relperiods would be preferred. If that's the wrong tradeoff I can 
change it.

One idea I considered was to include WITHOUT OVERLAPS constraints in the relchecks count. But that 
feels pretty hacky, and it is harder than it sounds, since index constraints are handled pretty far 
from where we update relchecks now. It doesn't save any complexity (but rather makes it worse), so 
the only reason to do it would be to avoid expanding pg_class records.

These patches still add some if-clauses to psql and pg_dump that say `if (fout->remoteVersion >= 
170000)`. But if I change them to 180000 I get failures in e.g. the pg_dump tests. What do other 
people do here before a release is cut?

Rebased on 3e53492aa7.

[1] 

https://github.com/postgres/postgres/blob/5d6c64d290978dab76c00460ba809156874be035/src/backend/utils/cache/relcache.c#L688-L713

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

Re: SQL:2011 application time

От

Robert Haas

Дата:

28 июня, 15:18:07

On Thu, Jun 27, 2024 at 5:56 PM Paul Jungwirth
<pj@illuminatedcomputing.com> wrote:
> I did add a relperiods column, but I have a mostly-complete branch here (not included in the
> patches) that does without. Not maintaining that new column is simpler for sure. The consequence is
> that the relcache must scan for WITHOUT OVERLAPS constraints on every table. That seems like a high
> performance cost for a feature most databases won't use. Since we try hard to avoid that kind of
> thing (e.g. [1]), I thought adding relperiods would be preferred. If that's the wrong tradeoff I can
> change it.

I'm sure that you are right that nobody is going to like an extra
index scan just to find periods. So, suppose we do as you propose and
add relperiods. In the situation where we are adding the first period
(or whatever the right term is) to the table, what kind of lock are we
holding on the table? Conversely, when we drop the last period, what
kind of lock are we holding on the table? If, hypothetically, both
answers were AccessExclusiveLock, this might not be too bad, but if
you say "ShareLock" then we've got a lot of problems; that's not even
self-exclusive.

> These patches still add some if-clauses to psql and pg_dump that say `if (fout->remoteVersion >=
> 170000)`. But if I change them to 180000 I get failures in e.g. the pg_dump tests. What do other
> people do here before a release is cut?

Sometimes I make a commit that bumps the version number (update major
version in src/tools/version_stamp.pl, then run it, then run autoconf,
then commit). Then I build my patch set on top of that. Once the
actual major release bump happens, I just drop that commit from the
stack.

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

05 июля, 22:22:31

Here is v35 of this patch series, with a few small changes. I renamed relperiods to 
relwithoutoverlaps, since that is more accurate about what we're counting. (PERIODs come in a later 
patch and we don't need to count them.) Also I cleaned up the branches in psql/pg_dump on version 
now that we're officially on v18.

On 6/28/24 05:18, Robert Haas wrote:
> On Thu, Jun 27, 2024 at 5:56 PM Paul Jungwirth
> <pj@illuminatedcomputing.com> wrote:
>> I did add a relperiods column, but I have a mostly-complete branch here (not included in the
>> patches) that does without. Not maintaining that new column is simpler for sure. The consequence is
>> that the relcache must scan for WITHOUT OVERLAPS constraints on every table. That seems like a high
>> performance cost for a feature most databases won't use. Since we try hard to avoid that kind of
>> thing (e.g. [1]), I thought adding relperiods would be preferred. If that's the wrong tradeoff I can
>> change it.
> 
> I'm sure that you are right that nobody is going to like an extra
> index scan just to find periods. So, suppose we do as you propose and
> add relperiods. In the situation where we are adding the first period
> (or whatever the right term is) to the table, what kind of lock are we
> holding on the table? Conversely, when we drop the last period, what
> kind of lock are we holding on the table? If, hypothetically, both
> answers were AccessExclusiveLock, this might not be too bad, but if
> you say "ShareLock" then we've got a lot of problems; that's not even
> self-exclusive.

This happens when creating a PRIMARY KEY or UNIQUE constraint, so we already have an 
AccessExclusiveLock on the table (whether creating or dropping). If we ever supported CREATE INDEX 
CONCURRENTLY for this, we would need to be careful about where we update the new field, but today we 
don't support that for exclusion constraints.

Rebased to 4b211003ec.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

Re: SQL:2011 application time

От

jian he

Дата:

09 июля, 10:15:39

On Thu, Jun 6, 2024 at 4:56 AM Paul Jungwirth
<pj@illuminatedcomputing.com> wrote:
>
> On 5/21/24 11:27, Isaac Morland wrote:
> > On Tue, 21 May 2024 at 13:57, Robert Haas <robertmhaas@gmail.com <mailto:robertmhaas@gmail.com>> wrote:
> >
> >     What I think is less clear is what that means for temporal primary
> >     keys. As Paul pointed out upthread, in every other case, a temporal
> >     primary key is at least as unique as a regular primary key, but in
> >     this case, it isn't. And someone might reasonably think that a
> >     temporal primary key should exclude empty ranges just as all primary
> >     keys exclude nulls. Or they might think the opposite.
> >
> >
> > Fascinating. I think you're absolutely right that it's clear that two empty intervals don't
> > conflict. If somebody wants to claim two intervals conflict, they need to point to at least one
> > instant in time that is common between them.
> >
> > But a major point of a primary key, it seems to me, is that it uniquely identifies a row. If items
> > are identified by a time range, non-overlapping or not, then the empty range can only identify one
> > item (per value of whatever other columns are in the primary key). I think for a unique key the
> > non-overlapping restriction has to be considered an additional restriction on top of the usual
> > uniqueness restriction.
> >
> > I suspect in many applications there will be a non-empty constraint; for example, it seems quite
> > reasonable to me for a meeting booking system to forbid empty meetings. But when they are allowed
> > they should behave in the mathematically appropriate way.
>
> Finding a way forward for temporal PKs got a lot of discussion at pgconf.dev (thanks especially to
> Peter Eisentraut and Jeff Davis!), so I wanted to summarize some options and describe what I think
> is the best approach.
>
> First the problem: empty ranges! A temporal PK/UNIQUE constraint is basically an exclusion
> constraint that is `(id WITH =, valid_at WITH &&)`. But the special 'empty' value never overlaps
> anything, *including itself*. (Note it has no "position": [3,3) is the same as [4,4).) Since the
> exclusion constraint forbids overlapping ranges, and empties never overlap, your table can have
> duplicates. (I'm talking about "literal uniqueness" as discussed in [1].) For instance:
>
>      CREATE EXTENSION btree_gist;
>      CREATE TABLE t (id int, valid_at daterange, name text);
>      ALTER TABLE t ADD CONSTRAINT tpk PRIMARY KEY (id, valid_at WITHOUT OVERLAPS);
>      INSERT INTO t VALUES (1, 'empty', 'foo');
>      INSERT INTO t VALUES (1, 'empty', 'bar');
>
> Multiranges have the same problem. So what do we do about that?
>
> **Option 0**: Allow it but document it. It shouldn't happen in practice: there is no reason for an
> empty range to get into a temporal table, and it arguably doesn't mean anything. The record is true
> at no time? But of course it will happen anyway. It's a footgun and will break expectations for at
> least some.
>
> It causes problems for us too. If you say `SELECT name FROM t GROUP BY id, valid_at`, we recognize
> that `name` is a functional dependency on the PK, so we allow it and give you the first row matching
> each key. You might get "foo" or you might get "bar". Also the planner uses not-nullable uniqueness
> to take many shortcuts. I couldn't create any concrete breakage there, but I bet someone else could.
> PKs that are not literally unique seems like something that would cause headaches for years.
>
> **Option 1**: Temporal PKs should automatically create a CHECK constraint that forbids empty ranges.
> Should UNIQUE constraints too? I'm tempted to say no, since sometimes users surprise us by coming up
> with new ways to use things. For instance one way to use empty ranges is to reference a temporal
> table from a non-temporal table, since `'empty' <@ anything` is always true (though this has
> questionable meaning or practical use). But probably we should forbid empties for UNIQUE constraints
> too. Forbidding them is more aligned with the SQL standard, which says that when you have a PERIOD,
> startcol < endcol (not <=). And it feels more consistent to treat both constraints the same way.
> Finally, if UNIQUEs do allow empties, we still risk confusing our planner.
>
> My last patch created these CHECK constraints for PKs (but not UNIQUEs) as INTERNAL dependencies.
> It's pretty clunky. There are lots of cases to handle, e.g. `ALTER COLUMN c TYPE` may reuse the PK
> index or may generate a new one. And what if the user already created the same constraint? Seeing
> all the trouble giving PKs automatic (cataloged) NOT NULL constraints makes me wary about this
> approach. It's not as bad, since there is no legacy, but it's still more annoying than I expected.
>
> Finally, hanging the CHECK constraint off the PK sets us up for problems when we add true PERIODs.
> Under 11.27 of SQL/Foundation, General Rules 2b says that defining a PERIOD should automatically add
> a CHECK constraint that startcol < endcol. That is already part of my last patch in this series. But
> that would be redundant with the constraint from the PK. And attaching the constraint to the PERIOD
> is a lot simpler than attaching it to the PK.
>
> **Option 2**: Add a new operator, called &&&, that works like && except an empty range *does*
> overlap another empty range. Empty ranges should still not overlap anything else. This would fix the
> exclusion constraint. You could add `(5, 'empty')` once but not twice. This would allow empties to
> people who want to use them. (We would still forbid them if you define a PERIOD, because those come
> with the CHECK constraint mentioned above.)
> And there is almost nothing to code. But it is mathematically suspect to say an empty range overlaps
> something small (something with zero width) but not something big. Surely if a && b and b <@ c, then
> a && c? So this feels like the kind of elegant hack that you eventually regret.
>
> **Option 3**: Forbid empties, not as a reified CHECK constraint, but just with some code in the
> executor. Again we could do just PKs or PKs and UNIQUEs. Let's do both, for all the reasons above.
> Not creating a CHECK constraint is much less clunky. There is no catalog entry to create/drop. Users
> don't wonder where it came from when they say `\d t`. It can't conflict with constraints of their
> own. We would enforce this in ExecConstraints, where we enforce NOT NULL and CHECK constraints, for
> any table with constraints where conperiod is true. We'd also need to do this check on existing rows
> when you create a temporal PK/UQ. This option also requires a new field in pg_class: just as we have
> relchecks, relhasrules, relhastriggers, etc. to let us skip work in the relcache, I assume we'd want
> relperiods.
>
> **Option 4**: Teach GiST indexes to enforce uniqueness. We didn't discuss this at pgconf, at least
> not in reference to the empties problem. But I was thinking about this request from Matthias for
> temporal PKs & UQs to support `USING INDEX idx`.[2] It is confusing that a temporal index has
> indisunique, but if you try to create a unique GiST index directly we say they don't support UNIQUE
> indexes! Similarly `pg_indexam_has_property(783, 'can_unique')` returns false. There is something
> muddled about all that. So how about we give the GiST AM handler amcanunique?
>
I think we can Forbid empties,not not mess with pg_class.


to make the communication smooth, i've set the base commit to
46a0cd4cefb4d9b462d8cc4df5e7ecdd190bea92
{Add temporal PRIMARY KEY and UNIQUE constraints}
https://git.postgresql.org/cgit/postgresql.git/commit/?id=46a0cd4cefb4d9b462d8cc4df5e7ecdd190bea92
you can git reset --hard 46a0cd4cefb4d9b462d8cc4df5e7ecdd190bea92
then apply the attached patch.



I hope I understand it correctly.
previously revert is only because the special value: empty.
i tried to use the operator &&&, new gist strategy number, pg_amop
entry to solve the problem.
Now with the applied patch, if the range column is specified WITHOUT OVERLAPS,
then this column is not allowed to have any empty range value.



logic work through:
* duplicate logic of range_overlaps but disallow empty value. also
have the operator &&&, (almost equivalent to &&)
* add new gist strategy number
* thanks to  add stratnum GiST support function
(https://git.postgresql.org/cgit/postgresql.git/commit/?id=6db4598fcb82a87a683c4572707e522504830a2b)
now we can set the strategy number to the mapped new function
(equivalent to range_overlaps, but error out empty value)
* in ComputeIndexAttrs, set the strategy number to the newly created
StrategyNumber in "else if (iswithoutoverlaps)" block.
* Similarly refactor src/backend/utils/adt/rangetypes_gist.c make the
index value validation using newly created function.



function name, error message maybe not great now, but it works.
------full demo, also see the comments.
DROP TABLE if exists temporal_rng;
CREATE TABLE temporal_rng (id int4range, valid_at tsrange);
ALTER TABLE temporal_rng
ADD CONSTRAINT temporal_rng_pk
PRIMARY KEY (id, valid_at WITHOUT OVERLAPS);
--should be fine.
INSERT INTO temporal_rng VALUES ('empty', '[2022-01-01,2022-01-02]');
--will error out, period column, empty range not allowed
INSERT INTO temporal_rng VALUES ('[3,3]', 'empty');

ALTER TABLE temporal_rng DROP CONSTRAINT temporal_rng_pk;
--period constraint dropped, now should be fine.
INSERT INTO temporal_rng VALUES ('[3,3]', 'empty');

--reinstall constraint, should error out
--because existing one row has empty value.
ALTER TABLE temporal_rng
ADD CONSTRAINT temporal_rng_pk
PRIMARY KEY (id, valid_at WITHOUT OVERLAPS);
delete from temporal_rng where id = '[3,3]';

--reinstall constraint, should be fine, because empty value removed.
ALTER TABLE temporal_rng
ADD CONSTRAINT temporal_rng_pk
PRIMARY KEY (id, valid_at WITHOUT OVERLAPS);

Вложения

v1-0001-forbid-empty-range-value-for-period-column.patch

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

18 июля, 06:34:42

On 7/9/24 00:15, jian he wrote:
>> **Option 2**: Add a new operator, called &&&, that works like && except an empty range *does*
>> overlap another empty range. Empty ranges should still not overlap anything else. This would fix the
>> exclusion constraint. You could add `(5, 'empty')` once but not twice. This would allow empties to
>> people who want to use them. (We would still forbid them if you define a PERIOD, because those come
>> with the CHECK constraint mentioned above.)
>> And there is almost nothing to code. But it is mathematically suspect to say an empty range overlaps
>> something small (something with zero width) but not something big. Surely if a && b and b <@ c, then
>> a && c? So this feels like the kind of elegant hack that you eventually regret.
> I think we can Forbid empties,not not mess with pg_class.
> 
> to make the communication smooth, i've set the base commit to
> 46a0cd4cefb4d9b462d8cc4df5e7ecdd190bea92
> {Add temporal PRIMARY KEY and UNIQUE constraints}
> https://git.postgresql.org/cgit/postgresql.git/commit/?id=46a0cd4cefb4d9b462d8cc4df5e7ecdd190bea92
> you can git reset --hard 46a0cd4cefb4d9b462d8cc4df5e7ecdd190bea92
> then apply the attached patch.
> 
> I hope I understand it correctly.
> previously revert is only because the special value: empty.
> i tried to use the operator &&&, new gist strategy number, pg_amop
> entry to solve the problem.
> Now with the applied patch, if the range column is specified WITHOUT OVERLAPS,
> then this column is not allowed to have any empty range value.
> 
> logic work through:
> * duplicate logic of range_overlaps but disallow empty value. also
> have the operator &&&, (almost equivalent to &&)
> * add new gist strategy number
> * thanks to  add stratnum GiST support function
> (https://git.postgresql.org/cgit/postgresql.git/commit/?id=6db4598fcb82a87a683c4572707e522504830a2b)
> now we can set the strategy number to the mapped new function
> (equivalent to range_overlaps, but error out empty value)
> * in ComputeIndexAttrs, set the strategy number to the newly created
> StrategyNumber in "else if (iswithoutoverlaps)" block.
> * Similarly refactor src/backend/utils/adt/rangetypes_gist.c make the
> index value validation using newly created function.

I like this approach a lot, but I'd like to hear what some other people think?

Jian he's &&& operator is similar to what I proposed upthread, but when either operand is an empty 
value it simply raises an error. (It should be an ereport, not an elog, and I think 
multirange_overlaps_multirange_internal is missing the empty check, but I can clean things up when I 
integrate it into the patch series.)

This is much simpler than everything I'm doing: checking for empties in the executor phase, adding a 
field to pg_class, setting things in the relcache, and checking for empties in existing rows when 
you add an index. This patch uses existing infrastructure to do all the work. It seems like a much 
cleaner solution.

Unlike my proposed &&& operator, it doesn't have weird mathematical consequences.

At first I thought raising an error was not great, but it's the same thing you get when you divide 
by zero. It's fine for an operator to have a restricted domain of inputs. And we would only use this 
internally for primary keys and unique constraints, where indeed raising an error is just what we want.

If I don't hear objections (or think of something myself :-), I'm inclined to use this approach.

But what do people think?

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

18 июля, 21:39:09

On 7/17/24 20:34, Paul Jungwirth wrote:
> I like this approach a lot, but I'd like to hear what some other people think?
> 
> Jian he's &&& operator is similar to what I proposed upthread, but when either operand is an empty 
> value it simply raises an error. (It should be an ereport, not an elog, and I think 
> multirange_overlaps_multirange_internal is missing the empty check, but I can clean things up when I 
> integrate it into the patch series.)

I thought of a possible problem: this operator works great if there are already rows in the table, 
but what if the *first row you insert* has an empty range? Then there is nothing to compare against, 
so the operator will never be used. Right?

Except when I test it, it still works! After running `make installcheck`, I did this:

regression=# truncate temporal_rng cascade;
NOTICE:  truncate cascades to table "temporal_fk_rng2rng"
TRUNCATE TABLE
regression=# insert into temporal_rng values ('[1,2)', 'empty');
ERROR:  range cannot be empty

My mental model must be wrong. Can anyone explain what is happening there? Is it something we can 
depend on?

So I swapped in the &&& patch, cleaned it up, and added tests. But something is wrong. After I get 
one failure from an empty, I keep getting failures, even though the table is empty:

regression=# truncate temporal_rng cascade;
NOTICE:  truncate cascades to table "temporal_fk_rng2rng"
TRUNCATE TABLE
regression=# insert into temporal_rng values ('[1,2)', '[2000-01-01,2010-01-01)'); -- ok so far
INSERT 0 1
regression=# insert into temporal_rng values ('[1,2)', 'empty'); -- should fail and does
ERROR:  range cannot be empty
regression=# insert into temporal_rng values ('[1,2)', '[2010-01-01,2020-01-01)'); -- uh oh
ERROR:  range cannot be empty
regression=# truncate temporal_rng cascade;
NOTICE:  truncate cascades to table "temporal_fk_rng2rng"
TRUNCATE TABLE
regression=# insert into temporal_rng values ('[1,2)', '[2000-01-01,2010-01-01)'); -- ok so far
INSERT 0 1
regression=# insert into temporal_rng values ('[1,2)', '[2010-01-01,2020-01-01)'); -- ok now
INSERT 0 1

It looks like the index is getting corrupted. Continuing from the above:

regression=# create extension pageinspect;
CREATE EXTENSION
regression=# select gist_page_items(get_raw_page('temporal_rng_pk', 0), 'temporal_rng_pk');
                               gist_page_items
----------------------------------------------------------------------------
  (1,"(0,1)",40,f,"(id, valid_at)=(""[1,2)"", ""[2000-01-01,2010-01-01)"")")
  (2,"(0,2)",40,f,"(id, valid_at)=(""[1,2)"", ""[2010-01-01,2020-01-01)"")")
(2 rows)

regression=# insert into temporal_rng values ('[1,2)', 'empty');
ERROR:  range cannot be empty
regression=# select gist_page_items(get_raw_page('temporal_rng_pk', 0), 'temporal_rng_pk');
                               gist_page_items
----------------------------------------------------------------------------
  (1,"(0,1)",40,f,"(id, valid_at)=(""[1,2)"", ""[2000-01-01,2010-01-01)"")")
  (2,"(0,2)",40,f,"(id, valid_at)=(""[1,2)"", ""[2010-01-01,2020-01-01)"")")
  (3,"(0,3)",32,f,"(id, valid_at)=(""[1,2)"", empty)")
(3 rows)

So maybe this is a bad place to ereport? Or is this a deeper bug with GiST? Here is where we're 
doing it:

#0  range_nonempty_overlaps_internal (typcache=0x635a7fbf67f0, r1=0x635a7fc11f20, r2=0x635a7fc11f40) 
at rangetypes.c:876
#1  0x0000635a7f06175d in range_gist_consistent_leaf_range (typcache=0x635a7fbf67f0, strategy=31, 
key=0x635a7fc11f20, query=0x635a7fc11f40)
     at rangetypes_gist.c:1076
#2  0x0000635a7f05fc9a in range_gist_consistent (fcinfo=0x7ffcd20f9f60) at rangetypes_gist.c:216
#3  0x0000635a7f12d780 in FunctionCall5Coll (flinfo=0x635a7fb44eb8, collation=0, 
arg1=140723832725648, arg2=109240340727454, arg3=31, arg4=0,
     arg5=140723832725567) at fmgr.c:1242
#4  0x0000635a7e999af6 in gistindex_keytest (scan=0x635a7fb44d50, tuple=0x7d155c0a3fd0, 
page=0x7d155c0a2000 "", offset=1, recheck_p=0x7ffcd20fa129,
     recheck_distances_p=0x7ffcd20fa12a) at gistget.c:221
#5  0x0000635a7e99a109 in gistScanPage (scan=0x635a7fb44d50, pageItem=0x7ffcd20fa1e0, 
myDistances=0x0, tbm=0x0, ntids=0x0) at gistget.c:436
#6  0x0000635a7e99a797 in gistgettuple (scan=0x635a7fb44d50, dir=ForwardScanDirection) at gistget.c:637
#7  0x0000635a7e9e4d38 in index_getnext_tid (scan=0x635a7fb44d50, direction=ForwardScanDirection) at 
indexam.c:590
#8  0x0000635a7e9e4f7d in index_getnext_slot (scan=0x635a7fb44d50, direction=ForwardScanDirection, 
slot=0x635a7fb44950) at indexam.c:682
#9  0x0000635a7ec5690b in check_exclusion_or_unique_constraint (heap=0x7d1560cea348, 
index=0x7d1560cedd98, indexInfo=0x635a7fb44c40, tupleid=0x635a7fb44580,
     values=0x7ffcd20faf00, isnull=0x7ffcd20faee0, estate=0x635a7fb434a0, newIndex=false, 
waitMode=CEOUC_WAIT, violationOK=false, conflictTid=0x0)
     at execIndexing.c:780
#10 0x0000635a7ec55c58 in ExecInsertIndexTuples (resultRelInfo=0x635a7fb43930, slot=0x635a7fb44550, 
estate=0x635a7fb434a0, update=false, noDupErr=false,
     specConflict=0x0, arbiterIndexes=0x0, onlySummarizing=false) at execIndexing.c:483
#11 0x0000635a7eca38a2 in ExecInsert (context=0x7ffcd20fb1b0, resultRelInfo=0x635a7fb43930, 
slot=0x635a7fb44550, canSetTag=true, inserted_tuple=0x0,
     insert_destrel=0x0) at nodeModifyTable.c:1145

Is there anything I can do to save this &&& idea? I've attached the patches I'm working with, 
rebased to cd85ae1114.

If ereport just won't work, then I might explore other definitions of a &&& operator. It was really 
nice to have such a clean solution.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

On 7/23/24 09:08, Paul Jungwirth wrote:
> One tempting alternative though is to let exclusion constraints do the not-empty check, instead of
> putting it in the executor. It would be an extra check we do only when the constraint has
> pg_constraint.conperiod. Then we don't need to add & maintain pg_class.relwithoutoverlaps, and we don't
> need a relcache change, and we don't need so much extra code to check existing rows when you add the
> constraint. It doesn't use the existing available exclusion constraint functionality, but if we're
> willing to extend the executor to know about WITHOUT OVERLAPS, I guess we could teach exclusion
> constraints about it instead. Doing the check there does seem to have better locality with the feature.
> So I think I will try that out as well.

Here is a patch moving the not-empty check into check_exclusion_or_unique_constraint. That is a more 
logical place for it than ExecConstraints, since WITHOUT OVERLAPS is part of the index constraint 
(not a CHECK constraint). At that point we've already looked up all the information we need. So 
there is no extra cost for non-temporal tables, and no need to change pg_class or add to the 
relcache. Also putting it there means we don't need any extra code to enforce non-empties when we 
build the index or do anything else with it.

I think this is the nicest solution we can expect. It is even cleaner than the &&& ideas. So 
hopefully this gets us back to where we were when we decided to commit PKs & FKs to v17.

As before, I've left the nonempty check as a separate patch to make reviewing easier, but when 
committing I would squash it with the PK patch.

Rebased to 05faf06e9c.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вложения

Re: SQL:2011 application time

От

Paul Jungwirth

Дата:

01 августа, 20:09:03

On 7/25/24 08:52, Paul Jungwirth wrote:
> Here is a patch moving the not-empty check into check_exclusion_or_unique_constraint. That is a more 
> logical place for it than ExecConstraints, since WITHOUT OVERLAPS is part of the index constraint 
> (not a CHECK constraint). At that point we've already looked up all the information we need. So 
> there is no extra cost for non-temporal tables, and no need to change pg_class or add to the 
> relcache. Also putting it there means we don't need any extra code to enforce non-empties when we 
> build the index or do anything else with it.
> 
> I think this is the nicest solution we can expect. It is even cleaner than the &&& ideas. So 
> hopefully this gets us back to where we were when we decided to commit PKs & FKs to v17.
> 
> As before, I've left the nonempty check as a separate patch to make reviewing easier, but when 
> committing I would squash it with the PK patch.

Hello,

Here is an updated set of patches, rebased because the old patches no longer applied.

Also I have a question about foreign key RESTRICT behavior and the SQL spec.

I added some tests for a particular condition:
there are two adjacent referenced rows (sharing a scalar key part),
and a single referencing row whose time spans the transition between the referenced rows.
So graphing the records on a timeline, they look like this:

PK:  |-----|-----|
FK:     |-----|

Now suppose you simultaneously update both referenced rows to be like so:

PK:  |---------|-|
FK:     |-----|

Note that the FK's condition is still fulfilled.

In a NO ACTION constraint, we clearly should not raise an error (and we don't).

In a RESTRICT constraint, we *do* raise an error (but maybe we shouldn't).

Here is some specific SQL (added to the tests in these patches):

-- A PK update sliding the edge between two referenced rows:
INSERT INTO temporal_rng (id, valid_at) VALUES
   ('[6,7)', daterange('2018-01-01', '2018-02-01')),
   ('[6,7)', daterange('2018-02-01', '2018-03-01'));
INSERT INTO temporal_fk_rng2rng (id, valid_at, parent_id) VALUES
   ('[4,5)', daterange('2018-01-15', '2018-02-15'), '[6,7)');
UPDATE temporal_rng
SET valid_at = CASE WHEN lower(valid_at) = '2018-01-01'
                     THEN daterange('2018-01-01', '2018-01-05')
                     WHEN lower(valid_at) = '2018-02-01'
                     THEN daterange('2018-01-05', '2018-03-01') END
WHERE id = '[6,7)';

or if you prefer PERIODs:

-- A PK update sliding the edge between two referenced rows:
INSERT INTO temporal_per (id, valid_from, valid_til) VALUES
   ('[6,7)', '2018-01-01', '2018-02-01'),
   ('[6,7)', '2018-02-01', '2018-03-01');
INSERT INTO temporal_fk_per2per (id, valid_from, valid_til, parent_id) VALUES
   ('[4,5)', '2018-01-15', '2018-02-15', '[6,7)');
UPDATE temporal_per
SET valid_from = CASE WHEN valid_from = '2018-01-01' THEN '2018-01-01'
                       WHEN valid_from = '2018-02-01' THEN '2018-01-05' END::date,
     valid_til =  CASE WHEN valid_from = '2018-01-01' THEN '2018-01-05'
                       WHEN valid_from = '2018-02-01' THEN '2018-03-01' END::date
WHERE id = '[6,7)';

Here is what the SQL:2011 spec says (section 4.18.3.3 from Part 2 Foundation):

 > ON UPDATE RESTRICT: any change to a referenced column in the referenced table is prohibited if 
there is a matching row.

So that says we should raise an error.
But it seems clearly written with only non-temporal constraints in mind.
Is it really correct in the scenario above? The reference is still valid.
Does anyone know if the text has been updated in more recent versions of the standard?

Part of me is happy the standard says this, because not raising an error is harder to implement.
Maybe a lot harder.

On the other hand, what if we have just one row in each table, and we *expand* the referenced range? 
In other words, from this:

PK:  |-----|
FK:    |-|

to this:

PK: |-------|
FK:    |-|

Should that raise an error too? Currently it does not.

But I think that is correct. As usual I go back to Date's model about "one row per millisecond".
The referenced milliseconds didn't get updated, only the unreferenced ones.
So I think what we are doing is okay.

Likewise that same principle indicates we are doing the right thing in the original case:
we did update the referenced milliseconds.
Even though we swapped in replacements, we have to raise an error.
This is no different than the non-temporal case.

So my conclusion is we are doing the right thing in all places.
But here is an opportunity for people to disagree. :-)

Rebased to f5f30c22ed.

Yours,

-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Here are some fixes based on outstanding feedback (some old some new). Details below:

On 3/25/24 17:00, jian he wrote:
 > hi.
 > minor issues I found in v33-0003.
 > there are 29 of {check_amproc_signature?.*false}
 > only one {check_amproc_signature(procform->amproc, opcintype, true}
 > is this refactoring really worth it?

I could add a separate function, for example check_amproc_retset_signature, but it would require 
duplicating almost the whole existing function, so a param seems better here.

 > We also need to refactor gistadjustmembers?

You're right, added the new support procs there.

 > +      <row>
 > +       <entry><function>intersect</function></entry>
 > +       <entry>computes intersection with <literal>FOR PORTION OF</literal>
 > +        bounds</entry>
 > +       <entry>13</entry>
 > +      </row>
 > +      <row>
 > +       <entry><function>without_portion</function></entry>
 > +       <entry>computes remaining duration(s) outside
 > +       <literal>FOR PORTION OF</literal> bounds</entry>
 > +       <entry>14</entry>
 > +      </row>
 > needs to add "(optional)".

Added.

 > +<programlisting>
 > +Datum
 > +my_range_intersect(PG_FUNCTION_ARGS)
 > +{
 > +    RangeType  *r1 = PG_GETARG_RANGE_P(0);
 > +    RangeType  *r2 = PG_GETARG_RANGE_P(1);
 > +    TypeCacheEntry *typcache;
 > +
 > +    /* Different types should be prevented by ANYRANGE matching rules */
 > +    if (RangeTypeGetOid(r1) != RangeTypeGetOid(r2))
 >                                                 elog(ERROR, "range
 > types do not match");
 > +
 > +    typcache = range_get_typcache(fcinfo, RangeTypeGetOid(r1));
 > +
 > +    PG_RETURN_RANGE_P(range_intersect_internal(typcache, r1, r2));
 > +}
 > +</programlisting>
 > the elog, ERROR indentation is wrong?

Fixed.

 > +/*
 > + * range_without_portion_internal - Sets outputs and outputn to the ranges
 > + * remaining and their count (respectively) after subtracting r2 from r1.
 > + * The array should never contain empty ranges.
 > + * The outputs will be ordered. We expect that outputs is an array of
 > + * RangeType pointers, already allocated with two slots.
 > + */
 > +void
 > +range_without_portion_internal(TypeCacheEntry *typcache, RangeType *r1,
 > +   RangeType *r2, RangeType **outputs, int *outputn)
 > the comments need to be refactored?
 > there is nothing related to "slot"?
 > not sure the "array" description is right.
 > (my understanding is compute rangetype r1 and r2, and save the result to
 > RangeType **outputs.

Changed "slots" to "elements". Everything else looks correct to me.

 > select proisstrict, proname from pg_proc where proname =
 > 'range_without_portion';
 > range_without_portion is strict.
 > but
 > select range_without_portion(NULL::int4range, int4range(11, 20,'[]'));
 > return zero rows.
 > Is this the expected behavior?

Returning zero rows is correct if the function is never called (which is what strict does).
I see other strict retset functions, e.g. json_array_elements.
That also returns zero rows if you say SELECT json_array_elements(NULL);

On 4/14/24 17:00, jian he wrote:
 > for unique index, primary key:
 > ii_ExclusionOps, ii_UniqueOps is enough to distinguish this index
 > support without overlaps,
 > we don't need another ii_HasWithoutOverlaps?
 > (i didn't test it though)

I think it is worth having something named. But also ii_Exclusion is not set in 
index_concurrently_create_copy, so inferring when we have WITHOUT OVERLAPS will not work in that case.

 > ON CONFLICT DO NOTHING
 > ON CONFLICT (id, valid_at) DO NOTHING
 > ON CONFLICT ON CONSTRAINT temporal_rng_pk DO NOTHING
 > I am confused by the test.
 > here temporal_rng only has one primary key, ON CONFLICT only deals with it.
 > I thought these three are the same thing?

They all have somewhat different code paths in infer_arbiter_indexes, and they mean different 
things. I recall when I first started dealing with empty ranges several of these test cases caught 
different bugs (as well as the DO UPDATE cases).

On 8/5/24 19:02, jian he wrote:
 > void
 > ExecWithoutOverlapsNotEmpty(Relation rel, Datum attval, Oid typtype,
 > Oid atttypid);
 >
 > should this just be a static function?
 > I am not so sure.

Changed. In a previous version I was calling this from two places, but I'm not anymore.

 > Oid typtype
 > should be
 > char typtype
 > ?

Oops, you're right! Fixed.

 >                   errmsg("new row for relation \"%s\" contains empty
 > WITHOUT OVERLAPS value",
 > we already have Form_pg_attribute via "TupleDesc tupdesc =
 > RelationGetDescr(heap);"
 > we can make the error message be:
 >                   errmsg("cannot be empty range value for WITHOUT
 > OVERLAPS column \"%s\" in relation \"%s\", colname,
 > RelationGetRelationName(rel))

Yes, it's nicer to report the column name. Changed.

 > elog(ERROR, "Got unknown type for WITHOUT OVERLAPS column: %d", atttypid);
 > people will wonder if domain over range works or not. but currently
 > not, better error message would be:
 >              elog(ERROR, "WITHOUT OVERLAPS column \"%s\" is not a range
 > or multirange type ", colname);
 > This part is unlikely to be reachable, so I don't have a strong opinion on it.

Likewise.

 > + if (!found)
 > + column = NULL;
 > this part no need?
 > because if not found, the column would be last element in ColumnDef
 > type list columns

We can later set `found` to true from inheritance (or it being a system column), and then `column` 
is set but wrong. So setting `column` to null seems generally clearer. But concretely, I use 
`column` below to give me the type (which I otherwise don't have in CREATE TABLE), so I can forbid 
types other than range and multirange.

 > also the following change also make sense:
 >
 > + if (!OidIsValid(typid) && column)
 > + typid = typenameTypeId(NULL, column->typeName);

This is because in CREATE TABLE I need to get the type from the `column` variable.

 > I am confused with this change?
 > you found out the typid,but didn't using this information, should it be
 > + if (strcmp(attname, key) == 0)
 > + {
 > + typid = attr->atttypid;
 > + found = true;
 > + break;
 > + }

Yes. Actually that is in the PERIOD patch file, but it should be in Forbid-empty-ranges. Moved.

 > so the failing error message be same for the following two cases:
 > CREATE TABLE t1 (id int4range,valid_at tsrange,b text,
 >     CONSTRAINT temporal_rng_pk PRIMARY KEY (id, b WITHOUT OVERLAPS)
 > );
 >
 > CREATE TABLE t1 (id int4range,valid_at tsrange,b text);
 > alter table t1 add CONSTRAINT temporal_rng_pk PRIMARY KEY (id, b
 > WITHOUT OVERLAPS);

I think the same error message is the right thing to do here.
It looks like that's what we're doing.
If I've misunderstand what you want, can you clarify?

On 8/6/24 07:50, jian he wrote:
 > in generateClonedIndexStmt
 > index->iswithoutoverlaps = (idxrec->indisprimary ||
 > idxrec->indisunique) && idxrec->indisexclusion;
 > this case, the index accessMethod will be "gist" only?
 >
 > do you think it's necessary to:
 > index->iswithoutoverlaps = (idxrec->indisprimary ||
 > idxrec->indisunique) && idxrec->indisexclusion
 > && strcmp(index->accessMethod, "gist") == 0);

This doesn't seem necessary, and maybe we'll support non-gist someday, when this condition would be 
misleading.

 > src/bin/pg_dump/pg_dump.c and src/bin/psql/describe.c
 > should be "if (pset.sversion >= 180000)"?

Ah, thanks. Changing these from 170000 also landed in the wrong patch file. Fixed.

 > + (This is sometimes called a
 > +      temporal key, if the column is a range of dates or timestamps, but
 > +      PostgreSQL allows ranges over any base type.)
 >
 > PostgreSQL should be decorated as
 > <productname>PostgreSQL</productname>

Done.

 > in DefineIndex we have:
 > if (stmt->unique && !stmt->iswithoutoverlaps && !amRoutine->amcanunique)
 > if (stmt->indexIncludingParams != NIL && !amRoutine->amcaninclude)
 > if (numberOfKeyAttributes > 1 && !amRoutine->amcanmulticol)
 > if (exclusion && amRoutine->amgettuple == NULL)
 >
 > maybe we can add:
 >      if (stmt->iswithoutoverlaps && strcmp(accessMethodName, "gist") != 0)
 >          ereport(ERROR,
 >                  (errcode(ERRCODE_FEATURE_NOT_SUPPORTED),
 >                   errmsg("access method \"%s\" does not support WITHOUT
 > OVERLAPS constraints",
 >                          accessMethodName)));

Okay.

 > + /* exclusionOpNames can be non-NIL if we are creating a partition */
 > + if (iswithoutoverlaps && exclusionOpNames == NIL)
 > + {
 > + indexInfo->ii_ExclusionOps = palloc_array(Oid, nkeycols);
 > + indexInfo->ii_ExclusionProcs = palloc_array(Oid, nkeycols);
 > + indexInfo->ii_ExclusionStrats = palloc_array(uint16, nkeycols);
 > + }
 > the comment is not 100% correct, i think.
 > creating a partition, "create table like INCLUDING ALL", both will go
 > through generateClonedIndexStmt.
 > generateClonedIndexStmt will produce exclusionOpNames if this index
 > supports exclusion constraint.

I think the comment is correct, but non-NIL is a confusing double negative, and it's not clear that 
the comment is giving the motivation for the second half of the condition.
I re-wrote it to be more clear. I also adjusted the `if` to avoid parsing operator names when not 
needed.

Rebased to e56ccc8e42.

Yours,


-- 
Paul              ~{:-)
pj@illuminatedcomputing.com

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: SQL:2011 application time

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения

Вложения