Обсуждение: UNIQUE null treatment option
The SQL standard has been ambiguous about whether null values in unique constraints should be considered equal or not. Different implementations have different behaviors. In the SQL:202x draft, this has been formalized by making this implementation-defined and adding an option on unique constraint definitions UNIQUE [ NULLS [NOT] DISTINCT ] to choose a behavior explicitly. This patch adds this option to PostgreSQL. The default behavior remains UNIQUE NULLS DISTINCT. Making this happen in the btree code is apparently pretty easy; most of the patch is just to carry the flag around to all the places that need it. The CREATE UNIQUE INDEX syntax extension is not from the standard, it's my own invention. (I named all the internal flags, catalog columns, etc. in the negative ("nulls not distinct") so that the default PostgreSQL behavior is the default if the flag is false. But perhaps the double negatives make some code harder to read.)
Вложения
On Fri, Aug 27, 2021 at 3:38 PM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote:
In the SQL:202x draft, this
has been formalized by making this implementation-defined and adding
an option on unique constraint definitions UNIQUE [ NULLS [NOT]
DISTINCT ] to choose a behavior explicitly.
The CREATE UNIQUE INDEX syntax extension is not from the standard,
it's my own invention.
For the unique index syntax, should this be selectable per column/expression, rather than for the entire index as a whole?
.m
On 27.08.21 14:44, Marko Tiikkaja wrote: > On Fri, Aug 27, 2021 at 3:38 PM Peter Eisentraut > <peter.eisentraut@enterprisedb.com > <mailto:peter.eisentraut@enterprisedb.com>> wrote: > > In the SQL:202x draft, this > has been formalized by making this implementation-defined and adding > an option on unique constraint definitions UNIQUE [ NULLS [NOT] > DISTINCT ] to choose a behavior explicitly. > > The CREATE UNIQUE INDEX syntax extension is not from the standard, > it's my own invention. > > > For the unique index syntax, should this be selectable per > column/expression, rather than for the entire index as a whole? Semantically, this would be possible, but the bookkeeping to make it work seems out of proportion with the utility. And you'd have the unique index syntax out of sync with the unique constraint syntax, which would be pretty confusing.
Here is a rebased version of this patch. On 27.08.21 14:38, Peter Eisentraut wrote: > The SQL standard has been ambiguous about whether null values in > unique constraints should be considered equal or not. Different > implementations have different behaviors. In the SQL:202x draft, this > has been formalized by making this implementation-defined and adding > an option on unique constraint definitions UNIQUE [ NULLS [NOT] > DISTINCT ] to choose a behavior explicitly. > > This patch adds this option to PostgreSQL. The default behavior > remains UNIQUE NULLS DISTINCT. Making this happen in the btree code > is apparently pretty easy; most of the patch is just to carry the flag > around to all the places that need it. > > The CREATE UNIQUE INDEX syntax extension is not from the standard, > it's my own invention. > > (I named all the internal flags, catalog columns, etc. in the > negative ("nulls not distinct") so that the default PostgreSQL > behavior is the default if the flag is false. But perhaps the double > negatives make some code harder to read.)
Вложения
Hi,
bool isunique;
+ bool nulls_not_distinct;
} BTSpool;
+ bool nulls_not_distinct;
} BTSpool;
Looking at the other fields in BTSpool, there is no underscore in field name.
I think the new field can be named nullsdistinct. This way, the double negative is avoided.
Similar comment for new fields in BTShared and BTLeader.
And the naming would be consistent with information_schema.sql where nulls_distinct is used:
+ CAST('YES' AS yes_or_no) AS enforced,
+ CAST(NULL AS yes_or_no) AS nulls_distinct
+ CAST(NULL AS yes_or_no) AS nulls_distinct
Cheers
+1 for commiting this feature. Consider this useful.
I find this patch useful. It includes changes in documentation and tests. Code itself looks reasonable to me. Since, unique constraint check is done by corresponding btree index, it makes this feature implementation elegant and lightweight.
In my view, it is sufficient that heap relation can have different nulls treatment in unique constraints for different unique columns. For example:
CREATE TABLE t (i INT UNIQUE NULLS DISTINCT, a INT UNIQUE NULLS NOT DISTINCT);
All the tests are running ok on Linux and MacOS X.
Although, patch doesn't apply with default git apply options. Only with the "three way merge" option (-3). Consider rebasing it, please. Then, in my view, it can be "Ready for committer".
-- Best regards,
Maxim Orlov.
On Wed, Dec 29, 2021 at 2:06 AM Peter Eisentraut <peter.eisentraut@enterprisedb.com> wrote: > Here is a rebased version of this patch. BTScanInsertData.anynullkeys already effectively means "if the index is a unique index, then we don't actually need to go through _bt_check_unique(), or perform any other checkingunique steps". This is really an instruction about what to do (or not do), based on the specifics of the values for the insertion scan key plus the index definition. In other words, the code in _bt_mkscankey() that sets up BTScanInsertData (an insertion scankey) was written with the exact requirements of btinsert() in mind -- nothing more. I wonder if the logic for setting BTScanInsertData.anynullkeys inside _bt_mkscankey() is the place to put your test for rel->rd_index->indnullsnotdistinct -- not inside _bt_doinsert(). That would probably necessitate renaming anynullkeys, but that's okay. This feels more natural to me because a NULL key column in a NULLS NOT DISTINCT unique constraint is very similar to a NULL non-key column in an INCLUDE index, as far as our requirements go -- and so both cases should probably be dealt with at the same point. -- Peter Geoghegan
On Thu, Jan 13, 2022 at 10:36 AM Peter Geoghegan <pg@bowt.ie> wrote: > I wonder if the logic for setting BTScanInsertData.anynullkeys inside > _bt_mkscankey() is the place to put your test for > rel->rd_index->indnullsnotdistinct -- not inside _bt_doinsert(). That > would probably necessitate renaming anynullkeys, but that's okay. This > feels more natural to me because a NULL key column in a NULLS NOT > DISTINCT unique constraint is very similar to a NULL non-key column in > an INCLUDE index, as far as our requirements go -- and so both cases > should probably be dealt with at the same point. Correction: I meant to write "...a NULL key column in a NULLS DISTINCT unique constraint is very similar...". -- Peter Geoghegan
I wonder if the logic for setting BTScanInsertData.anynullkeys inside
_bt_mkscankey() is the place to put your test for
rel->rd_index->indnullsnotdistinct -- not inside _bt_doinsert(). That
would probably necessitate renaming anynullkeys, but that's okay. This
feels more natural to me because a NULL key column in a NULLS NOT
DISTINCT unique constraint is very similar to a NULL non-key column in
an INCLUDE index, as far as our requirements go -- and so both cases
should probably be dealt with at the same point.
A good point, indeed!
On 13.01.22 19:36, Peter Geoghegan wrote: > I wonder if the logic for setting BTScanInsertData.anynullkeys inside > _bt_mkscankey() is the place to put your test for > rel->rd_index->indnullsnotdistinct -- not inside _bt_doinsert(). That > would probably necessitate renaming anynullkeys, but that's okay. This > feels more natural to me because a NULL key column in a NULLS NOT > DISTINCT unique constraint is very similar to a NULL non-key column in > an INCLUDE index, as far as our requirements go -- and so both cases > should probably be dealt with at the same point. Makes sense. Here is an updated patch with this change. I didn't end up renaming anynullkeys. I came up with names like "anyalwaysdistinctkeys", but in the end that felt too abstract, and moreover, it would require rewriting a bunch of code comments that refer to null values in this context. Since as you wrote, anynullkeys is just a local concern between two functions, this slight inaccuracy is perhaps better than some highly general but unclear terminology.
Вложения
Since cfbot did failed with error, probably, unrelated to the patch itself (see https://cirrus-ci.com/task/5330150500859904)
and repeated check did not start automatically, I reattach patch v3 to restart cfbot on this patch.
Best regards,
Maxim Orlov.
Вложения
Makes sense. Here is an updated patch with this change.
I didn't end up renaming anynullkeys. I came up with names like
"anyalwaysdistinctkeys", but in the end that felt too abstract, and
moreover, it would require rewriting a bunch of code comments that refer
to null values in this context. Since as you wrote, anynullkeys is just
a local concern between two functions, this slight inaccuracy is perhaps
better than some highly general but unclear terminology.
Agree with that. With the comment it is clear how it works.
I've looked at the patch v3. It seems good enough for me. CFbot tests have also come green.
Suggest it is RFC now.
On 28.01.22 13:56, Pavel Borisov wrote: > Makes sense. Here is an updated patch with this change. > > I didn't end up renaming anynullkeys. I came up with names like > "anyalwaysdistinctkeys", but in the end that felt too abstract, and > moreover, it would require rewriting a bunch of code comments that > refer > to null values in this context. Since as you wrote, anynullkeys is > just > a local concern between two functions, this slight inaccuracy is > perhaps > better than some highly general but unclear terminology. > > Agree with that. With the comment it is clear how it works. > > I've looked at the patch v3. It seems good enough for me. CFbot tests > have also come green. > Suggest it is RFC now. Committed. Thanks.