Обсуждение: Re: using index to speedup add not null constraints to a table

Поиск
Список
Период
Сортировка

Re: using index to speedup add not null constraints to a table

От
Álvaro Herrera
Дата:
Can you please rebase this?  It stopped applying a week ago.

Thanks!

-- 
Álvaro Herrera        Breisgau, Deutschland  —  https://www.EnterpriseDB.com/
"Sallah, I said NO camels! That's FIVE camels; can't you count?"
(Indiana Jones)



Re: using index to speedup add not null constraints to a table

От
jian he
Дата:
On Fri, Oct 17, 2025 at 3:57 AM Álvaro Herrera <alvherre@kurilemu.de> wrote:
> Can you please rebase this?  It stopped applying a week ago.
>

hi.

rebase and minor polishment.

Вложения

Re: using index to speedup add not null constraints to a table

От
Álvaro Herrera
Дата:
Hello

I rebased this patch to review it.  Overall, I think this is a great
idea.  Here are some comments on the patch, which I hope are not
anything major.

I'm not really sure that I agree with the way ATRewriteTable works now.
It seems that we scan the list of columns, and verify each one for nulls,
but abort midway if a column is generated.  Surely we should check for
generated columns first, to avoid wasting time verifying earlier
columns in case a later column is generated?

That said, we do check for notnull_virtual_attrs to be NIL.  It seems to
me that this implies that the checked columns are not generated.  In
other words, the test for whether columns are generated is unnecessary
and confusing, and in which case you don't have to change anything, just
remove the "if (attr->attgenerated == ATTRIBUTE_GENERATED_VIRTUAL)"
inner block.

However, if we do this, then I think computing notnull_attrs is
pointless.  So we should only change the order: do this new check first,
and if we find that any new not-null column is on a generated column,
then we compute both notnull_virtual_attrs and notnull_attrs.  No?


The separation of labor between index_check_notnull() and
index_check_notnull_internal() seems a bit pointless.  Why not have a
single routine that does both things?  Though, to be honest, it does
look like the former should live in tablecmds.c instead of
execIndexing.c.  (But if you do split things that way, then you need to
make index_check_notnull_internal extern).

The tests look a bit excessive.  Why do we need an isolation test for
this?  And it looks to me like the other test could be in
src/test/regress rather than be a TAP test.  After all, that's what you
have the ereport(DEBUG1) there, isn't it?

"veritify" doesn't exist.  The word is "verify".

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
Syntax error: function hell() needs an argument.
Please choose what hell you want to involve.

Вложения

Re: using index to speedup add not null constraints to a table

От
Álvaro Herrera
Дата:
On 2026-Jan-13, Álvaro Herrera wrote:

> However, if we do this, then I think computing notnull_attrs is
> pointless.  So we should only change the order: do this new check
> first, and if we find that any new not-null column is on a generated
> column, then we compute both notnull_virtual_attrs and notnull_attrs.
> No?

Oh, another thing we should do is have a first pass that verifies
whether all columns have an appropriate index, without scanning any of
them; only if we verify that they all have one (and no generated column
is involved) then we start scanning the indexes.  Otherwise we waste
time scanning one index and verify that it contains no null values, only
to realize that the next column does not have an appropriate index to
use, and thus we must scan the table.  Then the first index scan is
wasted work.

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/
"The saddest aspect of life right now is that science gathers knowledge faster
 than society gathers wisdom."  (Isaac Asimov)



Re: using index to speedup add not null constraints to a table

От
jian he
Дата:
hi.

---------------------
create unlogged table t2_copy(col1 int, col2 int, col3 int, col4 int,
col5 int) with (autovacuum_enabled = off, vacuum_index_cleanup=off);
insert into t2_copy select g, g, g, g,g from generate_series(1, 10_000_000) g;
set enable_seqscan to off;
set enable_bitmapscan to off;
set max_parallel_workers=0;
set max_parallel_workers_per_gather=0;

create index t2_copy_idx on t2_copy(col1, col2, col3, col4, col5);
explain (costs off, analyze) select from t2_copy where col1 is NULL or
col2 is NULL or col3 is NULL or col4 is null or col5 is null \watch
i=0.1 c=10

drop index t2_copy_idx;
create index t2_copy_idx1 on t2_copy(col1);
create index t2_copy_idx2 on t2_copy(col2);
create index t2_copy_idx3 on t2_copy(col3);
create index t2_copy_idx4 on t2_copy(col4);
create index t2_copy_idx5 on t2_copy(col5);
explain (costs off, analyze) select from t2_copy where col1 is NULL or
col2 is NULL or col3 is NULL or col4 is null or col5 is null \watch
i=0.1 c=10
------------------------------------------
By comparing the above two EXPLAIN, I found out that there will be regression
for using one multiple column indexes fast verify mutiple not-null constraints.

create unlogged table t3_copy(col1 int, col2 int, col3 int, col4 int);
create index t3_copy_idx on t3_copy(col1, col2, col3, col4);
alter table t3_copy add not null col1, add not null col2, add not null
col3, add not null col4;

In this case, scanning the t3_copy_idx index to check these four NOT NULL
constraints is actually slower than a full table scan.

Therefore, I further restricted the use of the index-scan mechanism for fast NOT
NULL verification to scenarios where the attribute being checked is the leading
column of the index.

So, for the above example, to let

alter table t3_copy add not null col1, add not null col2, add not null
col3, add not null col4;

utilize indexscan mechanism, we require four indexes, each index leading column
is corresponding to the not-null constraint attribute.

I have not yet addressed all of your other comments, as this
represents a more of a
major change. Therefore, I am submitting the attached patch first.

index_check_notnull function, in v8 it is:

+ /* collect index attnums while loop */
+ for (int i = 0; i < index->indnkeyatts; i++)
+ {
+ attr = TupleDescAttr(tupdesc, (index->indkey.values[i] - 1));
+
+
+ }

in v9, it's:
+ /* collect index attnums while loop */
+ attr = TupleDescAttr(tupdesc, (index->indkey.values[0] - 1));
+

I am not sure that regression tests would be helpful, since the test hinges on
whether ereport(DEBUG1, ...) is reached.

I am not sure about the concurrency implications; therefore, isolation tests are
attached.  maybe it's not necessary.



--
jian
https://www.enterprisedb.com

Вложения