Обсуждение: Optimize cardinality estimation when unique keys are fully covered

Поиск
Список
Период
Сортировка

Optimize cardinality estimation when unique keys are fully covered

От
"feichanghong"
Дата:
Hi,

Recently, I ran into a cardinality estimation problem where the query
predicates fully cover a primary key or unique index. The issue can be
reproduced with the following case:

```sql
create table t1(a int, b int, primary key(a, b));

DO $$
DECLARE
    i integer;
BEGIN
    FOR i IN 1..100000 LOOP
        INSERT INTO t1 VALUES (i, 0);
    END LOOP;
END;
$$ LANGUAGE plpgsql;


DO $$
DECLARE
    i integer;
BEGIN
    FOR j IN 1..10 LOOP
        FOR i IN 1..100000 LOOP
            INSERT INTO t1 VALUES (i%10+10*(j-1), i);
        END LOOP;
    END LOOP;
END;
$$ LANGUAGE plpgsql;



create table t2(a int);
insert into t2 select 1;

analyze t1, t2;

postgres=# explain analyze select * from t1 where a = 1 and b = 0;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=4.65..2174.00 rows=833 width=8) (actual time=0.060..0.061 rows=1.00 loops=1)
   Recheck Cond: ((a = 1) AND (b = 0))
   Heap Blocks: exact=1
   Buffers: shared hit=4
   ->  Bitmap Index Scan on t1_pkey  (cost=0.00..4.44 rows=833 width=0) (actual time=0.042..0.043 rows=1.00 loops=1)
         Index Cond: ((a = 1) AND (b = 0))
         Index Searches: 1
         Buffers: shared hit=3
 Planning Time: 0.146 ms
 Execution Time: 0.105 ms
(10 rows)

postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.43..4.81 rows=32 width=12) (actual time=0.067..0.069 rows=1.00 loops=1)
   Buffers: shared hit=5
   ->  Seq Scan on t2  (cost=0.00..1.01 rows=1 width=4) (actual time=0.024..0.025 rows=1.00 loops=1)
         Buffers: shared hit=1
   ->  Index Only Scan using t1_pkey on t1  (cost=0.43..108.23 rows=32 width=8) (actual time=0.036..0.036 rows=1.00 loops=1)
         Index Cond: ((a = t2.a) AND (b = 0))
         Heap Fetches: 0
         Index Searches: 1
         Buffers: shared hit=4
 Planning Time: 0.257 ms
 Execution Time: 0.114 ms
(11 rows)

```

It can be observed that the Index Cond fully covers the primary key index.
Naturally, the correct rows estimate should be 1, but the optimizer
estimates the two conditions independently, resulting in an overestimated
row count. This overestimation has a greater impact in scenarios with
partitioned tables and multi-table joins, making the planner more inclined
to choose hash or merge joins.

We may consider checking in cardinality estimation whether the restrictlist
fully covers a unique index. If so, we can directly set the estimated rows
to 1. The attached patch provides a very early demo of this approach.

Best Regards,
Fei Changhong
Alibaba Cloud Computing Ltd.
Вложения

Re: Optimize cardinality estimation when unique keys are fully covered

От
feichanghong
Дата:


On Nov 22, 2025, at 00:54, feichanghong <feichanghong@qq.com> wrote:

Hi,

Recently, I ran into a cardinality estimation problem where the query
predicates fully cover a primary key or unique index. The issue can be
reproduced with the following case:

```sql
create table t1(a int, b int, primary key(a, b));

DO $$
DECLARE
    i integer;
BEGIN
    FOR i IN 1..100000 LOOP
        INSERT INTO t1 VALUES (i, 0);
    END LOOP;
END;
$$ LANGUAGE plpgsql;


DO $$
DECLARE
    i integer;
BEGIN
    FOR j IN 1..10 LOOP
        FOR i IN 1..100000 LOOP
            INSERT INTO t1 VALUES (i%10+10*(j-1), i);
        END LOOP;
    END LOOP;
END;
$$ LANGUAGE plpgsql;



create table t2(a int);
insert into t2 select 1;

analyze t1, t2;

postgres=# explain analyze select * from t1 where a = 1 and b = 0;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=4.65..2174.00 rows=833 width=8) (actual time=0.060..0.061 rows=1.00 loops=1)
   Recheck Cond: ((a = 1) AND (b = 0))
   Heap Blocks: exact=1
   Buffers: shared hit=4
   ->  Bitmap Index Scan on t1_pkey  (cost=0.00..4.44 rows=833 width=0) (actual time=0.042..0.043 rows=1.00 loops=1)
         Index Cond: ((a = 1) AND (b = 0))
         Index Searches: 1
         Buffers: shared hit=3
 Planning Time: 0.146 ms
 Execution Time: 0.105 ms
(10 rows)

postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.43..4.81 rows=32 width=12) (actual time=0.067..0.069 rows=1.00 loops=1)
   Buffers: shared hit=5
   ->  Seq Scan on t2  (cost=0.00..1.01 rows=1 width=4) (actual time=0.024..0.025 rows=1.00 loops=1)
         Buffers: shared hit=1
   ->  Index Only Scan using t1_pkey on t1  (cost=0.43..108.23 rows=32 width=8) (actual time=0.036..0.036 rows=1.00 loops=1)
         Index Cond: ((a = t2.a) AND (b = 0))
         Heap Fetches: 0
         Index Searches: 1
         Buffers: shared hit=4
 Planning Time: 0.257 ms
 Execution Time: 0.114 ms
(11 rows)

```

It can be observed that the Index Cond fully covers the primary key index.
Naturally, the correct rows estimate should be 1, but the optimizer
estimates the two conditions independently, resulting in an overestimated
row count. This overestimation has a greater impact in scenarios with
partitioned tables and multi-table joins, making the planner more inclined
to choose hash or merge joins.

We may consider checking in cardinality estimation whether the restrictlist
fully covers a unique index. If so, we can directly set the estimated rows
to 1. The attached patch provides a very early demo of this approach.

Apologies for the garbled text due to my email client. Please find the
reproduction SQL below again:

```sql
create table t1(a int, b int, primary key(a, b));

DO $$
DECLARE
    i integer;
BEGIN
    FOR i IN 1..100000 LOOP
        INSERT INTO t1 VALUES (i, 0);
    END LOOP;
END;
$$ LANGUAGE plpgsql;


DO $$
DECLARE
    i integer;
BEGIN
    FOR j IN 1..10 LOOP
        FOR i IN 1..100000 LOOP
            INSERT INTO t1 VALUES (i%10+10*(j-1), i);
        END LOOP;
    END LOOP;
END;
$$ LANGUAGE plpgsql;



create table t2(a int);
insert into t2 select 1;

analyze t1, t2;

postgres=# explain analyze select * from t1 where a = 1 and b = 0;
                                                     QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
 Bitmap Heap Scan on t1  (cost=4.65..2174.00 rows=833 width=8) (actual time=0.060..0.061 rows=1.00 loops=1)
   Recheck Cond: ((a = 1) AND (b = 0))
   Heap Blocks: exact=1
   Buffers: shared hit=4
   ->  Bitmap Index Scan on t1_pkey  (cost=0.00..4.44 rows=833 width=0) (actual time=0.042..0.043 rows=1.00 loops=1)
         Index Cond: ((a = 1) AND (b = 0))
         Index Searches: 1
         Buffers: shared hit=3
 Planning Time: 0.146 ms
 Execution Time: 0.105 ms
(10 rows)

postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;
                                                         QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.43..4.81 rows=32 width=12) (actual time=0.067..0.069 rows=1.00 loops=1)
   Buffers: shared hit=5
   ->  Seq Scan on t2  (cost=0.00..1.01 rows=1 width=4) (actual time=0.024..0.025 rows=1.00 loops=1)
         Buffers: shared hit=1
   ->  Index Only Scan using t1_pkey on t1  (cost=0.43..108.23 rows=32 width=8) (actual time=0.036..0.036 rows=1.00 loops=1)
         Index Cond: ((a = t2.a) AND (b = 0))
         Heap Fetches: 0
         Index Searches: 1
         Buffers: shared hit=4
 Planning Time: 0.257 ms
 Execution Time: 0.114 ms
(11 rows)

```


Best Regards,
Fei Changhong

Re: Optimize cardinality estimation when unique keys are fully covered

От
Tom Lane
Дата:
"=?utf-8?B?ZmVpY2hhbmdob25n?=" <feichanghong@qq.com> writes:
> We may consider checking in cardinality estimation whether the restrictlist
> fully covers a unique index. If so, we can directly set the estimated rows
> to 1. The attached patch provides a very early demo of this approach.

I think this is far harder than you believe; a simplistic approach
like this will mainly result in breaking things.  The reason is that
we need to have the same estimate of the size of a join relation
regardless of how it is implemented (and, indeed, that size estimate
is made before we ever consider individual join paths).  Otherwise the
fundamental model of generating different join paths and comparing
them for merit breaks down, either at this join level or higher ones.
But the conclusion that "where t1.a = t2.a and t1.b = 0" means that
t1's rowcount is 1 only applies if the join is implemented as an inner
indexscan.  If we choose some other method, say a hash join based on
a seqscan of t1, having forced that rowcount to 1 would produce
completely false estimates.

Now, what you are proposing would make the EXPLAIN output cosmetically
better, in that instead of

 Nested Loop  (cost=0.43..1.57 rows=32 width=12)
   ->  Seq Scan on t2  (cost=0.00..1.01 rows=1 width=4)
   ->  Index Only Scan using t1_pkey on t1  (cost=0.43..4.76 rows=32 width=8)
         Index Cond: ((a = t2.a) AND (b = 0))

we would get a rowcount estimate of "1" for the parameterized t1 scan.
But the estimate for the output of the nestloop would have to remain
the same.  And the t1 scan is not where the problem is: it already
made the right choice there.  To the extent that this EXPLAIN output
is problematic, it's because if this join is part of a bigger query
then we may make poor choices at upper join levels due to having a bad
estimate of this join's output size.  I don't see how this line of
work can fix that.  (Yes, I see the hack you put into
set_joinrel_size_estimates.  It's a hack not a workable solution,
because it will distort the size estimates in many cases that are
not quite what you have here.)

            regards, tom lane



Re: Optimize cardinality estimation when unique keys are fully covered

От
Nico Williams
Дата:
On Fri, Nov 21, 2025 at 01:52:07PM -0500, Tom Lane wrote:
> But the conclusion that "where t1.a = t2.a and t1.b = 0" means that
> t1's rowcount is 1 only applies if the join is implemented as an inner
> indexscan.  If we choose some other method, say a hash join based on
> a seqscan of t1, having forced that rowcount to 1 would produce
> completely false estimates.

But wouldn't knowing that for an inner indexscan the cardinality is one
then drive the optimizer to choose inner indexscan over other methods?

Nico
-- 



Re: Optimize cardinality estimation when unique keys are fully covered

От
feichanghong
Дата:


On Nov 22, 2025, at 02:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"feichanghong" <feichanghong@qq.com> writes:
We may consider checking in cardinality estimation whether the restrictlist
fully covers a unique index. If so, we can directly set the estimated rows
to 1. The attached patch provides a very early demo of this approach.

I think this is far harder than you believe; a simplistic approach
like this will mainly result in breaking things.  The reason is that
we need to have the same estimate of the size of a join relation
regardless of how it is implemented (and, indeed, that size estimate
is made before we ever consider individual join paths).  Otherwise the
fundamental model of generating different join paths and comparing
them for merit breaks down, either at this join level or higher ones.
But the conclusion that "where t1.a = t2.a and t1.b = 0" means that
t1's rowcount is 1 only applies if the join is implemented as an inner
indexscan.  If we choose some other method, say a hash join based on
a seqscan of t1, having forced that rowcount to 1 would produce
completely false estimates.

It should be noted that the row estimate for a base relation will only be
adjusted to "1" when the baserestrictinfo fully covers a unique index,
for example "WHERE a = 1 AND b = 0". See set_baserel_size_estimates for
details. In contrast, "WHERE t1.a = t2.a AND t1.b = 0" will only affect
the ppi_rows of a parameterized path; see get_parameterized_baserel_size
for details.

In addition, for indexscan and bitmapscan, I also adjusted the
indexSelectivity in genericcostestimate.

For example, in the following case: in the first SQL, the
baserestrictinfo covers the primary key, so its "rows=1". In the second
SQL, it does not, and the planner chooses a hash join with "rows=98045"
(note that we still adjusted the join row estimate through the
parameterized path).

```sql
postgres=# explain analyze select * from t1 where a = 1 and b = 0;
                                             QUERY PLAN
-----------------------------------------------------------------------------------------------------
 Seq Scan on t1  (cost=0.00..21367.77 rows=1 width=8) (actual time=0.027..260.106 rows=1.00 loops=1)
   Filter: ((a = 1) AND (b = 0))
   Rows Removed by Filter: 1099999
   Buffers: shared hit=4868
 Planning Time: 0.111 ms
 Execution Time: 260.136 ms
(6 rows)

postgres=# set enable_indexscan to off;
SET
postgres=# set enable_bitmapscan to off;
SET
postgres=# set max_parallel_workers_per_gather to 0;
SET
postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;
                                                     QUERY PLAN
--------------------------------------------------------------------------------------------------------------------
 Hash Join  (cost=1.02..18986.82 rows=1 width=12) (actual time=0.042..280.086 rows=1.00 loops=1)
   Hash Cond: (t1.a = t2.a)
   Buffers: shared hit=4869
   ->  Seq Scan on t1  (cost=0.00..18617.81 rows=98045 width=8) (actual time=0.025..264.100 rows=100000.00 loops=1)
         Filter: (b = 0)
         Rows Removed by Filter: 1000000
         Buffers: shared hit=4868
   ->  Hash  (cost=1.01..1.01 rows=1 width=4) (actual time=0.010..0.012 rows=1.00 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 9kB
         Buffers: shared hit=1
         ->  Seq Scan on t2  (cost=0.00..1.01 rows=1 width=4) (actual time=0.006..0.007 rows=1.00 loops=1)
               Buffers: shared hit=1
 Planning Time: 0.210 ms
 Execution Time: 280.137 ms
(14 rows)
```

Now, what you are proposing would make the EXPLAIN output cosmetically
better, in that instead of

Nested Loop  (cost=0.43..1.57 rows=32 width=12)
  ->  Seq Scan on t2  (cost=0.00..1.01 rows=1 width=4)
  ->  Index Only Scan using t1_pkey on t1  (cost=0.43..4.76 rows=32 width=8)
        Index Cond: ((a = t2.a) AND (b = 0))

we would get a rowcount estimate of "1" for the parameterized t1 scan.
But the estimate for the output of the nestloop would have to remain
the same.  And the t1 scan is not where the problem is: it already
made the right choice there.  To the extent that this EXPLAIN output
is problematic, it's because if this join is part of a bigger query
then we may make poor choices at upper join levels due to having a bad
estimate of this join's output size.  I don't see how this line of
work can fix that.  (Yes, I see the hack you put into
set_joinrel_size_estimates.  It's a hack not a workable solution,
because it will distort the size estimates in many cases that are
not quite what you have here.)

For join cardinality estimation, we still follow the principle that the
rows should remain consistent regardless of which path is chosen. This is
also set in set_joinrel_size_estimates. The only adjustment is: if we know
that pushing down some join_clauses to lower-level nodes can yield a more
accurate row estimate (for example, in "t1.a = t2.a AND t1.b = 0" the
clause "t1.a = t2.a"), then we use the rows estimate after pushdown in the
lower node, and remove this join clause when computing join selectivity.

Of course, this is just a demo of an initial idea, and there are many
aspects that need improvement. Many thanks for your suggestions and
comments.

Best Regards,
Fei Changhong