Обсуждение: Optimize cardinality estimation when unique keys are fully covered

Поиск

Список

Период

Сортировка

Optimize cardinality estimation when unique keys are fully covered

От

"feichanghong"

Дата:

21 ноября 2025 г., 19:54:10

Hi,

Recently, I ran into a cardinality estimation problem where the query

predicates fully cover a primary key or unique index. The issue can be

reproduced with the following case:

```sql

create table t1(a int, b int, primary key(a, b));

DO $$

DECLARE

i integer;

BEGIN

FOR i IN 1..100000 LOOP

INSERT INTO t1 VALUES (i, 0);

END LOOP;

END;

$$ LANGUAGE plpgsql;

DO $$

DECLARE

i integer;

BEGIN

FOR j IN 1..10 LOOP

FOR i IN 1..100000 LOOP

INSERT INTO t1 VALUES (i%10+10*(j-1), i);

END LOOP;

END;

$$ LANGUAGE plpgsql;

create table t2(a int);

insert into t2 select 1;

analyze t1, t2;

postgres=# explain analyze select * from t1 where a = 1 and b = 0;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on t1 (cost=4.65..2174.00 rows=833 width=8) (actual time=0.060..0.061 rows=1.00 loops=1)

Recheck Cond: ((a = 1) AND (b = 0))

Heap Blocks: exact=1

Buffers: shared hit=4

-> Bitmap Index Scan on t1_pkey (cost=0.00..4.44 rows=833 width=0) (actual time=0.042..0.043 rows=1.00 loops=1)

Index Cond: ((a = 1) AND (b = 0))

Index Searches: 1

Buffers: shared hit=3

Planning Time: 0.146 ms

Execution Time: 0.105 ms

(10 rows)

postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=0.43..4.81 rows=32 width=12) (actual time=0.067..0.069 rows=1.00 loops=1)

Buffers: shared hit=5

-> Seq Scan on t2 (cost=0.00..1.01 rows=1 width=4) (actual time=0.024..0.025 rows=1.00 loops=1)

Buffers: shared hit=1

-> Index Only Scan using t1_pkey on t1 (cost=0.43..108.23 rows=32 width=8) (actual time=0.036..0.036 rows=1.00 loops=1)

Index Cond: ((a = t2.a) AND (b = 0))

Heap Fetches: 0

Index Searches: 1

Buffers: shared hit=4

Planning Time: 0.257 ms

Execution Time: 0.114 ms

(11 rows)

```

It can be observed that the Index Cond fully covers the primary key index.

Naturally, the correct rows estimate should be 1, but the optimizer

estimates the two conditions independently, resulting in an overestimated

row count. This overestimation has a greater impact in scenarios with

partitioned tables and multi-table joins, making the planner more inclined

to choose hash or merge joins.

We may consider checking in cardinality estimation whether the restrictlist

fully covers a unique index. If so, we can directly set the estimated rows

to 1. The attached patch provides a very early demo of this approach.

Best Regards,

Fei Changhong

Alibaba Cloud Computing Ltd.

Вложения

optimize-cardinality-fully-covered-uk.patch

Re: Optimize cardinality estimation when unique keys are fully covered

От

feichanghong

Дата:

21 ноября 2025 г., 20:11:56

On Nov 22, 2025, at 00:54, feichanghong <feichanghong@qq.com> wrote:

Hi,

Recently, I ran into a cardinality estimation problem where the query
predicates fully cover a primary key or unique index. The issue can be
reproduced with the following case:

```sql
create table t1(a int, b int, primary key(a, b));

DO $$
DECLARE
i integer;
BEGIN
FOR i IN 1..100000 LOOP
INSERT INTO t1 VALUES (i, 0);
END LOOP;
END;
$$ LANGUAGE plpgsql;

DO $$
DECLARE
i integer;
BEGIN
FOR j IN 1..10 LOOP
FOR i IN 1..100000 LOOP
INSERT INTO t1 VALUES (i%10+10*(j-1), i);
END LOOP;
END LOOP;
END;
$$ LANGUAGE plpgsql;

create table t2(a int);
insert into t2 select 1;

analyze t1, t2;

postgres=# explain analyze select * from t1 where a = 1 and b = 0;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------
Bitmap Heap Scan on t1 (cost=4.65..2174.00 rows=833 width=8) (actual time=0.060..0.061 rows=1.00 loops=1)
Recheck Cond: ((a = 1) AND (b = 0))
Heap Blocks: exact=1
Buffers: shared hit=4
-> Bitmap Index Scan on t1_pkey (cost=0.00..4.44 rows=833 width=0) (actual time=0.042..0.043 rows=1.00 loops=1)
Index Cond: ((a = 1) AND (b = 0))
Index Searches: 1
Buffers: shared hit=3
Planning Time: 0.146 ms
Execution Time: 0.105 ms
(10 rows)

postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=0.43..4.81 rows=32 width=12) (actual time=0.067..0.069 rows=1.00 loops=1)
Buffers: shared hit=5
-> Seq Scan on t2 (cost=0.00..1.01 rows=1 width=4) (actual time=0.024..0.025 rows=1.00 loops=1)
Buffers: shared hit=1
-> Index Only Scan using t1_pkey on t1 (cost=0.43..108.23 rows=32 width=8) (actual time=0.036..0.036 rows=1.00 loops=1)
Index Cond: ((a = t2.a) AND (b = 0))
Heap Fetches: 0
Index Searches: 1
Buffers: shared hit=4
Planning Time: 0.257 ms
Execution Time: 0.114 ms
(11 rows)

```

It can be observed that the Index Cond fully covers the primary key index.
Naturally, the correct rows estimate should be 1, but the optimizer
estimates the two conditions independently, resulting in an overestimated
row count. This overestimation has a greater impact in scenarios with
partitioned tables and multi-table joins, making the planner more inclined
to choose hash or merge joins.

We may consider checking in cardinality estimation whether the restrictlist
fully covers a unique index. If so, we can directly set the estimated rows
to 1. The attached patch provides a very early demo of this approach.

Apologies for the garbled text due to my email client. Please find the

reproduction SQL below again:

```sql

create table t1(a int, b int, primary key(a, b));

DO $$

DECLARE

i integer;

BEGIN

FOR i IN 1..100000 LOOP

INSERT INTO t1 VALUES (i, 0);

END LOOP;

END;

$$ LANGUAGE plpgsql;

DO $$

DECLARE

i integer;

BEGIN

FOR j IN 1..10 LOOP

FOR i IN 1..100000 LOOP

INSERT INTO t1 VALUES (i%10+10*(j-1), i);

END LOOP;

END;

$$ LANGUAGE plpgsql;

create table t2(a int);

insert into t2 select 1;

analyze t1, t2;

postgres=# explain analyze select * from t1 where a = 1 and b = 0;

QUERY PLAN

---------------------------------------------------------------------------------------------------------------------

Bitmap Heap Scan on t1 (cost=4.65..2174.00 rows=833 width=8) (actual time=0.060..0.061 rows=1.00 loops=1)

Recheck Cond: ((a = 1) AND (b = 0))

Heap Blocks: exact=1

Buffers: shared hit=4

-> Bitmap Index Scan on t1_pkey (cost=0.00..4.44 rows=833 width=0) (actual time=0.042..0.043 rows=1.00 loops=1)

Index Cond: ((a = 1) AND (b = 0))

Index Searches: 1

Buffers: shared hit=3

Planning Time: 0.146 ms

Execution Time: 0.105 ms

(10 rows)

postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;

QUERY PLAN

-----------------------------------------------------------------------------------------------------------------------------

Nested Loop (cost=0.43..4.81 rows=32 width=12) (actual time=0.067..0.069 rows=1.00 loops=1)

Buffers: shared hit=5

-> Seq Scan on t2 (cost=0.00..1.01 rows=1 width=4) (actual time=0.024..0.025 rows=1.00 loops=1)

Buffers: shared hit=1

-> Index Only Scan using t1_pkey on t1 (cost=0.43..108.23 rows=32 width=8) (actual time=0.036..0.036 rows=1.00 loops=1)

Index Cond: ((a = t2.a) AND (b = 0))

Heap Fetches: 0

Index Searches: 1

Buffers: shared hit=4

Planning Time: 0.257 ms

Execution Time: 0.114 ms

(11 rows)

```

Best Regards,
Fei Changhong

Re: Optimize cardinality estimation when unique keys are fully covered

От

Tom Lane

Дата:

21 ноября 2025 г., 21:52:07

"=?utf-8?B?ZmVpY2hhbmdob25n?=" <feichanghong@qq.com> writes:
> We may consider checking in cardinality estimation whether the restrictlist
> fully covers a unique index. If so, we can directly set the estimated rows
> to 1. The attached patch provides a very early demo of this approach.

I think this is far harder than you believe; a simplistic approach
like this will mainly result in breaking things.  The reason is that
we need to have the same estimate of the size of a join relation
regardless of how it is implemented (and, indeed, that size estimate
is made before we ever consider individual join paths).  Otherwise the
fundamental model of generating different join paths and comparing
them for merit breaks down, either at this join level or higher ones.
But the conclusion that "where t1.a = t2.a and t1.b = 0" means that
t1's rowcount is 1 only applies if the join is implemented as an inner
indexscan.  If we choose some other method, say a hash join based on
a seqscan of t1, having forced that rowcount to 1 would produce
completely false estimates.

Now, what you are proposing would make the EXPLAIN output cosmetically
better, in that instead of

 Nested Loop  (cost=0.43..1.57 rows=32 width=12)
   ->  Seq Scan on t2  (cost=0.00..1.01 rows=1 width=4)
   ->  Index Only Scan using t1_pkey on t1  (cost=0.43..4.76 rows=32 width=8)
         Index Cond: ((a = t2.a) AND (b = 0))

we would get a rowcount estimate of "1" for the parameterized t1 scan.
But the estimate for the output of the nestloop would have to remain
the same.  And the t1 scan is not where the problem is: it already
made the right choice there.  To the extent that this EXPLAIN output
is problematic, it's because if this join is part of a bigger query
then we may make poor choices at upper join levels due to having a bad
estimate of this join's output size.  I don't see how this line of
work can fix that.  (Yes, I see the hack you put into
set_joinrel_size_estimates.  It's a hack not a workable solution,
because it will distort the size estimates in many cases that are
not quite what you have here.)

            regards, tom lane

Re: Optimize cardinality estimation when unique keys are fully covered

От

Nico Williams

Дата:

21 ноября 2025 г., 22:52:01

On Fri, Nov 21, 2025 at 01:52:07PM -0500, Tom Lane wrote:
> But the conclusion that "where t1.a = t2.a and t1.b = 0" means that
> t1's rowcount is 1 only applies if the join is implemented as an inner
> indexscan.  If we choose some other method, say a hash join based on
> a seqscan of t1, having forced that rowcount to 1 would produce
> completely false estimates.

But wouldn't knowing that for an inner indexscan the cardinality is one
then drive the optimizer to choose inner indexscan over other methods?

Nico
--

Re: Optimize cardinality estimation when unique keys are fully covered

От

feichanghong

Дата:

22 ноября 2025 г., 08:15:42

On Nov 22, 2025, at 02:52, Tom Lane <tgl@sss.pgh.pa.us> wrote:

"feichanghong" <feichanghong@qq.com> writes:
We may consider checking in cardinality estimation whether the restrictlist
fully covers a unique index. If so, we can directly set the estimated rows
to 1. The attached patch provides a very early demo of this approach.

I think this is far harder than you believe; a simplistic approach
like this will mainly result in breaking things. The reason is that
we need to have the same estimate of the size of a join relation
regardless of how it is implemented (and, indeed, that size estimate
is made before we ever consider individual join paths). Otherwise the
fundamental model of generating different join paths and comparing
them for merit breaks down, either at this join level or higher ones.
But the conclusion that "where t1.a = t2.a and t1.b = 0" means that
t1's rowcount is 1 only applies if the join is implemented as an inner
indexscan. If we choose some other method, say a hash join based on
a seqscan of t1, having forced that rowcount to 1 would produce
completely false estimates.

It should be noted that the row estimate for a base relation will only be

adjusted to "1" when the baserestrictinfo fully covers a unique index,

for example "WHERE a = 1 AND b = 0". See set_baserel_size_estimates for

details. In contrast, "WHERE t1.a = t2.a AND t1.b = 0" will only affect

the ppi_rows of a parameterized path; see get_parameterized_baserel_size

for details.

In addition, for indexscan and bitmapscan, I also adjusted the

indexSelectivity in genericcostestimate.

For example, in the following case: in the first SQL, the

baserestrictinfo covers the primary key, so its "rows=1". In the second

SQL, it does not, and the planner chooses a hash join with "rows=98045"

(note that we still adjusted the join row estimate through the

parameterized path).

```sql

postgres=# explain analyze select * from t1 where a = 1 and b = 0;

QUERY PLAN

-----------------------------------------------------------------------------------------------------

Seq Scan on t1 (cost=0.00..21367.77 rows=1 width=8) (actual time=0.027..260.106 rows=1.00 loops=1)

Filter: ((a = 1) AND (b = 0))

Rows Removed by Filter: 1099999

Buffers: shared hit=4868

Planning Time: 0.111 ms

Execution Time: 260.136 ms

(6 rows)

postgres=# set enable_indexscan to off;

SET

postgres=# set enable_bitmapscan to off;

SET

postgres=# set max_parallel_workers_per_gather to 0;

SET

postgres=# explain analyze select * from t1, t2 where t1.a = t2.a and t1.b = 0;

QUERY PLAN

--------------------------------------------------------------------------------------------------------------------

Hash Join (cost=1.02..18986.82 rows=1 width=12) (actual time=0.042..280.086 rows=1.00 loops=1)

Hash Cond: (t1.a = t2.a)

Buffers: shared hit=4869

-> Seq Scan on t1 (cost=0.00..18617.81 rows=98045 width=8) (actual time=0.025..264.100 rows=100000.00 loops=1)

Filter: (b = 0)

Rows Removed by Filter: 1000000

Buffers: shared hit=4868

-> Hash (cost=1.01..1.01 rows=1 width=4) (actual time=0.010..0.012 rows=1.00 loops=1)

Buckets: 1024 Batches: 1 Memory Usage: 9kB

Buffers: shared hit=1

-> Seq Scan on t2 (cost=0.00..1.01 rows=1 width=4) (actual time=0.006..0.007 rows=1.00 loops=1)

Buffers: shared hit=1

Planning Time: 0.210 ms

Execution Time: 280.137 ms

(14 rows)

```

Now, what you are proposing would make the EXPLAIN output cosmetically
better, in that instead of

Nested Loop (cost=0.43..1.57 rows=32 width=12)
  -> Seq Scan on t2 (cost=0.00..1.01 rows=1 width=4)
  -> Index Only Scan using t1_pkey on t1 (cost=0.43..4.76 rows=32 width=8)
        Index Cond: ((a = t2.a) AND (b = 0))

we would get a rowcount estimate of "1" for the parameterized t1 scan.
But the estimate for the output of the nestloop would have to remain
the same. And the t1 scan is not where the problem is: it already
made the right choice there. To the extent that this EXPLAIN output
is problematic, it's because if this join is part of a bigger query
then we may make poor choices at upper join levels due to having a bad
estimate of this join's output size. I don't see how this line of
work can fix that. (Yes, I see the hack you put into
set_joinrel_size_estimates. It's a hack not a workable solution,
because it will distort the size estimates in many cases that are
not quite what you have here.)

For join cardinality estimation, we still follow the principle that the

rows should remain consistent regardless of which path is chosen. This is

also set in set_joinrel_size_estimates. The only adjustment is: if we know

that pushing down some join_clauses to lower-level nodes can yield a more

accurate row estimate (for example, in "t1.a = t2.a AND t1.b = 0" the

clause "t1.a = t2.a"), then we use the rows estimate after pushdown in the

lower node, and remove this join clause when computing join selectivity.

Of course, this is just a demo of an initial idea, and there are many

aspects that need improvement. Many thanks for your suggestions and

comments.

Best Regards,
Fei Changhong

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Optimize cardinality estimation when unique keys are fully covered

Optimize cardinality estimation when unique keys are fully covered

Вложения

Re: Optimize cardinality estimation when unique keys are fully covered

Re: Optimize cardinality estimation when unique keys are fully covered

Re: Optimize cardinality estimation when unique keys are fully covered

Re: Optimize cardinality estimation when unique keys are fully covered