Обсуждение: wrong rows estimation by hash join

Поиск
Список
Период
Сортировка

wrong rows estimation by hash join

От
"James Pang (chaolpan)"
Дата:

  How does hash join estimation rows ?   pg v14, it make wrong rows estimation then leave nest loop lef join that make poor sql plan.  A

 

                                       ->  Nested Loop Left Join  (cost=171112.69..475856.90 rows=1 width=521)

                                             ->  Nested Loop Left Join  (cost=171111.31..474489.54 rows=1 width=423)

                                                   ->  Hash Join  (cost=171110.76..474488.93 rows=1 width=257)                     <<< here , actually the rows is 98000 ,but optimizer returns

                                                         Hash Cond: (((ccsm.xxx_id)::text = (cc.xxx_id)::text) AND ((ccsm.xxx_key)::text = (cc.account_key)::text))       <<< ccsm.xx_id and ccsm.xx_key are part of primary key.

                                                         ->  Seq Scan on cs_xxxxx ccsm  (cost=0.00..254328.08 rows=4905008 width=201)

                                                         ->  Hash  (cost=167540.92..167540.92 rows=237989 width=115)

                                                               ->  Index Scan using cs_xxxx_test on cs_contract cc  (cost=0.43..167540.92 rows=237989 width=115)

                                                                     Index Cond: ((xx_to > CURRENT_DATE) AND ((status)::text = ANY ('{Active,Inactive,Pending}'::text[])))

                                                   ->  Index Scan using cs_xxx_pk on cs_site cs  (cost=0.56..0.61 rows=1 width=203)

                                                         Index Cond: ((xxx_key)::text = (ccsm.xxx_key)::text)

 

 

Thanks,

 

James

Re: wrong rows estimation by hash join

От
Tomas Vondra
Дата:
Hi,

On 6/9/23 10:36, James Pang (chaolpan) wrote:
>   How does hash join estimation rows ?   pg v14, it make wrong rows
> estimation then leave nest loop lef join that make poor sql plan.  A
> 

I doubt this is specific to hashjoins, we estimate cardinality the same
way for all joins (or more precisely, we estimate it before picking the
particular join method).

I'm just guessing, but I'd bet the join condition is correlated with the
filter on cs_contract:

> ->  Index
> Scan using cs_xxxx_test on cs_contract cc  (cost=0.43..167540.92
> rows=237989 width=115)
> Index Cond: ((xx_to > CURRENT_DATE) AND ((status)::text = ANY
> ('{Active,Inactive,Pending}'::text[])))
> 

If you remove that condition, does the estimate improve?


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company