Обсуждение: Different row estimations on base rels

Поиск
Список
Период
Сортировка

Different row estimations on base rels

От
Donald Dong
Дата:
Hi,

I noticed the estimated rows of the base relations during the join
searching is *very* different from the estimations in the final plan.

Join search (rows of the initial_rels):
RELOPTINFO (ct): rows=1 width=4
RELOPTINFO (it): rows=1 width=4
RELOPTINFO (mc): rows=17567 width=32
RELOPTINFO (mi_idx): rows=1380035 width=8
RELOPTINFO (t): rows=2528356 width=25

The final plan:
Seq Scan on company_type ct
  (cost=0.00..1.05 rows=1 width=4)
Seq Scan on info_type it
  (cost=0.00..2.41 rows=1 width=4)
Parallel Seq Scan on movie_companies mc
  (cost=0.00..37814.90 rows=7320 width=32)
Parallel Seq Scan on movie_info_idx mi_idx
  (cost=0.00..13685.15 rows=575015 width=8)
Index Scan using title_pkey on title t
  (cost=0.43..0.58 rows=1 width=25)

By looking at the joinrel->rows, I would expect relation t to have
the largest size, however, this is not true at all. I wonder what's
causing this observation, and how to get estimations close to the
final plan?

Thank you,
Donald Dong



Re: Different row estimations on base rels

От
Robert Haas
Дата:
On Sun, May 26, 2019 at 1:00 PM Donald Dong <xdong@csumb.edu> wrote:
> I noticed the estimated rows of the base relations during the join
> searching is *very* different from the estimations in the final plan.
>
> Join search (rows of the initial_rels):
> RELOPTINFO (ct): rows=1 width=4
> RELOPTINFO (it): rows=1 width=4
> RELOPTINFO (mc): rows=17567 width=32
> RELOPTINFO (mi_idx): rows=1380035 width=8
> RELOPTINFO (t): rows=2528356 width=25
>
> The final plan:
> Seq Scan on company_type ct
>   (cost=0.00..1.05 rows=1 width=4)
> Seq Scan on info_type it
>   (cost=0.00..2.41 rows=1 width=4)
> Parallel Seq Scan on movie_companies mc
>   (cost=0.00..37814.90 rows=7320 width=32)
> Parallel Seq Scan on movie_info_idx mi_idx
>   (cost=0.00..13685.15 rows=575015 width=8)
> Index Scan using title_pkey on title t
>   (cost=0.43..0.58 rows=1 width=25)
>
> By looking at the joinrel->rows, I would expect relation t to have
> the largest size, however, this is not true at all. I wonder what's
> causing this observation, and how to get estimations close to the
> final plan?

Well, it's all there in the code.  I believe the issue is that the
final estimates are based on the number of rows that will be returned
from the relation, which is often less, and occasionally more, than
the total of the rows in the relation.  The reason it's often less is
because there might be a WHERE clause or similar which rules out some
of the rows.  The reason it might be more is because a nested loop
could return the same rows multiple times.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Different row estimations on base rels

От
Donald Dong
Дата:
On May 29, 2019, at 1:36 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Well, it's all there in the code.  I believe the issue is that the
> final estimates are based on the number of rows that will be returned
> from the relation, which is often less, and occasionally more, than
> the total of the rows in the relation.  The reason it's often less is
> because there might be a WHERE clause or similar which rules out some
> of the rows.  The reason it might be more is because a nested loop
> could return the same rows multiple times.

Yes, indeed. I was confused, and I guess I could've thought about it
about more before posting here. Thank you for answering this
question!

Regards,
Donald Dong