Sort operation displays more tuples than it contains its subnode

Поиск

Список

Период

Сортировка

От	a.rybakina
Тема	Sort operation displays more tuples than it contains its subnode
Дата	22 мая 2024 г. 20:31:21
Msg-id	9f4a159b-f527-465f-b82e-38b4b7df812f@postgrespro.ru обсуждение исходный текст
Ответы	Re: Sort operation displays more tuples than it contains its subnode Re: Sort operation displays more tuples than it contains its subnode
Список	pgsql-hackers

Дерево обсуждения

Hi!

I faced the issue, when the sorting node in the actual information shows a larger number of tuples than it actually is. And I can not understand why?

I attached the dump file with my database and run this query that consists underestimation and it works fine.

Unfortunately, I could not find the approach how I can improve statistic information here, so I did it in the code.

I needed to better cardinality in IndexScan and MergeJoin nodes. I highlighted them in bold.

postgres=# set enable_hashjoin =off;
SET
postgres=# set enable_nestloop =off;
SET

explain analyze select cname, avg(degree) from course, student,score join broke_down_course on
(score.cno=broke_down_course.cno and score.sno=broke_down_course.sno) where score.sno = student.sno group by (cname);

                                                                           QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=10000001523.70..10000001588.95 rows=10 width=250) (actual time=262.903..322.973 rows=10 loops=1)
   Group Key: course.cname
   -> Sort (cost=10000001523.70..10000001545.41 rows=8684 width=222) (actual time=256.221..283.710 rows=77540 loops=1)
         Sort Key: course.cname
         Sort Method: external merge Disk: 4656kB
         -> Nested Loop (cost=10000000614.18..10000000955.58 rows=8684 width=222) (actual time=7.518..160.518 rows=77540 loops=1)
               -> Merge Join (cost=614.18..845.96 rows=868 width=4) (actual time=7.497..126.632 rows=7754 loops=1)
                     Merge Cond: ((score.sno = broke_down_course.sno) AND (score.cno = broke_down_course.cno))
                     -> Merge Join (cost=0.70..1297.78 rows=29155 width=16) (actual time=0.099..99.329 rows=29998 loops=1)
                           Merge Cond: (score.sno = student.sno)
                           -> Index Scan using score_idx1 on score (cost=0.42..10125.41 rows=29998 width=12) (actual time=0.045..74.427 rows=29998 loops=1)
                           -> Index Only Scan using student_pkey on student (cost=0.28..89.28 rows=3000 width=4) (actual time=0.045..2.170 rows=3000 loops=1)
                                 Heap Fetches: 0
                     -> Sort (cost=613.48..632.86 rows=7754 width=8) (actual time=7.378..9.626 rows=7754 loops=1)
                           Sort Key: broke_down_course.sno, broke_down_course.cno
                           Sort Method: quicksort Memory: 374kB
                           -> Seq Scan on broke_down_course (cost=0.00..112.54 rows=7754 width=8) (actual time=0.028..1.428 rows=7754 loops=1)
               -> Materialize (cost=0.00..1.15 rows=10 width=218) (actual time=0.000..0.001 rows=10 loops=7754)
                     -> Seq Scan on course (cost=0.00..1.10 rows=10 width=218) (actual time=0.012..0.017 rows=10 loops=1)
Planning Time: 124.591 ms
Execution Time: 326.547 ms

When I run this query again I see that the Sort node shows more actual data than it was in SeqScan (I highlighted it).

postgres=# explain analyze select cname, avg(degree) from course, student,score join broke_down_course on
(score.cno=broke_down_course.cno and score.sno=broke_down_course.sno) where score.sno = student.sno group by (cname);
                                                                              QUERY PLAN
----------------------------------------------------------------------------------------------------------------------------------------------------------------------
GroupAggregate (cost=10000001746.28..10000001811.53 rows=10 width=250) (actual time=553.428..615.028 rows=10 loops=1)
   Group Key: course.cname
   -> Sort (cost=10000001746.28..10000001767.99 rows=8684 width=222) (actual time=546.531..574.223 rows=77540 loops=1)
         Sort Key: course.cname
         Sort Method: external merge Disk: 4656kB
         -> Merge Join (cost=10000000614.18..10000001178.16 rows=8684 width=222) (actual time=7.892..448.889 rows=77540 loops=1)
               Merge Cond: ((score.sno = broke_down_course.sno) AND (score.cno = broke_down_course.cno))
               -> Merge Join (cost=10000000000.70..10000002146.57 rows=291550 width=234) (actual time=0.137..318.345 rows=299971 loops=1)
                     Merge Cond: (score.sno = student.sno)
                     -> Index Scan using score_idx1 on score (cost=0.42..10125.41 rows=29998 width=12) (actual time=0.046..76.505 rows=29998 loops=1)
                     -> Materialize (cost=10000000000.28..10000000540.41 rows=30000 width=222) (actual time=0.082..76.345 rows=299964 loops=1)
                           -> Nested Loop (cost=10000000000.28..10000000465.41 rows=30000 width=222) (actual time=0.077..16.543 rows=30000 loops=1)
                                 -> Index Only Scan using student_pkey on student (cost=0.28..89.28 rows=3000 width=4) (actual time=0.045..2.774 rows=3000 loops=1)
                                       Heap Fetches: 0
                                 -> Materialize (cost=0.00..1.15 rows=10 width=218) (actual time=0.000..0.002 rows=10 loops=3000)
                                       -> Seq Scan on course (cost=0.00..1.10 rows=10 width=218) (actual time=0.023..0.038 rows=10 loops=1)
               -> Sort (cost=613.48..632.86 rows=7754 width=8) (actual time=7.612..21.214 rows=77531 loops=1)
                     Sort Key: broke_down_course.sno, broke_down_course.cno
                     Sort Method: quicksort Memory: 374kB
                     -> Seq Scan on broke_down_course (cost=0.00..112.54 rows=7754 width=8) (actual time=0.016..1.366 rows=7754 loops=1)
Planning Time: 96.685 ms
Execution Time: 618.538 ms
(22 rows)

Maybe, my reproduction looks questionable, sorry for that, but I seriously don't understand why we have so many tuples here in Sort node.

Вложения

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Sort operation displays more tuples than it contains its subnode

Вложения