Re: TPC-H Q20 from 1 hour to 19 hours!

Поиск
Список
Период
Сортировка
От Tomas Vondra
Тема Re: TPC-H Q20 from 1 hour to 19 hours!
Дата
Msg-id f91b4a44-f739-04bd-c4b6-f135bd643669@2ndquadrant.com
обсуждение исходный текст
Ответ на Re: TPC-H Q20 from 1 hour to 19 hours!  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Ответы Re: TPC-H Q20 from 1 hour to 19 hours!
Список pgsql-hackers

On 03/30/2017 12:14 AM, Tomas Vondra wrote:
>
> I've only ran the queries on 10GB data set, but that should be enough. 
> The plans are from current master - I'll rerun the script on an older 
> release later today.
> 

So, an plans from an older release (9.4) are attached. What seems to 
matter is the join between partsupp and part, which is estimated like 
this on 9.4:

    ->  Sort  (cost=172852.06..173741.94 rows=355951 width=12)
              (actual time=321.998..334.440 rows=86836 loops=1)
          Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
          Sort Method: quicksort  Memory: 7143kB
          ->  Nested Loop  (cost=0.43..140031.03 rows=355951 width=12)
                       (actual time=0.025..303.145 rows=86836 loops=1)

and like this on current master:

   ->  Sort  (cost=146617.86..146819.89 rows=80809 width=12)
             (actual time=562.513..575.599 rows=86836 loops=1)
          Sort Key: partsupp.ps_partkey, partsupp.ps_suppkey
          Sort Method: quicksort  Memory: 7143kB
          ->  Nested Loop  (cost=0.43..140031.03 rows=80809 width=12)
                      (actual time=0.054..536.003 rows=86836 loops=1)

which however seems clearly more accurate than 9.4. So perhaps there is 
some bug in computing the mergejoin estimate, but it's also possible 
correcting the estimate (lowering it) also has some impact.

But joining to the aggregated subquery also clearly plays some role, 
because without it the estimates are higher.

Another interesting observation is that only the foreign key between 
part/partsupp seems to matter - once it gets dropped, the estimates get 
back close to 9.4 values.

What is however strange is that changing max_parallel_workers_per_gather 
affects row estimates *above* the Gather node. That seems a bit, um, 
suspicious, no? See the parallel-estimates.log.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?
Следующее
От: Venkata B Nagothi
Дата:
Сообщение: Re: [BUGS] Bug in Physical Replication Slots (at least 9.5)?