Re: Performance Issue on Query 18 of TPC-H Benchmark
От | Andrei Lepikhov |
---|---|
Тема | Re: Performance Issue on Query 18 of TPC-H Benchmark |
Дата | |
Msg-id | 98398f84-6079-47f9-958c-c7e323228cc7@gmail.com обсуждение исходный текст |
Ответ на | Re: Performance Issue on Query 18 of TPC-H Benchmark (Ba Jinsheng <bajinsheng@u.nus.edu>) |
Ответы |
Re: Performance Issue on Query 18 of TPC-H Benchmark
|
Список | pgsql-bugs |
On 10/17/24 01:26, Ba Jinsheng wrote: > >I believe you are not allowing the optimizer to generate a different > aggregation path (Group Aggregate) because it requires a sort operation. > So I think this is not correct. > > Yes, this is what I did. I though it is what you were asking? I have not > found another way to enforce HashAggregate, so I directly modified the > code. Can you eliberate why it is incorrect? As I see, the main problem lies in the first aggregate (with HAVING clause) where hash aggregation seems preferable. To disable that, you can use some trick: SET enable_hashagg = 'on'; SET enable_sort = 'off'; SET work_mem = '1GB'; In my case the optimiser have built aggregation: -> HashAggregate (cost=202639.35..208346.45 rows=126825 width=4) (actual time=3540.761..4224.547 rows=9 loops=3) Group Key: lineitem_1.l_orderkey Filter: (sum(lineitem_1.l_quantity) > '314'::numeric) Batches: 1 Memory Usage: 638993kB Rows Removed by Filter: 1499991 Worker 0: Batches: 1 Memory Usage: 638993kB Worker 1: Batches: 1 Memory Usage: 638993kB -> Seq Scan on lineitem lineitem_1 (cost=0.00..172626.23 rows=6002623 width=9) (actual time=0.014..675.552 rows=6001215 loops=3) Not sure it is the best plan possible in this case. I know only about re-optimisation feature which can provide correct number of groups estimation to aggregates as well as cardinality and work_mem and give the optimiser all correct estimations. But it is still an enterprise feature :(. > Can I understand disabling parallelism is a good setup for finding > performance issues? I usually use such such a switch to identify problems. Parallel workers use HashJoin more frequently and it sometimes cause non-linear behaviour of execution time, compared to sequental plan. At the same time, they donn't support parameterised paths and it may end up in interesting performance jumps ... BTW, I also wonder why do you report to pgsql-bugs in presence of better fitted pgsql-performance thread? -- regards, Andrei Lepikhov
В списке pgsql-bugs по дате отправления: