Re: Performance Issue on Query 18 of TPC-H Benchmark
От | David Rowley |
---|---|
Тема | Re: Performance Issue on Query 18 of TPC-H Benchmark |
Дата | |
Msg-id | CAApHDvo7najDZTg4=KMDsBTwE0+9K8=AxAgLxQTT1_AOGkYP=A@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: Performance Issue on Query 18 of TPC-H Benchmark (Ba Jinsheng <bajinsheng@u.nus.edu>) |
Список | pgsql-bugs |
On Thu, 17 Oct 2024 at 07:26, Ba Jinsheng <bajinsheng@u.nus.edu> wrote: > > >I believe you are not allowing the optimizer to generate a different aggregation path (Group Aggregate) because it requiresa sort operation. So I think this is not correct. > > Yes, this is what I did. I though it is what you were asking? I have not found another way to enforce HashAggregate, soI directly modified the code. Can you eliberate why it is incorrect? Like I mentioned earlier, GroupAggregate needs its input to be guaranteed to be sorted by the group key. Above you've hacked the planner to produce: -> GroupAggregate (cost=0.00..208346.45 rows=126825 width=4) (actual time=334.397..1549.837 rows=9 loops=3) Group Key: lineitem_1.l_orderkey Filter: (sum(lineitem_1.l_quantity) > '314'::numeric) Rows Removed by Filter: 1500388 -> Seq Scan on lineitem lineitem_1 (cost=0.00..172626.23 rows=6002623 width=9) (actual time=0.051..438.226 rows=6001215 loops=3) This is simply not a valid plan. GroupAggreate relies on the rows arriving in Group Key order as it checks if the current row is in the same group as the previous row. When it's not, that group is classes as complete and the HAVING clause can be evaluated. It's possible the plan you've ended up when happens to produce the correct results before the lineitem table is already in l_orderkey order. Try updating one of the earlier rows in a way that puts the heap out of order and you'll likely notice the "Rows Removed by Filter" change. Or try without the HAVING clause and observe the number of groups changing. I strongly suggest you experiment further before proposing changes in this area. Also, this is not a valid discussion for the pgsql-bugs mailing list. This list is about reporting newly discovered bugs. It's not a place to discuss proposing new ones. David
В списке pgsql-bugs по дате отправления: