Re: Performance of Query 60 on TPC-DS Benchmark
От | Andrei Lepikhov |
---|---|
Тема | Re: Performance of Query 60 on TPC-DS Benchmark |
Дата | |
Msg-id | 8efe2768-bc32-47dc-9a7c-ef20861eb150@gmail.com обсуждение исходный текст |
Ответ на | Performance of Query 60 on TPC-DS Benchmark (Ba Jinsheng <bajinsheng@u.nus.edu>) |
Список | pgsql-performance |
On 22/11/2024 18:12, Ba Jinsheng wrote: > I think the key difference is that the patch disables the usage of Hash > Join, which incurs a worse performance. Discovering your case a little more I found out the origins of the problem: Memoize+NestLoop was not chosen because top-query LIMIT node wasn't counted in estimation on lower levels of the query. At first, I found that join prediction is overestimated, that is unusual. Look at this: -> Merge Semi Join (cost=3611.16..3694.10 rows=2107 width=21) (actual time=28.195..30.243 rows=498 loops=2) Merge Cond: (item_2.i_item_id = item_3.i_item_id) -> Sort (cost=2051.70..2078.17 rows=10588 width=21) (actual time=14.113..14.625 rows=2416 loops=2) Sort Key: item_2.i_item_id Sort Method: quicksort Memory: 938kB Worker 0: Sort Method: quicksort Memory: 247kB -> Parallel Seq Scan on item item_2 (cost=0.00..1343.88 rows=10588 width=21) (actual time=0.029..5.954 rows=9000 loops=2) -> Sort (cost=1559.47..1563.93 rows=1786 width=17) (actual time=14.072..14.247 rows=950 loops=2) Sort Key: item_3.i_item_id Sort Method: quicksort Memory: 49kB Worker 0: Sort Method: quicksort Memory: 49kB -> Seq Scan on item item_3 (cost=0.00..1463.00 rows=1786 width=17) (actual time=0.018..12.638 rows=1786 loops=2) Filter: (i_category = 'Children'::bpchar) Rows Removed by Filter: 16214 Because of that the Memoize node wasn't chosen. Executing this specific part of the query: SET max_parallel_workers_per_gather = 1; SET parallel_setup_cost = 0.001; SET parallel_tuple_cost = 0.00005; SET min_parallel_table_scan_size = 0; EXPLAIN (ANALYZE) SELECT * FROM item i1 WHERE i_item_id IN (SELECT i_item_id FROM item i2 WHERE i2.i_category IN ('Children')); I found that prediction was correct: Merge Semi Join (cost=3611.16..3694.10 rows=2107 width=21) (actual time=19.878..26.321 rows=1931 loops=2) So, top-level nodes just didn't pull more tuples than possible because of LIMIT. If you remove LIMIT 100 from the query, you can see that your plan (NestLoop+Memoize) works 24s, much worse than the 3s Postgres (with HashJoin) created without your changes. In toto, this example demonstrates the problem of planning queries that need only fractional results. I may be wrong, but is this a problem of an Append node? -- regards, Andrei Lepikhov
В списке pgsql-performance по дате отправления: