Re: MergeAppend could consider sorting cheapest child path
От | Andrei Lepikhov |
---|---|
Тема | Re: MergeAppend could consider sorting cheapest child path |
Дата | |
Msg-id | 4ed36dd0-73f5-4c62-a4b8-3c655d256766@gmail.com обсуждение исходный текст |
Ответ на | Re: MergeAppend could consider sorting cheapest child path (Alexander Korotkov <aekorotkov@gmail.com>) |
Список | pgsql-hackers |
On 27/7/2025 00:51, Alexander Korotkov wrote: > On Tue, Jul 22, 2025 at 2:13 PM Andrei Lepikhov <lepihov@gmail.com > I've another idea. cost_tuplesort() puts 2.0 under logarithm to prefer > tuplesort over heapsort. I think we can adjust cost_gather_merge() and > cost_merge_append() to do the same. 0001 patch implements that. I > think the plan changes of 0001 might be reasonable since most cases deal > with small rowsets. One thing concerns me: 0002 still affects one of > the postgres_fdw checks. Could you, please, take a look? Thanks for the idea! I analysed your approach a little bit. Initially, I ran the test script I had created previously [1] and discovered that on a large scale (1e6 - 1e7 tuples), the plan still defaults to MergeAppend, which deviates from the execution time (7190 ms for Sort+Append and 8450 ms for MergeAppend+Sort). Attempting to find out the reason, I combined all the costs into a single formula for each strategy: MergeAppend+Sort: total_cost =CO*ntuples*(1+2*log(ntuples)) + Ccput * 0.5 * ntuples+ 2*CO*N*log(N) + A Sort+Append: total_cost = CO*ntuples*(1+2*log(ntuples))+ Ccput * 0.5 * ntuples + A Terms: - A - sum of total costs of underlying subtrees - CO - cpu_operator_cost - Ccput - cpu_tuple_cost - N - number of subpaths (streams) Given the significant gap in total execution time between these strategies, I believe it would be reasonable to introduce a coefficient to the equation's 'ntuples' variable component that will keep the gap between big quicksort and MergeAppend's heapsort out of the fuzzy factor gap. Discovering papers on the value of constant in quicksort [2] and heapsort [3], I realised that there is a difference. The constant's value varies in a wide range: 1.3-1.5 for quicksort and 2-3 for heapsort. Considering that we should change the current cost model as little as possible, not to break the balance, we may just increase the constant value for the heap sort to maintain a bare minimum gap between strategies out of the fuzzy factor. In this case, the merge append constant should be around 3.8 - 4.0. With this minor change, we see a shift in the regression tests. Most of these changes were introduced by the new append strategy. Although I haven't analysed these changes in depth yet, I believe they are all related to the small data sets and should fade out on a larger scale. See this minor correction in the attachment. postgres_fdw tests are stable now. [1] https://github.com/danolivo/conf/blob/main/Scripts/sort-vs-mergeappend-3.sql [2] https://en.wikipedia.org/wiki/Quicksort?utm_source=chatgpt.com [2] https://arxiv.org/abs/1504.01459 -- regards, Andrei Lepikhov
Вложения
В списке pgsql-hackers по дате отправления: