Re: MergeAppend could consider sorting cheapest child path
От | Andrei Lepikhov |
---|---|
Тема | Re: MergeAppend could consider sorting cheapest child path |
Дата | |
Msg-id | ef50ac7b-d04b-462d-930a-114ae77ce7b2@gmail.com обсуждение исходный текст |
Ответ на | Re: MergeAppend could consider sorting cheapest child path (Alexander Pyhalov <a.pyhalov@postgrespro.ru>) |
Ответы |
Re: MergeAppend could consider sorting cheapest child path
|
Список | pgsql-hackers |
On 3/28/25 09:19, Alexander Pyhalov wrote: > Andy Fan писал(а) 2024-10-17 03:33: > I've updated patch. One more interesting case which we found - when > fractional path is selected, it still can be more expensive than sorted > cheapest total path (as we look only on indexes whith necessary > pathkeys, not on indexes which allow efficiently fetch data). > So far couldn't find artificial example, but we've seen inadequate index > selection due to this issue - instead of using index suited for filters > in where, index, suitable for sorting was selected as one having the > cheapest fractional cost. I think it is necessary to generalise the approach a little. Each MergeAppend subpath candidate that fits pathkeys should be compared to the overall-optimal path + explicit Sort node. Let's label this two-node composition as base_path. There are three cases exist: startup-optimal, total-optimal and fractional-optimal. In the startup case, base_path should use cheapest_startup_path, the total-optimal case - cheapest_total_path and for a fractional case, we may employ the get_cheapest_fractional_path routine to detect proper base_path. It may provide a more effective plan either in full, fractional and partial scan cases: 1. The Sort node may be pushed down to subpaths under a parallel or async Append. 2. When a minor set of subpaths doesn't have a proper index, and it is profitable to sort them instead of switching to plain Append. In general, analysing the regression tests changed by this code, I see that the cost model prefers a series of small sortings instead of a single massive one. May be it will show some profit for execution time. I am not afraid of any palpable planning overhead here because we just do cheap cost estimation and comparison operations that don't need additional memory allocations. The caller is responsible for building a proper Sort node if this method is chosen as optimal. In the attachment, see the patch written according to the idea. There are I introduced two new routines: get_cheapest_path_for_pathkeys_ext get_cheapest_fractional_path_for_pathkeys_ext I have designed the code that way to reduce duplicated code in the generate_orderedappend_paths routine. But the main point is that I envision these new routines may be reused elsewhere. This feature looks сlose to the one we discussed before [1]. So, let me CC the people from the previous conversation to the discussion. [1] https://www.postgresql.org/message-id/flat/CAN-LCVPxnWB39CUBTgOQ9O7Dd8DrA_tpT1EY3LNVnUuvAX1NjA%40mail.gmail.com -- regards, Andrei Lepikhov
Вложения
В списке pgsql-hackers по дате отправления: