Summary of Sort Improvement Proposals

Поиск
Список
Период
Сортировка
От Benjamin Coutu
Тема Summary of Sort Improvement Proposals
Дата
Msg-id de7813aa148be9dcc341@zeyos.com
обсуждение исходный текст
Список pgsql-hackers
Hello,

In light of multiple threads [1-6] discussing sorting improvements, I'd like to consolidate the old (+some new) ideas
asa starting point.
 
It might make sense to brain storm on a few of these ideas and maybe even identify some that are worth implementing and
testing.

1. Simple algorithmic ideas:
    - Use single-assignment insertion-sort instead of swapping
    - Increase insertion-sort threshold to at least 8 (possibly 10+), to be determined empirically based on current
hardware
    - Make insertion-sort threshold customizable via template based on sort element size

2. More complex/speculative algorithmic ideas:
    - Try counting insertion-sort loop iterations and bail after a certain limit (include presorted check in
insertion-sortloop and continue presorted check from last position in separate loop after bailout)
 
    - Try binary search for presorted check (outside of insertion-sort-code)
    - Try binary insertion sort (if comparison costs are high)
    - Try partial insertion sort (include presorted check)
    - Try presorted check only at top-level, not on every recursive step, or if on every level than at least only for n
>some threshold
 
    - Try asymmetric quick-sort partitioning
    - Try dual pivot quick-sort
    - Try switching to heap-sort dependent on recursion depth (might allow ripping out median-of-median)

3. TupleSort ideas:
    - Use separate sort partition for NULL values to avoid null check on every comparison and to make nulls first/last
trivial
    - Pass down non-nullness info to avoid null check and/or null-partition creation (should ideally be determined by
planner)
    - Skip comparison of first sort key on subsequent full tuple tie-breaker comparison (unless abbreviated key)
    - Encode NULL directly in abbreviated key (only if no null-partitioning)

4. Planner ideas:
    - Use pg_stats.correlation to inform sort algorithm selection for sort keys that come from
sequential-scans/bitmap-heap-scans
    - Use n_distinct to inform sort algorithm selection (many tie-breaker comparisons necessary on multi-key sort)
    - Improve costing of sorts in planner considering tuple size, distribution and n_distinct

[1] https://www.postgresql.org/message-id/flat/ddc4e498740a8e411c59%40zeyos.com
[2] https://www.postgresql.org/message-id/flat/CAFBsxsHanJTsX9DNJppXJxwg3bU%2BYQ6pnmSfPM0uvYUaFdwZdQ%40mail.gmail.com
[3] https://www.postgresql.org/message-id/flat/CAApHDvoTTtoQYfp3d0kTPF6y1pjexgLwquzKmjzvjC9NCw4RGw%40mail.gmail.com
[4] https://www.postgresql.org/message-id/flat/CAEYLb_Xn4-6f1ofsf2qduf24dDCVHbQidt7JPpdL_RiT1zBJ6A%40mail.gmail.com
[5]
https://www.postgresql.org/message-id/flat/CAEYLb_W%2B%2BUhrcWprzG9TyBVF7Sn-c1s9oLbABvAvPGdeP2DFSQ%40mail.gmail.com
[6]
https://www.postgresql.org/message-id/flat/683635b8-381b-5b08-6069-d6a45de19a12%40enterprisedb.com#12683b7a6c566eb5b926369b5fd2d41f

-- 

Benjamin Coutu
http://www.zeyos.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: race condition in pg_class
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: Large files for relations