GSOC 2018 Project - A New Sorting Routine

Поиск
Список
Период
Сортировка
От Kefan Yang
Тема GSOC 2018 Project - A New Sorting Routine
Дата
Msg-id CAF448YLZz5cD+w5qYQGMcfFPY9zAu2mkkiCtOPSBJmwbjuPNig@mail.gmail.com
обсуждение исходный текст
Ответы Re: GSOC 2018 Project - A New Sorting Routine  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-hackers

Hello, Hackers!

 

I am working on my project in Google Summer of Code 2018. In this project, I am trying to improve the in-memory sorting routine in PostgreSQL. Now I am very excited to share my progress with you guys.

 

Originally, PostgreSQL is using the QuickSort implemented by J. L. Bentley and M. D. McIlroy in "Engineering a sort function" with some modifications. This sorting routine is very fast, yet may fall to O(n^2) time complexity in the worst case scenario. We are trying to find faster sorting algorithms with guaranteed O(nlogn) time complexity.

 

In this patch, I

  1. Use IntroSort to implement pg_qsort. IntroSort is a hybrid sorting algorithm. It uses Quicksort most of the time, but switch to insertion sort when the array is small and heapsort when the recursion exceeds depth limit.
  2. Only check if the array is preordered once on the whole array to get better overall performance. Previously the sorting routine checks if the array is preordered on every recursion.

 

After some performance test, I find the new sorting routine

  1. Slightly faster on sorting random arrays.
  2. Much faster on worst case scenario since it has O(nlogn) worst case complexity.
  3. Has nearly the same performance on mostly sorted arrays.

 

I use both standalone tests and pgbench to show the result. A more detailed report is in the attachment, along with the patch and some scripts to reproduce the result.

 

Notice: this patch may fail a test case in ‘make check’ because QuickSort and HeapSort are unstable. The two sorting routines may give different results if multiple entries in the array have the same key value.

 

Generally speaking, I think the new sorting routine seems to be an improvement in many ways. Please let me know if you have any thoughts or suggestions.

 

Regards,

Kefan Yang

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: cost_sort() improvements
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Optimze usage of immutable functions as relation