Highly Efficient Custom Sorting

Поиск
Список
Период
Сортировка
От Eliot Gable
Тема Highly Efficient Custom Sorting
Дата
Msg-id AANLkTikrA8M3cncIHkFqsJ78TzCnQdriE0GjvK8XjB9E@mail.gmail.com
обсуждение исходный текст
Ответы Re: Highly Efficient Custom Sorting
Re: Highly Efficient Custom Sorting
Список pgsql-performance
I have a long stored procedure (over 3,000 lines). Originally, it would take about 44ms to run the whole query. After lots and lots of tweaking, Postgres now runs the entire thing and gathers my results in just 15.2ms, which is very impressive given the hardware this is running on. Now, I used to return the results unsorted to my C++ backend and then sort them there using my custom sort order which provides prioritized, weighted random ordering with 4 different priority fields and 3 different weighting fields within 3 of those 4 priority fields. Needless to say, the sorting is quite complex. I wanted to cut down on the amount of data being delivered to my C++ backend, so I am using the stored procedure to insert a summary of my results directly into the database, which is far more efficient than dumping it all to the C++ backend (including stuff that isn't really needed there) and then dumping it all back to Postgres via INSERTS later in the execution path. The problem is that I want the results sorted in this custom order before they are stored in the database. (By sorted, I mean I want to include a field that lists a numerical order for the set of results.) Thus, I used to dump everything to the C++ program, perform the sorting, then INSERT back to Postgres. This was obviously not all that efficient. Now, the sorting in C++ took <1ms to accomplish. When I re-wrote the sorting in pl/pgsql using a couple of additional stored procedures, I discovered it is taking 15.2ms to perform the sort of the records within Postgres. This almost cancels out all of the prior optimizations I previously performed:

T:20100702001841+0903010 TID:0x43945940 INFO:NOTICE:  Sorting group ...
<snip>
...
</snip>
T:20100702001841+0917683 TID:0x43945940 INFO:NOTICE:  Sorting 1 rows in priority 1... <-- Last sort item
T:20100702001841+0918292 TID:0x43945940 INFO:NOTICE:    

918,292 - 903,010 = 15,282 us = 15.282 ms

So, the bottom line is, I need a faster way to do this sorting. 

What options are at my disposal for improving the speed and efficiency of this sorting? Which is the easiest to implement? What are the drawbacks of each different method?


Thanks in advance for your insights.


--
Eliot Gable

"We do not inherit the Earth from our ancestors: we borrow it from our children." ~David Brower

"I decided the words were too conservative for me. We're not borrowing from our children, we're stealing from them--and it's not even considered to be a crime." ~David Brower

"Esse oportet ut vivas, non vivere ut edas." (Thou shouldst eat to live; not live to eat.) ~Marcus Tullius Cicero

В списке pgsql-performance по дате отправления:

Предыдущее
От: Michal Fapso
Дата:
Сообщение: big data - slow select (speech search)
Следующее
От: Bruce Momjian
Дата:
Сообщение: Re: No hash join across partitioned tables?