External Sort performance patch
От | Simon Riggs |
---|---|
Тема | External Sort performance patch |
Дата | |
Msg-id | 1138235607.3363.48.camel@localhost.localdomain обсуждение исходный текст |
Ответы |
Re: External Sort performance patch
Re: External Sort performance patch |
Список | pgsql-patches |
The enclosed patch substantially improves large sort performance, in the general case cvstip: elapsed 5693 sec, CPU 196 sec patch: elapsed 4132 sec, CPU 90 sec The patch implements dynamically increasing number of logical tapes when sufficient memory is available to make that efficient. cvstip runs with a static number of tapes (7) whereas the patch was able to allocate 104 tapes to the task. This has the effect of almost completely removing intermediate merge runs and hence the increased performance. From Jeffrey W. Baker's original idea http://archives.postgresql.org/pgsql-performance/2005-09/msg00430.php and followup comments http://archives.postgresql.org/pgsql-hackers/2005-10/msg00015.php It is expected this will substantially improve performance for large ORDER BY, GROUP BY and CREATE INDEX statements. The guesstimated default setting of the OPTIMAL_MERGE_BUFFER_SIZE of 262144 means that the default setting of work_mem will still use only 7 tapes, though setting work_mem > 2MB will yield improvements. Further testing and/or patch comments are requested. All changes are isolated to src/backend/utils/sort/tuplesort.c Patch applies cleanly and passes make check on cvstip (though this code path is not tested in the regression tests anyway). Test details: Run the following sort on my laptop (512MB RAM) postgres=# set work_mem=65536; SET Time: 0.801 ms postgres=# select * from d order by 1,2 limit 1; col1 | col2 ------+------------------------- 1 | eeeeeeseeeeeeeeeeeeeeee (1 row) Time: 4133122.769 ms postgres=# \d d Table "public.d" Column | Type | Modifiers --------+---------+----------- col1 | integer | col2 | text | postgres=# select count(*) from d; count ----------- 100000000 (1 row) Time: 248283.128 ms postgres=# select pg_relation_size('d'); pg_relation_size ------------------ 6450397184 (1 row) Time: 98.629 ms postgres=# trace_sort was enabled for both runs and these are attached as files to this mail. Test data was anti-sorted, but the ordering of data is not relevant to the algorithm anyway, except when the data is already almost perfectly sorted, in which case there is typically only one run anyway. Best Regards, Simon Riggs
Вложения
В списке pgsql-patches по дате отправления: