Re: Using quicksort for every external sort run

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: Using quicksort for every external sort run
Дата
Msg-id CAM-w4HM4XW3u5kVEuUrr+L+KX3WZ=5JKk0A=DJjzypkB-Hyu4w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Ответы Re: Using quicksort for every external sort run  (Greg Stark <stark@mit.edu>)
Список pgsql-hackers

​So incidentally I've been running some benchmarks myself. Mostly to understand the current scaling behaviour of sorting to better judge whether Peter's analysis of where the pain points are and why we should not worry about optimizing for the multiple merge pass case were on target. I haven't actually benchmarked his patch at all, just stock head so far.

The really surprising result (for me) so far is that apparently merge passes spent actually very little time doing I/O. I had always assumed most of the time was spent waiting on I/O and that's why we spend so much effort ensuring sequential I/O and trying to maximize run lengths. I was expecting to see a huge step increase in the total time whenever there was an increase in merge passes. However I see hardly any increase, sometimes even a decrease despite the extra pass. The time generally increases as work_mem decreases but the slope is pretty moderate and gradual with no big steps due to extra passes.

On further analysis I'm less surprised by this than previously. The larger benchmarks I'm running are on a 7GB table which only actually generates 2.6GB of sort data so even writing all that out and then reading it all back in on a 100MB/s disk would only take an extra 50s. That won't make a big dent when the whole sort takes about 30 minutes. Even if you assume there's a substantial amount of random I/O it'll only be a 10% difference or so which is more or less in line with what I'm seeing.

I haven't actually got to benchmarking Peter's patch at all but this is reinforcing his argument dramatically. If the worst case for using quicksort is that the shorter runs might push us into doing an extra merge and that might add an extra 10% to the run-time that will be easily counter-balanced by the faster quicksort and in any case it only affects people who for some reason can't just increase work_mem to allow the single merge mode. 


Table SizeSort Size128MB64MB32MB16MB8MB4MB
6914MB2672 MB3392.293102.133343.534081.234727.745620.77
3457MB1336 MB1669.161593.851444.221654.272076.742266.84
2765MB1069 MB1368.921250.441117.21293.451431.641772.18
1383MB535 MB716.48625.06557.14575.67644.2721.68
691MB267 MB301.08295.87266.84256.29283.82292.24
346MB134 MB145.48149.48133.23130.69127.67137.74
35MB13 MB3.5816.7711.2311.9313.973.17

The colours are to give an idea of the number of merge passes. Grey, is an internal sort. White is a single merge. Yellow and red are successively more merges (though the exact boundary between yellow and red may not be exactly meaningful due to my misunderstanding polyphase merge).

The numbers here are seconds taken from the "elapsed" in the following log statements when running queries like the following with trace_sort enabled:
LOG:  external sort ended, 342138 disk blocks used: CPU 276.04s/3173.04u sec elapsed 5620.77 sec
STATEMENT:  select count(*) from (select * from n200000000 order by r offset 99999999999) AS x;

This was run on the smallest size VM on Google Compute Engine with 600MB of virtual RAM and a 100GB virtual network block device.

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: HINTing on UPDATE foo SET foo.bar = ..;
Следующее
От: Fujii Masao
Дата:
Сообщение: Re: [DOCS] max_worker_processes on the standby