Re: Sort optimizations: Making in-memory sort cache-aware

Поиск
Список
Период
Сортировка
От Ankit Kumar Pandey
Тема Re: Sort optimizations: Making in-memory sort cache-aware
Дата
Msg-id 15cd9c50-eb5a-2fad-1d01-1c93b836f2c0@gmail.com
обсуждение исходный текст
Ответ на Re: Sort optimizations: Making in-memory sort cache-aware  (Ankit Kumar Pandey <itsankitkp@gmail.com>)
Список pgsql-hackers
Hi Andres,


I took a stab at naive version of this but been stuck for sometime now.

I have added logic to sort on first column at first pass,

realloc all tuples and do full sort at second pass, but I am not seeing

any benefit (it is actually regressing) at all.

Tried doing above both at bulk and at chunks of data.

 > You effectively end up with a bounded number of pre-sorted blocks, so 
the most
 > obvious thing to try is to build a heap of those blocks and 
effectively do a
 > heapsort between the presorted blocks.

I am not very clear about implementation for this. How can we do 
heapsort between

  the presorted blocks? Do we keep changing state->bound=i, i+n, i+2n 
something like

this and keep calling make_bounded_heap/sort_bounded_heap?

 > A related, but separate, improvement is to reduce / remove the memory
 > allocation overhead.

This is still pending from my side.


I have attached some benchmarking results with script and POC

patches (which includes GUC switch to enable optimization for ease of 
testing) for the same.

Tested on WORK_MEM=3 GB for 1 and 10 Million rows data.

Please let me know things which I can fix and re-attempt.


Thanks,

Ankit



Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tomas Vondra
Дата:
Сообщение: Re: Add LZ4 compression in pg_dump
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Add SHELL_EXIT_CODE to psql