Sort Refinement

Поиск

Список

Период

Сортировка

От	Simon Riggs
Тема	Sort Refinement
Дата	20 марта 2008 г. 18:20:13
Msg-id	1206033442.4285.586.camel@ebony.site обсуждение исходный текст
Ответы	Re: Sort Refinement Re: Sort Refinement Re: Sort Refinement
Список	pgsql-hackers

Дерево обсуждения

Currently, our sort algorithm assumes that its input is unsorted. So if
your data is sorted on (a) and you would like it to be sorted on (a,b)
then we need to perform the full sort of (a,b).

For small sorts this doesn't matter much. For larger sorts the heap sort
algorithm will typically result in just a single run being written to
disk which must then be read back in. Number of I/Os required is twice
the total volume of data to be sorted.

If we assume we use heap sort, then if we *know* that the data is
presorted on (a) then we should be able to emit tuples directly that the
value of (a) changes and keep emitting them until the heap is empty,
since they will exit the heap in (a,b) order.

The normal pattern of a sort node is to read all of its input and then
begin emitting rows, so the sort is completed before the first row is
returned (OK, lets gloss over the final merge bit for now). So what I'm
suggesting is a sort node that will switch back and forth between
consuming and emitting rows. In some cases it *may* still need to scroll
to disk, but we already have that code.

In this way we can completely avoid writing data to disk for some types
of large sort, even when we have a data set much larger than available
memory. Of course it only applies when we are refining an existing
well-known sort order.

This is a similar concept to the way we handled dynamic buffering of
data for the merge join in 8.3, so I expect it to have an equally warm
reception ;-)

--  Simon Riggs 2ndQuadrant  http://www.2ndQuadrant.com 
 PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Sort Refinement