Re: Using quicksort for every external sort run

Поиск
Список
Период
Сортировка
От Greg Stark
Тема Re: Using quicksort for every external sort run
Дата
Msg-id CAM-w4HNw0esnk-Pae2FvwpyEbVhJAZ4BKCxAzWXx3R9CTMyjqw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Using quicksort for every external sort run  (Jim Nasby <Jim.Nasby@BlueTreble.com>)
Ответы Re: Using quicksort for every external sort run  (Peter Geoghegan <pg@heroku.com>)
Список pgsql-hackers
On Mon, Feb 15, 2016 at 8:43 PM, Jim Nasby <Jim.Nasby@bluetreble.com> wrote:
> On 2/7/16 8:57 PM, Peter Geoghegan wrote:
>>
>> It seems less than ideal that DBAs have to be so
>> conservative in sizing work_mem.
>
>
> +10


I was thinking about this over the past couple weeks. I'm starting to
think the quicksort runs gives at least the beginnings of a way
forward on this front. Keeping in mind that we know how many tapes we
can buffer in memory and the number is likely to be relatively large
-- on the order of 100+ is typical, what if do something like the
following rough sketch:

Give users two knobs, a lower bound "sort in memory using quicksort"
memory size and an upper bound "absolutely never use more than this"
which they can set to a substantial fraction of physical memory. Then
when we overflow the lower bound we start generating runs, the first
one being of that length. Each run we generate we double (or increase
by 50% or something) until we hit the maximum. That means that the
first few runs may be shorter than necessary but we have enough tapes
available that that doesn't hurt much and we'll eventually get to a
large enough run size that we won't run out of tapes and can still do
a single final (on the fly) merge.

In fact what's really attractive about this idea is that it might give
us a reasonable spot to do some global system resource management.
Each time we want to increase the run size we check some shared memory
counter of how much memory is in use and refuse to increase if there's
too much in use (or if we're using too large a fraction of it or some
other heuristic). The key point is that since we don't need to decide
up front at the beginning of the sort and we don't need to track it
continuously there is neither too little nor too much contention on
this shared memory variable. Also the behaviour would be not too
chaotic if there's a user-tunable minimum and the other activity in
the system only controls how more memory it can steal from the global
pool on top of that.

-- 
greg



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tatsuo Ishii
Дата:
Сообщение: planstats.sgml
Следующее
От: "David G. Johnston"
Дата:
Сообщение: Re: planstats.sgml