Re: Randomisation for ensuring nlogn complexity in quicksort

Поиск
Список
Период
Сортировка
От Robert Haas
Тема Re: Randomisation for ensuring nlogn complexity in quicksort
Дата
Msg-id CA+Tgmoa+4jZf28DQvzqVMWNA5+xeWgcqMcw+5LcDkXwdANWJ=g@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Randomisation for ensuring nlogn complexity in quicksort  (Claudio Freire <klaussfreire@gmail.com>)
Ответы Re: Randomisation for ensuring nlogn complexity in quicksort  (Claudio Freire <klaussfreire@gmail.com>)
Список pgsql-hackers
On Mon, Jul 1, 2013 at 3:54 PM, Claudio Freire <klaussfreire@gmail.com> wrote:
> On Mon, Jul 1, 2013 at 4:32 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>> This shouldn't be too complex, and should give us a fixed nlogn complexity even for wild data sets, without
affectingexisting normal data sets that are present in every day transactions. I even believe that those data sets will
alsobenefit from the above optimisation. 
>>
>> The only method of selecting a pivot for quicksort that obtain O(n lg
>> n) run time with 100% certainty is have a magical oracle inside the
>> computer that tells you in fixed time and with perfect accuracy which
>> pivot you should select.
>
> Doesn't a linear median algorithm (say median of medians) get you O(n lg n)?
>
> Granted, with a huge constant (I think 4)... but it should still be O(n lg n).

No.   Thinking about this a little more, I believe the way it works
out is that any algorithm for picking the median that guarantees that
a certain *percentage* of the tuples will be in the smaller partition
will have O(n lg n) complexity, but any algorithm that only guarantees
that a fixed *number* of tuples in the smaller partition is still
quadratic in complexity.  In the case of a median algorithm, you're
only guaranteed to have 2 elements in the smaller partition, which is
a constant.  If you take a median of medians, you're guaranteed to
have 8 elements in the smaller partition, which is bigger, but still a
constant.

The reason why this doesn't matter much in practice is because the
data distribution that causes quadratic behavior for median-of-medians
is not one which is likely to occur in the real world and will
probably only come up if chosen by an adversary who is attempting to
make your life miserable.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Joshua D. Drake"
Дата:
Сообщение: Re: Documentation/help for materialized and recursive views
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: "pg_ctl promote" exit status