Re: Randomisation for ensuring nlogn complexity in quicksort

Поиск
Список
Период
Сортировка
От Atri Sharma
Тема Re: Randomisation for ensuring nlogn complexity in quicksort
Дата
Msg-id CAOeZVie4yV2529e1ByW5=06By1V-00x2vzTSRxtHAP3Knpq94A@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Randomisation for ensuring nlogn complexity in quicksort  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: Randomisation for ensuring nlogn complexity in quicksort  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Tue, Jul 2, 2013 at 1:02 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Sun, Jun 30, 2013 at 8:30 AM, Atri Sharma <atri.jiit@gmail.com> wrote:
>> I have been reading the recent discussion and was researching a bit, and I think that we should really go with the
ideaof randomising the input data(if it is not completely presorted), to ensure that we do not get quadratic
complexity.
>
> That doesn't ensure any such thing.  It just makes it less likely.
> But we have existing guards that also make that unlikely, so I'm not
> sure what we'd be gaining.
>
>> One easy way to do that could be to take a sample of the data set, and take a pivot out of it. Still a better way
couldbe to take multiple samples which are spread of the data set, select a value from each of them, and then take a
cumulativepivot(median,maybe). 
>
> We pretty much do that already.
>
>> This shouldn't be too complex, and should give us a fixed nlogn complexity even for wild data sets, without
affectingexisting normal data sets that are present in every day transactions. I even believe that those data sets will
alsobenefit from the above optimisation. 
>
> The only method of selecting a pivot for quicksort that obtain O(n lg
> n) run time with 100% certainty is have a magical oracle inside the
> computer that tells you in fixed time and with perfect accuracy which
> pivot you should select.
>
> If you want to get a useful response to your emails, consider
> including a statement of what you think the problem is and why you
> think your proposed changes will help.  Consider offering a test case
> that performs badly and an analysis of the reason why.

Right, thanks for that. I will keep that in mind.

I was thinking about *mostly sorted* datasets, consider the following:

10 11 12 4 5 6 1 2

(Just off my head, sorry if I missed something).

Now, the above data set is made up of number of rotation of a sorted
dataset, so is mostly sorted, albeit with some disordering.

My point is that these kind of datasets(not necessarily the above one)
can lead to a bad choice of pivot, and hence give us a complexity
which is below NlogN.

I know we have a check for pre sorted inputs, but wasn't sure how we
deal with mostly sorted inputs, as quick sort likes disorder in input.

I agree with Claudio's idea. One thing to keep in mind is that we
don't do quick sort for large data sets anyway, and move to external
merge sort for it. So, we could think of using median of medians
algorithm for the purpose.

Another thing I would like to investigate is our implementation of
quick sort's performance(and maybe external merge sort as well) on
multiword keys.

Regards,

Atri


--
Regards,

Atri
l'apprenant



В списке pgsql-hackers по дате отправления:

Предыдущее
От: James Sewell
Дата:
Сообщение: [PATCH] Add an ldapoption to disable chasing LDAP referrals
Следующее
От: Amit Kapila
Дата:
Сообщение: Re: Patch for fail-back without fresh backup