Re: qsort, once again

Поиск
Список
Период
Сортировка
От Dann Corbit
Тема Re: qsort, once again
Дата
Msg-id D425483C2C5C9F49B5B7A41F8944154757D68B@postal.corporate.connx.com
обсуждение исходный текст
Ответ на qsort, once again  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: qsort, once again  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
> -----Original Message-----
> From: pgsql-hackers-owner@postgresql.org [mailto:pgsql-hackers-
> owner@postgresql.org] On Behalf Of Dann Corbit
> Sent: Thursday, March 16, 2006 4:42 PM
> To: Tom Lane
> Cc: Jonah H. Harris; pgsql-hackers@postgresql.org; Jerry Sievers
> Subject: Re: [HACKERS] qsort, once again
>
> > So my feeling is we should just remove the swap_cnt code and return
to
> > the original B&M algorithm.  Being much faster than expected for
> > presorted input doesn't justify being far slower than expected for
> > other inputs, IMHO.  In the context of Postgres I doubt that
perfectly
> > sorted input shows up very often anyway.
> >
> > Comments?
>
> Checking for presorted input is O(n).
> If the input is random, an average of 3 elements will be tested.
> So adding an in-order check of the data should not be too expensive.
>
> I would benchmark several approaches and see which one is best when
used
> in-place.

Even if "hunks" of the input are sorted, the test is a very good idea.

Recall that we are sorting recursively and so we divide the data into
chunks.

Consider an example...

Quicksort of a field that contains Sex as 'M' for male, 'F' for female,
or NULL for unknown.

The median selection is going to pick one of 'M', 'F', or NULL.
After pass 1 of qsort we will have two partitions.  One partition will
have all of one type and the other partition will have the other two
types.

An in-order check will tell us that the monotone partition is sorted and
we are done with it.

Imagine also a table that was clustered but for which we have not
updated statistics.  Perhaps it is 98% sorted.  Checking for order in
our partitions is probably a good idea.

I think you could also get a good optimization if you are checking for
partitions and find a big section of the partition is not ordered (even
though the whole thing is not).  If you could perk the ordered size up
the tree, you could just add another partition to the merge list and
sort the unordered part.

In "C Unleashed" I call this idea partition discovery mergesort.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Dann Corbit"
Дата:
Сообщение: Re: qsort, once again
Следующее
От: "Qingqing Zhou"
Дата:
Сообщение: Re: Separate BLCKSZ for data and logging