Re: qsort again (was Re: [PERFORM] Strange Create Index

Поиск
Список
Период
Сортировка
От Craig A. James
Тема Re: qsort again (was Re: [PERFORM] Strange Create Index
Дата
Msg-id 43F4A7D8.4050907@modgraph-usa.com
обсуждение исходный текст
Ответ на Re: qsort again (was Re: [PERFORM] Strange Create Index  (Markus Schaber <schabi@logix-tt.com>)
Ответы Re: qsort again (was Re: [PERFORM] Strange Create Index
Список pgsql-hackers
Markus Schaber wrote:
> Ron wrote:
>>...and of course if you know enough about the data to be sorted so as to
>>constrain it appropriately, one should use a non comparison based O(N)
>>sorting algorithm rather than any of the general comparison based
>>O(NlgN) methods.
>
> Sounds interesting, could you give us some pointers (names, URLs,
> papers) to such algorithms?

Most of these techniques boil down to good ol' "bucket sort".  A simple example: suppose you have a column of integer
percentages,range zero to 100.  You know there are only 101 distinct values.  So create 101 "buckets" (e.g. linked
lists),make a single pass through your data and drop each row's ID into the right bucket, then make a second pass
throughthe buckets, and write the row ID's out in bucket order.  This is an O(N) sort technique. 

Any time you have a restricted data range, you can do this.  Say you have 100 million rows of scientific results known
tobe good to only three digits -- it can have at most 1,000 distinct values (regardless of the magnitude of the
values),so you can do this with 1,000 buckets and just two passes through the data. 

You can also use this trick when the optimizer is asked for "fastest first result."  Say you have a cursor on a column
ofnumbers with good distribution.  If you do a bucket sort on the first two or three digits only, you know the first
"page"of results will be in the first bucket.  So you only need to apply qsort to that first bucket (which is very
fast,since it's small), and you can deliver the first page of data to the application.  This can be particularly
effectivein interactive situations, where the user typically looks at a few pages of data and then abandons the search.
 

I doubt this is very relevant to Postgres.  A relational database has to be general purpose, and it's hard to give it
"hints"that would tell it when to use this particular optimization. 

Craig

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: qsort again (was Re: [PERFORM] Strange Create
Следующее
От: Ron
Дата:
Сообщение: Re: qsort again (was Re: [PERFORM] Strange Create