Optimizing DISTINCT with LIMIT

Поиск
Список
Период
Сортировка
От tmp
Тема Optimizing DISTINCT with LIMIT
Дата
Msg-id gh8m5v$7oj$1@news.hub.org
обсуждение исходный текст
Ответы Re: Optimizing DISTINCT with LIMIT  (Gregory Stark <stark@enterprisedb.com>)
Список pgsql-hackers
As far as I have understood the following query  SELECT DISTINCT foo  FROM bar  LIMIT baz
is done by first sorting the input and then traversing the sorted data, 
ensuring uniqueness of output and stopping when the LIMIT threshold is 
reached. Furthermore, a part of the sort procedure is to traverse input 
at least one time.

Now, if the input is large but the LIMIT threshold is small, this 
sorting step may increase the query time unnecessarily so here is a 
suggestion for optimization:  If the input is "sufficiently" large and the LIMIT threshold 
"sufficiently" small, maintain the DISTINCT output by hashning while 
traversing the input and stop when the LIMIT threshold is reached. No 
sorting required and *at* *most* one read of input.

Use case: Websites that needs to present small samples of huge queries fast.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Gregory Stark
Дата:
Сообщение: Assertion failure in new outer/semi/anti join code
Следующее
От: Gregory Stark
Дата:
Сообщение: Re: In-place upgrade: catalog side