Re: intermittant performance problem

Поиск
Список
Период
Сортировка
От marcin mank
Тема Re: intermittant performance problem
Дата
Msg-id b1b9fac60903290124r31dc8342s603125bd00d535cf@mail.gmail.com
обсуждение исходный текст
Ответ на Re: intermittant performance problem  (Mike Charnoky <noky@nextbus.com>)
Ответы Re: intermittant performance problem  (marcin mank <marcin.mank@gmail.com>)
Список pgsql-general
I think (a part of) Your problem is that order by random() is O(N
logN) complexity, while You are after O(N) .

The solution (in pseudocode)

random_sample(resultset,K):
  result := first K rows from resultset
  resultset.scrollto(K+1)
  p = K+1
  while(resultset.hasMoreRows())
        row = resultset.current()
        resultset.advance()
        if(random() < K/p)
              replace a random element in result with row
        p = p+1
  return result


the invariant being that at each step result contains a random sample
of K elements.

It should be fairly easy to implement in plpgsql.

Greetings
Marcin

В списке pgsql-general по дате отправления:

Предыдущее
От: Eric Smith
Дата:
Сообщение: running two clusters on one machine
Следующее
От: marcin mank
Дата:
Сообщение: Re: intermittant performance problem