Re: gaussian distribution pgbench -- splits v4

Поиск

Список

Период

Сортировка

От	Fabien COELHO
Тема	Re: gaussian distribution pgbench -- splits v4
Дата	1 августа 2014 г. 07:27:07
Msg-id	alpine.DEB.2.10.1408010905040.9457@sto обсуждение исходный текст
Ответ на	Re: gaussian distribution pgbench -- splits v4 (Robert Haas <robertmhaas@gmail.com>)
Ответы	Re: gaussian distribution pgbench -- splits v4
Список	pgsql-hackers

Дерево обсуждения

Hello,

>> Version one is "k' = 1 + (a * k + b) modulo n" with "a" prime with 
>> respect to "n", "n" being the number of keys. This is nearly possible, 
>> but for the modulo operator which is currently missing, and that I'm 
>> planning to submit for this very reason, but probably another time.
>
> That's pretty crude,

Yep. It is very simple, it is much better than nothing, and for a database 
test is may be "good enough".

> although I don't object to a modulo operator.  It would be nice to be 
> able to use a truly random permutation, which is not hard to generate 
> but probably requires O(n) storage, likely a problem for large scale 
> factors.

That is indeed the actual issue in my mind. I was thinking of permutations 
with a formula, which are not so easy to find and may end-up looking like 
"(a*k+b)%n" anyway. I had the same issue for generating random data for a 
schema (see http://www.coelho.net/datafiller.html).

> Maybe somebody who knows more math than I do (like you, probably!) can 
> come up with something more clever.

I can certainly suggest other formula, but that does not mean beautiful 
code, thus would probably be rejected. I'll see.

An alternative to this whole process may be to hash/modulo a non uniform 
random value.
      id = 1 + hash(some-random()) % n

But the hashing changes the distribution as it adds collisions, so I have 
to think about how to be able to control the distribution in that case, 
and what hash function to use.

-- 
Fabien.

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: gaussian distribution pgbench -- splits v4