Re: gaussian distribution pgbench -- splits v4
От | Fabien COELHO |
---|---|
Тема | Re: gaussian distribution pgbench -- splits v4 |
Дата | |
Msg-id | alpine.DEB.2.10.1408010905040.9457@sto обсуждение исходный текст |
Ответ на | Re: gaussian distribution pgbench -- splits v4 (Robert Haas <robertmhaas@gmail.com>) |
Ответы |
Re: gaussian distribution pgbench -- splits v4
|
Список | pgsql-hackers |
Hello, >> Version one is "k' = 1 + (a * k + b) modulo n" with "a" prime with >> respect to "n", "n" being the number of keys. This is nearly possible, >> but for the modulo operator which is currently missing, and that I'm >> planning to submit for this very reason, but probably another time. > > That's pretty crude, Yep. It is very simple, it is much better than nothing, and for a database test is may be "good enough". > although I don't object to a modulo operator. It would be nice to be > able to use a truly random permutation, which is not hard to generate > but probably requires O(n) storage, likely a problem for large scale > factors. That is indeed the actual issue in my mind. I was thinking of permutations with a formula, which are not so easy to find and may end-up looking like "(a*k+b)%n" anyway. I had the same issue for generating random data for a schema (see http://www.coelho.net/datafiller.html). > Maybe somebody who knows more math than I do (like you, probably!) can > come up with something more clever. I can certainly suggest other formula, but that does not mean beautiful code, thus would probably be rejected. I'll see. An alternative to this whole process may be to hash/modulo a non uniform random value. id = 1 + hash(some-random()) % n But the hashing changes the distribution as it adds collisions, so I have to think about how to be able to control the distribution in that case, and what hash function to use. -- Fabien.
В списке pgsql-hackers по дате отправления: