Re: add modulo (%) operator to pgbench
От | Fabien COELHO |
---|---|
Тема | Re: add modulo (%) operator to pgbench |
Дата | |
Msg-id | alpine.DEB.2.10.1409240916340.3672@sto обсуждение исходный текст |
Ответ на | Re: add modulo (%) operator to pgbench (Heikki Linnakangas <hlinnakangas@vmware.com>) |
Ответы |
Re: add modulo (%) operator to pgbench
|
Список | pgsql-hackers |
Hello Heikki, >> If you reject it, you can also remove the gaussian and exponential random >> distribution which is near useless without a mean to add a minimal >> pseudo-random stage, for which "(x * something) % size" is a reasonable >> approximation, hence the modulo submission. > > I'm confused. The gaussian and exponential patch was already committed. Yes. They are significant patches that really involved significant work, and which are only really useful with a complementary stupid 10 lines patch which is being rejected without understanding why it is needed. > Are you saying that it's not actually useful, and should be reverted? > That doesn't make sense to me, gaussian and exponential distributed > values seems quite useful to me in test scripts. Yes and no. Currently these distributions are achieved by mapping a continuous function onto integers, so that neighboring integers get neighboring number of draws, say with size=7: #draws 10 6 3 1 0 0 0 // some exponential distribution int drawn 0 1 2 3 4 5 6 Although having an exponential distribution of accesses on tuples is quite reasonable, the likelyhood there would be so much correlation between neighboring values is not realistic at all. You need some additional shuffling to get there. > I don't understand what that pseudo-random stage you're talking about is. Can > you elaborate? The pseudo random stage is just a way to scatter the values. A basic approach to achieve this is "i' = (i * large-prime) % size", if you have a modulo. For instance with prime=5 you may get something like: #draws 10 6 3 1 0 0 0 int drawn 0 1 2 3 4 5 6 (i) scattered 0 5 3 1 6 4 2 (i' = 5 i % 7) So the distribution becomes: #draws 10 1 0 3 0 6 0 scattered 0 1 2 3 4 5 6 Which is more interesting from a testing perspective because it removes the neighboring value correlation. A better approach is to use a hash function. "i' = hash(i) % size", although it skews the distribution some more, but the quality of the shuffling is much better than with the mult-modulo version above. Note that you need a modulo as well... I must say that I'm appaled by a decision process which leads to such results, with significant patches passed, and the tiny complement to make it really useful (I mean not on the paper or on the feature list, but in real life) is rejected... -- Fabien.
В списке pgsql-hackers по дате отправления: