Re: pgbench randomness initialization
От | Fabien COELHO |
---|---|
Тема | Re: pgbench randomness initialization |
Дата | |
Msg-id | alpine.DEB.2.10.1604071242420.11001@sto обсуждение исходный текст |
Ответ на | Re: pgbench randomness initialization (Andres Freund <andres@anarazel.de>) |
Список | pgsql-hackers |
Hello Andres, > If you run the test for longer... Or explicitly iterate over IVs. At the > very least we need to make pgbench output the IV used, to have some > chance of repeating tests. Note that I'm not against providing a way to repeat tests "exactly", and I have suggested two means: environment variable and/or option. > [...] That comparison pretty much invalidates any point you're making, > it's that bad. At least it is simple, if simplistic. Here is another one: I knew a financial institution which needed to evaluate the VAR of exotic financial products every night. They relied on MC for that. Alas, it was not converging quickly enough, results were unstable, so they took your advice: they froze the seed. Day after day the results were mostly the same, the VAR was stable one morning to the other, the management is happy, the risks were under control... That was in the mid 2000s:-) >> However, from a stastistical perspective this is just heresy: you may do a >> change which improves one given run at the expense of all possible others >> and you would not know it: Say for instance that there are two different >> behaviors depending on something, then you will check against one of them >> only. > > Meh. That assumes that we're doing a huge number of pgbench runs; A number of, not necessarily "huge". Or averaging a lot of intermediate values and having a hard look at the distribution, not just the final tps number. > but usually people do maybe a handful. Tops. If you're trying to defend > against scenarios like that you need to design your tests so that you'll > encounter such problems by running longer. People usually do a lot of things, does not mean that it is "right". >> So I have no mathematical doubt that changing the seed is the right >> default setting, thus I think that the current behavior is fine. >> However I'm okay if someone wants to control the randomness for some >> reason (maybe having "less sure" results, but quickly), so it could be >> allowed somehow. > > There might be some statistics arguments, Yep, there is. > but I think they're pretty ignoring reality. Hmmm. If reality wants to ignore mathematics, usually it looses, so this will not be with my blessing. Note that as a committer you do not need me to freeze the seed. I'm just providing an opinion backed by mathematical proofs. -- Fabien.
В списке pgsql-hackers по дате отправления: