Re: CPU costs of random_zipfian in pgbench
От | Fabien COELHO |
---|---|
Тема | Re: CPU costs of random_zipfian in pgbench |
Дата | |
Msg-id | alpine.DEB.2.21.1903242006310.9939@lancre обсуждение исходный текст |
Ответ на | Re: CPU costs of random_zipfian in pgbench (Tom Lane <tgl@sss.pgh.pa.us>) |
Ответы |
Re: CPU costs of random_zipfian in pgbench
|
Список | pgsql-hackers |
Hello Tom, >>> If this is done, some people with zipfian distribution that currently >>> work might be unhappy. > >> After giving it some thought, I think that this cannot be fully fixed for >> 12. > > Just to clarify --- my complaint about "over engineering" referred to > the fact that a cache exists at all; fooling around with the overflow > behavior isn't really going to answer that. Having some cache really makes sense for s < 1, AFAICS. > The bigger picture here is that a benchmarking tool that contains its > own performance surprises is not a nice thing to have. Hmmm. It seems that it depends. Sometimes self inflected wounds are the soldier's responsability, sometimes they must be prevented. > It's not very hard to imagine somebody wasting a lot of time trying to > explain weird results, only to find out that the cause is unstable > performance of random_zipfian. Or worse, people might draw totally > incorrect conclusions if they fail to drill down enough to realize that > there's a problem in pgbench rather than on the server side. Yep, benchmarking is tougher than it looks: it is very easy to get it wrong without knowing, whatever tool you use. >> Given the constraint of Jim Gray's approximated method for s in (0, 1), >> which really does zipfian for the first two integers and then uses an >> exponential approximation, the only approach is that the parameters must >> be computed in a partial eval preparation phase before the bench code is >> run. This means that only (mostly) constants would be allowed as >> parameters when s is in (0, 1), but I think that this is acceptable >> because anyway the method fundamentaly requires it. > > Yeah, if we could store all the required harmonic numbers before the > test run timing starts, that would address the concern about stable > performance. But I have to wonder whether zipfian with s < 1 is useful > enough to justify so much code. I do not know about that. I do not think that Jim Gray chose to invent an approximated method for s < 1 because he thought it was fun. I think he did it because his benchmark data required it. If you need it, you need it. If you do not need it, you do not care. >> The attached other attached patch illustrate what I call poor performance >> for stupid parameters (no point in doing zipfian on 2 integers…) : >> ... >> Maybe the implementation could impose that s is at least 1.001 to avoid >> the lower performance? > > I was wondering about that too. It seems like it'd be a wise idea to > further constrain s and/or n to ensure that the s > 1 code path doesn't do > anything too awful ... Yep. The attached version enforces s >= 1.001, which avoids the worse cost of iterating, according to my small tests. > unless someone comes up with a better implementation that has stable > performance without such constraints. Hmmm… -- Fabien.
Вложения
В списке pgsql-hackers по дате отправления: