Re: CPU costs of random_zipfian in pgbench

Поиск

Список

Период

Сортировка

От	Tomas Vondra
Тема	Re: CPU costs of random_zipfian in pgbench
Дата	18 февраля 2019 г. 01:02:37
Msg-id	0666abc4-1a41-7703-8fd8-244ac10243b4@2ndquadrant.com обсуждение исходный текст
Ответ на	Re: CPU costs of random_zipfian in pgbench (David Fetter <david@fetter.org>)
Ответы	Re: CPU costs of random_zipfian in pgbench (David Fetter <david@fetter.org>)
Список	pgsql-hackers

Дерево обсуждения

On 2/17/19 6:33 PM, David Fetter wrote:
> On Sun, Feb 17, 2019 at 11:09:27AM -0500, Tom Lane wrote:
>> Fabien COELHO <coelho@cri.ensmp.fr> writes:
>>>> I'm trying to use random_zipfian() for benchmarking of skewed data sets, 
>>>> and I ran head-first into an issue with rather excessive CPU costs. 
>>
>>> If you want skewed but not especially zipfian, use exponential which is 
>>> quite cheap. Also zipfian with a > 1.0 parameter does not have to compute 
>>> the harmonic number, so it depends in the parameter.
>>
>> Maybe we should drop support for parameter values < 1.0, then.  The idea
>> that pgbench is doing something so expensive as to require caching seems
>> flat-out insane from here.  That cannot be seen as anything but a foot-gun
>> for unwary users.  Under what circumstances would an informed user use
>> that random distribution rather than another far-cheaper-to-compute one?
>>
>>> ... This is why I submitted a pseudo-random permutation 
>>> function, which alas does not get much momentum from committers.
>>
>> TBH, I think pgbench is now much too complex; it does not need more
>> features, especially not ones that need large caveats in the docs.
>> (What exactly is the point of having zipfian at all?)
> 
> Taking a statistical perspective, Zipfian distributions violate some
> assumptions we make by assuming uniform distributions. This matters
> because Zipf-distributed data sets are quite common in real life.
> 

I don't think there's any disagreement about the value of non-uniform
distributions. The question is whether it has to be a zipfian one, when
the best algorithm we know about is this expensive in some cases? Or
would an exponential distribution be enough?

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Justin Pryzby
Дата: 18 февраля 2019 г., 00:48:00
Сообщение: Re: Actual Cost

Следующее

От: Tomas Vondra
Дата: 18 февраля 2019 г., 01:08:31
Сообщение: Re: CPU costs of random_zipfian in pgbench

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: CPU costs of random_zipfian in pgbench

Предыдущее

Следующее