Re: [HACKERS] [WIP] Zipfian distribution in pgbench

Поиск
Список
Период
Сортировка
От Fabien COELHO
Тема Re: [HACKERS] [WIP] Zipfian distribution in pgbench
Дата
Msg-id alpine.DEB.2.20.1707191616140.4496@lancre
обсуждение исходный текст
Ответ на Re: [HACKERS] [WIP] Zipfian distribution in pgbench  (Alik Khilazhev <a.khilazhev@postgrespro.ru>)
Ответы Re: [HACKERS] [WIP] Zipfian distribution in pgbench  (Alik Khilazhev <a.khilazhev@postgrespro.ru>)
Список pgsql-hackers
Hello Alik,

> I am attaching patch v3.

> Among other things I fixed small typo in description of 
> random_exponential function in pgbench.sgml file.

Ok. Probably this typo should be committed separatly and independently.

A few comments about v3:

Patch applies cleanly, compiles, works.

About the maths: As already said, I'm not at ease with a random_zipfian 
function which does not display a (good) zipfian distribution. At the 
minimum the documentation should be clear about the approximations implied 
depending on the parameter value.

In the litterature the theta parameter seems to be often called alpha
or s (eg see https://en.wikipedia.org/wiki/Zipf%27s_law). I would suggest to
stick to "s" instead of "theta"?

About the code: looks simpler than the previous version, which is good.

Double constants should be used when the underlying type is a double,
instead of relying on implicit int to double promotion (0 -> 0.0, 1 -> 1.0).

Functions zipfZeta(n, theta) does not really computes the zeta(n) function,
so I think that a better name should be chosen. It seems to compute
H_{n,theta}, the generalized harmonic number. Idem "thetan" field in struct.

The handling of cache overflow by randomly removing one entry looks like
a strange idea. Rather remove the oldest entry?

ISTM that it should print a warning once if the cache array overflows as 
performance would drop heavily.

If the zipf cache is constant size, there is no point in using dynamic 
allocation, just declare an array...

Comments need updating: eg "theta parameter of previous execution" which
dates back when there was only one value.

There should be a comment to explain that the structure is in the thread
for thread safety.

There should be non regression tests somehow. If the "improve pgbench
tap test infrastructure" get through, things should be added there.

-- 
Fabien.



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Kuzmenkov
Дата:
Сообщение: Re: [HACKERS] Proposal for CSN based snapshots
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Bug in ExecModifyTable function and trigger issues for foreign tables