Re: General purpose hashing func in pgbench

Поиск
Список
Период
Сортировка
От Ildar Musin
Тема Re: General purpose hashing func in pgbench
Дата
Msg-id 960a8096-dde5-5175-7d7c-ef151086cbee@postgrespro.ru
обсуждение исходный текст
Ответ на Re: General purpose hashing func in pgbench  (Fabien COELHO <coelho@cri.ensmp.fr>)
Ответы Re: General purpose hashing func in pgbench
Список pgsql-hackers
Hello Fabien,


17/01/2018 10:52, Fabien COELHO пишет:

>> Here is a new version of patch. I've splitted it into two parts. The
>> first one is almost the same as v4 from [1] with some refactoring.
>> The second part introduces random_seed variable as you proposed.
>
> Patch 1 applies. Compilations fails, there are two "hash_seed"
> declared in "pgbench.c".
>
> Patch 2 applies cleanly on top of the previous patch and compiles,
> because the variable is removed...
>
> If an ":hash_seed" pgbench variable is used, ISTM that there is no
> need for a global variable at all, so the two patches are going back
> and forth, which is unhelpful. ISTM better to provide just one
> combined patch for the feature.
>
> If the hash_seed variable really needs to be kept, it should be an
> "int64" variable, like other pgbench values.
>
> The len < 1 || len > 2 is checked twice, once in the "switch", on in
> an "if" just after the "switch". Once is enough.

I totally messed up doing git rebase and didn't double check the code.
*facepalm* There shouldn't be hash_seed variable and the second 'len < 1
|| len > 2' check. Sorry for that, fixed in the attached patch.

> Calling random just usually initializes about 31 bits, so random
> should be called 2 or maybe 3 times? Or maybe use the internal getrand
> which has 48 pseudorandom bits?
Done. I used code from get_random_uint64() as an example.
>
> For me "random seed" is what is passed to srandom, so the variable
> should rather be named hash_seed because there could also be a random
> seed (actually, there is in another parallel patch:-). Moreover, this
> seed may or may not be random if set, so calling it "random_seed" is
> not desirable.
>
My intention was to introduce seed variable which potentially could be
used in different contexts, not only for hash functions. I renamed it to
'hash_seed' for now. But what do you think about naming it simply 'seed'
or choosing some other general name?
>> I didn't do the executor simplification thing yet because I'm a
>> little concerned about inventive users, who may want to change
>> random_seed variable in runtime (which is possible since pgbench
>> doesn't have read only variables aka constants AFAIK).
>
> If the user choses to overide hash_seed in their script, it is their
> decision, the documentation has only to be clear about :hash_seed
> being the default seed. I see no clear reason to work around this
> possibility by evaluating the seed at parse time, especially as the
> variable may not have its final value yet depending on the option
> order. I'd suggest to just use make_variable("hash_seed") for the
> default second argument and simplify the executor.
That is a great idea, I didn't see that possibility. Done.
>
> The seed variable is not tested explicitely in the script, you could add
> a "hash(5432) == hash(5432, :hash_seed)" for instance.
>
> It would be nice if an native English speaker could proofread the
> documentation text. I'd suggest: "*an* optional seed parameter". "In
> case *the* seed...". "<literal>:hash_seed</literal>". "shared for" ->
> "shared by". "following listing" -> "following pgbench script". "few
> accounts generates" -> "few accounts generate".
>
Done as well.
> For the document example, I'd use larger values for the random &
> modulo, eg 100000000 and 1000000. The drawback is that zipfian does a
> costly computation of the generalized harmonic number when the
> parameter is lower than 1.0. For cities, the parameter found by Zipf
> was 1.07 (says Wikipedia). Maybe use this historical value. Or maybe
> use an exponential distribution in the example.
>
Changed parameter to 1.07.

Thanks!

-- 
Ildar Musin
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company 


Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Claudio Freire
Дата:
Сообщение: Re: [HACKERS] Vacuum: allow usage of more than 1GB of work mem
Следующее
От: Geoff Winkless
Дата:
Сообщение: Re: proposal: alternative psql commands quit and exit