Re: random() generates collisions too early

Поиск
Список
Период
Сортировка
От Honza Horak
Тема Re: random() generates collisions too early
Дата
Msg-id 5266726E.5030300@redhat.com
обсуждение исходный текст
Ответ на Re: random() generates collisions too early  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: random() generates collisions too early  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Список pgsql-bugs
On 10/21/2013 05:14 PM, Tom Lane wrote:
> Heikki Linnakangas <hlinnakangas@vmware.com> writes:
>> On 18.10.2013 14:55, Honza Horak wrote:
>>> The results covered only 181383 distinct values, and 68 values
>>> repeated four or five times each. We should at least consider using a
>>> higher-entropy seed.
>
>> Interesting. PostgreSQL's random() function just calls the underlying
>> libc random() function. I assume you tested this on with Linux and glibc.
>
> I agree with the theory that this probably isn't the fault of the random()
> function as such, but with our code to reset the random seed when forking
> a postmaster child process.  Note that the test case is only examining the
> first random value created in each process.  So basically what this is
> measuring is the number of different seed values we use.
>
>             regards, tom lane
>

After playing a bit more with random() value and the seed defined in
src/backend/postmaster/postmaster.c:4038 it seems really like the seed
doesn't have very good characteristic. The PID number used there ensures
the value to be different in child processes, but when only xor-ed with
usecs, it doesn't enlarge the total data domain of the seed, which stays
1M values. Then having in mind the birthday paradox and probably not
ideal PRNG, it doesn't seem unrealistic to get two same random numbers
in only hundreds/thousands of tries.

In order to enlarge possible data domain of the seed we can include sec
count as well and shift some xor-ed variables to use the whole range or
unsigned int. The attached patch uses a way that gave me much better
results (collision found approx. after a hundred thousands of values):

-       srandom((unsigned int) (MyProcPid ^ usecs));
+       srandom((unsigned int) (MyProcPid ^ (usecs << 12) ^ secs));

I'm also thinking of a reproducer -- does the test-suite allow to
"reconnect" the client during a test?

Regards,
Honza

Вложения

В списке pgsql-bugs по дате отправления:

Предыдущее
От: Honza Horak
Дата:
Сообщение: Re: random() generates collisions too early
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: random() generates collisions too early