Обсуждение: BUG #12589: Poor randomness from random() with some seeds; poor resolution

Поиск
Список
Период
Сортировка

BUG #12589: Poor randomness from random() with some seeds; poor resolution

От
pgsql-004@personal.formauri.es
Дата:
The following bug has been logged on the website:

Bug reference:      12589
Logged by:          Pedro Gimeno
Email address:      pgsql-004@personal.formauri.es
PostgreSQL version: 9.1.13
Operating system:   Debian Wheezy
Description:

postgres=> select version();
                                        version

----------------------------------------------------------------------------------------
 PostgreSQL 9.1.13 on i686-pc-linux-gnu, compiled by gcc (Debian 4.7.2-5)
4.7.2, 32-bit
(1 row)

postgres=> select setseed(1);
 setseed
---------

(1 row)

postgres=> -- formatted as array for compactness of output in email
postgres=> select array(select floor(random()*40) from generate_series(1,
40));
                                                         ?column?


---------------------------------------------------------------------------------------------------------------------------


{19,39,19,19,39,19,39,39,39,39,19,19,19,19,39,39,39,39,19,39,39,39,39,19,39,19,39,19,39,19,39,19,19,19,39,19,38,38,19,38}
(1 row)

postgres=> select setseed(-1);
 setseed
---------

(1 row)

postgres=> select array(select floor(random()*40) from generate_series(1,
40));
                                              ?column?

-----------------------------------------------------------------------------------------------------

{20,0,20,20,0,20,0,0,0,0,20,20,20,20,0,0,0,0,20,0,0,0,0,20,0,20,0,20,0,20,0,20,20,20,0,20,1,1,20,1}
(1 row)

It gets somewhat better as things progress, but the start of the sequence
exhibits very clear non-randomness.

The cause seems to be GNU libc:

$ cat x.c
#include <stdio.h>
#include <stdlib.h>

main()
{
    int i;
    srandom(0x7FFFFFFF);
    for (i = 0; i < 40; i++)
        printf(i?",%d":"%d", (random()/(RAND_MAX/40)));
    puts("");
}

$ make x
cc     x.c   -o x

$ ./x
19,39,19,19,39,19,39,39,39,39,19,19,19,19,39,39,39,39,19,39,39,39,39,19,39,19,39,19,39,19,39,19,19,19,39,19,38,38,19,38

$ dpkg -l libc6 | grep libc6
ii  libc6:i386                                                  2.17-3
                        i386         Embedded GNU C Library: Shared
libraries

PostgreSQL should probably use its own pseudo-random number generator and
not depend on the system's, as there's scholar literature (e.g. Knuth's
TAOCP) indicating that many systems' default generator has poor quality. Any
replacement PRNG should preferably be cross-platform compatible, thus
providing sequence repeatability across platforms, modulo float rounding
errors. A fast and more reliable (non-crypto) PRNG such as Mersenne Twister
or WELL is not hard to implement and they tend to be fast enough for most
purposes.

On top on that, the returned value does not really have full float8
precision:

postgres=> select mod((random()*2147483648)::numeric(14,4), 1);
  mod
--------
 0.0000
(1 row)

(the latter output happens always no matter how many times it's tried).

Re: BUG #12589: Poor randomness from random() with some seeds; poor resolution

От
Tom Lane
Дата:
pgsql-004@personal.formauri.es writes:
> [ setseed(1) behaves poorly on glibc platforms ]
> PostgreSQL should probably use its own pseudo-random number generator and
> not depend on the system's,

This is not a bug; it's a feature request, and a rather poorly grounded
one at that.  If the user uses setseed(), he probably cares about getting
the same sequence every time, and would not thank us for rolling in a
different implementation with different behavior.

(You could try griping to the glibc folk for providing a poor random()
function on your platform, but I suspect they'd give largely the same
answer.)

Having said that, I think that most of your problem here stems from using
an extremal value for the argument of setseed().  I get saner-looking
behavior wih a seed of 0 or 0.1, or even 0.999999.

We could possibly redefine setseed to take a range of -1 < x < 1 rather
than -1 <= x <= 1, although that might break things unnecessarily for
users on other platforms where srandom() doesn't have this odd corner
case.

            regards, tom lane

PS: also, if you have need of something better than random(), you could
consider using pgcrypto's random number generator, or writing your own
as an extension.

Re: BUG #12589: Poor randomness from random() with some seeds; poor resolution

От
Pedro Gimeno
Дата:
Tom Lane wrote, On 2015-01-19 16:22:
> pgsql-004@personal.formauri.es writes:
>> [ setseed(1) behaves poorly on glibc platforms ]
>> PostgreSQL should probably use its own pseudo-random number generator and
>> not depend on the system's,
>
> This is not a bug; it's a feature request, and a rather poorly
> grounded one at that.

It's a purportedly random sequence that turns out not to be random, with
not even a caveat in the manual. That's a bug.

> If the user uses setseed(), he probably cares about getting
> the same sequence every time, and would not thank us for rolling in a
> different implementation with different behavior.

That's a good argument for not backporting such a change and documenting
the caveats instead in current versions. On the other hand, migrating a
database to a different platform, or sharing test data with people in
other platforms, can also bite that user by producing a different
sequence. I ran into this issue myself when I started to prepare a
reproducible test case for a problem in an external application using
randomized data with known values. Instead of sending a short file that
generates the data at random and tweaks the necessary values to trigger
the problem, I may be forced to send a ~1Gb file because of this issue.

> Having said that, I think that most of your problem here stems from using
> an extremal value for the argument of setseed().  I get saner-looking
> behavior wih a seed of 0 or 0.1, or even 0.999999.

Me too, but my choice of setseed(1) should have been just as good as
any, according to the manual. Also it doesn't solve the bigger problem
of platform dependency I'm having. Relying on the system's generator is
known to be problematic. The least-effort approach is to tweak the
manual to add the corresponding caveats, but that's a poor solution.

> PS: also, if you have need of something better than random(), you could
> consider using pgcrypto's random number generator, or writing your own
> as an extension.

pgcrypto doesn't have seeding, so neither of these solutions solves this
problem except if we force the other party to install that extension
too, which isn't really a viable solution.

Re: BUG #12589: Poor randomness from random() with some seeds; poor resolution

От
Tom Lane
Дата:
Pedro Gimeno <pgsql-004@personal.formauri.es> writes:
> Tom Lane wrote, On 2015-01-19 16:22:
>> This is not a bug; it's a feature request, and a rather poorly
>> grounded one at that.

> It's a purportedly random sequence that turns out not to be random, with
> not even a caveat in the manual. That's a bug.

It is absolutely arguable that this is a glibc bug, so I encourage you
to submit a bug report to them and see what happens.  However, Postgres'
random() function is clearly documented to be just a thin layer over the
platform's random() function:

    The characteristics of the values returned by random() depend on
    the system implementation.

That seems to me to be a sufficient "caveat".

            regards, tom lane

Re: BUG #12589: Poor randomness from random() with some seeds; poor resolution

От
Pedro Gimeno
Дата:
Tom Lane wrote, On 2015-01-20 03:59:
>     The characteristics of the values returned by random() depend on
>     the system implementation.
>
> That seems to me to be a sufficient "caveat".

That caveat is not in the 9.1 manual which is the version I reported
this issue against.

Can it be backported to the versions of the manual where it applies?

Re: BUG #12589: Poor randomness from random() with some seeds; poor resolution

От
Tom Lane
Дата:
Pedro Gimeno <pgsql-004@personal.formauri.es> writes:
> Tom Lane wrote, On 2015-01-20 03:59:
>> The characteristics of the values returned by random() depend on
>> the system implementation.
>>
>> That seems to me to be a sufficient "caveat".

> That caveat is not in the 9.1 manual which is the version I reported
> this issue against.

Good point.  It looks like Heikki added that text in 9.4, and didn't
back-patch.

> Can it be backported to the versions of the manual where it applies?

Done.

            regards, tom lane

Re: BUG #12589: Poor randomness from random() with some seeds; poor resolution

От
Tom Lane
Дата:
Pedro Gimeno <pgsql-004@personal.formauri.es> writes:
> Tom Lane wrote, On 2015-01-21 03:23:
>> Done.

> Thank you. As I said I consider that a poor solution in the long term
> (though probably a necessary one for current branches, for the reasons
> you stated). Would a patch in this area have any chance?

Personally I see no reason whatsoever to replace random(), and multiple
reasons not to.  If you want a random-number generator with different
properties from what libc provides, write an extension.

            regards, tom lane

Re: BUG #12589: Poor randomness from random() with some seeds; poor resolution

От
Pedro Gimeno
Дата:
Tom Lane wrote, On 2015-01-21 03:23:
>> Can it be backported to the versions of the manual where it applies?
>
> Done.

Thank you. As I said I consider that a poor solution in the long term
(though probably a necessary one for current branches, for the reasons
you stated). Would a patch in this area have any chance?