Re: random() (was Re: New GUC to sample log queries)
| От | Fabien COELHO |
|---|---|
| Тема | Re: random() (was Re: New GUC to sample log queries) |
| Дата | |
| Msg-id | alpine.DEB.2.21.1812280808490.32444@lancre обсуждение исходный текст |
| Ответ на | Re: random() (was Re: New GUC to sample log queries) (Tom Lane <tgl@sss.pgh.pa.us>) |
| Ответы |
Re: random() (was Re: New GUC to sample log queries)
Re: random() (was Re: New GUC to sample log queries) |
| Список | pgsql-hackers |
Hello Tom,
>>> Another idea, which would be a lot less prone to breakage by
>>> add-on code, is to change drandom() and setseed() to themselves
>>> use pg_erand48() with a private seed.
>
>> The pg_erand48 code looks like crumbs from the 70's optimized for 16 bits
>> architectures (which it is probably not, but why not going to 64 bits or
>> 128 bits directly looks like a missed opportunity), its internal state is
>> 48 bits as its name implies, and its period probably around 2**48, which
>> is 2**12 better than the previous case, not an extraordinary achievement.
>
> I can't get terribly excited about rewriting that code. You're arguing
> from a "one size should fit all" perspective,
Not exactly. I'm not claiming that distinguishing parts that require
good random from others is a bad choice. I'm arguing that:
(1) from a software engineering perspective, the PRNG implementation
should hide the underlying generator name and state size.
(2) from a numerical perspective, poor seedings practice should be avoided
when possible.
(3) from a cryptographic perspective, LCG is a poor choice of fast PRNG,
which a quick look at wikipedia (mother of all knowledge) confirms.
(4) the fact that pg historical PRNG choices are well documented is not a
good justification for not improving them.
Better alternatives exist that do not cost much (eg xorshift variants,
WELL...), some of which are optimized for 64 bits architectures.
> which is exactly not the design approach we're using.
> We've already converted security-sensitive PRNG uses to use
> pg_strong_random (or if we haven't, that's a localized bug in any such
> places we missed). What remains are places where we are not so
> concerned about cryptographic strength but rather speed.
I do agree with the speed concern. I'm arguing that better quality at
speed can be achieved with better seeding practices and not using a LCG.
About costs, not counting array accesses:
- lrand48 (48 bits state as 3 uint16) is 29 ops
(10 =, 8 *, 7 +, 4 >>)
- xorshift+ (128 bits state as 2 uint64) is 13 ops
( 5 =, 0 *, 1 +, 3 >>, 4 ^)
- xororshift128+ (idem) is 17 ops
( 6 =, 0 *, 1 +, 5 >>, 3 ^, 2 |, less if rot in hardware)
- WELL512 (512 bits state as 16 uint32) is 38 ops
(11 =, 0 *, 3 +, 7 >>, 10 ^, 4 &)
probably much better, but probably slower than the current version
I'd be of the (debatable) opinion that we could use xororshift128+,
already used by various languages, even if it fails some specialized
tests.
> It does behoove us to ensure that the seed values are unpredictable and
> that a user-controllable seed isn't used for internal operations,
Sure.
> but I don't feel a need for replacing the algorithm.
Hmmm. Does it mean that you would veto any change, even if the speed
concern is addressed (i.e. faster/not slower with better quality)?
> You might argue that the SQL function drandom should be held to a higher
> standard, but we document exactly this same tradeoff for it. Users who
> want cryptographic strength are directed to pgcrypto, leaving the audience
> for drandom being people who want speed.
My point is that speed is not necessary incompatible with better quality.
Better quality should not replace strong random when needed.
--
Fabien.
В списке pgsql-hackers по дате отправления: