Re: random() (was Re: New GUC to sample log queries)

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: random() (was Re: New GUC to sample log queries)
Дата
Msg-id 10913.1546038905@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: random() (was Re: New GUC to sample log queries)  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: random() (was Re: New GUC to sample log queries)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
I wrote:
> Thomas Munro <thomas.munro@enterprisedb.com> writes:
>> +1, but I wonder if just separating them is enough.  Is our seeding
>> algorithm good enough for this new purpose?  The initial seed is 100%
>> predictable to a logged in user (it's made from the backend PID and
>> backend start time, which we tell you), and not even that hard to
>> guess from the outside, so I think Coverity's warning is an
>> understatement in this case.  Even if we separate the PRNG state used
>> for internal stuff so that users can't clobber its seed from SQL,
>> wouldn't it be possible to predict which statements will survive the
>> log sampling filter given easily available information and a good
>> guess at how many times random() (or whatever similar thing) has been
>> called so far?

> Yeah, that's a good point.  Maybe we should upgrade the per-process
> seed initialization to make it less predictable.  I could see expending
> a call of the strong RNG to contribute some more noise to the seeds
> selected in InitProcessGlobals().

Here's a simple patch to do so.

Looking at this, I seem to remember that we considered doing exactly this
awhile ago, but refrained because there was concern about depleting the
system's reserve of entropy if we have a high backend spawn rate, and it
didn't seem like there was a security reason to insist on unpredictable
random() results.  However, the log-sampling patch destroys the latter
argument.  As for the former argument, I'm not sure how big a deal that
really is.  Presumably, the act of spawning a backend would itself
contribute some more entropy to the pool (particularly if a network
connection is involved), so the depletion problem might be fictitious
in the first place.  Also, a few references I consulted, such as the
Linux urandom(4) man page, suggest that even in a depleted-entropy
state the results of reading /dev/urandom should be random enough
for all but the very strictest security requirements.

Thoughts?

            regards, tom lane

diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index eedc617..2d5a0ac 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*************** ClosePostmasterPorts(bool am_syslogger)
*** 2520,2530 ****
  /*
   * InitProcessGlobals -- set MyProcPid, MyStartTime[stamp], random seeds
   *
!  * Called early in every backend.
   */
  void
  InitProcessGlobals(void)
  {
      MyProcPid = getpid();
      MyStartTimestamp = GetCurrentTimestamp();
      MyStartTime = timestamptz_to_time_t(MyStartTimestamp);
--- 2520,2532 ----
  /*
   * InitProcessGlobals -- set MyProcPid, MyStartTime[stamp], random seeds
   *
!  * Called early in the postmaster and every backend.
   */
  void
  InitProcessGlobals(void)
  {
+     unsigned int rseed;
+
      MyProcPid = getpid();
      MyStartTimestamp = GetCurrentTimestamp();
      MyStartTime = timestamptz_to_time_t(MyStartTimestamp);
*************** InitProcessGlobals(void)
*** 2539,2553 ****
  #endif

      /*
!      * Set a different seed for random() in every backend.  Since PIDs and
!      * timestamps tend to change more frequently in their least significant
!      * bits, shift the timestamp left to allow a larger total number of seeds
!      * in a given time period.  Since that would leave only 20 bits of the
!      * timestamp that cycle every ~1 second, also mix in some higher bits.
       */
!     srandom(((uint64) MyProcPid) ^
              ((uint64) MyStartTimestamp << 12) ^
!             ((uint64) MyStartTimestamp >> 20));
  }


--- 2541,2570 ----
  #endif

      /*
!      * Set a different seed for random() in every process.  We want something
!      * unpredictable, so if possible, use high-quality random bits for the
!      * seed.  Otherwise, fall back to a seed based on timestamp and PID.
!      *
!      * Note we can't use pg_backend_random here, since this is used in the
!      * postmaster, and even in a backend we might not be attached to shared
!      * memory yet.
       */
! #ifdef HAVE_STRONG_RANDOM
!     if (!pg_strong_random(&rseed, sizeof(rseed)))
! #endif
!     {
!         /*
!          * Since PIDs and timestamps tend to change more frequently in their
!          * least significant bits, shift the timestamp left to allow a larger
!          * total number of seeds in a given time period.  Since that would
!          * leave only 20 bits of the timestamp that cycle every ~1 second,
!          * also mix in some higher bits.
!          */
!         rseed = ((uint64) MyProcPid) ^
              ((uint64) MyStartTimestamp << 12) ^
!             ((uint64) MyStartTimestamp >> 20);
!     }
!     srandom(rseed);
  }



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Making all nbtree entries unique by having heap TIDs participatein comparisons
Следующее
От: Heikki Linnakangas
Дата:
Сообщение: Re: Making all nbtree entries unique by having heap TIDs participatein comparisons