Re: Issue with the PRNG used by Postgres

Поиск
Список
Период
Сортировка
От Parag Paul
Тема Re: Issue with the PRNG used by Postgres
Дата
Msg-id CAA=PXp074Aab6xjyMfQoYhLUFFsDzT+0X=-JKbxyryXa9SH0eQ@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Issue with the PRNG used by Postgres  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Issue with the PRNG used by Postgres  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
hi Tom, 
 First of all thanks for you response. I did not misread it. The 0.5 is added to the result of the multiplication which then uses C integer casting, which does not round off, but just drops the decimal portion.


status->cur_delay += (int) (status->cur_delay *
                                    pg_prng_double(&pg_global_prng_state) + 0.5);


So, if RNG generated 0.0000001 and cur_delay =1000.
Result will be
1000 + int(1000*0.000001 + 5) = (int)(1000 + (0.1+.5)) = (int)1000.6 = 1000 <--  back to the same value
and there is no change after that starts happening, if the RNG is in the low hamming index state. If avalanche happens soon, then it will correct it self, but in the mean time, we have a stuck_spin_lock PANIC that could bring down a production server. 
-Parag

On Wed, Apr 10, 2024 at 8:09 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Robert Haas <robertmhaas@gmail.com> writes:
> I'm not convinced that we should try to improve the RNG, but surely we
> need to put parentheses around pg_prng_double(&pg_global_prng_state) +
> 0.5. IIUC, the current logic is making us multiply the spin delay by a
> value between 0 and 1 when what was intended was that it should be
> multiplied by a value between 0.5 and 1.5.

No, I think you are misreading it, because the assignment is += not =.
The present coding is

        /* increase delay by a random fraction between 1X and 2X */
        status->cur_delay += (int) (status->cur_delay *
                                    pg_prng_double(&pg_global_prng_state) + 0.5);

which looks fine to me.  The +0.5 is so that the conversion to integer
rounds rather than truncating.

In any case, I concur with Andres: if this behavior is anywhere near
critical then the right fix is to not be using spinlocks.

                        regards, tom lane

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Kartyshov Ivan
Дата:
Сообщение: Re: [HACKERS] make async slave to wait for lsn to be replayed
Следующее
От: Tom Lane
Дата:
Сообщение: Re: Speed up clean meson builds by ~25%