Re: Issue with the PRNG used by Postgres

Поиск
Список
Период
Сортировка
От Parag Paul
Тема Re: Issue with the PRNG used by Postgres
Дата
Msg-id CAA=PXp3jBDvx7HwOfeF8OFKZA7WD=ZDA+zdpTARnJaYWu2_2cw@mail.gmail.com
обсуждение исходный текст
Ответ на Re: Issue with the PRNG used by Postgres  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Issue with the PRNG used by Postgres
Список pgsql-hackers
hi Tom, 
 Sorry for the delayed response. I was collecting of the data from my production servers. 

The reason why this could be a problem is a flaw in the RNG with the enlarged Hamming belt. 
I attached an image here, with the RNG outputs from 2 backends. I ran our code for weeks, and collected ther
values generated by the RNG over many backends. The one in Green (say backend id 600), stopped flapping values and
only produced low (near 0 ) values for half an hour, whereas the Blue(say backend 700), kept generating good values and had
a range between [0-1)
During this period, the backed 600 suffered and ended up with spinlock stuck condition. 

-Parag


On Wed, Apr 10, 2024 at 9:28 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Actually ... Parag mentioned that this was specifically about
lwlock.c's usage of spinlocks.  It doesn't really use a spinlock,
but it does use s_lock.c's delay logic, and I think it's got the
usage pattern wrong:

    while (true)
    {
        /* always try once to acquire lock directly */
        old_state = pg_atomic_fetch_or_u32(&lock->state, LW_FLAG_LOCKED);
        if (!(old_state & LW_FLAG_LOCKED))
            break;                /* got lock */

        /* and then spin without atomic operations until lock is released */
        {
            SpinDelayStatus delayStatus;

            init_local_spin_delay(&delayStatus);

            while (old_state & LW_FLAG_LOCKED)
            {
                perform_spin_delay(&delayStatus);
                old_state = pg_atomic_read_u32(&lock->state);
            }
#ifdef LWLOCK_STATS
            delays += delayStatus.delays;
#endif
            finish_spin_delay(&delayStatus);
        }

        /*
         * Retry. The lock might obviously already be re-acquired by the time
         * we're attempting to get it again.
         */
    }

I don't think it's correct to re-initialize the SpinDelayStatus each
time around the outer loop.  That state should persist through the
entire acquire operation, as it does in a regular spinlock acquire.
As this stands, it resets the delay to minimum each time around the
outer loop, and I bet it is that behavior not the RNG that's to blame
for what he's seeing.

(One should still wonder what is the LWLock usage pattern that is
causing this spot to become so heavily contended.)

                        regards, tom lane
Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Issue with the PRNG used by Postgres
Следующее
От: Parag Paul
Дата:
Сообщение: Re: Issue with the PRNG used by Postgres