Re: Spinlocks, yet again: analysis and proposed patches

Поиск
Список
Период
Сортировка
От	Tom Lane
Тема	Re: Spinlocks, yet again: analysis and proposed patches
Дата	14 сентября 2005 г. 14:32:52
Msg-id	5962.1126719163@sss.pgh.pa.us обсуждение исходный текст
Ответ на	Re: Spinlocks, yet again: analysis and proposed patches (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы	Re: Spinlocks, yet again: analysis and proposed patches Re: Spinlocks, yet again: analysis and proposed patches
Список	pgsql-hackers
Дерево обсуждения
I wrote:
> Another thought came to mind: maybe the current data layout for LWLocks
> is bad.  Right now, the spinlock that protects each LWLock data struct
> is itself part of the struct, and since the structs aren't large (circa
> 20 bytes), the whole thing is usually all in the same cache line. ...
> Maybe it'd be better to allocate the spinlocks off by themselves.

Well, this is odd.  I made up a patch to do this (attached) and found
that it pretty much sucks.  Still the 4-way Opteron, previous best
(slock-no-cmpb and spin-delay-2):
    1 31s    2 42s    4 51s    8 100s
with lwlock-separate added:
    1 31s    2 52s    4 106s    8 213s

What I had expected to see was a penalty in the single-thread case (due
to instructions added to the LWLock functions), but there isn't any.
I did not expect to see a factor-of-2 penalty for four threads.

I guess what this means is that there's no real problem with losing the
cache line while manipulating the LWLock, which is what the patch was
intended to prevent.  Instead, we're paying for swapping two cache lines
(the spinlock and the LWLock) across processors instead of just one line.
But that should at worst be a 2x inflation of the time previously spent
in LWLockAcquire/Release, which is surely not yet all of the application
;-).  Why the heck is this so bad?  Should we expect that apparently
minor changes in shared data structures might be costing equivalently
huge penalties in SMP performance elsewhere?

Unfortunately I don't have root on the Opteron and can't run oprofile.
But I'd really like to see some oprofile stats from these two cases
so we can figure out what in the world is going on here.  Can anyone
help?

            regards, tom lane

*** src/backend/storage/lmgr/lwlock.c.orig    Sat Aug 20 19:26:24 2005
--- src/backend/storage/lmgr/lwlock.c    Wed Sep 14 11:50:09 2005
***************
*** 27,35 ****
  #include "storage/spin.h"


  typedef struct LWLock
  {
!     slock_t        mutex;            /* Protects LWLock and queue of PGPROCs */
      bool        releaseOK;        /* T if ok to release waiters */
      char        exclusive;        /* # of exclusive holders (0 or 1) */
      int            shared;            /* # of shared holders (0..MaxBackends) */
--- 27,44 ----
  #include "storage/spin.h"


+ /*
+  * Each LWLock has an associated spinlock, which we consider as locking the
+  * content of the LWLock struct as well as the associated queue of waiting
+  * PGPROCs.  To minimize cache-line contention on multiprocessors, the
+  * spinlocks are kept physically separate from their LWLocks --- we don't
+  * want a spinning processor to interfere with access to the LWLock for
+  * the processor holding the spinlock.
+  */
+
  typedef struct LWLock
  {
!     slock_t       *mutex;            /* Protects LWLock and queue of PGPROCs */
      bool        releaseOK;        /* T if ok to release waiters */
      char        exclusive;        /* # of exclusive holders (0 or 1) */
      int            shared;            /* # of shared holders (0..MaxBackends) */
***************
*** 39,44 ****
--- 48,65 ----
  } LWLock;

  /*
+  * We allocate CACHE_LINE_SIZE bytes for the fixed LWLocks' spinlocks.
+  * We assume that the fixed locks are few enough and heavily used enough
+  * that it's worth trying to put each one's spinlock in its own cache line.
+  * The dynamic locks are assumed to be many and less heavily hit, so they
+  * get just MAXALIGN space apiece.  (Packing spinlocks tightly sounds like a
+  * bad idea on machines where slock_t is just a byte ... even for lesser-used
+  * spinlocks, there would be enough locks per cache line to create a
+  * contention issue.)
+  */
+ #define CACHE_LINE_SIZE        64
+
+ /*
   * This points to the array of LWLocks in shared memory.  Backends inherit
   * the pointer by fork from the postmaster.  LWLockIds are indexes into
   * the array.
***************
*** 135,144 ****
      Size        size;
      int            numLocks = NumLWLocks();

!     /* Allocate the LWLocks plus space for shared allocation counter. */
      size = mul_size(numLocks, sizeof(LWLock));

!     size = add_size(size, 2 * sizeof(int));

      return size;
  }
--- 156,174 ----
      Size        size;
      int            numLocks = NumLWLocks();

!     /* Space for the array of LWLock structs. */
      size = mul_size(numLocks, sizeof(LWLock));

!     /* Space for dynamic allocation counter and MAXALIGN padding. */
!     size = add_size(size, 2 * sizeof(int) + MAXIMUM_ALIGNOF);
!
!     /* Space for the spinlocks --- see notes for CACHE_LINE_SIZE. */
!     size = add_size(size,
!                     mul_size((int) NumFixedLWLocks,
!                              Max(sizeof(slock_t), CACHE_LINE_SIZE)));
!     size = add_size(size,
!                     mul_size(numLocks - (int) NumFixedLWLocks,
!                              Max(sizeof(slock_t), MAXIMUM_ALIGNOF)));

      return size;
  }
***************
*** 153,182 ****
      int            numLocks = NumLWLocks();
      Size        spaceLocks = LWLockShmemSize();
      LWLock       *lock;
      int            id;

      /* Allocate space */
      LWLockArray = (LWLock *) ShmemAlloc(spaceLocks);

      /*
       * Initialize all LWLocks to "unlocked" state
       */
      for (id = 0, lock = LWLockArray; id < numLocks; id++, lock++)
      {
!         SpinLockInit(&lock->mutex);
          lock->releaseOK = true;
          lock->exclusive = 0;
          lock->shared = 0;
          lock->head = NULL;
          lock->tail = NULL;
-     }

!     /*
!      * Initialize the dynamic-allocation counter at the end of the array
!      */
!     LWLockCounter = (int *) lock;
!     LWLockCounter[0] = (int) NumFixedLWLocks;
!     LWLockCounter[1] = numLocks;
  }


--- 183,229 ----
      int            numLocks = NumLWLocks();
      Size        spaceLocks = LWLockShmemSize();
      LWLock       *lock;
+     char       *mutexptr;
      int            id;

      /* Allocate space */
      LWLockArray = (LWLock *) ShmemAlloc(spaceLocks);

      /*
+      * Set up dynamic-allocation counter at the end of the array of structs.
+      * (Since struct LWLock contains ints, we should not need any extra
+      * alignment padding here.)
+      */
+     LWLockCounter = (int *) (LWLockArray + numLocks);
+     LWLockCounter[0] = (int) NumFixedLWLocks;
+     LWLockCounter[1] = numLocks;
+
+     /*
+      * We place the spinlocks after the counter, maxaligning for safety.
+      */
+     mutexptr = (char *) (LWLockCounter + 2);
+     mutexptr = (char *) MAXALIGN(mutexptr);
+
+     /*
       * Initialize all LWLocks to "unlocked" state
       */
      for (id = 0, lock = LWLockArray; id < numLocks; id++, lock++)
      {
!         slock_t       *mutex = (slock_t *) mutexptr;
!
!         SpinLockInit(mutex);
!         lock->mutex = mutex;
          lock->releaseOK = true;
          lock->exclusive = 0;
          lock->shared = 0;
          lock->head = NULL;
          lock->tail = NULL;

!         if (id < (int) NumFixedLWLocks)
!             mutexptr += Max(sizeof(slock_t), CACHE_LINE_SIZE);
!         else
!             mutexptr += Max(sizeof(slock_t), MAXIMUM_ALIGNOF);
!     }
  }


***************
*** 207,212 ****
--- 254,260 ----
  LWLockAcquire(LWLockId lockid, LWLockMode mode)
  {
      volatile LWLock *lock = LWLockArray + lockid;
+     volatile slock_t *mutex = lock->mutex;
      PGPROC       *proc = MyProc;
      bool        retry = false;
      int            extraWaits = 0;
***************
*** 253,259 ****
          bool        mustwait;

          /* Acquire mutex.  Time spent holding mutex should be short! */
!         SpinLockAcquire_NoHoldoff(&lock->mutex);

          /* If retrying, allow LWLockRelease to release waiters again */
          if (retry)
--- 301,307 ----
          bool        mustwait;

          /* Acquire mutex.  Time spent holding mutex should be short! */
!         SpinLockAcquire_NoHoldoff(mutex);

          /* If retrying, allow LWLockRelease to release waiters again */
          if (retry)
***************
*** 304,310 ****
          lock->tail = proc;

          /* Can release the mutex now */
!         SpinLockRelease_NoHoldoff(&lock->mutex);

          /*
           * Wait until awakened.
--- 352,358 ----
          lock->tail = proc;

          /* Can release the mutex now */
!         SpinLockRelease_NoHoldoff(mutex);

          /*
           * Wait until awakened.
***************
*** 336,342 ****
      }

      /* We are done updating shared state of the lock itself. */
!     SpinLockRelease_NoHoldoff(&lock->mutex);

      /* Add lock to list of locks held by this backend */
      held_lwlocks[num_held_lwlocks++] = lockid;
--- 384,390 ----
      }

      /* We are done updating shared state of the lock itself. */
!     SpinLockRelease_NoHoldoff(mutex);

      /* Add lock to list of locks held by this backend */
      held_lwlocks[num_held_lwlocks++] = lockid;
***************
*** 359,364 ****
--- 407,413 ----
  LWLockConditionalAcquire(LWLockId lockid, LWLockMode mode)
  {
      volatile LWLock *lock = LWLockArray + lockid;
+     volatile slock_t *mutex = lock->mutex;
      bool        mustwait;

      PRINT_LWDEBUG("LWLockConditionalAcquire", lockid, lock);
***************
*** 375,381 ****
      HOLD_INTERRUPTS();

      /* Acquire mutex.  Time spent holding mutex should be short! */
!     SpinLockAcquire_NoHoldoff(&lock->mutex);

      /* If I can get the lock, do so quickly. */
      if (mode == LW_EXCLUSIVE)
--- 424,430 ----
      HOLD_INTERRUPTS();

      /* Acquire mutex.  Time spent holding mutex should be short! */
!     SpinLockAcquire_NoHoldoff(mutex);

      /* If I can get the lock, do so quickly. */
      if (mode == LW_EXCLUSIVE)
***************
*** 400,406 ****
      }

      /* We are done updating shared state of the lock itself. */
!     SpinLockRelease_NoHoldoff(&lock->mutex);

      if (mustwait)
      {
--- 449,455 ----
      }

      /* We are done updating shared state of the lock itself. */
!     SpinLockRelease_NoHoldoff(mutex);

      if (mustwait)
      {
***************
*** 424,429 ****
--- 473,479 ----
  LWLockRelease(LWLockId lockid)
  {
      volatile LWLock *lock = LWLockArray + lockid;
+     volatile slock_t *mutex = lock->mutex;
      PGPROC       *head;
      PGPROC       *proc;
      int            i;
***************
*** 446,452 ****
          held_lwlocks[i] = held_lwlocks[i + 1];

      /* Acquire mutex.  Time spent holding mutex should be short! */
!     SpinLockAcquire_NoHoldoff(&lock->mutex);

      /* Release my hold on lock */
      if (lock->exclusive > 0)
--- 496,502 ----
          held_lwlocks[i] = held_lwlocks[i + 1];

      /* Acquire mutex.  Time spent holding mutex should be short! */
!     SpinLockAcquire_NoHoldoff(mutex);

      /* Release my hold on lock */
      if (lock->exclusive > 0)
***************
*** 494,500 ****
      }

      /* We are done updating shared state of the lock itself. */
!     SpinLockRelease_NoHoldoff(&lock->mutex);

      /*
       * Awaken any waiters I removed from the queue.
--- 544,550 ----
      }

      /* We are done updating shared state of the lock itself. */
!     SpinLockRelease_NoHoldoff(mutex);

      /*
       * Awaken any waiters I removed from the queue.
В списке pgsql-hackers по дате отправления:
Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: Spinlocks, yet again: analysis and proposed patches