update i386 spinlock for hyperthreading

Поиск
Список
Период
Сортировка
От Manfred Spraul
Тема update i386 spinlock for hyperthreading
Дата
Msg-id 3FECB5A8.50708@colorfullife.com
обсуждение исходный текст
Ответы Re: update i386 spinlock for hyperthreading
Список pgsql-patches
Hi,

Intel recommends to add a special pause instruction into spinlock busy
loops. It's necessary for hyperthreading - without it, the cpu can't
figure out that a logical thread does no useful work and incorrectly
awards lots of execution resources to that thread. Additionally, it's
supposed to reduce the time the cpu needs to recover from the
(mispredicted) branch after the spinlock was obtained.
The attached patch adds a new platform hook and implements it for i386.
The new instruction is backward compatible, thus no cpu detection is
necessary.
Additionally I've increased the number of loops from 100 to 1000 - a 3
GHz Pentium 4 might execute 100 loops faster than a single bus
transaction. I don't know if this change is appropriate for all
platforms, or if SPINS_PER_DELAY should be made platform specific.

Mark did a test run with his dbt-2 benchmark on a 4-way Xeon with HT
enabled, and the patch resulted in a 10% performance increase:
Before:
http://developer.osdl.org/markw/dbt2-pgsql/284/
After:
http://developer.osdl.org/markw/dbt2-pgsql/300/

--
    Manfred
Index: ./src/backend/storage/lmgr/s_lock.c
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/backend/storage/lmgr/s_lock.c,v
retrieving revision 1.22
diff -c -r1.22 s_lock.c
*** ./src/backend/storage/lmgr/s_lock.c    23 Dec 2003 22:15:07 -0000    1.22
--- ./src/backend/storage/lmgr/s_lock.c    26 Dec 2003 22:24:52 -0000
***************
*** 76,82 ****
       * The select() delays are measured in centiseconds (0.01 sec) because 10
       * msec is a common resolution limit at the OS level.
       */
! #define SPINS_PER_DELAY        100
  #define NUM_DELAYS            1000
  #define MIN_DELAY_CSEC        1
  #define MAX_DELAY_CSEC        100
--- 76,82 ----
       * The select() delays are measured in centiseconds (0.01 sec) because 10
       * msec is a common resolution limit at the OS level.
       */
! #define SPINS_PER_DELAY        1000
  #define NUM_DELAYS            1000
  #define MIN_DELAY_CSEC        1
  #define MAX_DELAY_CSEC        100
***************
*** 111,116 ****
--- 111,117 ----

              spins = 0;
          }
+         CPU_DELAY();
      }
  }

Index: ./src/include/storage/s_lock.h
===================================================================
RCS file: /projects/cvsroot/pgsql-server/src/include/storage/s_lock.h,v
retrieving revision 1.123
diff -c -r1.123 s_lock.h
*** ./src/include/storage/s_lock.h    23 Dec 2003 22:15:07 -0000    1.123
--- ./src/include/storage/s_lock.h    26 Dec 2003 22:24:52 -0000
***************
*** 52,57 ****
--- 52,66 ----
   *    in assembly language to execute a hardware atomic-test-and-set
   *    instruction.  Equivalent OS-supplied mutex routines could be used too.
   *
+  *    Additionally, a platform specific delay function can be defined:
+  *
+  *    void CPU_DELAY(void)
+  *        Notification that the cpu is executing a busy loop.
+  *
+  *     Some platforms need such an indication. One example are platforms
+  *     that implement SMT, i.e. multiple logical threads that share
+  *     execution resources in one physical cpu.
+  *
   *    If no system-specific TAS() is available (ie, HAVE_SPINLOCKS is not
   *    defined), then we fall back on an emulation that uses SysV semaphores
   *    (see spin.c).  This emulation will be MUCH MUCH slower than a proper TAS()
***************
*** 115,120 ****
--- 124,140 ----
      return (int) _res;
  }

+ #define HAS_CPU_DELAY
+
+ #define CPU_DELAY()    cpu_delay()
+
+ static __inline__ void
+ cpu_delay(void)
+ {
+     __asm__ __volatile__(
+         " rep; nop            \n"
+         : : : "memory");
+ }
  #endif     /* __i386__ || __x86_64__ */


***************
*** 715,720 ****
--- 735,748 ----
  #define TAS(lock)        tas(lock)
  #endif     /* TAS */

+ #ifndef HAS_CPU_DELAY
+ #define CPU_DELAY() cpu_delay()
+
+ static __inline__ void
+ cpu_delay(void)
+ {
+ }
+ #endif

  /*
   * Platform-independent out-of-line support routines

В списке pgsql-patches по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: fork/exec patch: pre-CreateProcess finalization
Следующее
От: Tom Lane
Дата:
Сообщение: Re: update i386 spinlock for hyperthreading