ohp@pyrenet.fr wrote:
>Hi Manfred,
>
>I'm using unixware 7 but couldn't compile your source with native cc, I
>had to compile it with gcc.
>
>here are the results:
>
>
Thanks. The test app compares the time needed for three different short
loops: a loop with six empty function calls, a loop with six function
calls and one nop in the middle, and a loop with a "rep;nop;" in the middle.
Result:
- nop needs 0 cycles - executed in parallel.
- rep;nop between 24 and 60 cycles - long enough that the pipeline is
emptied.
I've searched around for further info regarding the recommended spinlock
algorithm:
- The optimization manual (google for "Intel 248966") contains a section
about pause instructions: The memory ordering violation is from the
multiple simultaneous reads that are executed due to pipelining the busy
loop.
- It references the Application Note AP-949 "Using Spin-Loops on Intel
Pentium 4 Processor and Intel Xeon Processor" for further details.
Unfortunately the app notes are stored on cedar.intel.com, and that
server appears to be down :-(
-- Manfred