Re: [HACKERS] S_LOCK reduced contention
От | Tom Lane |
---|---|
Тема | Re: [HACKERS] S_LOCK reduced contention |
Дата | |
Msg-id | 22882.897662578@sss.pgh.pa.us обсуждение исходный текст |
Ответ на | Re: [PATCHES] Try again: S_LOCK reduced contention (dg@illustra.com (David Gould)) |
Ответы |
Re: [HACKERS] S_LOCK reduced contention
(dg@illustra.com (David Gould))
|
Список | pgsql-hackers |
dg@illustra.com (David Gould) writes: > [ Much careful testing snipped ] Nice job, David! > gprof introduces severe experimental error for small quick functions. > Perhaps the spinlock function is so short and quick compared to the > mcount overhead added to the function prolog that the overhead dominates > the measurement. gprof remains a useful tool for larger functions with > longer runtimes, but must be considered very suspect for tiny functions. Right. gprof is a fine example of Heisenberg's Uncertainty Principle applied to software ;-). You can't measure something without affecting it. As Bruce Momjian just pointed out in another followup, running the test function a lot of times in a tight loop isn't a perfect answer either: you find out what happens when the function's code and referenced data are all in cache, but you can't tell much about what happens when they are not; and you can't tell whether the whole application's memory usage patterns are such that the function will remain in cache. I'm guessing that the backend uses tas() enough that it will probably stay in cache, but that is *strictly* a guess with no evidence. > In some of the test cases there was significant timing variation from > run to run even though the test conditions were apparently identical. > Even more strangely, the variation in time was not random but appeared > to represent two different modes. And, the variation was itself > repeatable. > [snip] > I have no explanation for this variation. Possibly it is some interaction > of where the program is loaded and the state of the memory heirarchy, but > even this is hard to sustain. I would be very curious to hear of any > plausible explainations. After chewing on this for a while I think that your speculation is right. You were using a Pentium, you said. The Pentium has a two-way set associative cache, which means that any given main-memory address has exactly two cache lines it could be loaded into. Main-memory addresses that are 1000H apart contend for the same pair of cache lines. Thus, if your program happens to repeatedly hit three locations that are exactly 1000H apart, it will suffer a cache miss every time. Change the address spacing, and no miss occurs. The cache miss takes forty-some clock cycles, versus one if the target location is in cache. (BTW, I'm getting this info out of Rick Booth's "Inner Loops", a fine reference book if you are into hand-optimized assembly coding for Intel processors.) So what I think you were seeing is that on some runs, the loop involved hitting three addresses that contend for the same cache line pair, while on other runs there was no cache contention. This could be explained by varying load addresses for the program, if it touched both its own addresses (variable) and some non-varying addresses --- say, C library routines executed from a shared library that remained loaded throughout. If you can link with a non-shared C library then you should get more consistent runtimes, because the offsets between all the locations touched by your loop would be fixed, and thus cache hits or misses ought to be consistent from run to run. The bottom line, however, is that this behavior is too unpredictable to be worth worrying about in a production program. A cache miss in a tight loop could be created or eliminated by unrelated changes in distant parts of the code ... and in any case the behavior will be different on different CPUs. The 486, Pentium, and Pentium Pro all have radically different cache layouts, let alone non-Intel CPUs. regards, tom lane
В списке pgsql-hackers по дате отправления: