On 2014-09-11 10:32:24 -0300, Arthur Silva wrote: > Unaligned memory access received a lot attention in Intel post-Nehalen era. > So it may very well pay off on Intel servers. You might find this blog post > and it's comments/external-links interesting > http://lemire.me/blog/archives/2012/05/31/data-alignment-for-speed-myth-or-reality/
FWIW, the reported results of imo pretty meaningless for postgres. It's sequential access over larger amount of memory. I.e. a perfectly prefetchable workload where it doesn't matter if superflous cachelines are fetched because they're going to be needed next round anyway.
In many production workloads one of the most busy accesses to individual datums is the binary search on individual pages during index lookups. That's pretty much exactly the contrary to the above.
Not saying that it's not going to be a benefit in many scenarios, but it's far from being as simple as saying that unaligned accesses on their own aren't penalized anymore.
Greetings,
Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
I modified the test code to use a completely random scan pattern to test something that completely trashes the cache. Not realistic but still confirms the hypothesis that the overhead is minimal on modern Intel.
------------------ test results compiling for 32bit ------------------
processing word of size 2 offset = 0 average time for offset 0 is 422.7 offset = 1 average time for offset 1 is 422.85
processing word of size 4 offset = 0 average time for offset 0 is 436.6 offset = 1 average time for offset 1 is 451 offset = 2 average time for offset 2 is 444.3 offset = 3 average time for offset 3 is 441.9
processing word of size 8 offset = 0 average time for offset 0 is 630.15 offset = 1 average time for offset 1 is 653 offset = 2 average time for offset 2 is 655.5 offset = 3 average time for offset 3 is 660.85 offset = 4 average time for offset 4 is 650.1 offset = 5 average time for offset 5 is 656.9 offset = 6 average time for offset 6 is 656.6 offset = 7 average time for offset 7 is 656.9
------------------ test results compiling for 64bit ------------------ processing word of size 2 offset = 0 average time for offset 0 is 402.55 offset = 1 average time for offset 1 is 406.9
processing word of size 4 offset = 0 average time for offset 0 is 424.05 offset = 1 average time for offset 1 is 436.55 offset = 2 average time for offset 2 is 435.1 offset = 3 average time for offset 3 is 435.3
processing word of size 8 offset = 0 average time for offset 0 is 444.9 offset = 1 average time for offset 1 is 470.25 offset = 2 average time for offset 2 is 468.95 offset = 3 average time for offset 3 is 476.75 offset = 4 average time for offset 4 is 474.9 offset = 5 average time for offset 5 is 468.25 offset = 6 average time for offset 6 is 469.8 offset = 7 average time for offset 7 is 469.1