I tested crctest in two machines and two versions of gcc.
UltraSPARC III, gcc 2.95.3:
gcc -O1 crctest.c 1.321517 s
gcc -O2 crctest.c 1.099186 s
gcc -O3 crctest.c 1.099330 s
gcc -O1 crctest64.c 1.651599 s
gcc -O2 crctest64.c 1.429089 s
gcc -O3 crctest64.c 1.434296 s
UltraSPARC III, gcc 3.4.3:
gcc -O1 crctest.c 1.209168 s
gcc -O2 crctest.c 1.206253 s
gcc -O3 crctest.c 1.209762 s
gcc -O1 crctest64.c 1.545899 s
gcc -O2 crctest64.c 1.545290 s
gcc -O3 crctest64.c 1.540993 s
Pentium III, gcc 2.95.3:
gcc -O1 crctest.c 1.548432 s
gcc -O2 crctest.c 1.226873 s
gcc -O3 crctest.c 1.227699 s
gcc -O1 crctest64.c 1.362152 s
gcc -O2 crctest64.c 1.259324 s
gcc -O3 crctest64.c 1.259608 s
Pentium III, gcc 3.4.3:
gcc -O1 crctest.c 1.084822 s
gcc -O2 crctest.c 0.921594 s
gcc -O3 crctest.c 0.921910 s
gcc -O1 crctest64.c 1.188287 s
gcc -O2 crctest64.c 1.242013 s
gcc -O3 crctest64.c 1.638812 s
I think that it can improve the performance by loop unrolling.
I measured the performance when the loop unrolled by -funroll-loops
option or hand-tune. (hand-tune version is attached.)
UltraSPARC III, gcc 2.95.3:
gcc -O2 crctest.c 1.098880 s
gcc -O2 -funroll-loops crctest.c 0.874165 s
gcc -O2 crctest_unroll.c 0.808208 s
UltraSPARC III, gcc 3.4.3:
gcc -O2 crctest.c 1.209168 s
gcc -O2 -funroll-loops crctest.c 1.127973 s
gcc -O2 crctest_unroll.c 1.017485 s
Pentium III, gcc 2.95.3:
gcc -O2 crctest.c 1.226873 s
gcc -O2 -funroll-loops crctest.c 1.077475 s
gcc -O2 crctest_unroll.c 1.051375 s
Pentium III, gcc 3.4.3:
gcc -O2 crctest.c 0.921594 s
gcc -O2 -funroll-loops crctest.c 0.873614 s
gcc -O2 crctest_unroll.c 0.839384 s
regards,
---
Atsushi Ogawa