Ok, I tried two optimisations:
1. By creating a special version of comparetup_index for single key
integer indexes. Create an index_get_attr with byval and len args. By
using fetch_att and specifying the values at compile time, gcc
optimises the whole call to about 12 instructions of assembly rather
than the usual mess.
2. By specifying: -Winline -finline-limit-1500 (only on tuplesort.c).
This causes inlineApplySortFunction() to be inlined, like the code
obviously expects it to be.
default build (baseline) 235 seconds
-finline only 217 seconds (7% better)
comparetup_index_fastbyval4 only 221 seconds (6% better)
comparetup_index_fastbyval4 and -finline 203 seconds (13.5% better)
This is indexing the integer sequence column on a 2.7 million row
table. The times are as given by gprof and so exclude system call time.
Basically, I recommend adding "-Winline -finline-limit-1500" to the
default build while we discuss other options.
comparetup_index_fastbyval4 patch attached per example.
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.