Hi,
On 2020-01-03 18:49:18 -0500, Tom Lane wrote:
> On some older RISC architectures, integer division is really slow, like
> slower than floating-point. I'm not sure if that's true on any platform
> people still care about though. In recent years, CPU architects have been
> able to throw all the transistors they needed at such problems. On a
> machine with single-cycle divide, it's likely that the extra
> compare-and-branch is a net loss.
Which architecture has single cycle division? I think it's way above
that, based on profiles I've seen. And Agner seems to back me up:
https://www.agner.org/optimize/instruction_tables.pdf
That lists a 32/64 idiv with a latency of ~26/~42-95 cycles, even on a
moder uarch like skylake-x.
Greetings,
Andres Freund