> I was definitely hand-waving additional implementation here for
> non-native 128 bit support; the modulus algorithm as presented
> requires 4 times the space as the divisor, so a uint16 implementation
> should work for all 64-bit machines. Certainly open to other ideas or
> implementations, this was the one I was able to find initially. If
> the 16bit approach is all that is needed in practice we can also see
> about narrowing the domain and not worry about making this a
> general-purpose function.
Here's a patch atop the series which converts to 16-bit uints and
passes regressions, but I don't consider well-vetted at this point.
David