Обсуждение: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
[PATCH] Custom code int(32|64) => text conversions out of performance reasons
От
Andres Freund
Дата:
Hi, While looking at binary COPY performance I forgot to add BINARY and was a bit shocked to see printf that high in the profile... Setup: CREATE TABLE convtest AS SELECT a.i ai, b.i bi, a.i*b.i aibi, (a.i*b.i)::text aibit FROM generate_series(1,1000) a(i), generate_series(1, 10000) b(i); Profile with an unmodified pg: speedtest=# COPY convtest(ai,bi,aibi) TO '/dev/null'; COPY 10000000 Time: 9192.476 ms Profile:# Events: 9K cycles## Overhead Command Shared Object Symbol# ........ ............... ................. ............................# 18.24% postgres_oldint libc-2.12.1.so [.] __GI_vfprintf 8.90% postgres_oldint libc-2.12.1.so [.] _itoa_word 8.77% postgres_oldint postgres_oldint [.] CopyOneRowTo 8.19% postgres_oldint libc-2.12.1.so [.] _IO_default_xsputn_internal 3.67% postgres_oldint postgres_oldint [.] AllocSetAlloc 3.38% postgres_oldint libc-2.12.1.so [.] __strchrnul 3.24% postgres_oldint libc-2.12.1.so [.] __GI___vsprintf_chk 2.87% postgres_oldint postgres_oldint [.] heap_deform_tuple 2.49% postgres_oldint libc-2.12.1.so [.] _IO_old_init 2.25% postgres_oldint libc-2.12.1.so [.] _IO_new_file_xsputn 2.03% postgres_oldint postgres_oldint [.] appendBinaryStringInfo 1.89% postgres_oldint postgres_oldint [.] heapgettup_pagemode 1.86% postgres_oldint postgres_oldint [.] FunctionCall1 1.85% postgres_oldint postgres_oldint [.] AllocSetCheck 1.79% postgres_oldint postgres_oldint [.] enlargeStringInfo Timing after replacing those sprintf("%li", ...) calls with a quickly coded handrolled itoa: speedtest=# COPY convtest(ai,bi,aibi) TO '/dev/null'; COPY 10000000 Time: 5309.928 ms Profile:# Events: 5K cycles## Overhead Command Shared Object Symbol# ........ ........ ................. ...........................# 14.96% postgres postgres [.] pg_s32toa 14.75% postgres postgres [.] CopyOneRowTo 5.97% postgres postgres [.] AllocSetAlloc 4.73% postgres postgres [.] heap_deform_tuple 4.54% postgres postgres [.] AllocSetCheck 4.01% postgres libc-2.12.1.so [.] _IO_new_file_xsputn 3.59% postgres postgres [.] heapgettup_pagemode 3.32% postgres postgres [.] enlargeStringInfo 3.25% postgres postgres [.] appendBinaryStringInfo 2.87% postgres postgres [.] CopySendChar 2.65% postgres postgres [.]FunctionCall1 2.44% postgres postgres [.] int4out 2.38% postgres [kernel.kallsyms] [k] copy_user_generic_string 2.30% postgres postgres [.] AllocSetReset 2.06% postgres postgres [.] pg_server_to_client 1.89% postgres libc-2.12.1.so [.] __GI_memset 1.87% postgres libc-2.12.1.so [.] memcpy A change from 9192.476ms 5309.928ms seems to be pretty good indication that a change in that area is waranted given integer columns are quite ubiquous... While at it: * I remove the outdated -- NOTE: int[24] operators never check for over/underflow! -- Some of these answers are consequently numerically incorrect. warnings in the regressions tests. * I renamed pg_[il]toa to pg_s(16|32|64)toa - I found the names confusing. Not sure if its worth it. * I added some tests for the border cases of 2^31-1 / -2^31 The 'after' profile shows obvious room for furhter improvement, but on a quick look I couldn't think of anything. Any Ideas? Andres PS: Oh, thats with assertions, but the results are comparable without them (8765.796ms vs 4561.673ms)
Re: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
От
Itagaki Takahiro
Дата:
On Mon, Nov 1, 2010 at 6:41 AM, Andres Freund <andres@anarazel.de> wrote: > While looking at binary COPY performance I forgot to add BINARY and was a bit > shocked to see printf that high in the profile... > > A change from 9192.476ms 5309.928ms seems to be pretty good indication that a > change in that area is waranted given integer columns are quite ubiquous... Good optimization. Here is the result on my machine: * before: 13057.190 ms, 12429.092 ms, 12622.374 ms * after: 8261.688 ms, 8427.024 ms, 8622.370 ms > * I renamed pg_[il]toa to pg_s(16|32|64)toa - I found the names confusing. Not > sure if its worth it. Agreed, but how about pg_i(16|32|64)toa? 'i' might be more popular than 's'. See also http://msdn.microsoft.com/en-US/library/yakksftt(VS.100).aspx I have a couple of questions and comments: * Why did you change "MAXINT8LEN + 1" to "+ 2" ? Are there possibility of buffer overflow in the current code? @@ -158,12 +159,9 @@ int8out(PG_FUNCTION_ARGS) - char buf[MAXINT8LEN + 1]; + char buf[MAXINT8LEN + 2]; * The buffer reordering seems a bit messy. //have to reorder the string, but not 0byte. I'd suggest to fill a fixed-size local buffer from right to left and copy it to the actual output. * C++-style comments should be cleaned up. -- Itagaki Takahiro
Re: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
От
Robert Haas
Дата:
On Sun, Oct 31, 2010 at 11:04 PM, Itagaki Takahiro <itagaki.takahiro@gmail.com> wrote: > On Mon, Nov 1, 2010 at 6:41 AM, Andres Freund <andres@anarazel.de> wrote: >> While looking at binary COPY performance I forgot to add BINARY and was a bit >> shocked to see printf that high in the profile... >> >> A change from 9192.476ms 5309.928ms seems to be pretty good indication that a >> change in that area is waranted given integer columns are quite ubiquous... > > Good optimization. Here is the result on my machine: > * before: 13057.190 ms, 12429.092 ms, 12622.374 ms > * after: 8261.688 ms, 8427.024 ms, 8622.370 ms Wow. Nice stuff, Andres! -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
От
Andres Freund
Дата:
On Monday 01 November 2010 04:04:51 Itagaki Takahiro wrote: > On Mon, Nov 1, 2010 at 6:41 AM, Andres Freund <andres@anarazel.de> wrote: > > While looking at binary COPY performance I forgot to add BINARY and was a > > bit shocked to see printf that high in the profile... > > > > A change from 9192.476ms 5309.928ms seems to be pretty good indication > > that a change in that area is waranted given integer columns are quite > > ubiquous... > > Good optimization. Here is the result on my machine: > * before: 13057.190 ms, 12429.092 ms, 12622.374 ms > * after: 8261.688 ms, 8427.024 ms, 8622.370 ms Thanks. > > * I renamed pg_[il]toa to pg_s(16|32|64)toa - I found the names > > confusing. Not sure if its worth it. > > Agreed, but how about pg_i(16|32|64)toa? 'i' might be more popular than > 's'. See also > http://msdn.microsoft.com/en-US/library/yakksftt(VS.100).aspx I find itoa not as clear about signedness as stoa, but if you insist, I dont feel strongly about it. > I have a couple of questions and comments: > > * Why did you change "MAXINT8LEN + 1" to "+ 2" ? > Are there possibility of buffer overflow in the current code? > @@ -158,12 +159,9 @@ int8out(PG_FUNCTION_ARGS) > - char buf[MAXINT8LEN + 1]; > + char buf[MAXINT8LEN + 2]; Argh. That should have never gotten into the patch. I was playing around with another optimization which would have needed more buffer space (but was quite a bit slower). > * The buffer reordering seems a bit messy. > //have to reorder the string, but not 0byte. > I'd suggest to fill a fixed-size local buffer from right to left > and copy it to the actual output. Hm. while(bufstart < buf){ char swap = *bufstart; *bufstart++ = *buf; *buf-- = swap;} Is a bit cleaner maybe, but I dont see much point in putting it into its own function... But again, I don't feel strongly. > * C++-style comments should be cleaned up. Will clean up. Andres
Re: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
От
Andres Freund
Дата:
Hi, On Monday 01 November 2010 10:15:01 Andres Freund wrote: > On Monday 01 November 2010 04:04:51 Itagaki Takahiro wrote: > > On Mon, Nov 1, 2010 at 6:41 AM, Andres Freund <andres@anarazel.de> wrote: > > > While looking at binary COPY performance I forgot to add BINARY and was > > > a bit shocked to see printf that high in the profile... > > > > > > A change from 9192.476ms 5309.928ms seems to be pretty good indication > > > that a change in that area is waranted given integer columns are quite > > > ubiquous... > > > * I renamed pg_[il]toa to pg_s(16|32|64)toa - I found the names > > > confusing. Not sure if its worth it. > > Agreed, but how about pg_i(16|32|64)toa? 'i' might be more popular than > > 's'. See also > > http://msdn.microsoft.com/en-US/library/yakksftt(VS.100).aspx > I find itoa not as clear about signedness as stoa, but if you insist, I > dont feel strongly about it. Let whover commits it decide... > > * The buffer reordering seems a bit messy. > > //have to reorder the string, but not 0byte. > > I'd suggest to fill a fixed-size local buffer from right to left > > and copy it to the actual output. > Is a bit cleaner maybe, but I dont see much point in putting it into its > own function... But again, I don't feel strongly. Using a seperate buffer cost nearly 500ms... So I only changed the comments there. The only way I could think of to make it faster was to fill the buffer from the end and then return a pointer to the starting point in the buffer. The speed benefits are small (around 80ms) and it makes the interface more cumbersome... Revised version attached - I will submit this to the next comittfest now. Andres
Re: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
От
Andres Freund
Дата:
On Tuesday 02 November 2010 01:37:43 Andres Freund wrote: > Revised version attached - I will submit this to the next comittfest now. Context diff attached this time...
Re: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
От
Peter Eisentraut
Дата:
On sön, 2010-10-31 at 22:41 +0100, Andres Freund wrote: > * I renamed pg_[il]toa to pg_s(16|32|64)toa - I found the names > confusing. Not sure if its worth it. Given that there are widely established functions atoi() and atol(), naming the reverse itoa() and ltoa() makes a lot of sense. The changed versions read like "string to ASCII".
Peter Eisentraut <peter_e@gmx.net> writes: > On sön, 2010-10-31 at 22:41 +0100, Andres Freund wrote: >> * I renamed pg_[il]toa to pg_s(16|32|64)toa - I found the names >> confusing. Not sure if its worth it. > Given that there are widely established functions atoi() and atol(), > naming the reverse itoa() and ltoa() makes a lot of sense. The changed > versions read like "string to ASCII". Yeah, and "s32" makes no sense at all. I think we should either leave well enough alone (to avoid introducing a cross-version backpatch hazard) or use pg_i32toa etc. regards, tom lane
Re: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
От
Robert Haas
Дата:
On Sun, Oct 31, 2010 at 5:41 PM, Andres Freund <andres@anarazel.de> wrote: > While at it: These words always make me a bit frightened when reviewing a patch, since it's generally simpler if a single patch only does one thing. However, in this case... > * I remove the outdated > -- NOTE: int[24] operators never check for over/underflow! > -- Some of these answers are consequently numerically incorrect. > warnings in the regressions tests. ...this part looks obviously OK, so I have committed it. The rest is attached as a residual patch, except that I reverted this change: > * I renamed pg_[il]toa to pg_s(16|32|64)toa - I found the names confusing. Not > sure if its worth it. I notice that int8out isn't terribly consistent with int2out and int4out, in that it does an extra copy. Maybe that's justified given the greater potential memory wastage, but I'm not certain. One approach might be to pick some threshold value and allocate a buffer in one of two sizes based on how large the value is relative to that cutoff. But that might also be a stupid idea, not sure. It would speed things up for me if you or someone else could take a quick pass over what remains here and fix the formatting and whitespace to be consistent with our general project style, and make the comment headers more consistent among the functions being added/modified. I think the new regression tests look good. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Вложения
Re: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
От
Andres Freund
Дата:
On Monday 15 November 2010 17:12:25 Robert Haas wrote: > It would speed things up for me if you or someone else could take a > quick pass over what remains here and fix the formatting and > whitespace to be consistent with our general project style, and make > the comment headers more consistent among the functions being > added/modified. will do. Andres
Re: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
От
Andres Freund
Дата:
On Monday 15 November 2010 17:12:25 Robert Haas wrote:> I notice that int8out isn't terribly consistent with int2out and > int4out, in that it does an extra copy. Maybe that's justified given > the greater potential memory wastage, but I'm not certain. One > approach might be to pick some threshold value and allocate a buffer > in one of two sizes based on how large the value is relative to that > cutoff. But that might also be a stupid idea, not sure. I removed the extra buffer - its actually a tiny bit faster without it (I guess the allocation pattern is a bit nicer during copy as it will always take the same paths and eventually the same address). I couldn't measure any difference memory-usage wise. The code was that way before btw. > It would speed things up for me if you or someone else could take a > quick pass over what remains here and fix the formatting and > whitespace to be consistent with our general project style, and make > the comment headers more consistent among the functions being > added/modified. I think I did most of those - the function comments in numutils weren't consistent before - now its consistent with the unchanged pg_atoi. Thanks for reviewing/applying the first part, Andres
Re: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
От
Robert Haas
Дата:
On Fri, Nov 19, 2010 at 4:16 PM, Andres Freund <andres@anarazel.de> wrote: > On Monday 15 November 2010 17:12:25 Robert Haas wrote:> I notice that int8out > isn't terribly consistent with int2out and >> int4out, in that it does an extra copy. Maybe that's justified given >> the greater potential memory wastage, but I'm not certain. One >> approach might be to pick some threshold value and allocate a buffer >> in one of two sizes based on how large the value is relative to that >> cutoff. But that might also be a stupid idea, not sure. > I removed the extra buffer - its actually a tiny bit faster without it (I > guess the allocation pattern is a bit nicer during copy as it will always take > the same paths and eventually the same address). > I couldn't measure any difference memory-usage wise. > > The code was that way before btw. Yeah, I know. After further thought I decided not to commit this part, because using 32 bytes when you only need 8 is sort of sucky. I'm not sure if it matters in real life, but if it's only a tiny speedup I guess I might as well play it safe. >> It would speed things up for me if you or someone else could take a >> quick pass over what remains here and fix the formatting and >> whitespace to be consistent with our general project style, and make >> the comment headers more consistent among the functions being >> added/modified. > I think I did most of those - the function comments in numutils weren't > consistent before - now its consistent with the unchanged pg_atoi. > > Thanks for reviewing/applying the first part, Sure thing. Thanks for taking time to do this - very nice speedup. This part now committed, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
От
Robert Haas
Дата:
On Fri, Nov 19, 2010 at 10:18 PM, Robert Haas <robertmhaas@gmail.com> wrote: > Sure thing. Thanks for taking time to do this - very nice speedup. > This part now committed, too. It occurs to me belatedly that there might be a better way to do this.Instead of flipping value from negative to positive,with a special case for the smallest possible integer, we could do it the other round. And actually, I think we can rid of neg, too. if (value < 0) *a++ = '-'; else value = -value; start = a; Then we could just adjust the calculation of the actual digit. *a++ = '0' + (-remainder); Good idea? Bad idea? Seems cleaner to me, assuming it'll actually work... -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > It occurs to me belatedly that there might be a better way to do this. > Instead of flipping value from negative to positive, with a special > case for the smallest possible integer, we could do it the other > round. And actually, I think we can rid of neg, too. The trouble with that approach is that you have to depend on the direction of rounding for negative quotients. Which was unspecified before C99, and it's precisely pre-C99 compilers that are posing a hazard to the current coding. FWIW, I find the code still pretty darn unsightly. I think this change is just wrong: * Avoid problems with the most negative integer not being representable * as a positive integer. */ - if (value == INT32_MIN) + if (value == INT_MIN) { memcpy(a, "-2147483648", 12); and even with INT32_MIN it was pretty silly, because there is exactly 0 hope of the code behaving sanely for some other value of the symbolic constant. I think it'd be much better to abandon the macros altogether and write if (value == (-2147483647-1)) { memcpy(a, "-2147483648", 12); Likewise for the int64 case, which BTW is no safer for pre-C99 compilers than it was yesterday: LL is not the portable way to write int64 constants. regards, tom lane
Re: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
От
Robert Haas
Дата:
On Sat, Nov 20, 2010 at 10:38 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > The trouble with that approach is that you have to depend on the > direction of rounding for negative quotients. Which was unspecified > before C99, and it's precisely pre-C99 compilers that are posing a > hazard to the current coding. Interesting. I wondered whether there might be compilers out there that handled that inconsistently, but then I thought I was probably being paranoid. > Likewise for the int64 case, which BTW is no safer for pre-C99 compilers > than it was yesterday: LL is not the portable way to write int64 > constants. Gah. I wish we had some documentation of this stuff. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
BTW, while we're thinking about marginal improvements: instead of constructing the string backwards and then reversing it in-place, what about building it working backwards from the end of the buffer and then memmove'ing it down to the start of the buffer? I haven't tested this but it seems likely to be roughly a wash speed-wise. The reason I find the idea attractive is that it will immediately expose any caller that is providing a buffer shorter than the required length, whereas now such callers will appear to work fine if they're only tested on small values. A small downside is that pg_itoa would then need its own implementation instead of just punting to pg_ltoa. regards, tom lane
Re: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
От
Robert Haas
Дата:
On Sat, Nov 20, 2010 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > BTW, while we're thinking about marginal improvements: instead of > constructing the string backwards and then reversing it in-place, > what about building it working backwards from the end of the buffer > and then memmove'ing it down to the start of the buffer? > > I haven't tested this but it seems likely to be roughly a wash > speed-wise. The reason I find the idea attractive is that it will > immediately expose any caller that is providing a buffer shorter > than the required length, whereas now such callers will appear to > work fine if they're only tested on small values. > > A small downside is that pg_itoa would then need its own implementation > instead of just punting to pg_ltoa. I think that might be more clever than is really warranted. I get your point about buffer overrun, but I don't think it's that hard for callers to do the right thing, so I'm inclined to think that's not worth much in this case. Of course, if memmove() can be implemented as a single assembler instruction or something, that might be appealing from a speed standpoint, but otherwise I think we may as well stick with this. There's less chance of needlessly touching an extra cache line, less chance of being confused by leftover garbage in memory after the end of the output string, and less duplicate code. I had given some thought to whether it might make sense to try to figure out how long the string will be before we actually start generating it, so that we can just start in the exactly right space and have to clean up afterward. But the obvious implementation seems like it could be more expensive than just doing the copy. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes: > On Sat, Nov 20, 2010 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> what about building it working backwards from the end of the buffer >> and then memmove'ing it down to the start of the buffer? > I think that might be more clever than is really warranted. I get > your point about buffer overrun, but I don't think it's that hard for > callers to do the right thing, so I'm inclined to think that's not > worth much in this case. Fair enough --- it was just a passing thought. > I had given some thought to whether it might make sense to try to > figure out how long the string will be before we actually start > generating it, so that we can just start in the exactly right space > and have to clean up afterward. But the obvious implementation seems > like it could be more expensive than just doing the copy. Yeah. You certainly don't want to do the division sequence twice, and a log() call wouldn't be cheap either, and there don't seem to be many other alternatives. If we were going to get picky about avoiding the reverse step, I'd go with Andres' idea of changing the API to pass back an address instead of guaranteeing that the result begins at the start of the buffer. But I think that's much more complicated for callers than it's worth. regards, tom lane
Re: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
От
Andres Freund
Дата:
On Saturday 20 November 2010 18:34:04 Tom Lane wrote: > BTW, while we're thinking about marginal improvements: instead of > constructing the string backwards and then reversing it in-place, > what about building it working backwards from the end of the buffer > and then memmove'ing it down to the start of the buffer? > > I haven't tested this but it seems likely to be roughly a wash > speed-wise. The reason I find the idea attractive is that it will > immediately expose any caller that is providing a buffer shorter > than the required length, whereas now such callers will appear to > work fine if they're only tested on small values. Tried that, the cost was measurable although not big (~3-5%)... Greetings, Andres
Re: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
От
Andres Freund
Дата:
On Saturday 20 November 2010 18:18:32 Robert Haas wrote: > > Likewise for the int64 case, which BTW is no safer for pre-C99 compilers > > than it was yesterday: LL is not the portable way to write int64 > > constants. > Gah. I wish we had some documentation of this stuff. Dito. I started doing Cish stuff quite a bit *after* C99 was mostly available in gcc... Sorry btw, for not realizing those points (and the regression-expectation file) myself... Andres
On Sat, Nov 20, 2010 at 6:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Sat, Nov 20, 2010 at 12:34 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> what about building it working backwards from the end of the buffer >>> and then memmove'ing it down to the start of the buffer? > >> I think that might be more clever than is really warranted. I get >> your point about buffer overrun, but I don't think it's that hard for >> callers to do the right thing, so I'm inclined to think that's not >> worth much in this case. It also seems wrong that a caller might happen to know that their argument will never be more than n digits but still has to allocate a buffer large enough to hold 2^64. > > Fair enough --- it was just a passing thought. > >> I had given some thought to whether it might make sense to try to >> figure out how long the string will be before we actually start >> generating it, so that we can just start in the exactly right space >> and have to clean up afterward. But the obvious implementation seems >> like it could be more expensive than just doing the copy. > > Yeah. You certainly don't want to do the division sequence twice, > and a log() call wouldn't be cheap either, and there don't seem to > be many other alternatives. There are bittwiddling hacks for computing log based 2. I'm not sure it's worth worrying about to this degree though. -- greg
Greg Stark <gsstark@mit.edu> writes: > On Sat, Nov 20, 2010 at 6:31 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> I had given some thought to whether it might make sense to try to >>> figure out how long the string will be before we actually start >>> generating it, so that we can just start in the exactly right space >>> and have to clean up afterward. �But the obvious implementation seems >>> like it could be more expensive than just doing the copy. >> Yeah. �You certainly don't want to do the division sequence twice, >> and a log() call wouldn't be cheap either, and there don't seem to >> be many other alternatives. > There are bittwiddling hacks for computing log based 2. I'm not sure > it's worth worrying about to this degree though. I think converting log2 to log10 *exactly* would end up being not so cheap, anyhow. regards, tom lane
Re: [PATCH] Custom code int(32|64) => text conversions out of performance reasons
От
Florian Weimer
Дата:
* Tom Lane: > Yeah. You certainly don't want to do the division sequence twice, > and a log() call wouldn't be cheap either, and there don't seem to > be many other alternatives. What about a sequence of comparisons, and unrolling the loop? That could avoid the final division, too. It might also be helpful to break down the dependency chain for large input values. The int8 version should probably work in 1e9 chunks and use a zero-padding variant of the 32-bit code. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99