Re: Efficient output for integer types

Поиск
Список
Период
Сортировка
От David Fetter
Тема Re: Efficient output for integer types
Дата
Msg-id 20190920191450.GD31596@fetter.org
обсуждение исходный текст
Ответ на Re: Efficient output for integer types  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Ответы Re: Efficient output for integer types  (David Fetter <david@fetter.org>)
Список pgsql-hackers
On Wed, Sep 18, 2019 at 04:27:46PM +0900, Kyotaro Horiguchi wrote:
> Hello.
> 
> At Wed, 18 Sep 2019 05:42:01 +0200, David Fetter <david@fetter.org> wrote in <20190918034201.GX31596@fetter.org>
> > On Tue, Sep 17, 2019 at 09:01:57AM +0200, David Fetter wrote:
> > > On Tue, Sep 17, 2019 at 08:55:05AM +0200, David Fetter wrote:
> > > > On Sun, Sep 15, 2019 at 09:18:49AM +0200, David Fetter wrote:
> > > > > Folks,
> > > > > 
> > > > > Please find attached a couple of patches intended to $subject.
> > > > > 
> > > > > This patch set cut the time to copy ten million rows of randomly sized
> > > > > int8s (10 of them) by about a third, so at least for that case, it's
> > > > > pretty decent.
> > > > 
> > > > Added int4 output, removed the sprintf stuff, as it didn't seem to
> > > > help in any cases I was testing.
> > > 
> > > Found a couple of "whiles" that should have been "ifs."
> > 
> > Factored out some inefficient functions and made the guts use the more
> > efficient function.
> 
> I'm not sure this is on the KISS principle, but looked it and
> have several random comments.
> 
> +numutils.o: CFLAGS += $(PERMIT_DECLARATION_AFTER_STATEMENT)
> 
> I don't think that we are allowing that as project coding
> policy. It seems to have been introduced only to accept external
> code as-is.

Changed to fit current policy.

> -                    char        str[23];    /* sign, 21 digits and '\0' */
> +                    char        str[MAXINT8LEN];
> 
> It's uneasy that MAXINT8LEN contains tailling NUL. MAXINT8BUFLEN
> can be so. I think MAXINT8LEN should be 20 and the definition
> should be str[MAXINT8LEN + 1].

Done.

> +static const char DIGIT_TABLE[200] = {
> +    '0', '0', '0', '1', '0', '2', '0', '3', '0', '4', '0', '5', '0', '6', '0', '7', '0', '8', '0', '9',
> 
> Wouldn't it be simpler if it were defined as a constant string?
> 
> static const char DIGIT_TABLE[201] =
>         "000102030405....19"
>         "202122232425....39"
> ..

I thought this might be even clearer:

"00" "01" "02" "03" "04" "05" "06" "07" "08" "09"
"10" "11" "12" "13" "14" "15" "16" "17" "18" "19"
"20" "21" "22" "23" "24" "25" "26" "27" "28" "29"
"30" "31" "32" "33" "34" "35" "36" "37" "38" "39"
"40" "41" "42" "43" "44" "45" "46" "47" "48" "49"
"50" "51" "52" "53" "54" "55" "56" "57" "58" "59"
"60" "61" "62" "63" "64" "65" "66" "67" "68" "69"
"70" "71" "72" "73" "74" "75" "76" "77" "78" "79"
"80" "81" "82" "83" "84" "85" "86" "87" "88" "89"
"90" "91" "92" "93" "94" "95" "96" "97" "98" "99";

> +pg_ltoa_n(int32 value, char *a)
> ...
> +    /* Compute the result string. */
> +    while (value >= 100000000)
> 
> We have only two degits above the value. Isn't the stuff inside
> the while a waste of cycles?

Changed the while to an if.

> +        /* Expensive 64-bit division. Optimize? */
> 
> I believe compiler treats such trivial optimizations. (concretely
> converts into shifts and subtractons if faster.)

Comments removed.

> +        memcpy(a + olength - i - 2, DIGIT_TABLE + c0, 2);
> 
> Maybe it'd be easy to read if 'a + olength - i' is a single variable.

Done.

> +    i += adjust;
> +    return i;
> 
> If 'a + olength - i' is replaced with a variable, the return
> statement is replacable with "return olength + adjust;".

I'm not sure I understand this.

> +    return t + (v >= PowersOfTen[t]);
> 
> I think it's better that if it were 't - (v < POT[t]) + 1; /*
> log10(v) + 1 */'. At least we need an explanation of the
> difference.  (I'didn't checked the algorithm is truely right,
> though.)

Comments added.

> > void
> > pg_lltoa(int64 value, char *a)
> > {
> ..
> >         memcpy(a, "-9223372036854775808", 21);
> ..
> >        memcpy(a, "0", 2);
> 
> The lines need a comment like "/* length contains trailing '\0'
> */"

Comments added.

> +    if (value >= 0)
> ...
> +    else
> + {
> +        if (value == PG_INT32_MIN)
> +            memcpy(str, "-2147483648", 11);
> +            return str + 11;
> >         }
> +        *str++ = '-';
> +        return pg_ltostr_zeropad(str, -value, minwidth - 1);
> 
> If then block of the if statement were (values < 0), we won't
> need to reenter the functaion.

This is a tail-call recursion, so it's probably optimized already.

> +        len = pg_ltoa_n(value, str);
> +        if (minwidth <= len)
> +            return str + len;
> +
> +        memmove(str + minwidth - len, str, len);
> 
> If the function had the parameters str with the room only for two
> digits and a NUL, 2 as minwidth but 1000 as value, the function
> would overrun the buffer. The original function just ignores
> overflowing digits.

I believe the original was incorrect.

Best,
David.
-- 
David Fetter <david(at)fetter(dot)org> http://fetter.org/
Phone: +1 415 235 3778

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Fabien COELHO
Дата:
Сообщение: Re: pgbench - allow to create partitioned tables
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: log bind parameter values on error