Re: refactoring - share str2*int64 functions

Поиск
Список
Период
Сортировка
От Andres Freund
Тема Re: refactoring - share str2*int64 functions
Дата
Msg-id 20190904090839.stp3madovtynq3px@alap3.anarazel.de
обсуждение исходный текст
Ответ на Re: refactoring - share str2*int64 functions  (Fabien COELHO <coelho@cri.ensmp.fr>)
Ответы Re: refactoring - share str2*int64 functions  (Michael Paquier <michael@paquier.xyz>)
Список pgsql-hackers
Hi,

On 2019-09-03 20:10:37 +0200, Fabien COELHO wrote:
> @@ -113,7 +113,7 @@ parse_output_parameters(List *options, uint32 *protocol_version,
>                           errmsg("conflicting or redundant options")));
>              protocol_version_given = true;
>
> -            if (!scanint8(strVal(defel->arg), true, &parsed))
> +            if (unlikely(pg_strtoint64(strVal(defel->arg), &parsed) != PG_STRTOINT_OK))
>                  ereport(ERROR,
>                          (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>                           errmsg("invalid proto_version")));

Unexcited about adding unlikely() to any place that's not a hot path -
which this certainly is not.

But I also wonder if we shouldn't just put the branch likelihood of
pg_strtoint64 not failing into one central location. E.g. something like

static inline pg_strtoint_status
pg_strtoint64(const char *str, int64 *result)
{
        pg_strtoint_status status;

        status = pg_strtoint64_impl(str, result);

        likely(status == PG_STRTOINT_OK);

        return status;
}

I've not tested this, but IIRC that should be sufficient to propagate
that knowledge up to callers.


> +    if (likely(stat == PG_STRTOINT_OK))
> +        return true;
> +    else if (stat == PG_STRTOINT_RANGE_ERROR)
>      {
> -        /* could fail if input is most negative number */
> -        if (unlikely(tmp == PG_INT64_MIN))
> -            goto out_of_range;
> -        tmp = -tmp;
> -    }
> -
> -    *result = tmp;
> -    return true;
> -
> -out_of_range:
> -    if (!errorOK)
>          ereport(ERROR,
>                  (errcode(ERRCODE_NUMERIC_VALUE_OUT_OF_RANGE),
>                   errmsg("value \"%s\" is out of range for type %s",
>                          str, "bigint")));
> -    return false;
> -
> -invalid_syntax:
> -    if (!errorOK)
> +        return false;
> +    }
> +    else if (stat == PG_STRTOINT_SYNTAX_ERROR)
> +    {
>          ereport(ERROR,
>                  (errcode(ERRCODE_INVALID_TEXT_REPRESENTATION),
>                   errmsg("invalid input syntax for type %s: \"%s\"",
>                          "bigint", str)));
> -    return false;
> +        return false;
> +    }
> +    else
> +        /* cannot get here */
> +        Assert(0);

I'd write this as a switch over the enum - that way we'll get a
compile-time warning if we're not handling a value.


> +static bool
> +str2int64(const char *str, int64 *val)
> +{
> +    pg_strtoint_status        stat = pg_strtoint64(str, val);
> +

I find it weird to have a wrapper that's named 'str2...' that then calls
'strto' to implement itself.


> +/*
> + * pg_strtoint64 -- convert a string to 64-bit integer
> + *
> + * The function returns if the conversion failed, or
> + * "*result" is set to the result.
> + */
> +pg_strtoint_status
> +pg_strtoint64(const char *str, int64 *result)
> +{
> +    const char *ptr = str;
> +    int64        tmp = 0;
> +    bool        neg = false;
> +
> +    /*
> +     * Do our own scan, rather than relying on sscanf which might be broken
> +     * for long long.
> +     *
> +     * As INT64_MIN can't be stored as a positive 64 bit integer, accumulate
> +     * value as a negative number.
> +     */
> +
> +    /* skip leading spaces */
> +    while (isspace((unsigned char) *ptr))
> +        ptr++;
> +
> +    /* handle sign */
> +    if (*ptr == '-')
> +    {
> +        ptr++;
> +        neg = true;
> +    }
> +    else if (*ptr == '+')
> +        ptr++;
> +
> +    /* require at least one digit */
> +    if (unlikely(!isdigit((unsigned char) *ptr)))
> +        return PG_STRTOINT_SYNTAX_ERROR;
> +
> +    /* process digits, we know that we have one ahead */
> +    do
> +    {
> +        int64        digit = (*ptr++ - '0');
> +
> +        if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
> +            unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
> +            return PG_STRTOINT_RANGE_ERROR;
> +    }
> +    while (isdigit((unsigned char) *ptr));

Hm. If we're concerned about the cost of isdigit etc - and I think
that's reasonable, after looking at their implementation in glibc (it
performs a lookup in a global table to handle potential locale changes)
- I think we ought to just not use the provided isdigit, and probably
not isspace either.  This code effectively relies on the input being
ascii anyway, so we don't need any locale specific behaviour afaict
(we'd e.g. get wrong results if isdigit() returned true for something
other than the ascii chars).

To me the generated code looks considerably better if I use something
like

static inline bool
pg_isdigit(char c)
{
    return c >= '0' && c <= '9';
}

static inline bool
pg_isspace(char c)
{
    return c == ' ';
}

(if we were to actually go for this, we'd probably want to add some docs
that we don't expect EOF, or make the code safe against that).

I've not benchmarked that, but I'd be surprised if it didn't improve
matters.

And once coded using the above, there's no point in the do/while
conversion imo, as any compiler can trivially optimize redundant checks.


> +/*
> + * pg_strtouint64 -- convert a string to unsigned 64-bit integer
> + *
> + * The function returns if the conversion failed, or
> + * "*result" is set to the result.
> + */
> +pg_strtoint_status
> +pg_strtouint64(const char *str, uint64 *result)
> +{
> +    const char *ptr = str;
> +    uint64        tmp = 0;
> +
> +    /* skip leading spaces */
> +    while (isspace((unsigned char) *ptr))
> +        ptr++;
> +
> +    /* handle sign */
> +    if (*ptr == '+')
> +        ptr++;
> +    else if (unlikely(*ptr == '-'))
> +        return PG_STRTOINT_SYNTAX_ERROR;

Hm. Seems like that should return PG_STRTOINT_RANGE_ERROR?


> +typedef enum {

Please don't define anonyous types (the enum itself, which now is only
reachable via the typedef). Also, there's a missing newline here).


Greetings,

Andres Freund



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Richard Guo
Дата:
Сообщение: Re: d25ea01275 and partitionwise join
Следующее
От: Peter Eisentraut
Дата:
Сообщение: Re: pglz performance