Обсуждение: Use strtoi64() in pgbench, replacing its open-coded implementation

Поиск
Список
Период
Сортировка

Use strtoi64() in pgbench, replacing its open-coded implementation

От
Heikki Linnakangas
Дата:
Here's a small patch to replace the int64 parsing code in pgbench with a 
call to strtoi64(). Makes it a little simpler.

Spotted this while grepping for all the different integer parsing 
functions we have. We could probably consolidate them some more, we 
still have quite a different integer-parsing routines in the backend and 
in the frontend. But this is one small, straightforward step in that 
direction.

- Heikki

Вложения

Re: Use strtoi64() in pgbench, replacing its open-coded implementation

От
Yuefei Shi
Дата:
Hi Heikki:

On Thu, Nov 20, 2025 at 8:45 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Here's a small patch to replace the int64 parsing code in pgbench with a
call to strtoi64(). Makes it a little simpler.

Spotted this while grepping for all the different integer parsing
functions we have. We could probably consolidate them some more, we
still have quite a different integer-parsing routines in the backend and
in the frontend. But this is one small, straightforward step in that
direction.

I wrote a small program to test your patch and found that for strings like " 12 ", it does not handle the trailing spaces and considers the input invalid. However, the original "strtoint64" function processes the trailing spaces correctly. Below is the small program I used: 

#include <stdio.h>
#include <errno.h>

typedef unsigned long long int64;
#define strtoi64(str, endptr, base) ((int64) strtol(str, endptr, base))

int main(int argc, char **argv)
{
    unsigned long long result;
    char *end;
    char *str = argv[1];

    result = strtoi64(str, &end, 10);
    if (errno != 0)
    {
        printf("%s\n", strerror(errno));
        return 1;
    }

    if (end == str || *end != '\0')
    {
        printf("invalid input syntax for type bigint: \"%s\"\n", str);
        return 1;
    }

    return 0;
}

When running ./test " 12 ", the output is:
invalid input syntax for type bigint: " 12 "

$ uname -a
Linux dev 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

Regards,
Shi Yuefei

Re: Use strtoi64() in pgbench, replacing its open-coded implementation

От
Neil Chen
Дата:
Hi,

On Thu, Nov 20, 2025 at 8:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Here's a small patch to replace the int64 parsing code in pgbench with a
call to strtoi64(). Makes it a little simpler.


+1 on this simplification – it definitely makes the code cleaner.
One small note: the updated code doesn’t handle trailing spaces in the input string. Should we consider this a concern?
 

Re: Use strtoi64() in pgbench, replacing its open-coded implementation

От
Tom Lane
Дата:
Neil Chen <carpenter.nail.cz@gmail.com> writes:
> +1 on this simplification – it definitely makes the code cleaner.
> One small note: the updated code doesn’t handle trailing spaces in the
> input string. Should we consider this a concern?

Heikki's draft commit message addresses that point:

    The old implementation accepted trailing whitespace, but that seemed
    unnecessary. Firstly, its sibling function for parsing decimals,
    strtodouble(), does not accept trailing whitespace. Secondly, none of
    the callers can pass a string with trailing whitespace to it.

I didn't try to verify the latter assertion, but if it's true,
we don't need the extra complication.

            regards, tom lane



Re: Use strtoi64() in pgbench, replacing its open-coded implementation

От
Neil Chen
Дата:

On Thu, Nov 20, 2025 at 9:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:

Heikki's draft commit message addresses that point:

    The old implementation accepted trailing whitespace, but that seemed
    unnecessary. Firstly, its sibling function for parsing decimals,
    strtodouble(), does not accept trailing whitespace. Secondly, none of
    the callers can pass a string with trailing whitespace to it.


My mistake – Heikki is absolutely right.
Looking at the two call sites of the function: one filters out trailing spaces within the 'is_an_int' function, and the other in exprscan.l won’t pass strings with trailing spaces either.
 
I didn't try to verify the latter assertion, but if it's true,
we don't need the extra complication.


make sense

Re: Use strtoi64() in pgbench, replacing its open-coded implementation

От
Chao Li
Дата:

> On Nov 19, 2025, at 23:37, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> Here's a small patch to replace the int64 parsing code in pgbench with a call to strtoi64(). Makes it a little
simpler.
>
> Spotted this while grepping for all the different integer parsing functions we have. We could probably consolidate
themsome more, we still have quite a different integer-parsing routines in the backend and in the frontend. But this is
onesmall, straightforward step in that direction. 
>
> - Heikki
> <0001-Use-strtoi64-in-pgbench-replacing-its-open-coded-imp.patch>

I have no concern if we decide to no longer support tailing spaces, while I still got a couple of small comments:

1
```
-/* return whether str matches "^\s*[-+]?[0-9]+$" */
+/*
+ * Return whether str matches "^\s*[-+]?[0-9]+$"
+ *
+ * This should agree with strtoint64() on what's accepted, ignoring overflows.
+ */
 static bool
 is_an_int(const char *str)
```

Here you added a comment saying "ignoring overflows”, yes, is_an_int() doesn’t check if the integer overflows.

But looking at where the function is called:
```
    else if (is_an_int(var->svalue))
    {
        /* if it looks like an int, it must be an int without overflow */
        int64        iv;

        if (!strtoint64(var->svalue, false, &iv))
            return false;

        setIntValue(&var->value, iv);
    }
```

The comment says “it must be an int without overflow”, so this comment should be updated as well.

2
```
+    if (unlikely(errno != 0))
     {
-        int8        digit = (*ptr++ - '0');
-
-        if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
-            unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
-            goto out_of_range;
+        if (!errorOK)
+            pg_log_error("value \"%s\" is out of range for type bigint", str);
+        return false;
     }
```

Here we log an “out out range” error when errno is not 0, which is inaccurate, we should check ERANGE.

strtoi64() maps to strtol()/strtoll(), the functions could return more errors than ERANGE.

Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/







Re: Use strtoi64() in pgbench, replacing its open-coded implementation

От
Heikki Linnakangas
Дата:
On 20/11/2025 05:36, Chao Li wrote:
> I have no concern if we decide to no longer support tailing spaces, while I still got a couple of small comments:
>
> 1
> ```
> -/* return whether str matches "^\s*[-+]?[0-9]+$" */
> +/*
> + * Return whether str matches "^\s*[-+]?[0-9]+$"
> + *
> + * This should agree with strtoint64() on what's accepted, ignoring overflows.
> + */
>   static bool
>   is_an_int(const char *str)
> ```
>
> Here you added a comment saying "ignoring overflows”, yes, is_an_int() doesn’t check if the integer overflows.
>
> But looking at where the function is called:
> ```
>     else if (is_an_int(var->svalue))
>     {
>         /* if it looks like an int, it must be an int without overflow */
>         int64        iv;
>
>         if (!strtoint64(var->svalue, false, &iv))
>             return false;
>
>         setIntValue(&var->value, iv);
>     }
> ```
>
> The comment says “it must be an int without overflow”, so this comment should be updated as well.

Hmm, I don't think it's wrong as it is, and this patch doesn't change
that behavior. That comment is a little vague though.

How about the following phrasing:

/*
  * If it looks like an integer, treat it as such.  If it turns out to be
  * too large for 'int64', return failure rather than fall back to 'double'.
  */

I don't feel the urge to refactor this myself right now, but we probably
could simplify this further. For example, I wonder if we should remove
is_an_int() altogether and rely on strtoi64() to return failure if the
input does't look like a integer. Also, strtodouble() is never called
with "errorOk != false".

> 2
> ```
> +    if (unlikely(errno != 0))
>       {
> -        int8        digit = (*ptr++ - '0');
> -
> -        if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
> -            unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
> -            goto out_of_range;
> +        if (!errorOK)
> +            pg_log_error("value \"%s\" is out of range for type bigint", str);
> +        return false;
>       }
> ```
>
> Here we log an “out out range” error when errno is not 0, which is inaccurate, we should check ERANGE.
>
> strtoi64() maps to strtol()/strtoll(), the functions could return more errors than ERANGE.

Good point. The existing strtodouble() function, which uses strtod(),
has the same issue (per POSIX spec at
https://pubs.opengroup.org/onlinepubs/000095399/functions/strtod.html):

> These functions may fail if:
>
> [EINVAL]
>     [CX] [Option Start] No conversion could be performed. [Option End]

Fixed that and committed. Thanks for the review!

- Heikki




Re: Use strtoi64() in pgbench, replacing its open-coded implementation

От
Álvaro Herrera
Дата:
On 2025-Nov-21, Heikki Linnakangas wrote:

> I don't feel the urge to refactor this myself right now, but we probably
> could simplify this further. For example, I wonder if we should remove
> is_an_int() altogether and rely on strtoi64() to return failure if the input
> does't look like a integer.

I had the same thought -- is_an_int() is not doing anything useful and
it would be better to get rid of it.  If we do have an integer-looking
that doesn't fit in int64, then maybe treating it as a double is not
wrong.  (I suppose if we wanted to have numeric values beyond int64
range and not lose precision, we would have to add separate support for
that.)

-- 
Álvaro Herrera               48°01'N 7°57'E  —  https://www.EnterpriseDB.com/