Обсуждение: Use strtoi64() in pgbench, replacing its open-coded implementation
Here's a small patch to replace the int64 parsing code in pgbench with a call to strtoi64(). Makes it a little simpler. Spotted this while grepping for all the different integer parsing functions we have. We could probably consolidate them some more, we still have quite a different integer-parsing routines in the backend and in the frontend. But this is one small, straightforward step in that direction. - Heikki
Вложения
Hi Heikki:
On Thu, Nov 20, 2025 at 8:45 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Here's a small patch to replace the int64 parsing code in pgbench with a
call to strtoi64(). Makes it a little simpler.
Spotted this while grepping for all the different integer parsing
functions we have. We could probably consolidate them some more, we
still have quite a different integer-parsing routines in the backend and
in the frontend. But this is one small, straightforward step in that
direction.
I wrote a small program to test your patch and found that for strings like
" 12 ", it does not handle the trailing spaces and considers the input invalid. However, the original "strtoint64" function processes the trailing spaces correctly. Below is the small program I used: #include <stdio.h>
#include <errno.h>
#include <errno.h>
typedef unsigned long long int64;
#define strtoi64(str, endptr, base) ((int64) strtol(str, endptr, base))
#define strtoi64(str, endptr, base) ((int64) strtol(str, endptr, base))
int main(int argc, char **argv)
{
unsigned long long result;
char *end;
char *str = argv[1];
result = strtoi64(str, &end, 10);
if (errno != 0)
{
printf("%s\n", strerror(errno));
return 1;
}
if (end == str || *end != '\0')
{
printf("invalid input syntax for type bigint: \"%s\"\n", str);
return 1;
}
return 0;
}
When running
./test " 12 ", the output is:invalid input syntax for type bigint: " 12 "
$ uname -a
Linux dev 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Linux dev 3.10.0-957.el7.x86_64 #1 SMP Thu Nov 8 23:39:32 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Regards,
Shi Yuefei
Hi,
On Thu, Nov 20, 2025 at 8:57 AM Heikki Linnakangas <hlinnaka@iki.fi> wrote:
Here's a small patch to replace the int64 parsing code in pgbench with a
call to strtoi64(). Makes it a little simpler.
+1 on this simplification – it definitely makes the code cleaner.
One small note: the updated code doesn’t handle trailing spaces in the input string. Should we consider this a concern?
One small note: the updated code doesn’t handle trailing spaces in the input string. Should we consider this a concern?
Neil Chen <carpenter.nail.cz@gmail.com> writes:
> +1 on this simplification – it definitely makes the code cleaner.
> One small note: the updated code doesn’t handle trailing spaces in the
> input string. Should we consider this a concern?
Heikki's draft commit message addresses that point:
The old implementation accepted trailing whitespace, but that seemed
unnecessary. Firstly, its sibling function for parsing decimals,
strtodouble(), does not accept trailing whitespace. Secondly, none of
the callers can pass a string with trailing whitespace to it.
I didn't try to verify the latter assertion, but if it's true,
we don't need the extra complication.
regards, tom lane
On Thu, Nov 20, 2025 at 9:27 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Heikki's draft commit message addresses that point:
The old implementation accepted trailing whitespace, but that seemed
unnecessary. Firstly, its sibling function for parsing decimals,
strtodouble(), does not accept trailing whitespace. Secondly, none of
the callers can pass a string with trailing whitespace to it.
My mistake – Heikki is absolutely right.
Looking at the two call sites of the function: one filters out trailing spaces within the 'is_an_int' function, and the other in exprscan.l won’t pass strings with trailing spaces either.
I didn't try to verify the latter assertion, but if it's true,
we don't need the extra complication.
make sense
> On Nov 19, 2025, at 23:37, Heikki Linnakangas <hlinnaka@iki.fi> wrote:
>
> Here's a small patch to replace the int64 parsing code in pgbench with a call to strtoi64(). Makes it a little
simpler.
>
> Spotted this while grepping for all the different integer parsing functions we have. We could probably consolidate
themsome more, we still have quite a different integer-parsing routines in the backend and in the frontend. But this is
onesmall, straightforward step in that direction.
>
> - Heikki
> <0001-Use-strtoi64-in-pgbench-replacing-its-open-coded-imp.patch>
I have no concern if we decide to no longer support tailing spaces, while I still got a couple of small comments:
1
```
-/* return whether str matches "^\s*[-+]?[0-9]+$" */
+/*
+ * Return whether str matches "^\s*[-+]?[0-9]+$"
+ *
+ * This should agree with strtoint64() on what's accepted, ignoring overflows.
+ */
static bool
is_an_int(const char *str)
```
Here you added a comment saying "ignoring overflows”, yes, is_an_int() doesn’t check if the integer overflows.
But looking at where the function is called:
```
else if (is_an_int(var->svalue))
{
/* if it looks like an int, it must be an int without overflow */
int64 iv;
if (!strtoint64(var->svalue, false, &iv))
return false;
setIntValue(&var->value, iv);
}
```
The comment says “it must be an int without overflow”, so this comment should be updated as well.
2
```
+ if (unlikely(errno != 0))
{
- int8 digit = (*ptr++ - '0');
-
- if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
- unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
- goto out_of_range;
+ if (!errorOK)
+ pg_log_error("value \"%s\" is out of range for type bigint", str);
+ return false;
}
```
Here we log an “out out range” error when errno is not 0, which is inaccurate, we should check ERANGE.
strtoi64() maps to strtol()/strtoll(), the functions could return more errors than ERANGE.
Best regards,
--
Chao Li (Evan)
HighGo Software Co., Ltd.
https://www.highgo.com/
On 20/11/2025 05:36, Chao Li wrote:
> I have no concern if we decide to no longer support tailing spaces, while I still got a couple of small comments:
>
> 1
> ```
> -/* return whether str matches "^\s*[-+]?[0-9]+$" */
> +/*
> + * Return whether str matches "^\s*[-+]?[0-9]+$"
> + *
> + * This should agree with strtoint64() on what's accepted, ignoring overflows.
> + */
> static bool
> is_an_int(const char *str)
> ```
>
> Here you added a comment saying "ignoring overflows”, yes, is_an_int() doesn’t check if the integer overflows.
>
> But looking at where the function is called:
> ```
> else if (is_an_int(var->svalue))
> {
> /* if it looks like an int, it must be an int without overflow */
> int64 iv;
>
> if (!strtoint64(var->svalue, false, &iv))
> return false;
>
> setIntValue(&var->value, iv);
> }
> ```
>
> The comment says “it must be an int without overflow”, so this comment should be updated as well.
Hmm, I don't think it's wrong as it is, and this patch doesn't change
that behavior. That comment is a little vague though.
How about the following phrasing:
/*
* If it looks like an integer, treat it as such. If it turns out to be
* too large for 'int64', return failure rather than fall back to 'double'.
*/
I don't feel the urge to refactor this myself right now, but we probably
could simplify this further. For example, I wonder if we should remove
is_an_int() altogether and rely on strtoi64() to return failure if the
input does't look like a integer. Also, strtodouble() is never called
with "errorOk != false".
> 2
> ```
> + if (unlikely(errno != 0))
> {
> - int8 digit = (*ptr++ - '0');
> -
> - if (unlikely(pg_mul_s64_overflow(tmp, 10, &tmp)) ||
> - unlikely(pg_sub_s64_overflow(tmp, digit, &tmp)))
> - goto out_of_range;
> + if (!errorOK)
> + pg_log_error("value \"%s\" is out of range for type bigint", str);
> + return false;
> }
> ```
>
> Here we log an “out out range” error when errno is not 0, which is inaccurate, we should check ERANGE.
>
> strtoi64() maps to strtol()/strtoll(), the functions could return more errors than ERANGE.
Good point. The existing strtodouble() function, which uses strtod(),
has the same issue (per POSIX spec at
https://pubs.opengroup.org/onlinepubs/000095399/functions/strtod.html):
> These functions may fail if:
>
> [EINVAL]
> [CX] [Option Start] No conversion could be performed. [Option End]
Fixed that and committed. Thanks for the review!
- Heikki
On 2025-Nov-21, Heikki Linnakangas wrote: > I don't feel the urge to refactor this myself right now, but we probably > could simplify this further. For example, I wonder if we should remove > is_an_int() altogether and rely on strtoi64() to return failure if the input > does't look like a integer. I had the same thought -- is_an_int() is not doing anything useful and it would be better to get rid of it. If we do have an integer-looking that doesn't fit in int64, then maybe treating it as a double is not wrong. (I suppose if we wanted to have numeric values beyond int64 range and not lose precision, we would have to add separate support for that.) -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/