Обсуждение: Overflow hazard in pgbench
moonjelly just reported an interesting failure [1]. It seems that
with the latest bleeding-edge gcc, this code is misoptimized:
/* check random range */
if (imin > imax)
{
pg_log_error("empty range given to random");
return false;
}
else if (imax - imin < 0 || (imax - imin) + 1 < 0)
{
/* prevent int overflows in random functions */
pg_log_error("random range is too large");
return false;
}
such that the second if-test doesn't fire. Now, according to the C99
spec this code is broken, because the compiler is allowed to assume
that signed integer overflow doesn't happen, whereupon the second
if-block is provably unreachable. The failure still represents a gcc
bug, because we're using -fwrapv which should disable that assumption.
However, not all compilers have that switch, so it'd be better to code
this in a spec-compliant way. I suggest applying the attached in
branches that have the required functions.
[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=moonjelly&dt=2021-06-26%2007%3A03%3A17
regards, tom lane
diff --git a/src/bin/pgbench/pgbench.c b/src/bin/pgbench/pgbench.c
index e61055b6b7..c4023bfa27 100644
--- a/src/bin/pgbench/pgbench.c
+++ b/src/bin/pgbench/pgbench.c
@@ -2450,7 +2450,8 @@ evalStandardFunc(CState *st,
case PGBENCH_RANDOM_ZIPFIAN:
{
int64 imin,
- imax;
+ imax,
+ delta;
Assert(nargs >= 2);
@@ -2464,7 +2465,8 @@ evalStandardFunc(CState *st,
pg_log_error("empty range given to random");
return false;
}
- else if (imax - imin < 0 || (imax - imin) + 1 < 0)
+ else if (pg_sub_s64_overflow(imax, imin, &delta) ||
+ pg_add_s64_overflow(delta, 1, &delta))
{
/* prevent int overflows in random functions */
pg_log_error("random range is too large");
I wrote:
> ... according to the C99
> spec this code is broken, because the compiler is allowed to assume
> that signed integer overflow doesn't happen, whereupon the second
> if-block is provably unreachable. The failure still represents a gcc
> bug, because we're using -fwrapv which should disable that assumption.
> However, not all compilers have that switch, so it'd be better to code
> this in a spec-compliant way.
BTW, for grins I tried building today's HEAD without -fwrapv, using
gcc version 11.1.1 20210531 (Red Hat 11.1.1-3) (GCC)
which is the newest version I have at hand. Not very surprisingly,
that reproduced the failure shown on moonjelly. However, after adding
the patch I proposed, "make check-world" passed! I was not expecting
that result; I supposed we still had lots of lurking assumptions of
traditional C overflow handling.
I'm not in any hurry to remove -fwrapv, because (a) this result doesn't
show that we have no such assumptions, only that they must be lurking
in darker, poorly-tested corners, and (b) I'm not aware of any reason
to think that removing -fwrapv would provide benefits worth taking any
risks for. But we may be closer to being able to do without that
switch than I thought.
regards, tom lane
Hello Tom,
> moonjelly just reported an interesting failure [1].
I noticed. I was planning to have a look at it, thanks for digging!
> It seems that with the latest bleeding-edge gcc, this code is
> misoptimized:
> else if (imax - imin < 0 || (imax - imin) + 1 < 0)
> {
> /* prevent int overflows in random functions */
> pg_log_error("random range is too large");
> return false;
> }
>
> such that the second if-test doesn't fire. Now, according to the C99
> spec this code is broken, because the compiler is allowed to assume
> that signed integer overflow doesn't happen,
Hmmm, so it is not possible to detect these with standard arithmetic as
done by this code. Note that the code was written in 2016, ISTM pre C99
Postgres. I'm unsure about what a C compiler could assume on the previous
standard wrt integer arithmetic.
> whereupon the second if-block is provably unreachable.
Indeed.
> The failure still represents a gcc bug, because we're using -fwrapv
> which should disable that assumption.
Ok, I'll report it.
I also see a good point with pgbench tests exercising edge cases.
> However, not all compilers have that switch, so it'd be better to code
> this in a spec-compliant way.
Ok.
> I suggest applying the attached in branches that have the required
> functions.
The latest gcc, recompiled yesterday, is still wrong, as shown by
moonjelly current status.
The proposed patch does fix the issue in pgbench TAP test. I'd suggest to
add unlikely() on all these conditions, as already done elsewhere. See
attached version.
I confirm that check-world passed with gcc head and its broken -fwrapv.
I also recompiled after removing manually -fwrapv: Make check, pgbench TAP
tests and check-world all passed. I'm not sure that edge case are well
enough tested in postgres to be sure that all is ok just from these runs
though.
--
Fabien.
Вложения
Fabien COELHO <coelho@cri.ensmp.fr> writes:
>> I suggest applying the attached in branches that have the required
>> functions.
> The proposed patch does fix the issue in pgbench TAP test. I'd suggest to
> add unlikely() on all these conditions, as already done elsewhere. See
> attached version.
Done that way, though I'm skeptical that it makes any measurable
difference.
> I also recompiled after removing manually -fwrapv: Make check, pgbench TAP
> tests and check-world all passed. I'm not sure that edge case are well
> enough tested in postgres to be sure that all is ok just from these runs
> though.
Yeah, I'm afraid that in most places it'd take a specifically-designed
test case to expose a problem, if there is one.
regards, tom lane
>> The failure still represents a gcc bug, because we're using -fwrapv which >> should disable that assumption. > > Ok, I'll report it. Done at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101254 -- Fabien.
Hello Tom, >>> The failure still represents a gcc bug, because we're using -fwrapv which >>> should disable that assumption. >> >> Ok, I'll report it. > > Done at https://gcc.gnu.org/bugzilla/show_bug.cgi?id=101254 Fixed at r12-1916-ga96d8d67d0073a7031c0712bc3fb7759417b2125 https://gcc.gnu.org/git/gitweb.cgi?p=gcc.git;h=a96d8d67d0073a7031c0712bc3fb7759417b2125 Just under 10 hours from the bug report… -- Fabien.
Hi, On 2021-06-27 16:21:46 -0400, Tom Lane wrote: > BTW, for grins I tried building today's HEAD without -fwrapv, using > gcc version 11.1.1 20210531 (Red Hat 11.1.1-3) (GCC) > which is the newest version I have at hand. Not very surprisingly, > that reproduced the failure shown on moonjelly. However, after adding > the patch I proposed, "make check-world" passed! I was not expecting > that result; I supposed we still had lots of lurking assumptions of > traditional C overflow handling. We did fix a lot of them a few years back... > I'm not in any hurry to remove -fwrapv, because (a) this result doesn't > show that we have no such assumptions, only that they must be lurking > in darker, poorly-tested corners, and (b) I'm not aware of any reason > to think that removing -fwrapv would provide benefits worth taking any > risks for. But we may be closer to being able to do without that > switch than I thought. Lack of failures after removing frwapv itself doesn't prove that much - very commonly the compiler won't optimize based on the improved knowledge about value range. Additionally we probably don't exercise all effected places in our tests. ubsan is able to catch all signed overflows. The last time I played around with that, tests still were hitting quite a few cases of overflows. But most not in particularly interesting places (e.g. cash_out, RIGHTMOST_ONE()) but also a few where it might be worth being careful about it in case a compiler disregards -fwrapv or doesn't implement it (e.g. _dorand48()). It might be worth setting up a bf animal with ubsan and enabled overflow checking... Greetings, Andres Freund