Обсуждение: Reliance on undefined behaviour in << operator
Hi all
Our implementation of << is a direct wrapper around the C operator. It
does not check the right-hand side's value.
Datum int8shl(PG_FUNCTION_ARGS) { int64 arg1 = PG_GETARG_INT64(0); int32 arg2 =
PG_GETARG_INT32(1);
PG_RETURN_INT64(arg1 << arg2); }
This means that an operation like:
1::bigint << 65
directly relies on the compiler and platforms' handling of the
undefined shift. On x64 intel gcc linux it does a rotation but that's
not AFAIK guaranteed by anything, and we should probably not be
relying on this or exposing it at the user level.
Pg returns:
test=> SELECT BIGINT '1' << 66;?column?
---------- 4
(1 row)
A test program:
#include "stdio.h"
int main(int argc, char * argv[])
{ printf("Result is %ld", 1l << 66); return 0;
}
returns zero when the compiler constant-folds, but when done at runtime:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char * argv[])
{ const char * num = "66"; printf("Result is %ld", 1l << atoi(num)); return 0;
}
IMO we should specify the behaviour in this case. Then issue a WARNING
that gets promoted to an ERROR in a few versions.
Consideration of << with a negative right-operand, and of
out-of-bounds >>, is probably also needed.
Thoughts?
-- Craig Ringer http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
Hello, I don't think so many people have used shift operators
with too-large or negative shift amount relying on the
undetermined behavior.
But if explicit definition is required, I prefer the result of a
shift operation with too-large shift mount is simplly zero. And
shift left with negative shift amount should do right
shift. Addition to that, no error nor warning won't be needed.
Like this,
Datum int8shl(PG_FUNCTION_ARGS) { int64 arg1 = PG_GETARG_INT64(0); int32 arg2
=PG_GETARG_INT32(1);
if (arg2 > 63 || arg2 < -63) PG_RETURN_INT64(0L); if (arg2 < 0) PG_RETURN_INT64(arg1
>>(-arg2));
PG_RETURN_INT64(arg1 << arg2); }
The obvious problem on this is the lack of compatibility with
existing behavior:(
Thoughts? Opinions?
regards,
At Wed, 16 Sep 2015 15:16:27 +0800, Craig Ringer <craig@2ndquadrant.com> wrote in
<CAMsr+YE+0KJuOJfbB2nLVfU+14R50Yi90e_8DewLV9jX+ro1zg@mail.gmail.com>
> Hi all
>
> Our implementation of << is a direct wrapper around the C operator. It
> does not check the right-hand side's value.
>
>
> Datum
> int8shl(PG_FUNCTION_ARGS)
> {
> int64 arg1 = PG_GETARG_INT64(0);
> int32 arg2 = PG_GETARG_INT32(1);
>
> PG_RETURN_INT64(arg1 << arg2);
> }
>
> This means that an operation like:
>
> 1::bigint << 65
>
> directly relies on the compiler and platforms' handling of the
> undefined shift. On x64 intel gcc linux it does a rotation but that's
> not AFAIK guaranteed by anything, and we should probably not be
> relying on this or exposing it at the user level.
>
>
>
> Pg returns:
>
> test=> SELECT BIGINT '1' << 66;
> ?column?
> ----------
> 4
> (1 row)
>
> A test program:
>
> #include "stdio.h"
> int main(int argc, char * argv[])
> {
> printf("Result is %ld", 1l << 66);
> return 0;
> }
>
> returns zero when the compiler constant-folds, but when done at runtime:
>
> #include <stdio.h>
> #include <stdlib.h>
> int main(int argc, char * argv[])
> {
> const char * num = "66";
> printf("Result is %ld", 1l << atoi(num));
> return 0;
> }
>
>
> IMO we should specify the behaviour in this case. Then issue a WARNING
> that gets promoted to an ERROR in a few versions.
>
> Consideration of << with a negative right-operand, and of
> out-of-bounds >>, is probably also needed.
>
> Thoughts?
--
Kyotaro Horiguchi
NTT Open Source Software Center
On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> Our implementation of << is a direct wrapper around the C operator. It
> does not check the right-hand side's value.
>
> Datum
> int8shl(PG_FUNCTION_ARGS)
> {
> int64 arg1 = PG_GETARG_INT64(0);
> int32 arg2 = PG_GETARG_INT32(1);
>
> PG_RETURN_INT64(arg1 << arg2);
> }
>
> This means that an operation like:
>
> 1::bigint << 65
>
> directly relies on the compiler and platforms' handling of the
> undefined shift. On x64 intel gcc linux it does a rotation but that's
> not AFAIK guaranteed by anything, and we should probably not be
> relying on this or exposing it at the user level.
I agree.
> Pg returns:
>
> test=> SELECT BIGINT '1' << 66;
> ?column?
> ----------
> 4
> (1 row)
>
> A test program:
>
> #include "stdio.h"
> int main(int argc, char * argv[])
> {
> printf("Result is %ld", 1l << 66);
> return 0;
> }
>
> returns zero when the compiler constant-folds, but when done at runtime:
>
> #include <stdio.h>
> #include <stdlib.h>
> int main(int argc, char * argv[])
> {
> const char * num = "66";
> printf("Result is %ld", 1l << atoi(num));
> return 0;
> }
>
>
> IMO we should specify the behaviour in this case. Then issue a WARNING
> that gets promoted to an ERROR in a few versions.
I disagree. Such warnings are prone to be really annoying, e.g. by
generating massive log spam. I'd say that we should either make a
large shift return 0 - which seems like the intuitively right behavior
to me: if you shift in N zeros where N is greater than the the word
size, then you should end up with all zeroes - or throw an error right
away. I'd vote for the former.
--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
>> Our implementation of << is a direct wrapper around the C operator. It
>> does not check the right-hand side's value.
>> ... On x64 intel gcc linux it does a rotation but that's
>> not AFAIK guaranteed by anything, and we should probably not be
>> relying on this or exposing it at the user level.
> I agree.
As far as I'm concerned, what those operators mean is "whatever your
compiler makes them mean". This is hardly the only place where we expose
platform-dependent behavior --- see also locale dependencies, timezones,
floating point, yadda yadda --- and I do not find it the most compelling
place to start reversing that general approach.
regards, tom lane
On Wed, Sep 16, 2015 at 3:57 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: >> On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote: >>> Our implementation of << is a direct wrapper around the C operator. It >>> does not check the right-hand side's value. >>> ... On x64 intel gcc linux it does a rotation but that's >>> not AFAIK guaranteed by anything, and we should probably not be >>> relying on this or exposing it at the user level. > >> I agree. > > As far as I'm concerned, what those operators mean is "whatever your > compiler makes them mean". This is hardly the only place where we expose > platform-dependent behavior --- see also locale dependencies, timezones, > floating point, yadda yadda --- and I do not find it the most compelling > place to start reversing that general approach. No, I don't agree, in this case. You could say that int32 + int32 is platform-dependent behavior too, when it overflows. But we don't say that. This case seems more like an overflow condition than it does like platform-dependent behavior that we should just pass through. Also, I think it's worth noting that, AFAICT, our users %@#! HATE the places where we expose platform-dependent behavior. That's why people keep proposing ICU for collations, and cursing their fate when Red Hat rolls out a glibc fix that changes some collation's ordering thus leaving all their indexes "corrupted". Of course, we're not in a position to eliminate all of those platform dependencies, because it would require us to integrate with - or develop - platform-dependent libraries for all of those things. And we don't really have the bandwidth for that, or at least it's not likely the best use of our time. But all things being equal, I don't think the fact that we have platform dependencies that are hard to eliminate means we should keep around the ones that are easy to eliminate. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
On 2015-09-16 15:57:04 -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > >> Our implementation of << is a direct wrapper around the C operator. It > >> does not check the right-hand side's value. > >> ... On x64 intel gcc linux it does a rotation but that's > >> not AFAIK guaranteed by anything, and we should probably not be > >> relying on this or exposing it at the user level. > > > I agree. > > As far as I'm concerned, what those operators mean is "whatever your > compiler makes them mean". According to C that's undefined behaviour. So in the extreme sense that could mean that the instruction could trigger a SIGBUS or something. > This is hardly the only place where we expose > platform-dependent behavior --- see also locale dependencies, timezones, > floating point, yadda yadda --- and I do not find it the most compelling > place to start reversing that general approach. But in other places We do overflow checks, so I don't think that'd be reversal of a general approach. Greetings, Andres Freund
On Wed, Sep 16, 2015 at 03:57:04PM -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Wed, Sep 16, 2015 at 3:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote: > >> Our implementation of << is a direct wrapper around the C operator. It > >> does not check the right-hand side's value. > >> ... On x64 intel gcc linux it does a rotation but that's > >> not AFAIK guaranteed by anything, and we should probably not be > >> relying on this or exposing it at the user level. > > > I agree. > > As far as I'm concerned, what those operators mean is "whatever your > compiler makes them mean". This is hardly the only place where we expose > platform-dependent behavior --- see also locale dependencies, timezones, > floating point, yadda yadda --- and I do not find it the most compelling > place to start reversing that general approach. > > regards, tom lane > +1 I tend to agree. Unless the behavior is mandated by the SQL standard, I have always expected the behavior of those apps to follow that defined by the compiler. Regards, Ken