Обсуждение: BUG #17725: Sefault when seg_in() called with a large argument
The following bug has been logged on the website:
Bug reference: 17725
Logged by: Robins Tharakan
Email address: tharakan@gmail.com
PostgreSQL version: 15.1
Operating system: Ubuntu 20.04
Description:
Hi,
The following SQL Segfaults on master (tested on b3bb7d12af).
SQL: SELECT seg_in(numeric_out(round(31, 10000)))
Backtrace on ea5ae4cae6@REL_14_STABLE:
=====================================
#0 __strcpy_avx2 () at ../sysdeps/x86_64/multiarch/strcpy-avx2.S:578
#1 0x00007f31c421f4aa in restore (
result=0x55009893ace0 <error: Cannot access memory at address
0x55009893ace0>, val=31, n=-46) at seg.c:1009
#2 0x00007f31c421dab9 in seg_out (fcinfo=0x7ffe3ddff6c0) at seg.c:135
#3 0x000055d296a40aa9 in FunctionCall1Coll (flinfo=0x55d298735478,
collation=0, arg1=94362989160448) at fmgr.c:1138
#4 0x000055d296a42004 in OutputFunctionCall (flinfo=0x55d298735478,
val=94362989160448) at fmgr.c:1575
#5 0x000055d29634a8b4 in printtup (slot=0x55d2987344b8,
self=0x55d298936cc0)
at printtup.c:357
#6 0x000055d2966196c6 in ExecutePlan (estate=0x55d298733f80,
planstate=0x55d2987341b8, use_parallel_mode=false, operation=CMD_SELECT,
sendTuples=true, numberTuples=0, direction=ForwardScanDirection,
dest=0x55d298936cc0, execute_once=true) at execMain.c:1582
#7 0x000055d2966172fd in standard_ExecutorRun (queryDesc=0x55d2987289d0,
direction=ForwardScanDirection, count=0, execute_once=true)
at execMain.c:361
#8 0x00007f31dbea134d in pgss_ExecutorRun (queryDesc=0x55d2987289d0,
direction=ForwardScanDirection, count=0, execute_once=true)
at pg_stat_statements.c:1003
#9 0x000055d2966170f3 in ExecutorRun (queryDesc=0x55d2987289d0,
direction=ForwardScanDirection, count=0, execute_once=true)
at execMain.c:303
Backtrace Full excerpt:
======================
#0 __strcpy_avx2 () at ../sysdeps/x86_64/multiarch/strcpy-avx2.S:578
No locals.
#1 0x00007f31c421f4aa in restore (
result=0x55009893ace0 <error: Cannot access memory at address
0x55009893ace0>, val=31, n=-46) at seg.c:1009
buf = "00000000003e1\000\060\060\060\060\060\060\060\060\060\060"
p = 0x55d29893ace8 "e+01"
exp = 48
i = 17
dp = 11
sign = 0
#2 0x00007f31c421dab9 in seg_out (fcinfo=0x7ffe3ddff6c0) at seg.c:135
seg = 0x55d29872e800
result = 0x55d29893ace0 "3.100000e+01"
p = 0x55d29893ace0 "3.100000e+01"
#3 0x000055d296a40aa9 in FunctionCall1Coll (flinfo=0x55d298735478,
collation=0, arg1=94362989160448) at fmgr.c:1138
fcinfodata = {fcinfo = {flinfo = 0x55d298735478, context = 0x0,
resultinfo = 0x0, fncollation = 0, isnull = false, nargs = 1,
args = 0x7ffe3ddff6e0},
fcinfo_data = "xTs\230\322U", '\000' <repeats 23 times>,
"U\001\000\000\350r\230\322U\000\000\000m\223\230\322U\000"}
fcinfo = 0x7ffe3ddff6c0
result = 94362958816336
__func__ = "FunctionCall1Coll"
#4 0x000055d296a42004 in OutputFunctionCall (flinfo=0x55d298735478,
val=94362989160448) at fmgr.c:1575
No locals.
#5 0x000055d29634a8b4 in printtup (slot=0x55d2987344b8,
self=0x55d298936cc0)
at printtup.c:357
outputstr = 0x55d296882235 <check_stack_depth+13> "\204\300td\276"
thisState = 0x55d298735468
attr = 94362989160448
typeinfo = 0x55d2987343a0
myState = 0x55d298936cc0
oldcontext = 0x55d298733e60
buf = 0x55d298936d10
natts = 1
i = 0
Error Log:
=========
2022-12-20 02:44:43.728 UTC [633388] LOG: server process (PID 783919) was
terminated by signal 11: Segmentation fault
2022-12-20 02:44:43.728 UTC [633388] DETAIL: Failed process was running:
SELECT seg_in(numeric_out(round(31,1000000)));
2022-12-20 02:44:43.728 UTC [633388] LOG: terminating any other active
server processes
Thanks to SQLSmith / SQLReduce for helping with the find.
-
Robins Tharakan
Amazon Web Services
On Tue, Dec 20, 2022 at 4:28 PM PG Bug reporting form <noreply@postgresql.org> wrote:
> PostgreSQL version: 15.1
> The following SQL Segfaults on master (tested on b3bb7d12af).
> Backtrace on ea5ae4cae6@REL_14_STABLE:
> SQL: SELECT seg_in(numeric_out(round(31, 10000)))
> 2022-12-20 02:44:43.728 UTC [633388] DETAIL: Failed process was running:
> SELECT seg_in(numeric_out(round(31,1000000)));
Neither query shows the reported problem in my environment on master (as of today) or v14, so not sure
=# SELECT seg_in(numeric_out(round(31, 10000)));
seg_in
--------
3e1
(1 row)
=# SELECT seg_in(numeric_out(round(31,1000000)));
seg_in
--------
3e1
(1 row)
It's possibly relevant that this result is different from the "3.100000e+01" which was shown in your backtrace. Since a few details of this report don't agree with each other, I'm starting to wonder if some other relevant details got lost along the way.
--
John Naylor
EDB: http://www.enterprisedb.com
> PostgreSQL version: 15.1
> The following SQL Segfaults on master (tested on b3bb7d12af).
> Backtrace on ea5ae4cae6@REL_14_STABLE:
> SQL: SELECT seg_in(numeric_out(round(31, 10000)))
> 2022-12-20 02:44:43.728 UTC [633388] DETAIL: Failed process was running:
> SELECT seg_in(numeric_out(round(31,1000000)));
Neither query shows the reported problem in my environment on master (as of today) or v14, so not sure
=# SELECT seg_in(numeric_out(round(31, 10000)));
seg_in
--------
3e1
(1 row)
=# SELECT seg_in(numeric_out(round(31,1000000)));
seg_in
--------
3e1
(1 row)
It's possibly relevant that this result is different from the "3.100000e+01" which was shown in your backtrace. Since a few details of this report don't agree with each other, I'm starting to wonder if some other relevant details got lost along the way.
--
John Naylor
EDB: http://www.enterprisedb.com
Hi John,
On Tue, 20 Dec 2022 at 20:44, John Naylor <john.naylor@enterprisedb.com> wrote:
> Neither query shows the reported problem in my environment on master (as of today) or v14, so not sure
> It's possibly relevant that this result is different from the "3.100000e+01" which was shown in your backtrace. Since
afew details of this report don't agree with each other, I'm starting to wonder if some other relevant details got lost
alongthe way.
Thanks for taking a look and you're possibly correct.
After trying a few combinations, I see that passing
CFLAGS="-Wuninitialized" (default for my test setup) causes this failure.
Removing the flag gives the error you mention, and possibly why this
may not be easy to reproduce on a production system (unsure).
$ gcc --version
gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0
# How I trigger compilation
cd ${sourcepth} && git clean -xdf && ./configure
CFLAGS="-Wuninitialized" --prefix=${installpth} && make -j`nproc`
install ...
This is a recent crash on 69f75bf825@REL_12_STABLE
2022-12-20 10:24:53.361 UTC [3087004] LOG: server process (PID
3182365) was terminated by signal 11: Segmentation fault
2022-12-20 10:24:53.361 UTC [3087004] DETAIL: Failed process was
running: SELECT seg_in(numeric_out(round(31, 10000)));
2022-12-20 10:24:53.361 UTC [3087004] LOG: terminating any other
active server processes
2022-12-20 10:24:53.366 UTC [3087004] LOG: all server processes
terminated; reinitializing
I created this bug-report since I am able to reproduce this at will. But let
me know if this is uninteresting, or if I can provide any other detail to
help in triaging.
-
robins
Robins Tharakan <tharakan@gmail.com> writes:
> On Tue, 20 Dec 2022 at 20:44, John Naylor <john.naylor@enterprisedb.com> wrote:
>> Neither query shows the reported problem in my environment on master (as of today) or v14, so not sure
> After trying a few combinations, I see that passing
> CFLAGS="-Wuninitialized" (default for my test setup) causes this failure.
> Removing the flag gives the error you mention, and possibly why this
> may not be easy to reproduce on a production system (unsure).
I don't see a crash either, but I can't help observing that this
input leads to a "seg" struct with "-46" significant digits:
(gdb) p *seg
$3 = {lower = 31, upper = 31, l_sigd = -46 '\322', u_sigd = -46 '\322',
l_ext = 0 '\000', u_ext = 0 '\000'}
So we're invoking sprintf with a fairly insane precision spec:
939 sprintf(result, "%.*e", n - 1, val);
(gdb) p n
$4 = -46
(gdb) p val
$5 = 31
POSIX says "a negative precision is taken as if the precision were
omitted", and our code seems to do that, but I wonder if this is
managing to overrun the output buffer on your platform.
IMO:
1. The seg grammar needs to constrain the result of significant_digits()
to something that will fit in the allocated "char" field width.
It looks like some code paths there have clamps, but not all.
2. Because we might already have stored "seg" values with bogus
sigd values, restore() had better clamp the "n" value it's given
to something sane. I see it clamps large positive values, but
it's not worrying about zero-or-negative.
regards, tom lane
I wrote:
> I don't see a crash either, but I can't help observing that this
> input leads to a "seg" struct with "-46" significant digits:
> ...
> So we're invoking sprintf with a fairly insane precision spec:
Actually, it looks like sprintf is not the problem. This is:
(gdb)
984 buf[10 + n] = '\0';
(gdb) p n
$9 = -46
So first off, we're stomping on something we shouldn't, and
secondly we're failing to nul-terminate buf[], which easily
explains your observed crash at the strcpy a little further
down. On most platforms strcpy would find a nul byte not
too much further on, which might prevent the worst sorts
of damage, but this is still very ugly.
regards, tom lane