Обсуждение: BUG #17372: Altering statistics during concurrent drop can lead to a server crash

Поиск
Список
Период
Сортировка

BUG #17372: Altering statistics during concurrent drop can lead to a server crash

От
PG Bug reporting form
Дата:
The following bug has been logged on the website:

Bug reference:      17372
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 14.1
Operating system:   Ubuntu 20.04
Description:

The following script (based on the stats_ext test):
psql -c "CREATE TABLE ab1 (a INTEGER, b INTEGER, c INTEGER);"
for i in `seq 100`; do
echo "iteration $i"
( { for n in `seq 20`; do echo "CREATE STATISTICS ab1_a_b_stats ON a, b FROM
ab1; DROP STATISTICS ab1_a_b_stats;"; done } | psql ) >psql1.log 2>&1 &
( { for n in `seq 20`; do echo "ALTER STATISTICS ab1_a_b_stats SET
STATISTICS -1;"; done } | psql ) >psql2.log 2>&1 &

wait
sleep 2
coredumpctl --no-pager && break
done

causes the server crash with the following stacktrace:
Core was generated by `postgres: law regression [local] ALTER STATISTICS
                        '.
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x000055c5f56ff405 in heap_deform_tuple (tuple=0x0,
tupleDesc=0x55c5f6c96a40, values=0x55c5f6c4a1a0, 
    isnull=0x55c5f6c45e08) at heaptuple.c:1252
1252            HeapTupleHeader tup = tuple->t_data;
(gdb) bt
#0  0x000055c5f56ff405 in heap_deform_tuple (tuple=0x0,
tupleDesc=0x55c5f6c96a40, values=0x55c5f6c4a1a0, 
    isnull=0x55c5f6c45e08) at heaptuple.c:1252
#1  0x000055c5f56ff127 in heap_modify_tuple (tuple=0x0,
tupleDesc=0x55c5f6c96a40, replValues=0x7ffe4d9f8ee0, 
    replIsnull=0x7ffe4d9f8ece, doReplace=0x7ffe4d9f8ed7) at
heaptuple.c:1139
#2  0x000055c5f597293a in AlterStatistics (stmt=0x55c5f6c22360) at
statscmds.c:695
#3  0x000055c5f5c679ff in ProcessUtilitySlow (pstate=0x55c5f6c431b0,
pstmt=0x55c5f6c226b0, 
    queryString=0x55c5f6c218a0 "ALTER STATISTICS ab1_a_b_stats SET
STATISTICS -1;", context=PROCESS_UTILITY_TOPLEVEL, 
    params=0x0, queryEnv=0x0, dest=0x55c5f6c227a0, qc=0x7ffe4d9f9700) at
utility.c:1883
#4  0x000055c5f5c65c13 in standard_ProcessUtility (pstmt=0x55c5f6c226b0, 
    queryString=0x55c5f6c218a0 "ALTER STATISTICS ab1_a_b_stats SET
STATISTICS -1;", readOnlyTree=false, 
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x55c5f6c227a0, qc=0x7ffe4d9f9700)
    at utility.c:1066
#5  0x000055c5f5c64b83 in ProcessUtility (pstmt=0x55c5f6c226b0, 
    queryString=0x55c5f6c218a0 "ALTER STATISTICS ab1_a_b_stats SET
STATISTICS -1;", readOnlyTree=false, 
    context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0,
dest=0x55c5f6c227a0, qc=0x7ffe4d9f9700) at utility.c:527
#6  0x000055c5f5c63422 in PortalRunUtility (portal=0x55c5f6c84210,
pstmt=0x55c5f6c226b0, isTopLevel=true, 
    setHoldSnapshot=false, dest=0x55c5f6c227a0, qc=0x7ffe4d9f9700) at
pquery.c:1155
#7  0x000055c5f5c636ad in PortalRunMulti (portal=0x55c5f6c84210,
isTopLevel=true, setHoldSnapshot=false, 
    dest=0x55c5f6c227a0, altdest=0x55c5f6c227a0, qc=0x7ffe4d9f9700) at
pquery.c:1312
#8  0x000055c5f5c62adc in PortalRun (portal=0x55c5f6c84210,
count=9223372036854775807, isTopLevel=true, run_once=true, 
    dest=0x55c5f6c227a0, altdest=0x55c5f6c227a0, qc=0x7ffe4d9f9700) at
pquery.c:788
#9  0x000055c5f5c5b918 in exec_simple_query (
    query_string=0x55c5f6c218a0 "ALTER STATISTICS ab1_a_b_stats SET
STATISTICS -1;") at postgres.c:1214
#10 0x000055c5f5c60807 in PostgresMain (argc=1, argv=0x7ffe4d9f9920,
dbname=0x55c5f6c4d9f8 "regression", 
    username=0x55c5f6c4d9d8 "law") at postgres.c:4486
#11 0x000055c5f5b84fd6 in BackendRun (port=0x55c5f6c46300) at
postmaster.c:4530
#12 0x000055c5f5b84831 in BackendStartup (port=0x55c5f6c46300) at
postmaster.c:4252
#13 0x000055c5f5b80626 in ServerLoop () at postmaster.c:1745
#14 0x000055c5f5b7fd83 in PostmasterMain (argc=3, argv=0x55c5f6c1b910) at
postmaster.c:1417
#15 0x000055c5f5a6f2e0 in main (argc=3, argv=0x55c5f6c1b910) at main.c:209


Re: BUG #17372: Altering statistics during concurrent drop can lead to a server crash

От
Tomas Vondra
Дата:
On 1/20/22 05:00, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      17372
> Logged by:          Alexander Lakhin
> Email address:      exclusion@gmail.com
> PostgreSQL version: 14.1
> Operating system:   Ubuntu 20.04
> Description:        
> 
> The following script (based on the stats_ext test):
> psql -c "CREATE TABLE ab1 (a INTEGER, b INTEGER, c INTEGER);"
> for i in `seq 100`; do
> echo "iteration $i"
> ( { for n in `seq 20`; do echo "CREATE STATISTICS ab1_a_b_stats ON a, b FROM
> ab1; DROP STATISTICS ab1_a_b_stats;"; done } | psql ) >psql1.log 2>&1 &
> ( { for n in `seq 20`; do echo "ALTER STATISTICS ab1_a_b_stats SET
> STATISTICS -1;"; done } | psql ) >psql2.log 2>&1 &
> 
> wait
> sleep 2
> coredumpctl --no-pager && break
> done
> 

Thanks for the report - I've reproduced it using your script (I had to
add a short wait before the syscache lookup). The issue is fairly
simple, we never check we actually found a tuple in the syscache, i.e.
there's no HeapTupleIsValid() call. We check the OID earlier, but the
tuple may have disappeared since then.

IMO the usual "cache lookup failed" error that we throw in other places
in similar cases is good enough. We already fail with other errors
(tuple concurrently updated/deleted) here.

Patch attached.


regards

-- 
Tomas Vondra
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
Вложения

Re: BUG #17372: Altering statistics during concurrent drop can lead to a server crash

От
Alexander Lakhin
Дата:
Hello Tomas,
20.01.2022 20:57, Tomas Vondra wrote:
> Thanks for the report - I've reproduced it using your script (I had to
> add a short wait before the syscache lookup). The issue is fairly
> simple, we never check we actually found a tuple in the syscache, i.e.
> there's no HeapTupleIsValid() call. We check the OID earlier, but the
> tuple may have disappeared since then.
>
> IMO the usual "cache lookup failed" error that we throw in other places
> in similar cases is good enough. We already fail with other errors
> (tuple concurrently updated/deleted) here.
Yes, the rough, but simple search:
grep -Porz
'.*=\s*SearchSysCache1.*\n(?!(.*|.*\n.*|.*\n.*\n.*|.*\n.*\n.*\n.*)HeapTupleIsValid)'
./
finds only this place.
Your fix works for me.

Thanks!

Best regards,
Alexander