BUG #16673: Stack depth limit exceeded error while running sysbench TPC-C

Поиск
Список
Период
Сортировка
От PG Bug reporting form
Тема BUG #16673: Stack depth limit exceeded error while running sysbench TPC-C
Дата
Msg-id 16673-d278c604f8e34ec0@postgresql.org
обсуждение исходный текст
Ответы Re: BUG #16673: Stack depth limit exceeded error while running sysbench TPC-C  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      16673
Logged by:          David Geier
Email address:      david@swarm64.com
PostgreSQL version: 12.4
Operating system:   Ubuntu 18
Description:

Hi hackers,

I came across a "stack depth limit exceeded" error which I believe is a bug.
It occurs in parallel workers when running Percona's sysbench TPC-C
benchmark with a fairly large number of parallel workers and sysbench
threads on a beefy machine (e.g. two sockets with 32 cores each). I ran with
PostgreSQL 12.4. Here are the reproduction steps:

1. Apply PostgreSQL configuration attached to this bug report.

2. Install sysbench (>= 1.0.14) from Peronca repos
(https://www.percona.com/doc/percona-server/5.7/installation/apt_repo.html)

3. Install sysbench-tpcc by cloning
https://github.com/Percona-Lab/sysbench-tpcc.git

4. Checkout commit f9f581e7a2ee9b5e9ea8db621c07c03540d9fb79

5. Startup server and create a database called 'tpcc'

6. Run the prepare step: ./tpcc.lua --pgsql-db=tpcc --pgsql-user=postgres
--threads=64 --tables=1 --scale=100 --db-driver=pgsql --rand-seed=1
prepare

7. Set the parallel workers storage parameter of all tables in the 'tpcc'
database to 52.

ALTER TABLE customer1 SET (parallel_workers = 52);
ALTER TABLE district1 SET (parallel_workers = 52);
ALTER TABLE history1 SET (parallel_workers = 52);
ALTER TABLE item1 SET (parallel_workers = 52);
ALTER TABLE new_orders1 SET (parallel_workers = 52);
ALTER TABLE order_line1 SET (parallel_workers = 52);
ALTER TABLE orders1 SET (parallel_workers = 52);
ALTER TABLE stock1 SET (parallel_workers = 52);
ALTER TABLE warehouse1 SET (parallel_workers = 52);

8. Run the benchmark:  ./tpcc.lua --pgsql-db=tpcc --pgsql-user=postgres
--time=1800 --threads=64 --report-interval=1 --tables=1 --scale=100
--db-driver=pgsql --rand-seed=1 run

After some time (can be many seconds to minutes) the benchmark fails with
the following error:

FATAL: PQexec() failed: 7 stack depth limit exceeded
FATAL: failed query was: SELECT COUNT(DISTINCT (s_i_id))
                        FROM order_line1, stock1
                       WHERE ol_w_id = 82 
                         AND ol_d_id = 1
                         AND ol_o_id < 3495 
                         AND ol_o_id >= 3475
                         AND s_w_id= 82
                         AND s_i_id=ol_i_id 
                         AND s_quantity < 17

The corresponding EXPLAIN output for a similar query looks like:

                                                                  QUERY PLAN
                                                                  

-----------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=2038.32..2038.33 rows=1 width=8) (actual
time=47.873..50.475 rows=1 loops=1)
   ->  Gather  (cost=757.14..2036.86 rows=583 width=4) (actual
time=13.292..50.434 rows=18 loops=1)
         Workers Planned: 52
         Workers Launched: 52
         ->  Nested Loop  (cost=257.14..1531.03 rows=11 width=4) (actual
time=0.080..0.168 rows=0 loops=52)
               ->  Parallel Bitmap Heap Scan on order_line1
(cost=128.67..1022.01 rows=1 width=4) (actual time=0.058..0.066 rows=4
loops=52)
                     Recheck Cond: ((ol_w_id = 64) AND (ol_d_id = 1) AND
(ol_o_id < 5707) AND (ol_o_id >= 5687))
                     ->  Bitmap Index Scan on order_line1_pkey
(cost=0.00..128.67 rows=7 width=0) (actual time=0.545..0.546 rows=213
loops=1)
                           Index Cond: ((ol_w_id = 64) AND (ol_d_id = 1) AND
(ol_o_id < 5707) AND (ol_o_id >= 5687))
               ->  Bitmap Heap Scan on stock1  (cost=128.47..31403.58
rows=83 width=4) (actual time=0.022..0.022 rows=0 loops=213)
                     Recheck Cond: ((s_w_id = 64) AND (s_i_id =
order_line1.ol_i_id))
                     Filter: (s_quantity < 10)
                     Rows Removed by Filter: 1
                     ->  Bitmap Index Scan on stock1_pkey
(cost=0.00..128.44 rows=250 width=0) (actual time=0.016..0.016 rows=1
loops=213)
                           Index Cond: ((s_w_id = 64) AND (s_i_id =
order_line1.ol_i_id))
 Planning Time: 0.664 ms
 Execution Time: 50.565 ms


The error occurs already with the default stack depth of 7 MiB but it occurs
quicker with a lower stack depth (e.g. 100 KiB). The reproduction steps are
as follows:

I also obtained a stack trace. Because gdb always hung up I ended up loading
an extension which uses the 'emit_log_hook' to sleep so that I could attach
a debugger when "stack depth limited exceeded" occurs. The stack trace is
also attached to this bug report.

Cheers,
David




PostgreSQL configuration:

autovacuum = off
min_parallel_table_scan_size = 0
cluster_name = '11/main'
datestyle = 'iso, mdy'
default_statistics_target = 2500
default_text_search_config = 'pg_catalog.english'
dynamic_shared_memory_type = posix
force_parallel_mode = False
lc_messages = 'en_US.UTF-8'
lc_monetary = 'en_US.UTF-8'
lc_numeric = 'en_US.UTF-8'
lc_time = 'en_US.UTF-8'
listen_addresses = '*'
log_line_prefix = '%t [%p]: [%l-1] user=%u,db=%d,app=%a,client=%h '
log_min_messages = WARNING
log_timezone = UTC
maintenance_work_mem = 12GB
max_stack_depth = 100
max_wal_senders = 0
max_wal_size = 12GB
port = 5432
#shared_preload_libraries = 'swarm64da.so'
ssl = False
synchronous_commit = False
timezone = UTC
unix_socket_directories = '/var/run/postgresql'
wal_level = minimal
max_connections = 2000
parallel_leader_participation = False
parallel_setup_cost = 500
parallel_tuple_cost = 0.01
random_page_cost = 128.0
seq_page_cost = 4.0
effective_io_concurrency = 128
enable_partition_pruning = True
enable_partitionwise_aggregate = True
enable_partitionwise_join = True
jit = true
max_worker_processes = 1024
max_parallel_workers = 1024
max_parallel_workers_per_gather = 52
shared_buffers = 48GB
temp_buffers = 2GB
work_mem = 2GB
effective_cache_size = 512GB


Stack trace:

#0  0x00007f63a79ccc41 in __GI___nanosleep
(requested_time=requested_time@entry=0x7ffd0382faf0,
remaining=remaining@entry=0x7ffd0382faf0) at
../sysdeps/unix/sysv/linux/nanosleep.c:28
#1  0x00007f639d78de89 in std::this_thread::sleep_for<long, std::ratio<1l,
1l> > (__rtime=..., __rtime=...) at /usr/include/c++/10/thread:405
#2  my_emit_log_hook (edata=<optimized out>) at
/opt/swarm64/dev/db/src/plugin/public_functions.cpp:167
#3  my_emit_log_hook (edata=<optimized out>) at
/opt/swarm64/dev/db/src/plugin/public_functions.cpp:158
#4  0x000055ff8d717c25 in EmitErrorReport () at
./build/../src/backend/utils/error/elog.c:1452
#5  0x000055ff8d54161b in StartBackgroundWorker () at
./build/../src/backend/postmaster/bgworker.c:778
#6  0x000055ff8d5507c5 in do_start_bgworker (rw=rw@entry=0x55ff8f1d4940) at
./build/../src/backend/postmaster/postmaster.c:5781
#7  0x000055ff8d55093c in maybe_start_bgworkers () at
./build/../src/backend/postmaster/postmaster.c:6007
#8  0x000055ff8d551299 in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5178
#9  <signal handler called>
#10 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#11 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#12 <signal handler called>
#13 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#14 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#15 <signal handler called>
#16 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#17 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#18 <signal handler called>
#19 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#20 0x000055ff8d5511e9 in reaper (postgres_signal_arg=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:3198
#21 <signal handler called>
#22 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#23 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#24 <signal handler called>
#25 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#26 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#27 <signal handler called>
#28 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#29 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#30 <signal handler called>
#31 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#32 0x000055ff8d5511e9 in reaper (postgres_signal_arg=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:3198
#33 <signal handler called>
#34 sigusr1_handler (postgres_signal_arg=10) at
./build/../src/backend/postmaster/postmaster.c:5097
#35 <signal handler called>
#36 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#37 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#38 <signal handler called>
#39 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#40 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#41 <signal handler called>
#42 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#43 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#44 <signal handler called>
#45 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#46 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#47 <signal handler called>
#48 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#49 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#50 <signal handler called>
#51 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#52 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#53 <signal handler called>
#54 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#55 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#56 <signal handler called>
#57 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#58 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#59 <signal handler called>
#60 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#61 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
--Type <RET> for more, q to quit, c to continue without paging--
#62 <signal handler called>
#63 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#64 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#65 <signal handler called>
#66 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#67 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#68 <signal handler called>
#69 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#70 0x000055ff8d5511e9 in reaper (postgres_signal_arg=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:3198
#71 <signal handler called>
#72 sigusr1_handler (postgres_signal_arg=10) at
./build/../src/backend/postmaster/postmaster.c:5097
#73 <signal handler called>
#74 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#75 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#76 <signal handler called>
#77 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#78 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#79 <signal handler called>
#80 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#81 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#82 <signal handler called>
#83 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#84 0x000055ff8d5511e9 in reaper (postgres_signal_arg=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:3198
#85 <signal handler called>
#86 sigusr1_handler (postgres_signal_arg=10) at
./build/../src/backend/postmaster/postmaster.c:5097
#87 <signal handler called>
#88 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#89 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#90 <signal handler called>
#91 0x00007f63a51be1fd in __GI___sigprocmask (how=how@entry=2,
set=0x55ff8dc74aa0 <UnBlockSig>, oset=oset@entry=0x0) at
../sysdeps/unix/sysv/linux/x86_64/sigprocmask.c:36
#92 0x000055ff8d55135e in sigusr1_handler (postgres_signal_arg=<optimized
out>) at ./build/../src/backend/postmaster/postmaster.c:5258
#93 <signal handler called>
#94 0x00007f63a52960f7 in __GI___select (nfds=nfds@entry=8,
readfds=readfds@entry=0x7ffd03848490, writefds=writefds@entry=0x0,
exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffd03848400)
    at ../sysdeps/unix/sysv/linux/select.c:41
#95 0x000055ff8d5519d9 in ServerLoop () at
./build/../src/backend/postmaster/postmaster.c:1668
#96 0x000055ff8d552efa in PostmasterMain (argc=5, argv=<optimized out>) at
./build/../src/backend/postmaster/postmaster.c:1377
#97 0x000055ff8d4a9aba in main (argc=5, argv=0x55ff8f171060) at
./build/../src/backend/main/main.c:228


В списке pgsql-bugs по дате отправления:

Предыдущее
От: "denni.pat"
Дата:
Сообщение: Re: BUG #16663: DROP INDEX did not free up disk space: idle connection hold file marked as deleted
Следующее
От: Tom Lane
Дата:
Сообщение: Re: BUG #16673: Stack depth limit exceeded error while running sysbench TPC-C