Обсуждение: database crash during pgbench run
Hi,
I have a very strong machine with 64GB of ram and 19 cpu but it seems that when I'm running the next benchmark test with pg_bench the database is crashing :
createdb -U postgres bench
pgbench -i -s 50 -U postgres -d bench
pgbench -U postgres -d bench -c 10 -t 10000
output :
client 8 receiving
FATAL: terminating connection due to administrator command
client 8 sending UPDATE pgbench_accounts SET abalance = abalance + -1542 WHERE aid = 1142155;
client 8 could not send UPDATE pgbench_accounts SET abalance = abalance + :delta WHERE aid = :aid;
invalid socket: server closed the connection unexpectedly
This probably means the server terminated abnormally
before or while processing the request.
transaction type: <builtin: TPC-B (sort of)>
scaling factor: 50
query mode: simple
number of clients: 10
number of threads: 1
number of transactions per client: 10000
number of transactions actually processed: 89241/100000
latency average = 27.944 ms
tps = 357.864437 (including connections establishing)
tps = 357.871594 (excluding connections establishing)
it crashes after some time and not immediately.
output from logs :
2018-12-10 19:11:12 IST 505 LOG: automatic analyze of table "bench.public.pgbench_branches" system usage: CPU 0.00s/0.00u sec elapsed 0.01 sec
2018-12-10 19:11:12 IST 505 LOG: automatic analyze of table "bench.public.pgbench_history" system usage: CPU 0.00s/0.05u sec elapsed 0.10 sec
2018-12-10 19:11:14 IST bench 25045 LOG: duration: 1451.819 ms statement: UPDATE pgbench_branches SET bbalance = bbalance + -4059 WHERE bid = 14;
2018-12-10 19:11:40 IST bench 25049 LOG: duration: 1039.710 ms statement: UPDATE pgbench_tellers SET tbalance = tbalance + 3596 WHERE tid = 403;
2018-12-10 19:11:56 IST 23647 LOG: received fast shutdown request
2018-12-10 19:11:56 IST 23647 LOG: aborting any active transactions
2018-12-10 19:11:56 IST bench 25051 FATAL: terminating connection due to administrator command
2018-12-10 19:11:56 IST bench 25049 FATAL: terminating connection due to administrator command
2018-12-10 19:11:56 IST sadas 27765 FATAL: terminating connection due to administrator command
2018-12-10 19:11:56 IST bench 25050 FATAL: terminating connection due to administrator command
2018-12-10 19:11:56 IST 23654 LOG: autovacuum launcher shutting down
2018-12-10 19:11:56 IST sadas 24821 FATAL: terminating connection due to administrator command
2018-12-10 19:11:56 IST hadr 24814 FATAL: terminating connection due to administrator command
2018-12-10 19:11:56 IST bench 25047 FATAL: terminating connection due to administrator command
2018-12-10 19:11:56 IST bench 25048 FATAL: terminating connection due to administrator command
2018-12-10 19:11:56 IST hadr 24065 FATAL: terminating connection due to administrator command
2018-12-10 19:11:56 IST postgres 24818 FATAL: terminating connection due to administrator command
2018-12-10 19:11:56 IST postgres 24819 FATAL: terminating connection due to administrator command
2018-12-10 19:11:56 IST hadr 24812 FATAL: terminating connection due to administrator command
2018-12-10 19:11:56 IST postgres 24817 FATAL: terminating connection due to administrator command
2018-12-10 19:11:56 IST bench 25046 FATAL: terminating connection due to administrator command
2018-12-10 19:11:56 IST hadr 24813 FATAL: terminating connection due to administrator command
2018-12-10 19:11:56 IST postgres 24816 FATAL: terminating connection due to administrator command
2018-12-10 19:11:56 IST 23651 LOG: shutting down
2018-12-10 19:11:56 IST 23651 LOG: checkpoint starting: shutdown immediate
2018-12-10 19:11:59 IST 23651 LOG: checkpoint complete: wrote 69557 buffers (4.1%); 0 transaction log file(s) added, 0 removed, 0 recycled; write=2.800 s, sync=0.045 s, total=2.877 s; sync files=23, longest=0.045 s, average=0.001 s; distance=573364 kB, estimate=573364 kB
2018-12-10 19:11:59 IST 23647 LOG: database system is shut down
2018-12-10 19:12:11 IST 2641 LOG: database system was shut down at 2018-12-10 19:11:59 IST
2018-12-10 19:12:11 IST 2641 LOG: MultiXact member wraparound protections are now enabled
2018-12-10 19:12:11 IST 2638 LOG: database system is ready to accept connections
2018-12-10 19:12:11 IST 2645 LOG: autovacuum launcher started
2018-12-10 19:12:17 IST 2692 LOG: automatic vacuum of table "bench.public.pgbench_tellers": index scans: 0
pages: 0 removed, 13 remain, 0 skipped due to pins, 0 skipped frozen
some conf parameters :
listen_addresses = '*'
maintenance_work_mem = 128MB
work_mem = 53MB
shared_buffers = 13411MB
effective_cache_size = 32278MB
max_wal_size = 1440MB
wal_buffers = 16MB
checkpoint_completion_target = 0.9
standard_conforming_strings = off
max_locks_per_transaction = 5000
max_connections = 1200
checkpoint_timeout = 30min
random_page_cost = 2.0
any idea what can cause it ?
Mariel Cherkassky <mariel.cherkassky@gmail.com> writes: > 2018-12-10 19:11:56 IST 23647 LOG: received fast shutdown request > any idea what can cause it ? Something sent SIGINT to the postmaster. regards, tom lane
> > 2018-12-10 19:11:56 IST 23647 LOG: received fast shutdown request > > any idea what can cause it ? > >Something sent SIGINT to the postmaster. My money is on the OoM (Out of Memory) killer. The standard PDGD install on CentOS should disable that, but I'm not surewhat OS you're on or how PostgreSQL was installed. Greg Clough. ________________________________ This e-mail, including accompanying communications and attachments, is strictly confidential and only for the intended recipient.Any retention, use or disclosure not expressly authorised by IHSMarkit is prohibited. This email is subject toall waivers and other terms at the following link: https://ihsmarkit.com/Legal/EmailDisclaimer.html Please visit www.ihsmarkit.com/about/contact-us.html for contact information on our offices worldwide.
Greg Clough <Greg.Clough@ihsmarkit.com> writes: >>> 2018-12-10 19:11:56 IST 23647 LOG: received fast shutdown request >>> any idea what can cause it ? >> Something sent SIGINT to the postmaster. > My money is on the OoM (Out of Memory) killer. That usually uses SIGKILL. If I had to guess, I'd wonder whether the postmaster was manually started, and if so whether it was properly dissociated from the user's terminal (with nohup or the like). If it wasn't, then a control-C typed at the terminal would SIGINT the postmaster as well as whatever it was meant to terminate. regards, tom lane
On Tue, Dec 11, 2018 at 10:01 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Greg Clough <Greg.Clough@ihsmarkit.com> writes: > >>> 2018-12-10 19:11:56 IST 23647 LOG: received fast shutdown request > >>> any idea what can cause it ? > > >> Something sent SIGINT to the postmaster. > > > My money is on the OoM (Out of Memory) killer. > > That usually uses SIGKILL. If I had to guess, I'd wonder whether the > postmaster was manually started, and if so whether it was properly > dissociated from the user's terminal (with nohup or the like). > If it wasn't, then a control-C typed at the terminal would SIGINT the > postmaster as well as whatever it was meant to terminate. Yeah. To add to this, pgbench runs are extremely unlikely to cause the kind of memory consumption issues that would trigger an OOM. This is definitely not a database crash, just some kind of administrative problem. Some things that might be helpful to help figure this out: *) What o/s *) how was the database installed *) how exactly did the database start *) are we looking at something exotic here (cloud managed postgres, exotic storage, etc) merlin