Hello hackers,
I've recently hit the message "WARNING: pgstat wait timeout" with PG
8.4.2.
I saw some reports about that message in the -bugs mailing list
http://archives.postgresql.org/pgsql-bugs/2009-12/msg00175.php
http://archives.postgresql.org/pgsql-bugs/2009-07/msg00081.php
where the backtrace from the statisctic collector was requested.
Although I don't have any other bad sympthoms in the system, I still
obtained a backtrace from the statistics collector process. Since I'm not
100% sure that the message is really a bug, feel free to ignore. But
if needed I have PG still running, so I can check something else if
needed.
Here is the (rather innocent IMHO) backtrace of the statistic collector
process:
(gdb) bt
#0 0x00007f31ddfc4b1f in poll () from /lib/libc.so.6
#1 0x00000000005bf7da in PgstatCollectorMain (argc=<value optimized out>,
# argv=<value optimized out>) at pgstat.c:2718
#2 0x00000000005c0131 in pgstat_start () at pgstat.c:631
#3 0x00000000005c474d in reaper (postgres_signal_arg=<value optimized
out>)
#at postmaster.c:2322
#4 <signal handler called>
#5 0x00007f31ddfc6c83 in select () from /lib/libc.so.6
#6 0x00000000005c20fc in ServerLoop () at postmaster.c:1347
#7 0x00000000005c34a7 in PostmasterMain (argc=3, argv=0x144fd20) at
#postmaster.c:1040
#8 0x000000000056cdc8 in main (argc=3, argv=0x144fd20) at main.c:188
(gdb) quit
The program is running. Quit anyway (and detach it)? (y or n) y
Detaching from program: /opt/pgsql/bin/postgres, process 24677
-------------
Bt full:
(gdb) bt full
#0 0x00007f31ddfc4b1f in poll () from /lib/libc.so.6
No symbol table info available.
#1 0x00000000005bf7da in PgstatCollectorMain (argc=<value optimized out>, argv=<value optimized out>) at
pgstat.c:2718 len = 64 msg = {msg_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, msg_dummy = {m_hdr
={ m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}}, msg_inquiry = {m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER,
m_size= 64}, inquiry_time = 0}, msg_tabstat = {m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid =
0,m_nentries = 0, m_xact_commit = 0, m_xact_rollback = 0, m_entry = {{t_id = 0, t_counts = { t_numscans = 0,
t_tuples_returned= 0, t_tuples_fetched = 138, t_tuples_inserted = 138, t_tuples_updated = 0, t_tuples_deleted
=0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0, t_blocks_fetched = 2, t_blocks_hit
=2}}, {t_id = 2672, t_counts = {t_numscans = 1, t_tuples_returned = 1, t_tuples_fetched = 1, t_tuples_inserted
=0, t_tuples_updated = 0, t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0,
t_new_dead_tuples= 0, t_blocks_fetched = 2, t_blocks_hit = 2}}, {t_id = 1259, t_counts = {t_numscans = 4,
t_tuples_returned = 553, t_tuples_fetched = 0, t_tuples_inserted = 0, t_tuples_updated = 0,
t_tuples_deleted= 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0, t_blocks_fetched
=24, t_blocks_hit = 24}}, {t_id = 2615, t_counts = {t_numscans = 0, t_tuples_returned = 0, t_tuples_fetched =
0,t_tuples_inserted = 0, t_tuples_updated = 0, t_tuples_deleted = 0, t_tuples_hot_updated = 0,
t_new_live_tuples= 0, t_new_dead_tuples = 0, t_blocks_fetched = 1, t_blocks_hit = 1}}, {t_id = 2685, t_counts
={t_numscans = 1, t_tuples_returned = 1, t_tuples_fetched = 1, t_tuples_inserted = 0, t_tuples_updated = 0,
t_tuples_deleted= 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0, t_blocks_fetched
=2, t_blocks_hit = 2}}, {t_id = 1172815, t_counts = {t_numscans = 0, t_tuples_returned = 0,
t_tuples_fetched= 0, t_tuples_inserted = 500000, t_tuples_updated = 0, t_tuples_deleted = 0,
t_tuples_hot_updated= 0, t_new_live_tuples = 500000, t_new_dead_tuples = 0, t_blocks_fetched = 23077,
t_blocks_hit= 15381}}, {t_id = 1172684, t_counts = {t_numscans = 0, t_tuples_returned = 0, msg = {msg_hdr =
{m_type= PGSTAT_MTYPE_BGWRITER, m_size = 64}, msg_dummy = {m_hdr = { m_type = PGSTAT_MTYPE_BGWRITER, m_size =
64}},msg_inquiry = {m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, inquiry_time = 0}, msg_tabstat = {m_hdr
={m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0, m_nentries = 0, m_xact_commit = 0, m_xact_rollback
=0, m_entry = {{t_id = 0, t_counts = { t_numscans = 0, t_tuples_returned = 0, t_tuples_fetched = 138,
t_tuples_inserted= 138, t_tuples_updated = 0, t_tuples_deleted = 0, t_tuples_hot_updated = 0,
t_new_live_tuples= 0, t_new_dead_tuples = 0, t_blocks_fetched = 2, t_blocks_hit = 2}}, {t_id = 2672, t_counts
={t_numscans = 1, t_tuples_returned = 1, t_tuples_fetched = 1, t_tuples_inserted = 0, t_tuples_updated = 0,
t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0,
t_blocks_fetched= 2, t_blocks_hit = 2}}, {t_id = 1259, t_counts = {t_numscans = 4, t_tuples_returned = 553,
t_tuples_fetched= 0, t_tuples_inserted = 0, t_tuples_updated = 0, t_tuples_deleted = 0, t_tuples_hot_updated =
0,t_new_live_tuples = 0, t_new_dead_tuples = 0, t_blocks_fetched = 24, t_blocks_hit = 24}}, {t_id = 2615,
t_counts= {t_numscans = 0, t_tuples_returned = 0, t_tuples_fetched = 0, t_tuples_inserted = 0,
t_tuples_updated= 0, t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples
=0, t_blocks_fetched = 1, t_blocks_hit = 1}}, {t_id = 2685, t_counts = {t_numscans = 1, t_tuples_returned = 1,
t_tuples_fetched = 1, t_tuples_inserted = 0, t_tuples_updated = 0, t_tuples_deleted = 0,
t_tuples_hot_updated= 0, t_new_live_tuples = 0, t_new_dead_tuples = 0, t_blocks_fetched = 2, t_blocks_hit =
2}},{t_id = 1172815, t_counts = {t_numscans = 0, t_tuples_returned = 0, t_tuples_fetched = 0,
t_tuples_inserted= 500000, t_tuples_updated = 0, t_tuples_deleted = 0, t_tuples_hot_updated = 0,
t_new_live_tuples= 500000, t_new_dead_tuples = 0, t_blocks_fetched = 23077, t_blocks_hit = 15381}}, {t_id =
1172684,t_counts = {t_numscans = 0, t_tuples_returned = 0, t_tuples_fetched = 0, t_tuples_inserted = 0,
t_tuples_updated= 0, t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples
=0, t_blocks_fetched = 1018, t_blocks_hit = 1017}}, {t_id = 2678, t_counts = {t_numscans = 5,
t_tuples_returned= 1, t_tuples_fetched = 1, t_tuples_inserted = 0, t_tuples_updated = 0, t_tuples_deleted = 0,
t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0, t_blocks_fetched = 5,
t_blocks_hit= 5}}, {t_id = 2619, t_counts = {t_numscans = 0, t_tuples_returned = 0, t_tuples_fetched = 0,
t_tuples_inserted= 0, t_tuples_updated = 24, t_tuples_deleted = 0, t_tuples_hot_updated = 24,
t_new_live_tuples= 0, t_new_dead_tuples = 9, t_blocks_fetched = 72, t_blocks_hit = 72}}, {t_id = 2696, t_counts
={t_numscans = 48, t_tuples_returned = 48, t_tuples_fetched = 48, t_tuples_inserted = 0, t_tuples_updated = 0,
t_tuples_deleted= 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0, t_blocks_fetched
=96, t_blocks_hit = 96}}}}, msg_tabpurge = {m_hdr = { m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid =
0,m_nentries = 0, m_tableid = {0, 0, 0, 0, 0, 0, 0, 0, 138, 0, 138, 0 <repeats 11 times>, 2, 0, 2, 0, 2672, 32767,
1,0, 1, 0, 1, 0 <repeats 13 times>, 2, 0, 2, 0, 1259, 0, 4, 0, 553, 0 <repeats 15 times>, 24, 0, 24, 0, 2615, 0
<repeats19 times>, 1, 0, 1, 0, 2685, 0, 1, 0, 1, 0, 1, 0 <repeats 13 times>, 2, 0, 2, 0, 1172815, 32561, 0, 0, 0,
0,0, 0, 500000, 0, 0, 0, 0, 0, 0, 0, 500000, 0, 0, 0, 23077, 0, 15381, 0, 1172684, 0 <repeats 19 times>, 1018, 0,
1017,0, 2678, 0, 5, 0, 1, 0, 1, 0 <repeats 13 times>, 5, 0, 5, 0, 2619, 3, 0, 0, 0, 0, 0, 0, 0, 0, 24, 0, 0, 0,
24,0, 0, 0, 9, 0, 72, 0,
---Type <return> to continue, or q <return> to quit--- 72, 0, 2696, 32767, 48, 0, 48, 0, 48, 0 <repeats 13
times>...}},msg_dropdb = {m_hdr = { m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0},
msg_resetcounter= {m_hdr = { m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0}, msg_autovacuum =
{m_hdr= { m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0, m_start_time = 0}, msg_vacuum = {m_hdr =
{ m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0, m_tableoid = 0, m_analyze = 0 '\0',
m_autovacuum= 0 '\0', m_scanned_all = 0 '\0', m_vacuumtime = 0, m_tuples = 0}, msg_analyze = {m_hdr = { m_type =
PGSTAT_MTYPE_BGWRITER,m_size = 64}, m_databaseid = 0, m_tableoid = 0, m_autovacuum = 0 '\0', m_analyzetime = 0,
m_live_tuples= 0, m_dead_tuples = 0}, msg_bgwriter = {m_hdr = { m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64},
m_timed_checkpoints= 0, m_requested_checkpoints = 0, m_buf_written_checkpoints = 0, m_buf_written_clean = 0,
m_maxwritten_clean= 0, m_buf_written_backend = 138, m_buf_alloc = 138}, msg_funcstat = {m_hdr = {m_type =
PGSTAT_MTYPE_BGWRITER,m_size = 64}, m_databaseid = 0, m_nentries = 0, m_entry = {{f_id = 0, f_numcalls = 0, f_time =
0,f_time_self = 0}, {f_id = 138, f_numcalls = 138, f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0,
f_time= 0, f_time_self = 2}, { f_id = 2, f_numcalls = 140733193390704, f_time = 1, f_time_self = 1}, {f_id = 1,
f_numcalls= 0, f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 2}, {f_id = 2,
f_numcalls= 1259, f_time = 4, f_time_self = 553}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 0}, {f_id
=0, f_numcalls = 0, f_time = 0, f_time_self = 24}, {f_id = 24, f_numcalls = 2615, f_time = 0, f_time_self = 0},
{ f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0,
f_time_self= 1}, {f_id = 1, f_numcalls = 2685, f_time = 1, f_time_self = 1}, {f_id = 1, f_numcalls = 0, f_time =
0,f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 2}, {f_id = 2, f_numcalls =
139848431297871,f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 500000, f_time = 0, f_time_self = 0},
{f_id= 0, f_numcalls = 500000, f_time = 0, f_time_self = 23077}, {f_id = 15381, f_numcalls = 1172684, f_time =
0,f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time =
0,f_time_self = 1018}, {f_id = 1017, f_numcalls = 2678, f_time = 5, f_time_self = 1}, {f_id = 1, f_numcalls = 0,
f_time= 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 5}, {f_id = 5, f_numcalls =
12884904507,f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 24, f_time_self = 0}, {f_id = 24,
f_numcalls= 0, f_time = 9, f_time_self = 72}, { f_id = 72, f_numcalls = 140733193390728, f_time = 48,
f_time_self= 48}, {f_id = 48, f_numcalls = 0, f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time =
0,f_time_self = 96}}}, msg_funcpurge = { m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0,
m_nentries= 0, m_functionid = {0, 0, 0, 0, 0, 0, 0, 0, 138, 0, 138, 0 <repeats 11 times>, 2, 0, 2, 0, 2672, 32767,
1,0, 1, 0, 1, 0 <repeats 13 times>, 2, 0, 2, 0, 1259, 0, 4, 0, 553, 0 <repeats 15 times>, 24, 0, 24, 0, 2615,
0<repeats 19 times>, 1, 0, 1, 0, 2685, 0, 1, 0, 1, 0, 1, 0 <repeats 13 times>, 2, 0, 2, 0, 1172815, 32561, 0, 0,
0,0, 0, 0, 500000, 0, 0, 0, 0, 0, 0, 0, 500000, 0, 0, 0, 23077, 0, 15381, 0, 1172684, 0 <repeats 19 times>, 1018,
0,1017, 0, 2678, 0, 5, 0, 1, 0, 1, 0 <repeats 13 times>, 5, 0, 5, 0, 2619, 3, 0, 0, 0, 0, 0, 0, 0, 0, 24, 0, 0, 0,
24,0, 0, 0, 9, 0, 72, 0, 72, 0, 2696, 32767, 48, 0, 48, 0, 48, 0 <repeats 13 times>...}}} input_fd = {fd =
8,events = 9, revents = 0} __func__ = "PgstatCollectorMain"
#2 0x00000000005c0131 in pgstat_start () at pgstat.c:631 curtime = <value optimized out> pgStatPid = 0
__func__ = "pgstat_start"
#3 0x00000000005c474d in reaper (postgres_signal_arg=<value optimized out>) at postmaster.c:2322
---Type <return> to continue, or q <return> to quit--- save_errno = 17 pid = 24673 exitstatus = 0
status = 0 __func__ = "reaper"
#4 <signal handler called>
#4 <signal handler called>
No symbol table info available.
#5 0x00007f31ddfc6c83 in select () from /lib/libc.so.6
No symbol table info available.
#6 0x00000000005c20fc in ServerLoop () at postmaster.c:1347 timeout = {tv_sec = 59, tv_usec = 916269}
rmask= {fds_bits = {120, 0 <repeats 15 times>}} selres = <value optimized out> readmask = {fds_bits =
{120,0 <repeats 15 times>}} nSockets = 7 now = 64 last_touch_time = 1264024914 __func__ =
"ServerLoop"
#7 0x00000000005c34a7 in PostmasterMain (argc=3, argv=0x144fd20) at postmaster.c:1040 opt = <value optimized
out> status = <value optimized out> userDoption = <value optimized out> __func__ =
"PostmasterMain"
#8 0x000000000056cdc8 in main (argc=3, argv=0x144fd20) at main.c:188
----------
The message occurs with PG 8.4.2 on a powerful 16-CPU, 12Gb RAM machine
with RAID50 during heavy COPY/INSERTs/UPDATEs from one Java client (some
of the INSERTS actually take up to 100 sec). The OS is Linux Debian 5.0 with
2.6.30 kernel.
I see these WARNINGs appearing for the second time and as before I
saw them staring after I did a manual vacuum of one of the tables in which
the data is inserted. But I have no idea whether that's really a cause or
just a coincidence.
Regards, S.
*******************************************************************
Sergey E. Koposov, PhD
Institute for Astronomy, Cambridge, UK
Tel: +44-1223-337-551
Web: http://lnfm1.sai.msu.ru/~math
E-mail: math@sai.msu.ru