WARNING: pgstat wait timeout

Поиск
Список
Период
Сортировка
От Sergey E. Koposov
Тема WARNING: pgstat wait timeout
Дата
Msg-id alpine.LRH.2.00.1001210139500.15783@lnfm1.sai.msu.ru
обсуждение исходный текст
Ответы Re: WARNING: pgstat wait timeout  (Jaime Casanova <jcasanov@systemguards.com.ec>)
Список pgsql-hackers
Hello hackers,

I've recently hit the message "WARNING:  pgstat wait timeout" with PG 
8.4.2.
I saw some reports about that message in the -bugs mailing list
http://archives.postgresql.org/pgsql-bugs/2009-12/msg00175.php
http://archives.postgresql.org/pgsql-bugs/2009-07/msg00081.php
where the backtrace from the statisctic collector was requested.
Although I don't have any other bad sympthoms in the system, I still 
obtained a backtrace from the statistics collector process. Since I'm not 
100% sure that the message is really a bug, feel free to ignore. But 
if needed I have PG still running, so I can check something else if 
needed.

Here is the (rather innocent IMHO) backtrace of the statistic collector 
process:
(gdb) bt
#0  0x00007f31ddfc4b1f in poll () from /lib/libc.so.6
#1  0x00000000005bf7da in PgstatCollectorMain (argc=<value optimized out>,
# argv=<value optimized out>)    at pgstat.c:2718
#2  0x00000000005c0131 in pgstat_start () at pgstat.c:631
#3  0x00000000005c474d in reaper (postgres_signal_arg=<value optimized 
out>)
#at postmaster.c:2322
#4  <signal handler called>
#5  0x00007f31ddfc6c83 in select () from /lib/libc.so.6
#6  0x00000000005c20fc in ServerLoop () at postmaster.c:1347
#7  0x00000000005c34a7 in PostmasterMain (argc=3, argv=0x144fd20) at
#postmaster.c:1040
#8  0x000000000056cdc8 in main (argc=3, argv=0x144fd20) at main.c:188
(gdb) quit
The program is running.  Quit anyway (and detach it)? (y or n) y
Detaching from program: /opt/pgsql/bin/postgres, process 24677
-------------

Bt full:

(gdb) bt full 
#0  0x00007f31ddfc4b1f in poll () from /lib/libc.so.6
No symbol table info available. 
#1  0x00000000005bf7da in PgstatCollectorMain (argc=<value optimized out>, argv=<value optimized out>)    at
pgstat.c:2718       len = 64        msg = {msg_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, msg_dummy = {m_hdr
={      m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}}, msg_inquiry = {m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER,
m_size= 64}, inquiry_time = 0}, msg_tabstat = {m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64},    m_databaseid =
0,m_nentries = 0, m_xact_commit = 0, m_xact_rollback = 0, m_entry = {{t_id = 0, t_counts = {          t_numscans = 0,
t_tuples_returned= 0, t_tuples_fetched = 138, t_tuples_inserted = 138,          t_tuples_updated = 0, t_tuples_deleted
=0, t_tuples_hot_updated = 0, t_new_live_tuples = 0,          t_new_dead_tuples = 0, t_blocks_fetched = 2, t_blocks_hit
=2}}, {t_id = 2672, t_counts = {t_numscans = 1,          t_tuples_returned = 1, t_tuples_fetched = 1, t_tuples_inserted
=0, t_tuples_updated = 0,          t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0,
t_new_dead_tuples= 0,          t_blocks_fetched = 2, t_blocks_hit = 2}}, {t_id = 1259, t_counts = {t_numscans = 4,
   t_tuples_returned = 553, t_tuples_fetched = 0, t_tuples_inserted = 0, t_tuples_updated = 0,
t_tuples_deleted= 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0,          t_blocks_fetched
=24, t_blocks_hit = 24}}, {t_id = 2615, t_counts = {t_numscans = 0,          t_tuples_returned = 0, t_tuples_fetched =
0,t_tuples_inserted = 0, t_tuples_updated = 0,          t_tuples_deleted = 0, t_tuples_hot_updated = 0,
t_new_live_tuples= 0, t_new_dead_tuples = 0,          t_blocks_fetched = 1, t_blocks_hit = 1}}, {t_id = 2685, t_counts
={t_numscans = 1, t_tuples_returned = 1,          t_tuples_fetched = 1, t_tuples_inserted = 0, t_tuples_updated = 0,
t_tuples_deleted= 0,          t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0, t_blocks_fetched
=2,          t_blocks_hit = 2}}, {t_id = 1172815, t_counts = {t_numscans = 0, t_tuples_returned = 0,
t_tuples_fetched= 0, t_tuples_inserted = 500000, t_tuples_updated = 0, t_tuples_deleted = 0,
t_tuples_hot_updated= 0, t_new_live_tuples = 500000, t_new_dead_tuples = 0, t_blocks_fetched = 23077,
t_blocks_hit= 15381}}, {t_id = 1172684, t_counts = {t_numscans = 0, t_tuples_returned = 0,        msg = {msg_hdr =
{m_type= PGSTAT_MTYPE_BGWRITER, m_size = 64}, msg_dummy = {m_hdr = {      m_type = PGSTAT_MTYPE_BGWRITER, m_size =
64}},msg_inquiry = {m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER,      m_size = 64}, inquiry_time = 0}, msg_tabstat = {m_hdr
={m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64},    m_databaseid = 0, m_nentries = 0, m_xact_commit = 0, m_xact_rollback
=0, m_entry = {{t_id = 0, t_counts = {          t_numscans = 0, t_tuples_returned = 0, t_tuples_fetched = 138,
t_tuples_inserted= 138,          t_tuples_updated = 0, t_tuples_deleted = 0, t_tuples_hot_updated = 0,
t_new_live_tuples= 0,          t_new_dead_tuples = 0, t_blocks_fetched = 2, t_blocks_hit = 2}}, {t_id = 2672, t_counts
={t_numscans = 1,          t_tuples_returned = 1, t_tuples_fetched = 1, t_tuples_inserted = 0, t_tuples_updated = 0,
     t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0,
t_blocks_fetched= 2, t_blocks_hit = 2}}, {t_id = 1259, t_counts = {t_numscans = 4,          t_tuples_returned = 553,
t_tuples_fetched= 0, t_tuples_inserted = 0, t_tuples_updated = 0,          t_tuples_deleted = 0, t_tuples_hot_updated =
0,t_new_live_tuples = 0, t_new_dead_tuples = 0,          t_blocks_fetched = 24, t_blocks_hit = 24}}, {t_id = 2615,
t_counts= {t_numscans = 0,          t_tuples_returned = 0, t_tuples_fetched = 0, t_tuples_inserted = 0,
t_tuples_updated= 0,          t_tuples_deleted = 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples
=0,          t_blocks_fetched = 1, t_blocks_hit = 1}}, {t_id = 2685, t_counts = {t_numscans = 1, t_tuples_returned = 1,
        t_tuples_fetched = 1, t_tuples_inserted = 0, t_tuples_updated = 0, t_tuples_deleted = 0,
t_tuples_hot_updated= 0, t_new_live_tuples = 0, t_new_dead_tuples = 0, t_blocks_fetched = 2,          t_blocks_hit =
2}},{t_id = 1172815, t_counts = {t_numscans = 0, t_tuples_returned = 0,          t_tuples_fetched = 0,
t_tuples_inserted= 500000, t_tuples_updated = 0, t_tuples_deleted = 0,          t_tuples_hot_updated = 0,
t_new_live_tuples= 500000, t_new_dead_tuples = 0, t_blocks_fetched = 23077,          t_blocks_hit = 15381}}, {t_id =
1172684,t_counts = {t_numscans = 0, t_tuples_returned = 0,          t_tuples_fetched = 0, t_tuples_inserted = 0,
t_tuples_updated= 0, t_tuples_deleted = 0,          t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples
=0, t_blocks_fetched = 1018,          t_blocks_hit = 1017}}, {t_id = 2678, t_counts = {t_numscans = 5,
t_tuples_returned= 1,          t_tuples_fetched = 1, t_tuples_inserted = 0, t_tuples_updated = 0, t_tuples_deleted = 0,
        t_tuples_hot_updated = 0, t_new_live_tuples = 0, t_new_dead_tuples = 0, t_blocks_fetched = 5,
t_blocks_hit= 5}}, {t_id = 2619, t_counts = {t_numscans = 0, t_tuples_returned = 0, t_tuples_fetched = 0,
t_tuples_inserted= 0, t_tuples_updated = 24, t_tuples_deleted = 0, t_tuples_hot_updated = 24,
t_new_live_tuples= 0, t_new_dead_tuples = 9, t_blocks_fetched = 72, t_blocks_hit = 72}}, {t_id = 2696,        t_counts
={t_numscans = 48, t_tuples_returned = 48, t_tuples_fetched = 48, t_tuples_inserted = 0,          t_tuples_updated = 0,
t_tuples_deleted= 0, t_tuples_hot_updated = 0, t_new_live_tuples = 0,          t_new_dead_tuples = 0, t_blocks_fetched
=96, t_blocks_hit = 96}}}}, msg_tabpurge = {m_hdr = {      m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid =
0,m_nentries = 0, m_tableid = {0, 0, 0, 0, 0, 0,      0, 0, 138, 0, 138, 0 <repeats 11 times>, 2, 0, 2, 0, 2672, 32767,
1,0, 1, 0, 1, 0 <repeats 13 times>, 2, 0,      2, 0, 1259, 0, 4, 0, 553, 0 <repeats 15 times>, 24, 0, 24, 0, 2615, 0
<repeats19 times>, 1, 0, 1, 0, 2685, 0,      1, 0, 1, 0, 1, 0 <repeats 13 times>, 2, 0, 2, 0, 1172815, 32561, 0, 0, 0,
0,0, 0, 500000, 0, 0, 0, 0, 0, 0,      0, 500000, 0, 0, 0, 23077, 0, 15381, 0, 1172684, 0 <repeats 19 times>, 1018, 0,
1017,0, 2678, 0, 5, 0, 1, 0,      1, 0 <repeats 13 times>, 5, 0, 5, 0, 2619, 3, 0, 0, 0, 0, 0, 0, 0, 0, 24, 0, 0, 0,
24,0, 0, 0, 9, 0, 72, 0, 
 
---Type <return> to continue, or q <return> to quit---      72, 0, 2696, 32767, 48, 0, 48, 0, 48, 0 <repeats 13
times>...}},msg_dropdb = {m_hdr = {      m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0},
msg_resetcounter= {m_hdr = {      m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0}, msg_autovacuum =
{m_hdr= {      m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0, m_start_time = 0}, msg_vacuum = {m_hdr =
{     m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0, m_tableoid = 0, m_analyze = 0 '\0',
m_autovacuum= 0 '\0', m_scanned_all = 0 '\0', m_vacuumtime = 0, m_tuples = 0}, msg_analyze = {m_hdr = {      m_type =
PGSTAT_MTYPE_BGWRITER,m_size = 64}, m_databaseid = 0, m_tableoid = 0, m_autovacuum = 0 '\0',    m_analyzetime = 0,
m_live_tuples= 0, m_dead_tuples = 0}, msg_bgwriter = {m_hdr = {      m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64},
m_timed_checkpoints= 0, m_requested_checkpoints = 0,    m_buf_written_checkpoints = 0, m_buf_written_clean = 0,
m_maxwritten_clean= 0, m_buf_written_backend = 138,    m_buf_alloc = 138}, msg_funcstat = {m_hdr = {m_type =
PGSTAT_MTYPE_BGWRITER,m_size = 64}, m_databaseid = 0,    m_nentries = 0, m_entry = {{f_id = 0, f_numcalls = 0, f_time =
0,f_time_self = 0}, {f_id = 138,        f_numcalls = 138, f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0,
f_time= 0, f_time_self = 2}, {        f_id = 2, f_numcalls = 140733193390704, f_time = 1, f_time_self = 1}, {f_id = 1,
f_numcalls= 0, f_time = 0,        f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 2}, {f_id = 2,
f_numcalls= 1259,        f_time = 4, f_time_self = 553}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 0}, {f_id
=0,        f_numcalls = 0, f_time = 0, f_time_self = 24}, {f_id = 24, f_numcalls = 2615, f_time = 0, f_time_self = 0},
{       f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0,
f_time_self= 1}, {f_id = 1, f_numcalls = 2685, f_time = 1, f_time_self = 1}, {f_id = 1, f_numcalls = 0,        f_time =
0,f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 2}, {f_id = 2,        f_numcalls =
139848431297871,f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 500000, f_time = 0,        f_time_self = 0},
{f_id= 0, f_numcalls = 500000, f_time = 0, f_time_self = 23077}, {f_id = 15381,        f_numcalls = 1172684, f_time =
0,f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time = 0, f_time_self = 0},      {f_id = 0, f_numcalls = 0, f_time =
0,f_time_self = 1018}, {f_id = 1017, f_numcalls = 2678, f_time = 5,        f_time_self = 1}, {f_id = 1, f_numcalls = 0,
f_time= 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0,        f_time = 0, f_time_self = 5}, {f_id = 5, f_numcalls =
12884904507,f_time = 0, f_time_self = 0}, {f_id = 0,        f_numcalls = 0, f_time = 24, f_time_self = 0}, {f_id = 24,
f_numcalls= 0, f_time = 9, f_time_self = 72}, {        f_id = 72, f_numcalls = 140733193390728, f_time = 48,
f_time_self= 48}, {f_id = 48, f_numcalls = 0,       f_time = 0, f_time_self = 0}, {f_id = 0, f_numcalls = 0, f_time =
0,f_time_self = 96}}}, msg_funcpurge = {    m_hdr = {m_type = PGSTAT_MTYPE_BGWRITER, m_size = 64}, m_databaseid = 0,
m_nentries= 0, m_functionid = {0, 0,      0, 0, 0, 0, 0, 0, 138, 0, 138, 0 <repeats 11 times>, 2, 0, 2, 0, 2672, 32767,
1,0, 1, 0, 1,      0 <repeats 13 times>, 2, 0, 2, 0, 1259, 0, 4, 0, 553, 0 <repeats 15 times>, 24, 0, 24, 0, 2615,
0<repeats 19 times>, 1, 0, 1, 0, 2685, 0, 1, 0, 1, 0, 1, 0 <repeats 13 times>, 2, 0, 2, 0, 1172815, 32561, 0,      0,
0,0, 0, 0, 500000, 0, 0, 0, 0, 0, 0, 0, 500000, 0, 0, 0, 23077, 0, 15381, 0, 1172684,      0 <repeats 19 times>, 1018,
0,1017, 0, 2678, 0, 5, 0, 1, 0, 1, 0 <repeats 13 times>, 5, 0, 5, 0, 2619, 3, 0,      0, 0, 0, 0, 0, 0, 0, 24, 0, 0, 0,
24,0, 0, 0, 9, 0, 72, 0, 72, 0, 2696, 32767, 48, 0, 48, 0, 48,      0 <repeats 13 times>...}}}        input_fd = {fd =
8,events = 9, revents = 0}        __func__ = "PgstatCollectorMain" 
 
#2  0x00000000005c0131 in pgstat_start () at pgstat.c:631        curtime = <value optimized out>        pgStatPid = 0
    __func__ = "pgstat_start" 
 
#3  0x00000000005c474d in reaper (postgres_signal_arg=<value optimized out>) at postmaster.c:2322 
---Type <return> to continue, or q <return> to quit---        save_errno = 17        pid = 24673        exitstatus = 0
     status = 0        __func__ = "reaper"
 
#4  <signal handler called>
#4  <signal handler called>
No symbol table info available.
#5  0x00007f31ddfc6c83 in select () from /lib/libc.so.6
No symbol table info available.
#6  0x00000000005c20fc in ServerLoop () at postmaster.c:1347        timeout = {tv_sec = 59, tv_usec = 916269}
rmask= {fds_bits = {120, 0 <repeats 15 times>}}        selres = <value optimized out>        readmask = {fds_bits =
{120,0 <repeats 15 times>}}        nSockets = 7        now = 64        last_touch_time = 1264024914        __func__ =
"ServerLoop"
#7  0x00000000005c34a7 in PostmasterMain (argc=3, argv=0x144fd20) at postmaster.c:1040        opt = <value optimized
out>       status = <value optimized out>        userDoption = <value optimized out>        __func__ =
"PostmasterMain"
#8  0x000000000056cdc8 in main (argc=3, argv=0x144fd20) at main.c:188

----------

The message occurs with PG 8.4.2 on a powerful 16-CPU, 12Gb RAM machine 
with RAID50 during heavy COPY/INSERTs/UPDATEs from one Java client (some 
of the INSERTS actually take up to 100 sec). The OS is Linux Debian 5.0 with 
2.6.30 kernel.

I see these WARNINGs appearing for the second time and as before I 
saw them staring after I did a manual vacuum of one of the tables in which 
the data is inserted. But I have no idea whether that's really a cause or 
just a coincidence.

Regards,    S.

*******************************************************************
Sergey E. Koposov, PhD
Institute for Astronomy, Cambridge, UK 
Tel: +44-1223-337-551
Web: http://lnfm1.sai.msu.ru/~math
E-mail: math@sai.msu.ru


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Streaming Replication and archiving
Следующее
От: Simon Riggs
Дата:
Сообщение: Re: Streaming replication, retrying from archive