Re: rebellious pg stats collector (reopened case)

Поиск
Список
Период
Сортировка
От Laszlo Nagy
Тема Re: rebellious pg stats collector (reopened case)
Дата
Msg-id 494F67D2.3090406@shopzeus.com
обсуждение исходный текст
Ответ на Re: rebellious pg stats collector (reopened case)  (Alvaro Herrera <alvherre@commandprompt.com>)
Ответы Re: rebellious pg stats collector (reopened case)  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-admin
OK, here is the new test.

shopzeus# ps l -p 39766 -p 39767 -p 39769
  UID   PID  PPID CPU PRI NI   VSZ   RSS MWCHAN STAT  TT       TIME COMMAND
   70 39766 78806   0  96  0 451960 423896 select Ss    ??    8:41.85
postgres: writer process    (postgres)
   70 39767 78806   0  96  0 451960  7184 select Ss    ??    0:58.75
postgres: wal writer process    (postgres)
   70 39769 78806   0  96  0 22400  7008 select Ss    ??   14:42.39
postgres: stats collector process    (postgres)


top -otime 10


last pid:  7984;  load averages:  1.38,  1.51,
1.49
up 16+18:18:22  05:02:20
170 processes: 3 running, 161 sleeping, 1 stopped, 5 zombie
CPU states:     % user,     % nice,     % system,     % interrupt,     %
idle
Mem: 959M Active, 5947M Inact, 562M Wired, 275M Cache, 214M Buf, 164M Free
Swap: 16G Total, 1088K Used, 16G Free

  PID USERNAME       THR PRI NICE   SIZE    RES STATE  C   TIME   WCPU
COMMAND
78806 pgsql            1  98    0   441M  6960K select 4  67:14  0.39%
postgres
39769 pgsql            1  96    0 22400K  7008K select 7  14:43  0.00%
postgres
39766 pgsql            1  96    0   441M   414M select 3   8:42  0.00%
postgres
10602 root             1  96    0   149M 18916K select 4   5:30  0.00% httpd
67535 root             1   4    0  3720K  1336K kqread 0   3:54  0.00%
dovecot
  795 root             1   4    0  8188K  1452K kqread 4   3:21  0.00%
master
95791 shopzeus         3 116   20 59892K 17440K select 3   2:22  0.00%
python
32397 postfix          1   4    0  8280K  1980K kqread 6   1:23  0.00% qmgr
67536 root             1   4    0  5996K  1448K kqread 4   1:07  0.00%
dovecot-auth
  525 root             1  96    0  4684K  1156K select 3   1:03  0.00%
syslogd

In other words, the stats collector is running since 14 hours.

shopzeus# date
Mon Dec 22 05:03:24 EST 2008
shopzeus# ls -l ~pgsql/data/global/pgstat.stat
-rw-------  1 pgsql  pgsql  232084 Dec 22 05:03
/usr/local/pgsql/data/global/pgstat.stat

Looks like it is updating the stats file continuously. Now I try to trace:


shopzeus# truss -p 39769
poll({7/POLLIN|POLLERR},1,2000)                  = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\^C\0\0\M^T\f\r\0\n"...,1000,0x0,NULL,0x0) =
984 (0x3d8)
poll({7/POLLIN|POLLERR},1,2000)                  = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\^C\0\0\M^T\f\r\0\n"...,1000,0x0,NULL,0x0) =
984 (0x3d8)
poll({7/POLLIN|POLLERR},1,2000)                  = 1 (0x1)
recvfrom(7,"\^A\0\0\0\M-X\^C\0\0\M^T\f\r\0\n"...,1000,0x0,NULL,0x0) =
984 (0x3d8)
poll({7/POLLIN|POLLERR},1,2000)                  = 1 (0x1)
recvfrom(7,"\^A\0\0\0X\^B\0\0\M^T\f\r\0\^F\0"...,1000,0x0,NULL,0x0) =
600 (0x258)
poll({7/POLLIN|POLLERR},1,2000)                  ERR#4 'Interrupted
system call'
SIGNAL 14 (SIGALRM)
sigreturn(0x7fffffffca70,0x0,0x7fffffffca70,0xffffffff809fb5c0,0x7fffffffffc0,0x7fffffffca60)
ERR#4 'Interrupted system call'
getppid(0x1,0x1,0x801901120,0x4,0x7fffffffd39f,0x6) = 11204 (0x2bc4)
open("global/pgstat.tmp",O_WRONLY|O_CREAT|O_TRUNC,0666) = 4 (0x4)
fstat(4,{mode=-rw------- ,inode=73223197,size=0,blksize=4096}) = 0 (0x0)
write(4,"\M^W\M-<\M-%\^A`\^B\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"l\0\0\0\0\0\0\0\M-T\M-s.\M-`\M-r"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\t\0\0\0\0\0\0\0\^C"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\^C\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0T\^T\^N\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\M-aKo\M-s\M-a\M-0A\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\^C\0\0\0\0\0\0\0\M-&"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0T\M-A\n\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\M-_o\M-T\M-u\M-p"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\^E\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\^A\0\0\0\0\0\0\^?\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0X\M-<"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\b\^D@f\0\0\0\0\0\0\0F \0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0T\M-2\f\b\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0;\0\0\0\0\0\0\0\^R\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\n\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\r\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0T_\n\0\0\0\0\0\0\M-*R\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"r\M^W\M-`\M-p\M-a\M-0A\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\M-a\M-p\M-a\M-0A\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0c\^Q\^A\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\^F\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0Z\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\^D\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,4096) = 4096 (0x1000)
write(4,"\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"...,2708) = 2708 (0xa94)
close(4)                     = 0 (0x0)
rename("global/pgstat.tmp","global/pgstat.stat") = 0 (0x0)
process exit, rval = 0

And the stats collector exited! It cannot be by accident. Last time I
checked, it had the same problem: tried to poll something. It was in an
inifinte loop, polling something but always got POLERR. Then because I
started to trace it, it exited with rval=0 or rval=1.

I do not understand why it exited after I started to trace it? Is this
some kind of bug?

Thanks,

   Laszlo


В списке pgsql-admin по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: reindexdb dying with SIGPIPE on 8.2.5
Следующее
От: Tom Lane
Дата:
Сообщение: Re: rebellious pg stats collector (reopened case)