Re: sblock state on FreeBSD 6.1

Поиск
Список
Период
Сортировка
От Jim C. Nasby
Тема Re: sblock state on FreeBSD 6.1
Дата
Msg-id 20060510201423.GR99570@pervasive.com
обсуждение исходный текст
Ответ на sblock state on FreeBSD 6.1  ("Jim C. Nasby" <jnasby@pervasive.com>)
Ответы Re: sblock state on FreeBSD 6.1  (Alvaro Herrera <alvherre@commandprompt.com>)
Список pgsql-hackers
We tried reproducing this on a backup server. We haven't been able to
wedge the system into a state where there's tons of sblock processes
and nothing's getting done, but we were able to get some processes into
sblock and get stack traces:

#0  0x000000080135bd2c in recvfrom () from /lib/libc.so.6
#1  0x00000000004f9898 in secure_read ()
#2  0x00000000004fed7b in TouchSocketFile ()
#3  0x00000000004fee27 in pq_getbyte ()
#4  0x000000000055febf in PostgresMain ()
#5  0x000000000053a487 in ClosePostmasterPorts ()
#6  0x000000000053bab7 in PostmasterMain ()
#7  0x0000000000500436 in main ()

#0  0x000000080137638c in sendto () from /lib/libc.so.6
#1  0x0000000000535fb5 in pgstat_report_activity ()
#2  0x000000000055fe81 in PostgresMain ()
#3  0x000000000053a487 in ClosePostmasterPorts ()
#4  0x000000000053bab7 in PostmasterMain ()
#5  0x0000000000500436 in main ()

#0  0x000000080137638c in sendto () from /lib/libc.so.6
#1  0x00000000004f954c in secure_write ()
#2  0x00000000004ff295 in pq_getmessage ()
#3  0x00000000004ff480 in pq_flush ()
#4  0x000000000055c59a in ReadyForQuery ()
#5  0x000000000055fe8c in PostgresMain ()
#6  0x000000000053a487 in ClosePostmasterPorts ()
#7  0x000000000053bab7 in PostmasterMain ()
#8  0x0000000000500436 in main ()

It may or may not be important that in the test environment we're not
seeing any 'statistics buffer is full' errors.

One thing that is interesting is that Tom thought that sblock probably
couldn't be happening on the client socket, since once that's
established there won't be any processes vieing for time on it, but I'm
wondering if TouchSocketFile() could be throwing a wrench into the
works? The 1st trace shows that it can put the process into sblock, so
I'm wondering if under certain circumstances that could end up running
away.

BTW, one interesting tidbit out of this is that this dual opteron
machine is handling 2000 transactions per second when we're trying to
reproduce the problem. Granted, these are almost entirely read-only
transactions, but still...

On Tue, May 02, 2006 at 07:38:56PM -0500, Jim C. Nasby wrote:
> Just experienced a server that was spending over 50% of CPU time in the
> system, apparently dealing with postmasters that were in the sblock
> state. Looking at the FreeBSD source, this indicates that the process is
> waiting for a lock on a socket. During this time the machine was doing
> nearly 200k context switches a second.
> 
> At the same time, the server was also producing 'statistics buffer is
> full' errors.
> 
> Has anyone seen this before? I suspect that the stats buffer errors are
> a symptom and not the cause of the problem, but unfortunately I wasn't
> able to get a stack trace to verify that theory.
> 
> The machine is a dual Opteron 250 with 8G of memory, running 8.1.3.
> While this was going on there were between 10 and 250 backends running
> at once, based on vmstat.
> 
> Any ideas what areas of the code could be locking a socket?
> Theoretically it shouldn't be the stats collector, and the site is using
> pgpool as a connection pool, so this shouldn't be due to trying to
> connect to backends at a furious rate.
> -- 
> Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
> Pervasive Software      http://pervasive.com    work: 512-231-6117
> vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 1: if posting/reading through Usenet, please send an appropriate
>        subscribe-nomail command to majordomo@postgresql.org so that your
>        message can get through to the mailing list cleanly
> 

-- 
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461


В списке pgsql-hackers по дате отправления:

Предыдущее
От: PFC
Дата:
Сообщение: Re: [PERFORM] Big IN() clauses etc : feature proposal
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: [TODO] Allow commenting of variables ...