Re: sblock state on FreeBSD 6.1
От | Jim C. Nasby |
---|---|
Тема | Re: sblock state on FreeBSD 6.1 |
Дата | |
Msg-id | 20060510201423.GR99570@pervasive.com обсуждение исходный текст |
Ответ на | sblock state on FreeBSD 6.1 ("Jim C. Nasby" <jnasby@pervasive.com>) |
Ответы |
Re: sblock state on FreeBSD 6.1
(Alvaro Herrera <alvherre@commandprompt.com>)
|
Список | pgsql-hackers |
We tried reproducing this on a backup server. We haven't been able to wedge the system into a state where there's tons of sblock processes and nothing's getting done, but we were able to get some processes into sblock and get stack traces: #0 0x000000080135bd2c in recvfrom () from /lib/libc.so.6 #1 0x00000000004f9898 in secure_read () #2 0x00000000004fed7b in TouchSocketFile () #3 0x00000000004fee27 in pq_getbyte () #4 0x000000000055febf in PostgresMain () #5 0x000000000053a487 in ClosePostmasterPorts () #6 0x000000000053bab7 in PostmasterMain () #7 0x0000000000500436 in main () #0 0x000000080137638c in sendto () from /lib/libc.so.6 #1 0x0000000000535fb5 in pgstat_report_activity () #2 0x000000000055fe81 in PostgresMain () #3 0x000000000053a487 in ClosePostmasterPorts () #4 0x000000000053bab7 in PostmasterMain () #5 0x0000000000500436 in main () #0 0x000000080137638c in sendto () from /lib/libc.so.6 #1 0x00000000004f954c in secure_write () #2 0x00000000004ff295 in pq_getmessage () #3 0x00000000004ff480 in pq_flush () #4 0x000000000055c59a in ReadyForQuery () #5 0x000000000055fe8c in PostgresMain () #6 0x000000000053a487 in ClosePostmasterPorts () #7 0x000000000053bab7 in PostmasterMain () #8 0x0000000000500436 in main () It may or may not be important that in the test environment we're not seeing any 'statistics buffer is full' errors. One thing that is interesting is that Tom thought that sblock probably couldn't be happening on the client socket, since once that's established there won't be any processes vieing for time on it, but I'm wondering if TouchSocketFile() could be throwing a wrench into the works? The 1st trace shows that it can put the process into sblock, so I'm wondering if under certain circumstances that could end up running away. BTW, one interesting tidbit out of this is that this dual opteron machine is handling 2000 transactions per second when we're trying to reproduce the problem. Granted, these are almost entirely read-only transactions, but still... On Tue, May 02, 2006 at 07:38:56PM -0500, Jim C. Nasby wrote: > Just experienced a server that was spending over 50% of CPU time in the > system, apparently dealing with postmasters that were in the sblock > state. Looking at the FreeBSD source, this indicates that the process is > waiting for a lock on a socket. During this time the machine was doing > nearly 200k context switches a second. > > At the same time, the server was also producing 'statistics buffer is > full' errors. > > Has anyone seen this before? I suspect that the stats buffer errors are > a symptom and not the cause of the problem, but unfortunately I wasn't > able to get a stack trace to verify that theory. > > The machine is a dual Opteron 250 with 8G of memory, running 8.1.3. > While this was going on there were between 10 and 250 backends running > at once, based on vmstat. > > Any ideas what areas of the code could be locking a socket? > Theoretically it shouldn't be the stats collector, and the site is using > pgpool as a connection pool, so this shouldn't be due to trying to > connect to backends at a furious rate. > -- > Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com > Pervasive Software http://pervasive.com work: 512-231-6117 > vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461 > > ---------------------------(end of broadcast)--------------------------- > TIP 1: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
В списке pgsql-hackers по дате отправления: