Обсуждение: Win32 max connections bug (causing crashes)
Hello, I had a customer call in today they are running Win2003 with 22 gig of ram (that may be a mistype on their end, it may be 32gigs of ram). They cranked up their postgresql max_connections to 500. When PostgreSQL hits above 400, it dies and I don't mean a slow crawl type death. A death where all connections close and the database does a rollback and restart. I was able to reproduce with a simple pgbench on my own win32 environment. I wasn't able to go above 300 with mine. Any thoughts? Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutionssince 1997 http://www.commandprompt.com/
Joshua D. Drake wrote: > Hello, > > I had a customer call in today they are running Win2003 with 22 gig of > ram (that may be a mistype on their end, it may be 32gigs of ram). > > They cranked up their postgresql max_connections to 500. > > When PostgreSQL hits above 400, it dies and I don't mean a slow crawl > type death. A death where all connections close and the database does a > rollback and restart. > > I was able to reproduce with a simple pgbench on my own win32 environment. > > I wasn't able to go above 300 with mine. Further on this with Debug 5: Client: DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 19299/1/0, nestlvl: 1, children: <> DEBUG: CommitTransaction DEBUG: name: unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 19299/1/0, nestlvl: 1, children: <> DEBUG: StartTransactionCommand DEBUG: StartTransaction DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 19300/1/0, nestlvl: 1, children: <> DEBUG: ProcessUtility DEBUG: CommitTransactionCommand DEBUG: CommitTransaction DEBUG: name: unnamed; blockState: STARTED; state: INPROGR, xid/subid/cid: 19300/1/0, nestlvl: 1, children: <> Connection to database 'bench' failed. server closed the connection unexpectedly This probably means the server terminated abnormally before or whileprocessing the request. jd@scratch:/usr/local/pgsql/bin$ Server to follow in next message. Joshua D. Drake > > Any thoughts? > > Joshua D. Drake > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutionssince 1997 http://www.commandprompt.com/
Server: > Event Type: Information > Event Source: PostgreSQL > Event Category: None > Event ID: 0 > Date: 8/9/2006 > Time: 8:36:52 PM > User: N/A > Computer: DAD > Description: > 2006-08-09 20:36:52 DEBUG: InitPostgres > > Event Type: Information > Event Source: PostgreSQL > Event Category: None > Event ID: 0 > Date: 8/9/2006 > Time: 8:36:52 PM > User: N/A > Computer: DAD > Description: > 2006-08-09 20:36:52 DEBUG: StartTransaction > > Event Type: Information > Event Source: PostgreSQL > Event Category: None > Event ID: 0 > Date: 8/9/2006 > Time: 8:36:52 PM > User: N/A > Computer: DAD > Description: > 2006-08-09 20:36:52 DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, xid/subid/cid: 19215/1/0, nestlvl:1, children: <> > > Client: > > > DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, > xid/subid/cid: 19299/1/0, nestlvl: 1, children: <> > DEBUG: CommitTransaction > DEBUG: name: unnamed; blockState: STARTED; state: INPROGR, > xid/subid/cid: 19299/1/0, nestlvl: 1, children: <> > DEBUG: StartTransactionCommand > DEBUG: StartTransaction > DEBUG: name: unnamed; blockState: DEFAULT; state: INPROGR, > xid/subid/cid: 19300/1/0, nestlvl: 1, children: <> > DEBUG: ProcessUtility > DEBUG: CommitTransactionCommand > DEBUG: CommitTransaction > DEBUG: name: unnamed; blockState: STARTED; state: INPROGR, > xid/subid/cid: 19300/1/0, nestlvl: 1, children: <> > Connection to database 'bench' failed. > server closed the connection unexpectedly > This probably means the server terminated abnormally > before or while processing the request. > jd@scratch:/usr/local/pgsql/bin$ > > > Server to follow in next message. > > Joshua D. Drake > > > > > > > > >> >> Any thoughts? >> >> Joshua D. Drake >> > >
what version postgresql? merlin
I confirmed the problem on a fairly recent 8.2devel merlin On 8/10/06, Merlin Moncure <mmoncure@gmail.com> wrote: > what version postgresql? > > merlin >
Maybe this article can help: Windows and the ClearCase process limit: Understanding the desktop heap http://www-128.ibm.com/developerworks/rational/library/05/1220_marechal/ ""Merlin Moncure"" mmoncure@gmail.com >I confirmed the problem on a fairly recent 8.2devel > > merlin > > On 8/10/06, Merlin Moncure <mmoncure@gmail.com> wrote: >> what version postgresql? >> >> merlin >> > > ---------------------------(end of broadcast)--------------------------- > TIP 5: don't forget to increase your free space map settings >
"William ZHANG" <uniware@zedware.org> writes: > Maybe this article can help: > Windows and the ClearCase process limit: Understanding the desktop heap > http://www-128.ibm.com/developerworks/rational/library/05/1220_marechal/ So the short answer is "get a real operating system"? I'm not sure I believe that article though, since it claims that the default maximum number of noninteractive processes is only 79. I thought from what was said upthread that we could get up to a couple hundred before seeing a problem. regards, tom lane
On 8/10/06, William ZHANG <uniware@zedware.org> wrote: > Maybe this article can help: > > Windows and the ClearCase process limit: Understanding the desktop heap > http://www-128.ibm.com/developerworks/rational/library/05/1220_marechal/ > i doubled all my heap settings and was able to roughly double the -c on pgbench from ~158 (stock) to ~330 (modified). so this is definately the problem. windows. meh :) merlin
On 8/10/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: > "William ZHANG" <uniware@zedware.org> writes: > > Maybe this article can help: > > Windows and the ClearCase process limit: Understanding the desktop heap > > http://www-128.ibm.com/developerworks/rational/library/05/1220_marechal/ > > So the short answer is "get a real operating system"? changing a registry setting is not terrible in and of itself, akin to manually manipluating procfs, but the behavior is in a failure condition is. other than that, no comment. personally all my servers are running mixture of gentoo and centos and i'm moving my desktop to mac os x. > I'm not sure I believe that article though, since it claims that the > default maximum number of noninteractive processes is only 79. > I thought from what was said upthread that we could get up to a couple > hundred before seeing a problem. that would depend on various factors, especially exactly how many resources the ibm server software ate up for each connection. pg seems to be leaner and meaner fwiw. anyways, i confirmed the fix. merlin
"Merlin Moncure" <mmoncure@gmail.com> writes: > On 8/10/06, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> So the short answer is "get a real operating system"? > changing a registry setting is not terrible in and of itself, akin to > manually manipluating procfs, but the behavior is in a failure > condition is. other than that, no comment. Right. Nothing wrong with having an upper limit on how many processes you can run, but reaching the limit should result in "fork failed" (or local equivalent), not crashes. Actually ... have any of the win32 hackers tested our win32 code path that's equivalent to Unix fork failure? Maybe this is just a garden-variety bug in our own code. regards, tom lane
Merlin Moncure wrote: > what version postgresql? 8.1.4 > > merlin > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutionssince 1997 http://www.commandprompt.com/
> Hello, > > I had a customer call in today they are running Win2003 with 22 gig > of ram (that may be a mistype on their end, it may be 32gigs of > ram). > > They cranked up their postgresql max_connections to 500. > > When PostgreSQL hits above 400, it dies and I don't mean a slow > crawl type death. A death where all connections close and the > database does a rollback and restart. > > I was able to reproduce with a simple pgbench on my own win32 > environment. > > I wasn't able to go above 300 with mine. > > Any thoughts? A followup question - does this happen both when the server is started as a service and when it's started manually? Any difference in when it dies? //Magnus
> > Maybe this article can help: > > > > Windows and the ClearCase process limit: Understanding the > desktop > > heap > > http://www- > 128.ibm.com/developerworks/rational/library/05/1220_marecha > > l/ > > > > i doubled all my heap settings and was able to roughly double the - > c > on pgbench from ~158 (stock) to ~330 (modified). so this is > definately the problem. If you try decreasing max_files_per_process to a significantly lower value (say, try 100 instead of 1000), does the number of processes you can run change noticeably? (I don't have a box around ATM that I can try to reproduce on. Will try to set up a VM for it soon.) //Magnus
On 8/18/06, Magnus Hagander <mha@sollentuna.net> wrote: > > i doubled all my heap settings and was able to roughly double the - > > c > > on pgbench from ~158 (stock) to ~330 (modified). so this is > > definately the problem. > > If you try decreasing max_files_per_process to a significantly lower > value (say, try 100 instead of 1000), does the number of processes you > can run change noticeably? > > (I don't have a box around ATM that I can try to reproduce on. Will try > to set up a VM for it soon.) per Magnus's request, I set my machine to 25 max_files (the minimum) and saw no appreciable gain in the number of connections requred to make it crash (I tested at 400). The first time I ran it I almost hosed my machine...it was doing all kinds of irrational beeping and all the windows were flickering and blinking. It did not do this following the max_file reduction, although I have no desire to run this test again on my development box ;) merlin