Обсуждение: brk() function and performance
Hi, We're running PostgreSQL 7.1.3 (I know, I know) on Solaris 7 on two Sun E4500s with 8 CPUs and 16 Gig of RAM. We have noticed that one of the machines is considerably slower than the other. We have traced the problem to the brk() funciton call. We were having some trouble with certain queries, because we were spending a lot of time moving blocks between the OS filesystem buffers and the Postgres shared buffer. That was on server A. So, we played with some settings, and settled on shared_buffers = 262144. On server B, we did not see the same problem. There, shared_buffers=8192. That is the only difference between the machines, except that server A also sees a lot more interactive traffic (server B is a replicated copy, and handles a number of read-only queries). We increased the shared_buffers setting on server A a few weeks ago, and saw an immediate (albeit slight, but enough for us) improvement in the system. But in the past few days, we have experienced sluggish behaviour. By selecting a single line from a frequently-accessed, relatively small table (< 300 rows), and selecting on an indexed field, we get the following difference in the truss output: Server A: the query takes 700-800 ms. syscall seconds calls brk .27 62 Server B: the query takes 200-300 ms. syscall seconds calls brk .02 64 Everything else is the same. The backend has been running since the shared memory change. I was wondering if perhaps the problem is what brk() is doing. Maybe it needs a contiguous segment, and when it goes to allocate more of its reserved memory, it has to shift the whole thing around? (If so, this is a clear reason why not to use huge shared buffers.) The Solaris man page doesn't make clear how this works (and in fact seem to suggest that brk() shouldn't be used). Any remarks, pointers, or suggestions would be welcome. I'm stumped. This is very puzzling. A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
On Thu, Jul 11, 2002 at 12:30:12PM -0400, Andrew Sullivan wrote: > Hi, > > We're running PostgreSQL 7.1.3 (I know, I know) on Solaris 7 on two > Sun E4500s with 8 CPUs and 16 Gig of RAM. > > We have noticed that one of the machines is considerably slower than > the other. We have traced the problem to the brk() funciton call. My Sun-loving colleague, Sorin Iszlai, wondered why this problem was cropping up, and remembered the qsort() debacle. So he did some tests. Guess what? Here's what he found: > I ran some tests with the realloc() function from the standard lib; > If the application calls realloc() 4096 times the results are: > - if linked with bsdmalloc, realloc() calls brk() 17 times only: > syscall seconds calls > brk .40 17 > - and without bsdmalloc : > syscall seconds calls > brk 1.36 24527 At this rate, I'm beginning to get the feeling that maybe getting FreeBSD to work well on 64 bit Sun machines is the most important project we could undertake ;-) Anyway, I'm going to do some tests with this, but in the meantime, if anyone has any views on the subject, insights, or experience, it'd be much appreciated. Thanks. A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
Yow. What are those Solaris engineers doing over there? --------------------------------------------------------------------------- Andrew Sullivan wrote: > On Thu, Jul 11, 2002 at 12:30:12PM -0400, Andrew Sullivan wrote: > > Hi, > > > > We're running PostgreSQL 7.1.3 (I know, I know) on Solaris 7 on two > > Sun E4500s with 8 CPUs and 16 Gig of RAM. > > > > We have noticed that one of the machines is considerably slower than > > the other. We have traced the problem to the brk() funciton call. > > My Sun-loving colleague, Sorin Iszlai, wondered why this problem was > cropping up, and remembered the qsort() debacle. So he did some > tests. Guess what? Here's what he found: > > > I ran some tests with the realloc() function from the standard lib; > > If the application calls realloc() 4096 times the results are: > > > - if linked with bsdmalloc, realloc() calls brk() 17 times only: > > syscall seconds calls > > brk .40 17 > > > - and without bsdmalloc : > > syscall seconds calls > > brk 1.36 24527 > > At this rate, I'm beginning to get the feeling that maybe getting > FreeBSD to work well on 64 bit Sun machines is the most important > project we could undertake ;-) > > Anyway, I'm going to do some tests with this, but in the meantime, if > anyone has any views on the subject, insights, or experience, it'd be > much appreciated. > > Thanks. > > A > > -- > ---- > Andrew Sullivan 87 Mowat Avenue > Liberty RMS Toronto, Ontario Canada > <andrew@libertyrms.info> M6K 3E3 > +1 416 646 3304 x110 > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026
Andrew Sullivan sez: } On Thu, Jul 11, 2002 at 12:30:12PM -0400, Andrew Sullivan wrote: [...] } > We have noticed that one of the machines is considerably slower than } > the other. We have traced the problem to the brk() funciton call. } } My Sun-loving colleague, Sorin Iszlai, wondered why this problem was } cropping up, and remembered the qsort() debacle. So he did some } tests. Guess what? Here's what he found: } } > I ran some tests with the realloc() function from the standard lib; } > If the application calls realloc() 4096 times the results are: } } > - if linked with bsdmalloc, realloc() calls brk() 17 times only: } > syscall seconds calls } > brk .40 17 } } > - and without bsdmalloc : } > syscall seconds calls } > brk 1.36 24527 } } At this rate, I'm beginning to get the feeling that maybe getting } FreeBSD to work well on 64 bit Sun machines is the most important } project we could undertake ;-) } } Anyway, I'm going to do some tests with this, but in the meantime, if } anyone has any views on the subject, insights, or experience, it'd be } much appreciated. Way back when I was a college freshman or sophomore, I was talking to a professor who mentioned having had tremendous problems with the brk() system call robbing him of system performance. His solution, since brk() is called when malloc decides it needs another page or so, was to allocate a *tremendous* amount of memory at the very beginning of his run, then free it all. This meant that Solaris mapped a whole bunch of pages to his app with just one brk() call, and once it was released it was in malloc's free list. The pages weren't swapped/paged or anything because until they were written to or read from, they didn't even really exist except in the OS's internal tables. It just took the OS out of the loop in memory allocation. This may ar may not be a good solution. I would expect it to fail or have bad performance characteristics on at least some flavors of Unix, and probably Windows. Still, it might be worth looking into on Solaris. } Thanks. } A --Greg
On Tue, Jul 16, 2002 at 10:28:02AM -0400, Andrew Sullivan wrote: > On Thu, Jul 11, 2002 at 12:30:12PM -0400, Andrew Sullivan wrote: > > > > We have noticed that one of the machines is considerably slower than > > the other. We have traced the problem to the brk() funciton call. More news, in case anyone is interested. It appears, after poking around the Net, that Sun ships their poor-performing malloc as the default on purpose, because it uses less memory. You can set your CFLAGS="-llibbsdmalloc" if you want to use the BSD library (which is on the system by default), or even just set LD_PRELOAD to pick up the BSD malloc instead (the latter seems to work just fine for the postmaster, but it breaks some other things, so I think I'd compile against it instead for any real work). The BSD malloc uses about 4 times the memory of the Solaris version, but it's plenty faster. Memory is cheap. Further tests, however, seem to indicate that brk() is not our main problem. On a test machine today, we found simple selects on a table with only a couple hundred rows are taking > 300 milliseconds when we set the shared buffers to some large number (like enough to allocate a Gig of memory), more than 250 ms when running with about 512 Meg of shared memory, but under 125 ms when running with a small shared buffer setting (say, enough to allocate less than 200 meg -- one test we allocated only 4 meg). The main culprit seems to be a memset() call that happens over and over to the same address. I've no idea why, but there it is. The same results are _not_ found in testing with 7.2.1. In that case, allocating a Gig of shared memory does not seem to affect the result at all. The only question is whether they might be if we ran a lot of updates agains the 7.2.x tree. (We tarred up and copied the data tree from production, since I had it from a recent maintenance period; but we had to use pg_dump to put the data into the 7.2 database, obviously). We'll do a great whack of updates, and see if that makes a difference. A -- ---- Andrew Sullivan 87 Mowat Avenue Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M6K 3E3 +1 416 646 3304 x110
Andrew Sullivan <andrew@libertyrms.info> writes: > On a test machine today, we found simple selects on a table > with only a couple hundred rows are taking > 300 milliseconds when we > set the shared buffers to some large number (like enough to allocate > a Gig of memory), more than 250 ms when running with about 512 Meg of > shared memory, but under 125 ms when running with a small shared > buffer setting (say, enough to allocate less than 200 meg -- one test > we allocated only 4 meg). The main culprit seems to be a memset() > call that happens over and over to the same address. I've no idea > why, but there it is. Hmph. There are some places in the bufmgr that do sequential scans of the whole buffer array, which might account for a slowdown with huge numbers of buffers. I do not think any of them are in hotspot paths however --- at least not in any recent release. This test was on 7.1.something, wasn't it? Could you recompile with profiling enabled and see where the time is really going with the large number of buffers? > The same results are _not_ found in testing with 7.2.1. This might mean we already fixed the bottleneck, in which case the question becomes less interesting (at least to me ;-)). regards, tom lane
Any update on this? --------------------------------------------------------------------------- Andrew Sullivan wrote: > On Tue, Jul 16, 2002 at 10:28:02AM -0400, Andrew Sullivan wrote: > > On Thu, Jul 11, 2002 at 12:30:12PM -0400, Andrew Sullivan wrote: > > > > > > We have noticed that one of the machines is considerably slower than > > > the other. We have traced the problem to the brk() funciton call. > > More news, in case anyone is interested. > > It appears, after poking around the Net, that Sun ships their > poor-performing malloc as the default on purpose, because it uses > less memory. You can set your CFLAGS="-llibbsdmalloc" if you want to > use the BSD library (which is on the system by default), or even just > set LD_PRELOAD to pick up the BSD malloc instead (the latter seems to > work just fine for the postmaster, but it breaks some other things, > so I think I'd compile against it instead for any real work). The > BSD malloc uses about 4 times the memory of the Solaris version, but > it's plenty faster. Memory is cheap. > > Further tests, however, seem to indicate that brk() is not our main > problem. On a test machine today, we found simple selects on a table > with only a couple hundred rows are taking > 300 milliseconds when we > set the shared buffers to some large number (like enough to allocate > a Gig of memory), more than 250 ms when running with about 512 Meg of > shared memory, but under 125 ms when running with a small shared > buffer setting (say, enough to allocate less than 200 meg -- one test > we allocated only 4 meg). The main culprit seems to be a memset() > call that happens over and over to the same address. I've no idea > why, but there it is. > > The same results are _not_ found in testing with 7.2.1. In that > case, allocating a Gig of shared memory does not seem to affect the > result at all. The only question is whether they might be if we ran > a lot of updates agains the 7.2.x tree. (We tarred up and copied the > data tree from production, since I had it from a recent maintenance > period; but we had to use pg_dump to put the data into the 7.2 > database, obviously). We'll do a great whack of updates, and see if > that makes a difference. > > A > > -- > ---- > Andrew Sullivan 87 Mowat Avenue > Liberty RMS Toronto, Ontario Canada > <andrew@libertyrms.info> M6K 3E3 > +1 416 646 3304 x110 > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
On Tue, Aug 27, 2002 at 12:31:00PM -0400, Bruce Momjian wrote: > > Any update on this? Sorry, yes. . . > > --------------------------------------------------------------------------- > > Andrew Sullivan wrote: > > > > The same results are _not_ found in testing with 7.2.1. In that > > case, allocating a Gig of shared memory does not seem to affect the > > result at all. The only question is whether they might be if we ran > > a lot of updates agains the 7.2.x tree. (We tarred up and copied the > > data tree from production, since I had it from a recent maintenance > > period; but we had to use pg_dump to put the data into the 7.2 > > database, obviously). We'll do a great whack of updates, and see if > > that makes a difference. We ran 100,000 updates against the same record on a table (vacuuming, sometimes, of course), and were unable to reproduce the slowdown. My best bet is that someone happened to fix this problem by accident. It could have been related to any of dozens of improvements in 7.2, of course, but whatever it was, it seems to be gone. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110