Обсуждение: [HACKERS] v6.1 buffers and performance
Here is an update on postgres backend buffer performance (and problems).
I ran the regression tests with several sizes for the "-B" option for
postmaster. The best performance was for the run with the largest value
of -B. However, smaller values of -B seem to give results indicating a
large memory leak or garbage collection problem in the backend.
(results are for "time make runtest" under tcsh)
TEST1:
gemini$ /opt/postgres/current/bin/postmaster -B 128
...
ACTUAL RESULTS OF REGRESSION TEST ARE NOW IN FILE regress.out
1.550u 4.580s 5:56.25 1.7% 0+0k 0+0io 16984pf+0w
TEST2:
gemini$ /opt/postgres/current/bin/postmaster
ACTUAL RESULTS OF REGRESSION TEST ARE NOW IN FILE regress.out
...
1.250u 4.870s 6:45.65 1.5% 0+0k 0+0io 16863pf+0w
TEST3:
gemini$ /opt/postgres/current/bin/postmaster -B 16
...
ACTUAL RESULTS OF REGRESSION TEST ARE NOW IN FILE regress.out
1.580u 4.690s 13:29.13 0.7% 0+0k 0+0io 16843pf+0w
In all cases, if I repeat the regression test without restarting
postmaster, the elapsed time becomes significantly higher. For runs with
"-B 16", a second run of the regression test starts failing midway
through with messages indicating that buffer space is exhausted.
I am running on a RedHat Linux box with i686 processor(s).
Vadim (and others?), can you see this feature in your regression runs? I
am still sitting on Robert Withrow's Purify runs on the regression
tests. Would those be helpful? Since the regression tests have changed
so much since v6.0, and since many features tested in the new regression
tests are not available in v6.0, I don't know if it would be useful to
try running the new regression tests with the old release. I haven't
tried distilling the problem down to a single test, because I don't
really know what to look for or what is causing the symptoms.
- Tom
------------------------------
I did a quick Purify run on the backend, and the first non-Irix leak
occured in the buffer manager...I won't be able to actually work on this
stuff for a while but there is definitely something going on in that area.
=+=------------------------/\---------------------------------=+=
Igor Natanzon |**| E-mail: igor@sba.miami.edu
=+=------------------------\/---------------------------------=+=
On Wed, 4 Jun 1997, Thomas G. Lockhart wrote:
> Here is an update on postgres backend buffer performance (and problems).
>
> I ran the regression tests with several sizes for the "-B" option for
> postmaster. The best performance was for the run with the largest value
> of -B. However, smaller values of -B seem to give results indicating a
> large memory leak or garbage collection problem in the backend.
>
> (results are for "time make runtest" under tcsh)
>
> TEST1:
> gemini$ /opt/postgres/current/bin/postmaster -B 128
> ...
> ACTUAL RESULTS OF REGRESSION TEST ARE NOW IN FILE regress.out
> 1.550u 4.580s 5:56.25 1.7% 0+0k 0+0io 16984pf+0w
>
> TEST2:
> gemini$ /opt/postgres/current/bin/postmaster
> ACTUAL RESULTS OF REGRESSION TEST ARE NOW IN FILE regress.out
> ...
> 1.250u 4.870s 6:45.65 1.5% 0+0k 0+0io 16863pf+0w
>
> TEST3:
> gemini$ /opt/postgres/current/bin/postmaster -B 16
> ...
> ACTUAL RESULTS OF REGRESSION TEST ARE NOW IN FILE regress.out
> 1.580u 4.690s 13:29.13 0.7% 0+0k 0+0io 16843pf+0w
>
> In all cases, if I repeat the regression test without restarting
> postmaster, the elapsed time becomes significantly higher. For runs with
> "-B 16", a second run of the regression test starts failing midway
> through with messages indicating that buffer space is exhausted.
>
> I am running on a RedHat Linux box with i686 processor(s).
>
> Vadim (and others?), can you see this feature in your regression runs? I
> am still sitting on Robert Withrow's Purify runs on the regression
> tests. Would those be helpful? Since the regression tests have changed
> so much since v6.0, and since many features tested in the new regression
> tests are not available in v6.0, I don't know if it would be useful to
> try running the new regression tests with the old release. I haven't
> tried distilling the problem down to a single test, because I don't
> really know what to look for or what is causing the symptoms.
>
> - Tom
>
------------------------------
Vadim B. Mikheev wrote:
> > Here is an update on postgres backend buffer performance (and problems).
<snip>
> > However, smaller values of -B seem to give results indicating a
> > large memory leak or garbage collection problem in the backend.
<snip>
> Tom, I posted message 23 May about this problem:
<snip>
> Bug presents in 1.09 too... I don't want to fix it now.
Sorry Vadim, with all of the great fixes you have been doing recently I
lost track of where this one fit in and was unclear whether it might
have been addressed in these recent changes.
It looks like Igor/Robert can have a great on-going contribution for new
code validation. It might take a little while for us to get to the point
where a daily or weekly run of Purify helps to catch new problems, and I
don't know what Igor's schedule will allow for fixing existing leaks,
but IMHO it's an exciting development for Postgres...
- Tom
------------------------------
Thomas G. Lockhart wrote:
>
> Here is an update on postgres backend buffer performance (and problems).
>
> I ran the regression tests with several sizes for the "-B" option for
> postmaster. The best performance was for the run with the largest value
> of -B. However, smaller values of -B seem to give results indicating a
> large memory leak or garbage collection problem in the backend.
>
...skipped
>
> In all cases, if I repeat the regression test without restarting
> postmaster, the elapsed time becomes significantly higher. For runs with
> "-B 16", a second run of the regression test starts failing midway
> through with messages indicating that buffer space is exhausted.
>
> I am running on a RedHat Linux box with i686 processor(s).
>
> Vadim (and others?), can you see this feature in your regression runs? I
> am still sitting on Robert Withrow's Purify runs on the regression
> tests. Would those be helpful? Since the regression tests have changed
> so much since v6.0, and since many features tested in the new regression
> tests are not available in v6.0, I don't know if it would be useful to
> try running the new regression tests with the old release. I haven't
> tried distilling the problem down to a single test, because I don't
> really know what to look for or what is causing the symptoms.
Tom, I posted message 23 May about this problem:
- ---------------------
Thomas G. Lockhart wrote:
>
> I seem to be seeing a buffer memory leak on my Linux box which
> becomes apparent only with several consecutive runs of the regression
> test. Can someone else reproduce this? Any suggestions on how
> to collect more information to help with debugging? (I don't know
> anything about this part of the code so don't know if I can actually
> *do* the debugging, but it's a start...)
>
> I've included below a summary of my results. Notice the increasing
> elapsed time (6 minutes 41.91 seconds for the first case) for
> consecutive runs. Note also that this time goes back down to this
> nominal value if postmaster is killed (control-c) and restarted.
>
> If I decrease the buffer parameter to "-B 16" (from the claimed "-B 64"
> default value) then the regression tests fail during the second
> iteration.
There is bug in executor concerning SQL-functions which return
SETOF results.
Buffer leak comes from queries like
SELECT p.name, p.hobbies.name, p.hobbies.equipment.name FROM person p;
^^^^^^^ ^^^^^^^ ^^^^^^^^^
these funcs created in create_function_2.sql
in misc.sql
Also, queries return dummy results:
regression=> SELECT p.name, p.hobbies.name, p.hobbies.equipment.name
FROM person p;
name |name |name
- -----+-----------+-------------
mike |posthacking|advil
joe |basketball |peet's coffee
sally|basketball |hightops
(3 rows)
regression=> SELECT p.hobbies.equipment.name, p.name, p.hobbies.name
FROM person p;
name |name |name
- --------+-----+-----------
advil |mike |posthacking
hightops|joe |basketball
hightops|sally|basketball
(3 rows)
Funny ?
Valid result:
regression=> SELECT person.name as who, hobbies_r.name as hobby,
equipment_r.name as need_in
from person, hobbies_r, equipment_r
where person.name = hobbies_r.person and equipment_r.hobby = hobbies_r.name;
who |hobby |need_in
- -----+-----------+-------------
mike |posthacking|advil
mike |posthacking|peet's coffee
joe |basketball |hightops
sally|basketball |hightops
(4 rows)
- just like pointed in comments in misc.sql
All comes from executor/execQual.c:ExecTargetList()
if ((IsA(expr,Iter)) && (*isDone))
return (HeapTuple)NULL;
- what if there are > 1 Iters, which return different number
of results (as in examples above) - one is DONE, but other(s) are not:
1. backend doesn't restore PrivateRefCount-s and so doesn't unpin
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
buffers properly;
2. JOE, which likes basketball, need in "peet's coffee"
(from MIKE, which likes posthacking, as we do) instead of "hightops".
Bug presents in 1.09 too... I don't want to fix it now.
Vadim
------------------------------
Igor wrote: > > I did a quick Purify run on the backend, and the first non-Irix leak > occured in the buffer manager...I won't be able to actually work on this > stuff for a while but there is definitely something going on in that area. > Igor, can you post me Purify output concerned buffer manager ? Vadim ------------------------------
I've put Robert's initdb and regression purify runs into
ftp://hub.org/pub/incoming/purify/ . Files are called
purify_initdb_970530.plog.gz
purify_regress_970530.plog.gz
Robert indicated that everything is in the files in the same order as
when run normally.
- Tom
------------------------------
End of hackers-digest V1 #377
*****************************
I upgraded my Linux box to have 96MB of memory, up from 32MB ($250, who
would have believed _that_ a few years ago).
There is a significant difference in Postgres performance, with the
regression tests taking ~5:15 to complete vs. ~6:20 before the upgrade
for a 17% speed improvement. There is little or no evidence for gross
degradation of performance with repeated runs of the regression test, as
I had reported earlier. So, the conclusion is that more memory is better
(duh!) and that as the buffer gets filled or fragmented smaller memory
machines start swapping.
- Tom
------------------------------
Hmm..I was running it on a machine with 192Mb ram..and performance was
pretty good....:)
Speaking of memory, I dumped my current database out, then loaded it under
Purify-watched backend (psql -d igor < db.out)...
After it was done there were several array bounds read errors, one array
bounds write error (bad!!) and almost 300K of leaked memory...I am not
sure whether this memory stays allocated when the spawned backend
terminates, so I don't know whether it really affects anything..Array
errors are important though...Array bounds reads would read/return
garbage data, and array bounds writes could overwrite data...
=+=------------------------/\---------------------------------=+=
Igor Natanzon |**| E-mail: igor@sba.miami.edu
=+=------------------------\/---------------------------------=+=
On Sun, 8 Jun 1997, Thomas G. Lockhart wrote:
> I upgraded my Linux box to have 96MB of memory, up from 32MB ($250, who
> would have believed _that_ a few years ago).
>
> There is a significant difference in Postgres performance, with the
> regression tests taking ~5:15 to complete vs. ~6:20 before the upgrade
> for a 17% speed improvement. There is little or no evidence for gross
> degradation of performance with repeated runs of the regression test, as
> I had reported earlier. So, the conclusion is that more memory is better
> (duh!) and that as the buffer gets filled or fragmented smaller memory
> machines start swapping.
>
> - Tom
>
------------------------------
At 12:21 AM 6/8/97, Igor wrote: >After it was done there were several array bounds read errors, one array >bounds write error (bad!!) and almost 300K of leaked memory...I am not >sure whether this memory stays allocated when the spawned backend >terminates, so I don't know whether it really affects anything..Array >errors are important though...Array bounds reads would read/return >garbage data, and array bounds writes could overwrite data... I agree that array bounds problems are *serious*, the write especially so. As to the fate of leaked memory: if the leak occurs in a child after the fork() then it doesn't matter at all after the child terminates. But can we guarantee that the amount of memory leaked is "small" even if the life of the child is "large"? The priorities seem obvious to me: 1) fix the array bounds problems. (If the fix is found after the 6.1 release then *immediately* release patches and/or version 6.1.1.) 2) Fix memory leaks in the parent PostMaster. (Make a 6.2 release ASAP.) 3) Fix memory leaks in the child processes, unless they can be determined to be unimportant for any conceivable transaction. I wouldn't normally go on at this length, except that I detect some ambivalence in the developer's posts on the subject. I hope that ambivalence is just an uncertainty in how to deal with the memory problems given the immaturity of the freeware tools, and not a desire to deny their seriousness. Signature failed Preliminary Design Review. Feasibility of a new signature is currently being evaluated. h.b.hotz@jpl.nasa.gov, or hbhotz@oxy.edu ------------------------------
Henry B. Hotz wrote:
> The priorities seem obvious to me: 1) fix the array bounds problems. (If
> the fix is found after the 6.1 release then *immediately* release patches
> and/or version 6.1.1.) 2) Fix memory leaks in the parent PostMaster.
> (Make a 6.2 release ASAP.) 3) Fix memory leaks in the child processes,
> unless they can be determined to be unimportant for any conceivable
> transaction.
>
> I wouldn't normally go on at this length, except that I detect some
> ambivalence in the developer's posts on the subject. I hope that
> ambivalence is just an uncertainty in how to deal with the memory problems
> given the immaturity of the freeware tools, and not a desire to deny their
> seriousness.
Henry, my first reaction was probably pretty similar to yours but:
1) postgres is already in _successful_ use
2) the latest release is more solid than the last
3) _all_ the code is inherited, and is something of an unknown quantity
4) if the development team waited until the software were perfect, we
would have never seen it and probably never would.
btw, Henry and I work at the place, although we've never met. It's
interesting seeing the somewhat different approach the postgres
developers must take for this to be successful.
- Tom
------------------------------
At 6:37 PM 6/8/97, Thomas G. Lockhart wrote: >Henry B. Hotz wrote: >> The priorities seem obvious to me: 1) fix the array bounds problems. (If >> the fix is found after the 6.1 release then *immediately* release patches >> and/or version 6.1.1.) 2) Fix memory leaks in the parent PostMaster. >> (Make a 6.2 release ASAP.) 3) Fix memory leaks in the child processes, >> unless they can be determined to be unimportant for any conceivable >> transaction. > >Henry, my first reaction was probably pretty similar to yours but: > >1) postgres is already in _successful_ use >2) the latest release is more solid than the last We may not be so far apart. 1&2 are the reasons why I did not suggest holding off on the 6.1 release. >3) _all_ the code is inherited, and is something of an unknown quantity OTOH 3 is why I think it important to use tools like Purify to create known characteristics when possible. When they find serious problems we should provide fixes as soon as possible. >4) if the development team waited until the software were perfect, we >would have never seen it and probably never would. Well this is always true of hardware as well as software. In NASA we have a bit of a problem sending out a repair crew after a spacecraft is launched, but we do sometimes launch things anyway. Tom (I'm sure) and I could both tell stories, but it gets a bit off topic. >btw, Henry and I work at the place, although we've never met. It's >interesting seeing the somewhat different approach the postgres >developers must take for this to be successful. Signature failed Preliminary Design Review. Feasibility of a new signature is currently being evaluated. h.b.hotz@jpl.nasa.gov, or hbhotz@oxy.edu ------------------------------ End of hackers-digest V1 #380 *****************************