Обсуждение: zLinux Load Testing Experience
I'm currently working on a project porting an application from RedHat Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at version 9.n, so the PostgreSQL binaries have been built using the standard build tools from source. Everything appears run correctly. However as part of performance testing, our IBM and Linux SysProgs have been "poking around" using strace and have reported the following (which they think is an error condition) when hooking up to the postmaster processes:- read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0 (Timeout) read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0 (Timeout) ... repeated many times From researching the archives, I "believe" the above to be "as designed" and simply indicates the Postmaster is attempting to read data from an IP socket which is timing out. Could I ask :- 1. Is this "normal" ? 2. if abnormal, any pointers as to where to start investigating The reason they latched onto the postmaster process was due to a perceived high CPU utilisation. For info, we are load testing with 100 connections being accessed from an IBm WebSphere hosted EJB based application. Many thanks, Andrew
On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net> wrote:
I'm currently working on a project porting an application from RedHat Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at version 9.n, so the PostgreSQL binaries have been built using the standard build tools from source. Everything appears run correctly. However as part of performance testing, our IBM and Linux SysProgs have been "poking around" using strace and have reported the following (which they think is an error condition) when hooking up to the postmaster processes:-
read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0 (Timeout)
read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable)
poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0 (Timeout)
... repeated many times
That does not look like the postmaster process. It looks like probably the background writer process.
It is normal, and doesn't explain high CPU utilization.
Cheers,
Jeff
On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net> wrote: >> >> I'm currently working on a project porting an application from RedHat >> Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at >> version 9.n, so the PostgreSQL binaries have been built using the standard >> build tools from source. Everything appears run correctly. However as part >> of performance testing, our IBM and Linux SysProgs have been "poking around" >> using strace and have reported the following (which they think is an error >> condition) when hooking up to the postmaster processes:- >> >> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable) >> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0 (Timeout) >> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable) >> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0 >> (Timeout) >> ... repeated many times >> > > > That does not look like the postmaster process. It looks like probably the > background writer process. > > It is normal, and doesn't explain high CPU utilization. yeah: we're probably a couple of steps in front of deep system profiling. Helpful things to provide to help diagnose would be: *) 'explain analyze' of the queries that are eating cpu *) more details about the hardware -- how many cpu, etc. *) better definition of 'perceived high CPU utilisation' *) some correlating performance tests, expecially cpu bound pgbench tests (pgbench -S) merlin
On 30/04/13 20:46, Merlin Moncure wrote: > On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net> wrote: >>> I'm currently working on a project porting an application from RedHat >>> Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at >>> version 9.n, so the PostgreSQL binaries have been built using the standard >>> build tools from source. Everything appears run correctly. However as part >>> of performance testing, our IBM and Linux SysProgs have been "poking around" >>> using strace and have reported the following (which they think is an error >>> condition) when hooking up to the postmaster processes:- >>> >>> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable) >>> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0 (Timeout) >>> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable) >>> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0 >>> (Timeout) >>> ... repeated many times >>> >> >> That does not look like the postmaster process. It looks like probably the >> background writer process. >> >> It is normal, and doesn't explain high CPU utilization. > yeah: we're probably a couple of steps in front of deep system > profiling. Helpful things to provide to help diagnose would be: > > *) 'explain analyze' of the queries that are eating cpu > *) more details about the hardware -- how many cpu, etc. > *) better definition of 'perceived high CPU utilisation' > *) some correlating performance tests, expecially cpu bound pgbench > tests (pgbench -S) > > merlin > > I'm not sure how much experience the community has on tuning PostgreSQL running on RedHat which in turn is hosted on an IBM mainframe under VM (using zLinux). So I'm happy to start posting further details and benchmark results and see where we go. Should I be moving this thread over into the pg-performance list, or is pg-general the right place?
On Wed, May 1, 2013 at 8:01 AM, Andrew Hastie <andrew@ahastie.net> wrote: > > On 30/04/13 20:46, Merlin Moncure wrote: >> >> On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >>> >>> On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net> >>> wrote: >>>> >>>> I'm currently working on a project porting an application from RedHat >>>> Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at >>>> version 9.n, so the PostgreSQL binaries have been built using the >>>> standard >>>> build tools from source. Everything appears run correctly. However as >>>> part >>>> of performance testing, our IBM and Linux SysProgs have been "poking >>>> around" >>>> using strace and have reported the following (which they think is an >>>> error >>>> condition) when hooking up to the postmaster processes:- >>>> >>>> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily >>>> unavailable) >>>> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0 >>>> (Timeout) >>>> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily >>>> unavailable) >>>> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0 >>>> (Timeout) >>>> ... repeated many times >>>> >>> >>> That does not look like the postmaster process. It looks like probably >>> the >>> background writer process. >>> >>> It is normal, and doesn't explain high CPU utilization. >> >> yeah: we're probably a couple of steps in front of deep system >> profiling. Helpful things to provide to help diagnose would be: >> >> *) 'explain analyze' of the queries that are eating cpu >> *) more details about the hardware -- how many cpu, etc. >> *) better definition of 'perceived high CPU utilisation' >> *) some correlating performance tests, expecially cpu bound pgbench >> tests (pgbench -S) >> >> merlin >> >> > I'm not sure how much experience the community has on tuning PostgreSQL > running on RedHat which in turn is hosted on an IBM mainframe under VM > (using zLinux). So I'm happy to start posting further details and benchmark > results and see where we go. Should I be moving this thread over into the > pg-performance list, or is pg-general the right place? certainly performance. and yes, zLinux is less well traveled. Did you compile postgres from source? Did you confirm that there is a native spinlocks implementation and it is being used? merlin
On 01/05/13 15:34, Merlin Moncure wrote:
Did you compile postgres from source? - Yes (I need PG v9.n as v8.n shipped with RedHat Ent6 does not have several v9 specific features we need).On Wed, May 1, 2013 at 8:01 AM, Andrew Hastie <andrew@ahastie.net> wrote:On 30/04/13 20:46, Merlin Moncure wrote:On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote:On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net> wrote:I'm currently working on a project porting an application from RedHat Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at version 9.n, so the PostgreSQL binaries have been built using the standard build tools from source. Everything appears run correctly. However as part of performance testing, our IBM and Linux SysProgs have been "poking around" using strace and have reported the following (which they think is an error condition) when hooking up to the postmaster processes:- read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0 (Timeout) read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily unavailable) poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0 (Timeout) ... repeated many timesThat does not look like the postmaster process. It looks like probably the background writer process. It is normal, and doesn't explain high CPU utilization.yeah: we're probably a couple of steps in front of deep system profiling. Helpful things to provide to help diagnose would be: *) 'explain analyze' of the queries that are eating cpu *) more details about the hardware -- how many cpu, etc. *) better definition of 'perceived high CPU utilisation' *) some correlating performance tests, expecially cpu bound pgbench tests (pgbench -S) merlinI'm not sure how much experience the community has on tuning PostgreSQL running on RedHat which in turn is hosted on an IBM mainframe under VM (using zLinux). So I'm happy to start posting further details and benchmark results and see where we go. Should I be moving this thread over into the pg-performance list, or is pg-general the right place?certainly performance. and yes, zLinux is less well traveled. Did you compile postgres from source? Did you confirm that there is a native spinlocks implementation and it is being used? merlin
Did you confirm that there is a native spinlocks implementation and it is being used? - I believe so as no errors or warnings logged during the build. Is there a simple way to check whether spin-locks are running native? I've started looking at several articles covering pgbench and running some initial tests, so I plan to start a new thread on pg-performance in the next day or so. Thanks for the advice so far - Appreciated :-) Andrew
On Wed, May 1, 2013 at 11:34 AM, Andrew Hastie <andrew@ahastie.net> wrote: > > On 01/05/13 15:34, Merlin Moncure wrote: > > On Wed, May 1, 2013 at 8:01 AM, Andrew Hastie <andrew@ahastie.net> wrote: > > On 30/04/13 20:46, Merlin Moncure wrote: > > On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote: > > On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net> > wrote: > > I'm currently working on a project porting an application from RedHat > Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at > version 9.n, so the PostgreSQL binaries have been built using the > standard > build tools from source. Everything appears run correctly. However as > part > of performance testing, our IBM and Linux SysProgs have been "poking > around" > using strace and have reported the following (which they think is an > error > condition) when hooking up to the postmaster processes:- > > read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily > unavailable) > poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0 > (Timeout) > read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily > unavailable) > poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0 > (Timeout) > ... repeated many times > > That does not look like the postmaster process. It looks like probably > the > background writer process. > > It is normal, and doesn't explain high CPU utilization. > > yeah: we're probably a couple of steps in front of deep system > profiling. Helpful things to provide to help diagnose would be: > > *) 'explain analyze' of the queries that are eating cpu > *) more details about the hardware -- how many cpu, etc. > *) better definition of 'perceived high CPU utilisation' > *) some correlating performance tests, expecially cpu bound pgbench > tests (pgbench -S) > > merlin > > > I'm not sure how much experience the community has on tuning PostgreSQL > running on RedHat which in turn is hosted on an IBM mainframe under VM > (using zLinux). So I'm happy to start posting further details and benchmark > results and see where we go. Should I be moving this thread over into the > pg-performance list, or is pg-general the right place? > > certainly performance. and yes, zLinux is less well traveled. Did > you compile postgres from source? Did you confirm that there is a > native spinlocks implementation and it is being used? > > merlin > > Did you compile postgres from source? - Yes (I need PG v9.n as v8.n shipped > with RedHat Ent6 does not have several v9 specific features we need). > > Did you confirm that there is a native spinlocks implementation and it is > being used? - I believe so as no errors or warnings logged during the build. > Is there a simple way to check whether spin-locks are running native? > > I've started looking at several articles covering pgbench and running some > initial tests, so I plan to start a new thread on pg-performance in the next > day or so. > > Thanks for the advice so far - Appreciated :-) I can't remember off the top of my head if configure forces you to specifically unset spinlocks to get through a build on a non-hardware spinlock platform. Point being: the interesting stuff happens during configure, not build. Check the contents of src/include/pg_config.h and look for this line: #define HAVE_SPINLOCKS 1 to see if you have hardware spinlocks. merlin
On Wed, May 1, 2013 at 1:21 PM, Merlin Moncure <mmoncure@gmail.com> wrote: > On Wed, May 1, 2013 at 11:34 AM, Andrew Hastie <andrew@ahastie.net> wrote: >> >> On 01/05/13 15:34, Merlin Moncure wrote: >> >> On Wed, May 1, 2013 at 8:01 AM, Andrew Hastie <andrew@ahastie.net> wrote: >> >> On 30/04/13 20:46, Merlin Moncure wrote: >> >> On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> >> On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net> >> wrote: >> >> I'm currently working on a project porting an application from RedHat >> Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at >> version 9.n, so the PostgreSQL binaries have been built using the >> standard >> build tools from source. Everything appears run correctly. However as >> part >> of performance testing, our IBM and Linux SysProgs have been "poking >> around" >> using strace and have reported the following (which they think is an >> error >> condition) when hooking up to the postmaster processes:- >> >> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily >> unavailable) >> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0 >> (Timeout) >> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily >> unavailable) >> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0 >> (Timeout) >> ... repeated many times >> >> That does not look like the postmaster process. It looks like probably >> the >> background writer process. >> >> It is normal, and doesn't explain high CPU utilization. >> >> yeah: we're probably a couple of steps in front of deep system >> profiling. Helpful things to provide to help diagnose would be: >> >> *) 'explain analyze' of the queries that are eating cpu >> *) more details about the hardware -- how many cpu, etc. >> *) better definition of 'perceived high CPU utilisation' >> *) some correlating performance tests, expecially cpu bound pgbench >> tests (pgbench -S) >> >> merlin >> >> >> I'm not sure how much experience the community has on tuning PostgreSQL >> running on RedHat which in turn is hosted on an IBM mainframe under VM >> (using zLinux). So I'm happy to start posting further details and benchmark >> results and see where we go. Should I be moving this thread over into the >> pg-performance list, or is pg-general the right place? >> >> certainly performance. and yes, zLinux is less well traveled. Did >> you compile postgres from source? Did you confirm that there is a >> native spinlocks implementation and it is being used? >> >> merlin >> >> Did you compile postgres from source? - Yes (I need PG v9.n as v8.n shipped >> with RedHat Ent6 does not have several v9 specific features we need). >> >> Did you confirm that there is a native spinlocks implementation and it is >> being used? - I believe so as no errors or warnings logged during the build. >> Is there a simple way to check whether spin-locks are running native? >> >> I've started looking at several articles covering pgbench and running some >> initial tests, so I plan to start a new thread on pg-performance in the next >> day or so. >> >> Thanks for the advice so far - Appreciated :-) > > I can't remember off the top of my head if configure forces you to > specifically unset spinlocks to get through a build on a non-hardware > spinlock platform. Point being: the interesting stuff happens during > configure, not build. > > Check the contents of src/include/pg_config.h and look for this line: > #define HAVE_SPINLOCKS 1 > > to see if you have hardware spinlocks. Just a follow up here since I'm about to go on vacation and will be out of pocket for the next several days. If you do indeed find out that you are using non TAS spinlocks, and are suspicious that this is causing your load issues, and are feeling experimental, and are using gcc to compile postgres, and have determined that the HAVE_GCC_INT_ATOMICS macro is set, I'd maybe consider hacking s_lock.h to use the gcc __sync_lock_test_and_set variant of TAS (see around line 300) in s_lock.h. merlin
Andrew Hastie <andrew@ahastie.net> writes: > Did you confirm that there is a native spinlocks implementation and it is being used? - I believe so as no errors or warningslogged during the build. Is there a simple way to check whether spin-locks are running native? All non-ancient versions of PG force you to say "configure --disable-spinlocks" to get a build without native spinlocks. Such builds are only considered suitable for zero-order port testing, because the performance hit is so bad. regards, tom lane
On 01/05/13 19:21, Merlin Moncure wrote: > On Wed, May 1, 2013 at 11:34 AM, Andrew Hastie <andrew@ahastie.net> wrote: >> On 01/05/13 15:34, Merlin Moncure wrote: >> >> On Wed, May 1, 2013 at 8:01 AM, Andrew Hastie <andrew@ahastie.net> wrote: >> >> On 30/04/13 20:46, Merlin Moncure wrote: >> >> On Tue, Apr 30, 2013 at 12:26 PM, Jeff Janes <jeff.janes@gmail.com> wrote: >> >> On Tue, Apr 30, 2013 at 8:28 AM, Andrew Hastie <andrew@ahastie.net> >> wrote: >> >> I'm currently working on a project porting an application from RedHat >> Linux on Intel onto IBM zLinux. Our application requires PostgreSQL at >> version 9.n, so the PostgreSQL binaries have been built using the >> standard >> build tools from source. Everything appears run correctly. However as >> part >> of performance testing, our IBM and Linux SysProgs have been "poking >> around" >> using strace and have reported the following (which they think is an >> error >> condition) when hooking up to the postmaster processes:- >> >> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily >> unavailable) >> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 200) = 0 >> (Timeout) >> read(3, 0x3ffff875ee0, 16) = -1 EAGAIN (Resource temporarily >> unavailable) >> poll([{fd=3, events=POLLIN}, {fd=6, events=POLLIN}], 2, 10000) = 0 >> (Timeout) >> ... repeated many times >> >> That does not look like the postmaster process. It looks like probably >> the >> background writer process. >> >> It is normal, and doesn't explain high CPU utilization. >> >> yeah: we're probably a couple of steps in front of deep system >> profiling. Helpful things to provide to help diagnose would be: >> >> *) 'explain analyze' of the queries that are eating cpu >> *) more details about the hardware -- how many cpu, etc. >> *) better definition of 'perceived high CPU utilisation' >> *) some correlating performance tests, expecially cpu bound pgbench >> tests (pgbench -S) >> >> merlin >> >> >> I'm not sure how much experience the community has on tuning PostgreSQL >> running on RedHat which in turn is hosted on an IBM mainframe under VM >> (using zLinux). So I'm happy to start posting further details and benchmark >> results and see where we go. Should I be moving this thread over into the >> pg-performance list, or is pg-general the right place? >> >> certainly performance. and yes, zLinux is less well traveled. Did >> you compile postgres from source? Did you confirm that there is a >> native spinlocks implementation and it is being used? >> >> merlin >> >> Did you compile postgres from source? - Yes (I need PG v9.n as v8.n shipped >> with RedHat Ent6 does not have several v9 specific features we need). >> >> Did you confirm that there is a native spinlocks implementation and it is >> being used? - I believe so as no errors or warnings logged during the build. >> Is there a simple way to check whether spin-locks are running native? >> >> I've started looking at several articles covering pgbench and running some >> initial tests, so I plan to start a new thread on pg-performance in the next >> day or so. >> >> Thanks for the advice so far - Appreciated :-) > I can't remember off the top of my head if configure forces you to > specifically unset spinlocks to get through a build on a non-hardware > spinlock platform. Point being: the interesting stuff happens during > configure, not build. > > Check the contents of src/include/pg_config.h and look for this line: > #define HAVE_SPINLOCKS 1 > > to see if you have hardware spinlocks. > > merlin > > Confirm that #define HAVE_SPINLOCKS 1 is present and correct. Will move any performance related issues I find onto the pg-performance list. Many thanks for all the help and advice so far :-) Andrew