On Mon, Jul 12, 2010 at 7:03 AM, Andras Fabian <Fabian@atrada.net> wrote:
> This STDOU issue gets even weirder. Now I have set up our two new servers (identical hw/sw) as I would have needed to
doso anyways. After having PG running, I also set up the same test scenario as I have it on our problematic servers,
andstarted the COPY-to-STDOUT experiment. And you know what? Both new servers are performing well. No hanging, and the
3GByte test dump was written in around 3 minutes (as expected). To make things even more complicated ... I went back to
ourproduction servers. Now, the first one - which I froze up with oprofile this morning and needed a REBOOT - is
performingwell too! It needed 3 minutes for the test case ... WTF? BUT, the second production server, which did not
havea reboot, is still behaving badly.
I'm gonna take a scientific wild-assed guess that your machine was
rebuilding RAID arrays when you started out, and you had massive IO
contention underneath the OS level resulting in such a slow down.
Note that you mentioned ~5% IO Wait. That's actually fairly high if
you've got 8 to 16 cores or something like that. It's much better to
use iostat -xd 60 or something like that and look for IO Utilization
at the end of the lines.
Again, just a guess.