Обсуждение: PG_DUMP very slow because of STDOUT ??

Поиск

Список

Период

Сортировка

PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

09 июля 2010 г., 08:54:01

Hi,

I am new to the mailing list (as a writer … reading since a long time), but am using PostgreSQL in our company for quite some time now.

Lately, we rolled out some new servers with vastly improved hardware (8 disk RAID 1+0, instead of old 2 disk RAID 1 etc. etc.), new OS (Ubuntu 10.04 server instead of Debian Lenny) and newer PostgreSQL version (8.4.4 instead of 8.3.8). And everything seemed to be much better – as expected – until, well until I have seen the first full backup run-times.

For example:

- One of our old machines needed approximately 2 ½ hours to make a dump (with “–Fc –compress=9”) of our approximately 100 GByte DB.

- One of our new machines with the copy of the same 100 GByte DB needed 6 ½ to 7 hours to dump the same DB.

First check of our MRTG monitoring showed, that during the backup, the machine was not saturated in any way by load (IO Wait were below 5%), CPUs were almost idling so you got the feeling, that machine was sleeping … But of course it did the backup, only that it was very slow.

No I further investigated. I also did some detailed monitoring with collect (http://collectl.sourceforge.net/), seeing, that the machine often was doing some slow reading and every 20-30 seconds wrote down a few Mbytes on the backup disk. So, almost everything suggests that we are very very far from some hardware limitations. I also checked some file write/ reads on the RAID, and could confirm, that plain DD of some multy GByte file only took a snap.

OK, then I checked what PG_DUMP was doing. In the process list of PostgreSQL (viewed via pgAdmin), I see how PG_DUMP works (didn’t care much about it until now). In does a COPY to STDOUT. Now I wanted to narrow down the problem, and tried to reproduce that COPY statement of one of our 8 GByte DBs (that table only has INT, BIGINT, BOOLEAN fields ! So no potential BLOB problem!). And now comes the interesting part.

- When I did “COPY public.bigtable (column1, column2, column3) TO '/var/tmp/test.dump';” it is FAST. It writes down the dump file of 3 GBytes in about 2 ½ minutes (as expected with the hardware)!

- When I did “time sudo -u postgres /var/software/pgsql1/bin/psql -h /tmp/pgsql1 -p 4306 -d database_name -f copy-test.sql > /var/tmp/test.dump” (and copy-test.sql has : “COPY public.cachereleaseartist (releasebyterritoryid, artistid, isdisplayartist, artistroleclusterid, artistroletypeid, iscontainedinrelease, iscontainedinresource) TO STDOUT;” ) … I couldn’t wait until it ended .. after 20 minutes the test.dump file was merely at 1 Gbyte (far from the target size of 3 Gbyte).

I also monitored both statements via collect, and could confirm that the direct COPY to file made some reasonable IO activity, while the COPY via STDOUT was idling like mad, with some occasional bursts and falling asleep again. This would also make clear, why PG_DUMP is so dog slow. If it gets its data via STDOUT at that speed, then it falls asleep too ….

And to make things even worse: I did the same test on our old hardware too. And BOTH, COPY to file directly AND COPY via STDOUT war equally fast (took 5 ½ minutes … but OK, that’s an old iron compared to the new one)!!! No difference between file and STDOUT (as expected)!

Now I ask, whats going on here ???? Why is COPY via STDOUT so much slower on out new machine?

- Has anything changed between PostgreSQL 8.3.8 and PostgreSQL 8.4.4 in this respect? Maybe some STDOUT buffer flushing from COPY? Buffer handling? Buffer size???

- Is maybe the OS to blame? As I told, we changed from Debian Lenny (Kernel 2.6.26-2) to Ubuntu 10.04 Server (Kernel 2.6.32-22) (both 64-bit)? And if yes, would there be ways for PostgreSQL to adapt to the new OSs new behavior in STDOUT (if there is such thing)???

And for the Hardware:

- Old machine was an 8 Core with 16 GByte RAM BUT only one RAID 1 (software!) with two disks for everything (I know, that was a little bit meager).

- New machine has 8 Core (with hyper-threading 16), 24 GByte RAM, and a RAID 1+0 with 8 disks for Data only, and XLOG/Backup/OS on a second RAID 1 with 2 disks, with hardware raid controller and battery backed cache (so, obviously, this machine should be faster than the old one)

Can someone shed some light on this STDOUT madness? Or give me some directions/hints in which I could further research?

Thank you very much!

Andras Fabian

Atrada Trading Network AG

Re: PG_DUMP very slow because of STDOUT ??

От

Tom Lane

Дата:

09 июля 2010 г., 10:38:58

Andras Fabian <Fabian@atrada.net> writes:
> Now I ask, whats going on here ???? Why is COPY via STDOUT so much slower on out new machine?

Something weird about the network stack on the new machine, maybe.
Have you compared the transfer speeds for Unix-socket and TCP connections?

On a Red Hat box I would try using oprofile to see where the bottleneck
is ... don't know if that's available for Ubuntu.

            regards, tom lane

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

09 июля 2010 г., 10:47:41

> Something weird about the network stack on the new machine, maybe.
> Have you compared the transfer speeds for Unix-socket and TCP connections?

Hmm, no ... but how do Unix-socket / TCP connections relate to STDOUT here (sorry, maybe this is obvious to some, but
notto me at the moment)? 
And doing some profiling came to my mind too, but didn't have much experience in that field until now (at least not on
basicOS related stuff). I might check out oprofile (yes, it seems to exist for Ubuntu) next week (now its time to leave
forthe weekend :-) 
And maybe others have some more ideas until then ... (or have experience this or similar weirdness too).

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Gesendet: Freitag, 9. Juli 2010 15:39
An: Andras Fabian
Cc: pgsql-general@postgresql.org
Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??

Andras Fabian <Fabian@atrada.net> writes:
> Now I ask, whats going on here ???? Why is COPY via STDOUT so much slower on out new machine?

Something weird about the network stack on the new machine, maybe.
Have you compared the transfer speeds for Unix-socket and TCP connections?

On a Red Hat box I would try using oprofile to see where the bottleneck
is ... don't know if that's available for Ubuntu.

            regards, tom lane

Re: PG_DUMP very slow because of STDOUT ??

От

Tom Lane

Дата:

09 июля 2010 г., 11:09:06

Andras Fabian <Fabian@atrada.net> writes:
>> Something weird about the network stack on the new machine, maybe.
>> Have you compared the transfer speeds for Unix-socket and TCP connections?

> Hmm, no ... but how do Unix-socket / TCP connections relate to STDOUT here (sorry, maybe this is obvious to some, but
notto me at the moment)? 

Well, COPY TO STDOUT really means "send the data across the network
connection to the client".  It sounds to me like your problem is slow
data transfer between the database backend and pg_dump (or psql).
Not sure what could be causing it, unless perhaps you've got a kernel
packet filter that's performing an unreasonable amount of processing
per packet.

            regards, tom lane

Re: PG_DUMP very slow because of STDOUT ??

От

Craig Ringer

Дата:

09 июля 2010 г., 17:11:47

On 09/07/10 22:08, Tom Lane wrote:
> Andras Fabian <Fabian@atrada.net> writes:
>>> Something weird about the network stack on the new machine, maybe.
>>> Have you compared the transfer speeds for Unix-socket and TCP connections?
>
>> Hmm, no ... but how do Unix-socket / TCP connections relate to STDOUT here (sorry, maybe this is obvious to some,
butnot to me at the moment)? 
>
> Well, COPY TO STDOUT really means "send the data across the network
> connection to the client".  It sounds to me like your problem is slow
> data transfer between the database backend and pg_dump (or psql).
> Not sure what could be causing it, unless perhaps you've got a kernel
> packet filter that's performing an unreasonable amount of processing
> per packet.

... and that should be showing up as high "system" CPU time, or it seems
to on my machines with lots of tcp/ip traffic and heavy netfilter rules.
If they're using unix sockets packet filtering shouldn't come into play
anyway.

Andrais: Are you using UNIX sockets or TCP/IP to connect to PostgreSQL?
If you specify any host to connect to, even "localhost", you're using
TCP/IP. If you don't specify a host, you're using unix sockets.

Whichever you are doing, try the other one and see if performance is
different.

Idea: One notable difference between COPY to file and 'copy to stdout"
to a receiving process over the network is that the latter involves the
interaction of two processes. When a kernel change is involved and
you're seeing low performance with lots of apparent idleness, that
immediately makes me think "process scheduler".

Of course, that might be completely off-track, but I've seen weird
performance issues caused by scheduler changes before.

--
Craig Ringer

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

09 июля 2010 г., 17:26:02

Hi guys,

well definitely thanks for these first ideas. They at least point in a direction which I havn't considered before (to
behonest, I didn't even think about associating STDOUT with some socket stuff).  
With my experiments, I think I have essentially tried both socket types. Once I normally connected via pgAdmin to my
server(this would definitely qualify as TCP/IP I think :-), and then also did locally on the machine via "psql" (see
theexample in my original post). So, in that case, it must have been unix sockets. And both cases behaved similarly
whenissuing the COPY TO STDOUT (i.e.: sloooowly).  
And for difference between COPY TO FILE and and COPY TO STDOUT ... when I did the same experiment on old
machine/kernel,there was almost no difference between execution time (almost identical, and fast as expected) - that
testwas done from "psql"/command line ... so unix-sockets. 
So, of course, there might be something with our new kernel. But then many others could have this "problem" too!
Becausewe are using the stock kernel of the latest Ubuntu 10.04 LTS 64-bit server (nothing self compiled, just as
is,andas it comes from Canonical after you downloaded it). Maybe, someone having a machine with Ubuntu 10.04 (server!)
anda PostgreSQL on it, could try to reproduce it too. 
By the way, how could I experiment with the scheduler? I remember, that there were some magic trick via /proc file
systemor something like that to "tweak" it on the fly .... 

Andras Fabian

________________________________________
Von: Craig Ringer [craig@postnewspapers.com.au]
Gesendet: Freitag, 9. Juli 2010 22:11
An: Tom Lane
Cc: Andras Fabian; pgsql-general@postgresql.org
Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 09/07/10 22:08, Tom Lane wrote:
> Andras Fabian <Fabian@atrada.net> writes:
>>> Something weird about the network stack on the new machine, maybe.
>>> Have you compared the transfer speeds for Unix-socket and TCP connections?
>
>> Hmm, no ... but how do Unix-socket / TCP connections relate to STDOUT here (sorry, maybe this is obvious to some,
butnot to me at the moment)? 
>
> Well, COPY TO STDOUT really means "send the data across the network
> connection to the client".  It sounds to me like your problem is slow
> data transfer between the database backend and pg_dump (or psql).
> Not sure what could be causing it, unless perhaps you've got a kernel
> packet filter that's performing an unreasonable amount of processing
> per packet.

... and that should be showing up as high "system" CPU time, or it seems
to on my machines with lots of tcp/ip traffic and heavy netfilter rules.
If they're using unix sockets packet filtering shouldn't come into play
anyway.

Andrais: Are you using UNIX sockets or TCP/IP to connect to PostgreSQL?
If you specify any host to connect to, even "localhost", you're using
TCP/IP. If you don't specify a host, you're using unix sockets.

Whichever you are doing, try the other one and see if performance is
different.

Idea: One notable difference between COPY to file and 'copy to stdout"
to a receiving process over the network is that the latter involves the
interaction of two processes. When a kernel change is involved and
you're seeing low performance with lots of apparent idleness, that
immediately makes me think "process scheduler".

Of course, that might be completely off-track, but I've seen weird
performance issues caused by scheduler changes before.

--
Craig Ringer

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

12 июля 2010 г., 05:45:28

Hi Tom (or others),

are there some recommended settings/ways to use oprofile on a situation like this??? I got it working, have seen a
firstprofile report, but then managed to completely freeze the server on a second try with different oprofile settings
(nexttests will go against the newly installed - next and identical - new servers).  

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Gesendet: Freitag, 9. Juli 2010 15:39
An: Andras Fabian
Cc: pgsql-general@postgresql.org
Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??

Andras Fabian <Fabian@atrada.net> writes:
> Now I ask, whats going on here ???? Why is COPY via STDOUT so much slower on out new machine?

Something weird about the network stack on the new machine, maybe.
Have you compared the transfer speeds for Unix-socket and TCP connections?

On a Red Hat box I would try using oprofile to see where the bottleneck
is ... don't know if that's available for Ubuntu.

            regards, tom lane

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

12 июля 2010 г., 10:04:00

This STDOU issue gets even weirder. Now I have set up our two new servers (identical hw/sw) as I would have needed to
doso anyways. After having PG running, I also set up the same test scenario as I have it on our problematic servers,
andstarted the COPY-to-STDOUT experiment. And you know what? Both new servers are performing well. No hanging, and the
3GByte test dump was written in around 3 minutes (as expected). To make things even more complicated ... I went back to
ourproduction servers. Now, the first one - which I froze up with oprofile this morning and needed a REBOOT - is
performingwell too! It needed 3 minutes for the test case ... WTF? BUT, the second production server, which did not
havea reboot, is still behaving badly. 
Now I tried to dig deeper (without killing a production server again) ... and came to comparing the outputs of PS (with
'-fax'parameter then, '-axl'). Now I have found something interesting: 
- all fast servers show the COPY process as being in the state Rs ("runnable (on run queue)")
- on the still slow server, this process is in 9 out of 10 samples in Ds ("uninterruptible sleep (usually IO)")

Now, this "Ds" state seems to be something unhealthy - especially if it is there almost all the time - as far as my
firstreeds on google show (and although it points to IO, there is seemingly only very little IO, and IO-wait is minimal
too).I have also done "-axl" with PS, which brings the following line for our process: 
F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
1  5551  2819  4201  20   0 5941068 201192 conges Ds ?          2:05 postgres: postgres musicload_cache [local] COPY"

Now, as far as I understood from my google searches, the column WCHAN shows, where in the kernel my process is hanging.
Hereit says "conges". Now, can somebody tell me, what "conges" means ???? Or do I have other options to get out even
moreinfo from the system (maybe without oprofile - as it already burned my hand :-). 

And yes, now I see a reboot as a possible "Fix", but that would not ensure me, that the problem will not resurface. So,
forthe time being, I will leave my current second production server as is ... so I can further narrow down the
potentialreasons of this strange STDOUT slow down (especially I someone ha s a tip for me :-) 

Andras Fabian

(in the meantime my "slow" server finished the COPY ... it took 46 minutes instead of 3 minutes on the fast machines
...a slowdown of factor 15).  




-----Ursprüngliche Nachricht-----
Von: Andras Fabian
Gesendet: Montag, 12. Juli 2010 10:45
An: 'Tom Lane'
Cc: pgsql-general@postgresql.org
Betreff: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

Hi Tom (or others),

are there some recommended settings/ways to use oprofile on a situation like this??? I got it working, have seen a
firstprofile report, but then managed to completely freeze the server on a second try with different oprofile settings
(nexttests will go against the newly installed - next and identical - new servers).  

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Gesendet: Freitag, 9. Juli 2010 15:39
An: Andras Fabian
Cc: pgsql-general@postgresql.org
Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??

Andras Fabian <Fabian@atrada.net> writes:
> Now I ask, whats going on here ???? Why is COPY via STDOUT so much slower on out new machine?

Something weird about the network stack on the new machine, maybe.
Have you compared the transfer speeds for Unix-socket and TCP connections?

On a Red Hat box I would try using oprofile to see where the bottleneck
is ... don't know if that's available for Ubuntu.

            regards, tom lane

Re: PG_DUMP very slow because of STDOUT ??

От

Greg Smith

Дата:

12 июля 2010 г., 15:55:07

Andras Fabian wrote:
> - all fast servers show the COPY process as being in the state Rs ("runnable (on run queue)")
> - on the still slow server, this process is in 9 out of 10 samples in Ds ("uninterruptible sleep (usually IO)")
>

I've run into significant performance regressions in PostgreSQL
performance due to issues with the Linux scheduler before, specifically
when running a single really intensive client program.  You might be
seeing something similar here.  I wrote a reference link heavy blog
entry about that at
http://notemagnet.blogspot.com/2008/05/pgbench-suffering-with-linux-2623-2626.html
you might find useful, one of the batch scheduler tweaks alluded to
there might improve things.  Regression here in newer kernels are the
norm rather than the exception, and given the general lack of quality
control in Ubuntu 10.04 I have avoided any performance testing of it
yet.  I was going to give it six months after release before I even
thought about that, in hopes more bugs are squashed, but I'm not
optimistic about this distribution for server use at all right now.

There's more information about using oprofile at
http://wiki.postgresql.org/wiki/Profiling_with_OProfile that might help
you dig into the underlying spot it's stuck at.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: PG_DUMP very slow because of STDOUT ??

От

Scott Marlowe

Дата:

12 июля 2010 г., 22:43:33

On Mon, Jul 12, 2010 at 7:03 AM, Andras Fabian <Fabian@atrada.net> wrote:
> This STDOU issue gets even weirder. Now I have set up our two new servers (identical hw/sw) as I would have needed to
doso anyways. After having PG running, I also set up the same test scenario as I have it on our problematic servers,
andstarted the COPY-to-STDOUT experiment. And you know what? Both new servers are performing well. No hanging, and the
3GByte test dump was written in around 3 minutes (as expected). To make things even more complicated ... I went back to
ourproduction servers. Now, the first one - which I froze up with oprofile this morning and needed a REBOOT - is
performingwell too! It needed 3 minutes for the test case ... WTF? BUT, the second production server, which did not
havea reboot, is still behaving badly. 

I'm gonna take a scientific wild-assed guess that your machine was
rebuilding RAID arrays when you started out, and you had massive IO
contention underneath the OS level resulting in such a slow down.
Note that you mentioned ~5% IO Wait.  That's actually fairly high if
you've got 8 to 16 cores or something like that.  It's much better to
use iostat -xd 60 or something like that and look for IO Utilization
at the end of the lines.

Again, just a guess.

Re: PG_DUMP very slow because of STDOUT ??

От

Craig Ringer

Дата:

13 июля 2010 г., 00:09:39

On 12/07/10 21:03, Andras Fabian wrote:
> This STDOU issue gets even weirder. Now I have set up our two new servers (identical hw/sw) as I would have needed to
doso anyways. After having PG running, I also set up the same test scenario as I have it on our problematic servers,
andstarted the COPY-to-STDOUT experiment. And you know what? Both new servers are performing well. No hanging, and the
3GByte test dump was written in around 3 minutes (as expected). To make things even more complicated ... I went back to
ourproduction servers. Now, the first one - which I froze up with oprofile this morning and needed a REBOOT - is
performingwell too! It needed 3 minutes for the test case ... WTF? BUT, the second production server, which did not
havea reboot, is still behaving badly. 
> Now I tried to dig deeper (without killing a production server again) ... and came to comparing the outputs of PS
(with'-fax' parameter then, '-axl'). Now I have found something interesting: 
> - all fast servers show the COPY process as being in the state Rs ("runnable (on run queue)")
> - on the still slow server, this process is in 9 out of 10 samples in Ds ("uninterruptible sleep (usually IO)")
>
> Now, this "Ds" state seems to be something unhealthy - especially if it is there almost all the time - as far as my
firstreeds on google show (and although it points to IO, there is seemingly only very little IO, and IO-wait is minimal
too).I have also done "-axl" with PS, which brings the following line for our process: 
> F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
> 1  5551  2819  4201  20   0 5941068 201192 conges Ds ?          2:05 postgres: postgres musicload_cache [local] COPY"


Your wchan column isn't wide enough to show the full function name, but
I'd say it's related to some form of throttling or congestion control.

Get a wider view of that column to find out what the full function name
is. Grepping the kernel source for it can then tell you a lot about
where in the kernel it is and what might be going on.

Try:

    ps ax -O wchan:40

to get a decently wide view of that col.

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

13 июля 2010 г., 03:31:33

Hi Scott,

Although I can't guarantee for 100% that there was no RAID rebuild at some point, I am almost sure that it wasn't the
case.Two machines - the ones which were already in production - exhibited this problem. Both of them were already up
forsome weeks. Now, the reboot rather "fixed" one of them instead of making it worse (as your theory goes this way) the
problem"disappeared" (but I don't know for how long). Now, only one of the production machines has the issue ... the
onewhich wasn't rebooted. Strange, strange. Nevertheless thank you for your idea ... this is exactly the way I try to
approachthe problem, by making some theories and trying to prove or disapprove them :-) 
Now I will try to further investigate along the tips from Craig and Greg.

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Gesendet: Dienstag, 13. Juli 2010 03:43
An: Andras Fabian
Cc: Tom Lane; pgsql-general@postgresql.org
Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??

On Mon, Jul 12, 2010 at 7:03 AM, Andras Fabian <Fabian@atrada.net> wrote:
> This STDOU issue gets even weirder. Now I have set up our two new servers (identical hw/sw) as I would have needed to
doso anyways. After having PG running, I also set up the same test scenario as I have it on our problematic servers,
andstarted the COPY-to-STDOUT experiment. And you know what? Both new servers are performing well. No hanging, and the
3GByte test dump was written in around 3 minutes (as expected). To make things even more complicated ... I went back to
ourproduction servers. Now, the first one - which I froze up with oprofile this morning and needed a REBOOT - is
performingwell too! It needed 3 minutes for the test case ... WTF? BUT, the second production server, which did not
havea reboot, is still behaving badly. 

I'm gonna take a scientific wild-assed guess that your machine was
rebuilding RAID arrays when you started out, and you had massive IO
contention underneath the OS level resulting in such a slow down.
Note that you mentioned ~5% IO Wait.  That's actually fairly high if
you've got 8 to 16 cores or something like that.  It's much better to
use iostat -xd 60 or something like that and look for IO Utilization
at the end of the lines.

Again, just a guess.

Re: PG_DUMP very slow because of STDOUT ??

От

Scott Marlowe

Дата:

13 июля 2010 г., 04:45:22

On Tue, Jul 13, 2010 at 12:31 AM, Andras Fabian <Fabian@atrada.net> wrote:
> Hi Scott,
>
> Although I can't guarantee for 100% that there was no RAID rebuild at some point, I am almost sure that it wasn't the
case.Two machines - the ones which were already in production - exhibited this problem. Both of them were already up
forsome weeks. Now, the reboot rather "fixed" one of them instead of making it worse (as your theory goes this way) the
problem"disappeared" (but I don't know for how long). Now, only one of the production machines has the issue ... the
onewhich wasn't rebooted. Strange, strange. Nevertheless thank you for your idea ... this is exactly the way I try to
approachthe problem, by making some theories and trying to prove or disapprove them :-) 
> Now I will try to further investigate along the tips from Craig and Greg.

Was there maybe a kernel update that hadn't been applied by reboot?

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

13 июля 2010 г., 04:53:17

Hi Scott,

No, we didn't have a kernel update (it is still the stock Ubuntu 10.04 Server kernel ... 2.6.32.2). And in the meantime
-this morning - I have discovered, that the rebooted server is again slowing down! It is not at the level of the
not-rebooted-server(about 45 mins for the 3 Gig file)... it "only" needs 22 minutes, but it is already quite a bit away
fromthe optimum of 3 minutes (or less). So, definitely, something is "deteriorating" in the system ... 

And I also did dome readings with iostat -xd 5 ... And the target drive to which the output of the STDOUT is directed
isbelow 1% utilization (mostly around 0.2 - 0.4%) with rare "peaks around 2-3% when it does a little bit more. And this
ismaybe one of the interesting observations. It seems to periodically "flush" a bit more out, just to fall asleep again
(withminimum write activity). The drive, from which the reads come (the one, where PG-s data files are ... it is the
8-diskRAID 10), has a little bit more activity (utilization 6-8%) but this data is also - concurrently - in use by some
appsreading from the DB (just, normal traffic on the DB).  

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Gesendet: Dienstag, 13. Juli 2010 09:45
An: Andras Fabian
Cc: Tom Lane; pgsql-general@postgresql.org
Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??

On Tue, Jul 13, 2010 at 12:31 AM, Andras Fabian <Fabian@atrada.net> wrote:
> Hi Scott,
>
> Although I can't guarantee for 100% that there was no RAID rebuild at some point, I am almost sure that it wasn't the
case.Two machines - the ones which were already in production - exhibited this problem. Both of them were already up
forsome weeks. Now, the reboot rather "fixed" one of them instead of making it worse (as your theory goes this way) the
problem"disappeared" (but I don't know for how long). Now, only one of the production machines has the issue ... the
onewhich wasn't rebooted. Strange, strange. Nevertheless thank you for your idea ... this is exactly the way I try to
approachthe problem, by making some theories and trying to prove or disapprove them :-) 
> Now I will try to further investigate along the tips from Craig and Greg.

Was there maybe a kernel update that hadn't been applied by reboot?

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

13 июля 2010 г., 05:05:39

Craig, thanks for that PS tip (you think, you have used PS for such a long time, but it still has some new tricks
available).

And here is the more readable line:

26390 congestion_wait                          D ?        00:00:26 postgres: postgres musicload_cache [local] COPY

So the kernel function it is always idling on seems to be congestion_wait ... I am already looking around at google and
inthe kernel source (for 2.6.32.2) 
http://lxr.linux.no/#linux+v2.6.32.2/mm/backing-dev.c#L762
/**
* congestion_wait - wait for a backing_dev to become uncongested
* @sync: SYNC or ASYNC IO
* @timeout: timeout in jiffies
*
* Waits for up to @timeout jiffies for a backing_dev (any backing_dev) to exit
* write congestion.  If no backing_devs are congested then just wait for the
* next write to be completed.
*/
So, obviously, for some reason we are waiting too much for a backind_device ... which ever it is at the moment.
Because,as I just wrote to Scott Marlowe, the disk system is almost idling (have seen disk utilization on the drive to
whichI write below 1%).  

So, the question seems to be, why and where this "idling happens".

Just as a test, I have tried a very simple piping example (which should go trough STDOUT too ... or am I wrong).
- "dd if=/dev/zero of=file_10GB bs=1024 count=10M" created a  10 GB test file on source drive (sdb)
- "time cat file_10GB > /var/tmp/test2.dump" ... pipe the file to target drive (/var/tmp is on sda)
This take only a bit over 2 minutes to complete ... and although, one sees congestion_wait in PS from time to time, it
isnot dominant, and there are other EXT4 or IO related function shown. ... But maybe this mini-experiment is lacking
someimportant steps, that are in the path of a COPY-to-STDOUT from PostgreSQL. 

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:craig@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 05:03
An: Andras Fabian
Cc: Tom Lane; pgsql-general@postgresql.org
Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 12/07/10 21:03, Andras Fabian wrote:
> This STDOU issue gets even weirder. Now I have set up our two new servers (identical hw/sw) as I would have needed to
doso anyways. After having PG running, I also set up the same test scenario as I have it on our problematic servers,
andstarted the COPY-to-STDOUT experiment. And you know what? Both new servers are performing well. No hanging, and the
3GByte test dump was written in around 3 minutes (as expected). To make things even more complicated ... I went back to
ourproduction servers. Now, the first one - which I froze up with oprofile this morning and needed a REBOOT - is
performingwell too! It needed 3 minutes for the test case ... WTF? BUT, the second production server, which did not
havea reboot, is still behaving badly. 
> Now I tried to dig deeper (without killing a production server again) ... and came to comparing the outputs of PS
(with'-fax' parameter then, '-axl'). Now I have found something interesting: 
> - all fast servers show the COPY process as being in the state Rs ("runnable (on run queue)")
> - on the still slow server, this process is in 9 out of 10 samples in Ds ("uninterruptible sleep (usually IO)")
>
> Now, this "Ds" state seems to be something unhealthy - especially if it is there almost all the time - as far as my
firstreeds on google show (and although it points to IO, there is seemingly only very little IO, and IO-wait is minimal
too).I have also done "-axl" with PS, which brings the following line for our process: 
> F   UID   PID  PPID PRI  NI    VSZ   RSS WCHAN  STAT TTY        TIME COMMAND
> 1  5551  2819  4201  20   0 5941068 201192 conges Ds ?          2:05 postgres: postgres musicload_cache [local] COPY"

Your wchan column isn't wide enough to show the full function name, but
I'd say it's related to some form of throttling or congestion control.

Get a wider view of that column to find out what the full function name
is. Grepping the kernel source for it can then tell you a lot about
where in the kernel it is and what might be going on.

Try:

    ps ax -O wchan:40

to get a decently wide view of that col.

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

Re: PG_DUMP very slow because of STDOUT ??

От

Craig Ringer

Дата:

13 июля 2010 г., 06:01:13

On 13/07/2010 4:05 PM, Andras Fabian wrote:
> Craig, thanks for that PS tip (you think, you have used PS for such a long time, but it still has some new tricks
available).

> So, obviously, for some reason we are waiting too much for a backind_device ... which ever it is at the moment.
Because,as I just wrote to Scott Marlowe, the disk system is almost idling (have seen disk utilization on the drive to
whichI write below 1%). 

A quick search suggests that most calls into congestion_wait are in the
memory management subsystem, and are involved in situations where the
system is struggling for memory. However, that includes memory consumed
by I/O buffers, writeback for I/O, etc, so it'd also be consistent with
very slow I/O causing write throttling as the system tried to write
already buffered data to disk.

Most other calls are in file system drivers.

At this point I'd be taking a closer look at "vmstat 1" and "iostat 1"
output, plus "top", to see if any interesting clues about system-wide
issues turned up.

I'd also be trying to perform each step of the problem operation in
isolation as much as possible, so as to see if I could find out what
particular part was causing the slowdown. Comparing "\copy" to "COPY ...
TO STDOUT", invoking "COPY ... TO STDOUT" with a standalone backend
writing output to an on disk file and to /dev/null, etc.

> So, the question seems to be, why and where this "idling happens".

You can potentially find out more by getting a trace of the kernel
function call stack for the backend process. The kernel call stack of a
process at any given time can be obtained by reading /proc/$pid/stack .

This will tell you not only what call it's waiting in in the kernel, but
what function(s) called it, and even the code offset within each function.

> Just as a test, I have tried a very simple piping example (which should go trough STDOUT too ... or am I wrong).
> - "dd if=/dev/zero of=file_10GB bs=1024 count=10M" created a  10 GB test file on source drive (sdb)
> - "time cat file_10GB>  /var/tmp/test2.dump" ... pipe the file to target drive (/var/tmp is on sda)

Isn't your issue suspected to be with network transfers over unix
sockets and/or tcp/ip, rather than with pipes?

Try "socat" if you want to test unix socket performance and/or tcp/ip
socket performance. It's an amazing sysadmin/network swiss army knife.

--
Craig Ringer

Re: PG_DUMP very slow because of STDOUT ??

От

Craig Ringer

Дата:

13 июля 2010 г., 06:05:20

On 13/07/2010 3:53 PM, Andras Fabian wrote:
> Hi Scott,
>
> No, we didn't have a kernel update (it is still the stock Ubuntu 10.04 Server kernel ... 2.6.32.2). And in the
meantime- this morning - I have discovered, that the rebooted server is again slowing down! It is not at the level of
thenot-rebooted-server (about 45 mins for the 3 Gig file)... it "only" needs 22 minutes, but it is already quite a bit
awayfrom the optimum of 3 minutes (or less). So, definitely, something is "deteriorating" in the system ... 
>
> And I also did dome readings with iostat -xd 5 ... And the target drive to which the output of the STDOUT is directed
isbelow 1% utilization (mostly around 0.2 - 0.4%) with rare "peaks around 2-3% when it does a little bit more. And this
ismaybe one of the interesting observations. It seems to periodically "flush" a bit more out, just to fall asleep again
(withminimum write activity). The drive, from which the reads come (the one, where PG-s data files are ... it is the
8-diskRAID 10), has a little bit more activity (utilization 6-8%) but this data is also - concurrently - in use by some
appsreading from the DB (just, normal traffic on the DB). 

I don't think it'll be particularly helpful in this case, but you never
know, so: another information-collecting tool is "blktrace". This can
let you observe in detail exactly what's being done on a given block
device. It's helpful when you have weird I/O patterns and can't figure
out why, as it'll often reveal some process continually poking away at
some file it doesn't need to, thrashing away on a disk-backed mmap()ed
tempfile, or the like.

--
Craig Ringer

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

13 июля 2010 г., 06:19:00

Hi Craig,

Yes, a first look at /proc/pid/stack shows something that smells like memory management ... ut least up to the point
wherecongestion_wait is called. 
------------------------------------------------------------------
[<ffffffff8110d750>] congestion_wait+0x70/0x90
[<ffffffff81101cb7>] shrink_inactive_list+0x667/0x7e0
[<ffffffff81101ec1>] shrink_list+0x91/0xf0
[<ffffffff811020b7>] shrink_zone+0x197/0x240
[<ffffffff81102886>] __zone_reclaim+0x146/0x260
[<ffffffff81102ab7>] zone_reclaim+0x117/0x150
[<ffffffff810f8fd4>] get_page_from_freelist+0x544/0x6c0
[<ffffffff810f98c9>] __alloc_pages_nodemask+0xd9/0x180
[<ffffffff81131822>] kmalloc_large_node+0x62/0xb0
[<ffffffff81135df9>] __kmalloc_node_track_caller+0x109/0x160
[<ffffffff814665f0>] __alloc_skb+0x80/0x190
[<ffffffff81462504>] sock_alloc_send_pskb+0x1c4/0x320
[<ffffffff81462675>] sock_alloc_send_skb+0x15/0x20
[<ffffffff814f54d5>] unix_stream_sendmsg+0x275/0x3e0
[<ffffffff8145f5ab>] sock_sendmsg+0x10b/0x140
[<ffffffff8145f765>] sys_sendto+0x125/0x180
[<ffffffff810131b2>] system_call_fastpath+0x16/0x1b
[<ffffffffffffffff>] 0xffffffffffffffff
------------------------------------------------------------------

I also looked at "whats happening" with strace, where you see traffic like:
------------------------------------------------------------------
....
[    7fc153d925b2] sendto(8, "\tt\tt\nd\0\0\0\0378241003\t974369\tt\t\\N\t10"..., 8192, 0, NULL, 0) = 8192
[    7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
[    7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
[    7fc153d925b2] sendto(8, "0\tt\tt\nd\0\0\0\0378241003\t830278\tt\t\\N\t1"..., 8192, 0, NULL, 0) = 8192
[    7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
[    7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
[    7fc153d925b2] sendto(8, "\tt\tt\nd\0\0\0 8241006\t1114684\tt\t\\N\t1"..., 8192, 0, NULL, 0) = 8192
[    7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
[    7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
[    7fc153d83460] read(3, "\0\0\0\0\0\0\0\0\0\0\4\0\260\1\260\t\0 \4 \0\0\0\0\310\237d\0\220\237d\0"..., 8192) = 8192
....
------------------------------------------------------------------

But still no definitive clue about the reasons. What is also quite interesting is, that when I start my COPY-to-STDOUT
experiment,it is running quite fast in the beginning. Sometimes up to 400 Mbytes, sometimes up to 1.4 GBytes (I didn't
finda real reason which I could be correlated to this) ... and then, suddenly it begins to stall. From there on, it
onlyadvances slowly with all the congestion_wait going on ... Hmm, maybe it has really something to do with the state
ofthe memory ... (this would go well with the fact, that a freshly rebooted server is not having the problem in the
beginning).

Ahh, well, and what would/could go against the file system / drive argument. My initial tests, where I did a
COPY-to-FILE,I have never head problems (only COPY-to-STDOUT gives them).  

Well, I will try to gather some more information with your other hints (quite a lot of stuff - and one learns a lot new
tricksin the process :-) ... 

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:craig@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 11:01
An: Andras Fabian
Cc: Tom Lane; pgsql-general@postgresql.org
Betreff: Re: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/2010 4:05 PM, Andras Fabian wrote:
> Craig, thanks for that PS tip (you think, you have used PS for such a long time, but it still has some new tricks
available).

> So, obviously, for some reason we are waiting too much for a backind_device ... which ever it is at the moment.
Because,as I just wrote to Scott Marlowe, the disk system is almost idling (have seen disk utilization on the drive to
whichI write below 1%). 

A quick search suggests that most calls into congestion_wait are in the
memory management subsystem, and are involved in situations where the
system is struggling for memory. However, that includes memory consumed
by I/O buffers, writeback for I/O, etc, so it'd also be consistent with
very slow I/O causing write throttling as the system tried to write
already buffered data to disk.

Most other calls are in file system drivers.

At this point I'd be taking a closer look at "vmstat 1" and "iostat 1"
output, plus "top", to see if any interesting clues about system-wide
issues turned up.

I'd also be trying to perform each step of the problem operation in
isolation as much as possible, so as to see if I could find out what
particular part was causing the slowdown. Comparing "\copy" to "COPY ...
TO STDOUT", invoking "COPY ... TO STDOUT" with a standalone backend
writing output to an on disk file and to /dev/null, etc.

> So, the question seems to be, why and where this "idling happens".

You can potentially find out more by getting a trace of the kernel
function call stack for the backend process. The kernel call stack of a
process at any given time can be obtained by reading /proc/$pid/stack .

This will tell you not only what call it's waiting in in the kernel, but
what function(s) called it, and even the code offset within each function.

> Just as a test, I have tried a very simple piping example (which should go trough STDOUT too ... or am I wrong).
> - "dd if=/dev/zero of=file_10GB bs=1024 count=10M" created a  10 GB test file on source drive (sdb)
> - "time cat file_10GB>  /var/tmp/test2.dump" ... pipe the file to target drive (/var/tmp is on sda)

Isn't your issue suspected to be with network transfers over unix
sockets and/or tcp/ip, rather than with pipes?

Try "socat" if you want to test unix socket performance and/or tcp/ip
socket performance. It's an amazing sysadmin/network swiss army knife.

--
Craig Ringer

Re: PG_DUMP very slow because of STDOUT ??

От

Craig Ringer

Дата:

13 июля 2010 г., 07:17:30

On 13/07/10 17:18, Andras Fabian wrote:

> But still no definitive clue about the reasons. What is also quite interesting is, that when I start my
COPY-to-STDOUTexperiment, it is running quite fast in the beginning. Sometimes up to 400 Mbytes, sometimes up to 1.4
GBytes(I didn't find a real reason which I could be correlated to this) ... and then, suddenly it begins to stall. From
thereon, it only advances slowly with all the congestion_wait going on ... Hmm, maybe it has really something to do
withthe state of the memory ... (this would go well with the fact, that a freshly rebooted server is not having the
problemin the beginning). 

Is there any chance something is buffering the output of the COPY ... TO
STDOUT? Whether it's a pipe/socket buffer that grows to a certain limit,
then begins to be limited by the ability of the receiver to empty it, or
whether it's the receiver buffering the whole COPY in memory before
writing it out, I don't know.

Do you see anything growing in memory use on the system while the COPY
is in progress? Any of the memory use stats in "free -m" changing"?

--
Craig Ringer

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

13 июля 2010 г., 07:20:32

I had another observation before your last mail:

I have compared the /proc/pid/sched Stats of a normal and a slow machine. And there were two counters that really
stickedout: 
- se.sleep_start
- se.block_start
 On a normal machine, both counter remaind at 0 all the time while doing COPY-to-STDOUT. On a "slow" machine, those
counters- but always only one of them - got some really big numbers while the process was stuck in congestion_wait.  

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:craig@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 11:01
An: Andras Fabian
Cc: Tom Lane; pgsql-general@postgresql.org
Betreff: Re: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/2010 4:05 PM, Andras Fabian wrote:
> Craig, thanks for that PS tip (you think, you have used PS for such a long time, but it still has some new tricks
available).

> So, obviously, for some reason we are waiting too much for a backind_device ... which ever it is at the moment.
Because,as I just wrote to Scott Marlowe, the disk system is almost idling (have seen disk utilization on the drive to
whichI write below 1%). 

A quick search suggests that most calls into congestion_wait are in the
memory management subsystem, and are involved in situations where the
system is struggling for memory. However, that includes memory consumed
by I/O buffers, writeback for I/O, etc, so it'd also be consistent with
very slow I/O causing write throttling as the system tried to write
already buffered data to disk.

Most other calls are in file system drivers.

At this point I'd be taking a closer look at "vmstat 1" and "iostat 1"
output, plus "top", to see if any interesting clues about system-wide
issues turned up.

I'd also be trying to perform each step of the problem operation in
isolation as much as possible, so as to see if I could find out what
particular part was causing the slowdown. Comparing "\copy" to "COPY ...
TO STDOUT", invoking "COPY ... TO STDOUT" with a standalone backend
writing output to an on disk file and to /dev/null, etc.

> So, the question seems to be, why and where this "idling happens".

You can potentially find out more by getting a trace of the kernel
function call stack for the backend process. The kernel call stack of a
process at any given time can be obtained by reading /proc/$pid/stack .

This will tell you not only what call it's waiting in in the kernel, but
what function(s) called it, and even the code offset within each function.

> Just as a test, I have tried a very simple piping example (which should go trough STDOUT too ... or am I wrong).
> - "dd if=/dev/zero of=file_10GB bs=1024 count=10M" created a  10 GB test file on source drive (sdb)
> - "time cat file_10GB>  /var/tmp/test2.dump" ... pipe the file to target drive (/var/tmp is on sda)

Isn't your issue suspected to be with network transfers over unix
sockets and/or tcp/ip, rather than with pipes?

Try "socat" if you want to test unix socket performance and/or tcp/ip
socket performance. It's an amazing sysadmin/network swiss army knife.

--
Craig Ringer

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

13 июля 2010 г., 07:26:39

Wait, now, here I see some correlation! Yes, it seems to be the memory! When I start my COPY-to-STDOUT experiment I had
some2000 MByte free (well ,the server has 24 GByte ... maybe other PostgreSQL processes used up the rest). Then, I
couldmonitor via "ll -h" how the file nicely growed (obviously no congestion), and in parallel see, how "free -m" the
"free"memory went down. Then, it reached a level below 192 MByte, and congestion began. Now it is going back and forth
around118-122-130 ... Obviously the STDOUT thing went out of some memory resources. 
Now I "only" who and why is running out, and how I can prevent that. Could there be some extremely big STDOUT buffering
inplay ???? 

I will need to do some more investigation. And thanks for your very good pointers which help me to narrow the focus
down!

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:craig@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 12:17
An: Andras Fabian
Cc: pgsql-general@postgresql.org
Betreff: Re: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/10 17:18, Andras Fabian wrote:

> But still no definitive clue about the reasons. What is also quite interesting is, that when I start my
COPY-to-STDOUTexperiment, it is running quite fast in the beginning. Sometimes up to 400 Mbytes, sometimes up to 1.4
GBytes(I didn't find a real reason which I could be correlated to this) ... and then, suddenly it begins to stall. From
thereon, it only advances slowly with all the congestion_wait going on ... Hmm, maybe it has really something to do
withthe state of the memory ... (this would go well with the fact, that a freshly rebooted server is not having the
problemin the beginning). 

Is there any chance something is buffering the output of the COPY ... TO
STDOUT? Whether it's a pipe/socket buffer that grows to a certain limit,
then begins to be limited by the ability of the receiver to empty it, or
whether it's the receiver buffering the whole COPY in memory before
writing it out, I don't know.

Do you see anything growing in memory use on the system while the COPY
is in progress? Any of the memory use stats in "free -m" changing"?

--
Craig Ringer

Re: PG_DUMP very slow because of STDOUT ??

От

Craig Ringer

Дата:

13 июля 2010 г., 07:51:35

On 13/07/2010 6:26 PM, Andras Fabian wrote:
> Wait, now, here I see some correlation! Yes, it seems to be the memory! When I start my COPY-to-STDOUT experiment I
hadsome 2000 MByte free (well ,the server has 24 GByte ... maybe other PostgreSQL processes used up the rest). Then, I
couldmonitor via "ll -h" how the file nicely growed (obviously no congestion), and in parallel see, how "free -m" the
"free"memory went down. Then, it reached a level below 192 MByte, and congestion began. Now it is going back and forth
around118-122-130 ... Obviously the STDOUT thing went out of some memory resources. 
> Now I "only" who and why is running out, and how I can prevent that.

 > Could there be some extremely big STDOUT buffering in play ????

Remember, "STDOUT" is misleading. The data is sent down the network
socket between the postgres backend and the client connected to that
backend. There is no actual stdio involved at all.

Imagine that the backend's stdout is redirected down the network socket
to the client, so when it sends to "stdout" it's just going to the
client. Any buffering you are interested in is in the unix or tcp/ip
socket (depending on how you're connecting), in the client, and in the
client's output to file/disk/whatever.

--
Craig Ringer

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

13 июля 2010 г., 07:57:14

OK, so here I should - maybe - look around the sockets. Hmm. Well, in the case of my experiments we are talking about
Unixsockets, as I am only connecting locally to the server (not real networking involved). Are there any ideas, where
sucha Unix Socket could impose such extreme buffering ??? And can/could I control its behavior at all?? (or would it be
athing, which can only be controlled from C-Code ... which would fall back to PostgreSQL as the initiator). 

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:craig@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 12:51
An: Andras Fabian
Cc: pgsql-general@postgresql.org
Betreff: Re: AW: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/2010 6:26 PM, Andras Fabian wrote:
> Wait, now, here I see some correlation! Yes, it seems to be the memory! When I start my COPY-to-STDOUT experiment I
hadsome 2000 MByte free (well ,the server has 24 GByte ... maybe other PostgreSQL processes used up the rest). Then, I
couldmonitor via "ll -h" how the file nicely growed (obviously no congestion), and in parallel see, how "free -m" the
"free"memory went down. Then, it reached a level below 192 MByte, and congestion began. Now it is going back and forth
around118-122-130 ... Obviously the STDOUT thing went out of some memory resources. 
> Now I "only" who and why is running out, and how I can prevent that.

 > Could there be some extremely big STDOUT buffering in play ????

Remember, "STDOUT" is misleading. The data is sent down the network
socket between the postgres backend and the client connected to that
backend. There is no actual stdio involved at all.

Imagine that the backend's stdout is redirected down the network socket
to the client, so when it sends to "stdout" it's just going to the
client. Any buffering you are interested in is in the unix or tcp/ip
socket (depending on how you're connecting), in the client, and in the
client's output to file/disk/whatever.

--
Craig Ringer

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

13 июля 2010 г., 08:29:15

Now, I have found an unorthodox way, to make a slow machine run COPY-to-STDOUT fast. I empty the cache memory of the
server,which makes "free" in "free -m" jump up to 14 GBytes (well, I just noticed, that most of the memory on the
serveris in "cache" ... up to 22 GBytes). I just entered: 

    " sync;echo 3 > /proc/sys/vm/drop_caches"

Running the COPY-to-STDOUT test after this immediately went through in a snap (2 1/2 minutes). I also see, that
somethingin relation with the file is "nicely" mapped into cache memory, because as soon as I delete the file (with
"rm"),that immediately frees up 3 GBytes of the cache. 

This seems to prove, that a memory issue is/was behind the slow down. But still the question remains, why and how this
canhappen? I mean, at some point the memory manager most have taken a very wrong decision, if this is the result of its
"normal"work ... And how the writing trough the socket affects this, I don't understand (because I still see the case,
whena normal COPY-to-FILE didn't slow down at the same time when COPY-to-STDOUT was crouching).  

So, potentially, maybe ... as a quick fix I could clean caches in my backup script that starts each night. But is this
asafe idea at all? Or could there be adverse implications (yes, of course, some queries that got their data from the
filecache before would now need to repopulate it) ? 

Or is there a way to influence the memory manager Linux in a way, that it behaves a bit more conservative (or just
differentin a positive way)? 

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:craig@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 12:51
An: Andras Fabian
Cc: pgsql-general@postgresql.org
Betreff: Re: AW: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/2010 6:26 PM, Andras Fabian wrote:
> Wait, now, here I see some correlation! Yes, it seems to be the memory! When I start my COPY-to-STDOUT experiment I
hadsome 2000 MByte free (well ,the server has 24 GByte ... maybe other PostgreSQL processes used up the rest). Then, I
couldmonitor via "ll -h" how the file nicely growed (obviously no congestion), and in parallel see, how "free -m" the
"free"memory went down. Then, it reached a level below 192 MByte, and congestion began. Now it is going back and forth
around118-122-130 ... Obviously the STDOUT thing went out of some memory resources. 
> Now I "only" who and why is running out, and how I can prevent that.

 > Could there be some extremely big STDOUT buffering in play ????

Remember, "STDOUT" is misleading. The data is sent down the network
socket between the postgres backend and the client connected to that
backend. There is no actual stdio involved at all.

Imagine that the backend's stdout is redirected down the network socket
to the client, so when it sends to "stdout" it's just going to the
client. Any buffering you are interested in is in the unix or tcp/ip
socket (depending on how you're connecting), in the client, and in the
client's output to file/disk/whatever.

--
Craig Ringer

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

13 июля 2010 г., 08:35:31

I have just rechecked one of our old generation machines, which never had/have this problem (where the backup of a 100
GBdatabase - to a 10 GByte dump - is still going trough in about 2 hours). They seem to have this high caching ratio
too(one of the machine says it has 15 GByte in cache out of 16) ... but still, they manage to gracefully give back RAM
towhoever needs it (i.e. the backup process with its COPY-to-STDOU or others). Well, the difference is of course 6
Kernelversions (old machine 2.6.26 ... new machin 2.6.32) ... or who knows what kernel settings ...  

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:craig@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 12:51
An: Andras Fabian
Cc: pgsql-general@postgresql.org
Betreff: Re: AW: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/2010 6:26 PM, Andras Fabian wrote:
> Wait, now, here I see some correlation! Yes, it seems to be the memory! When I start my COPY-to-STDOUT experiment I
hadsome 2000 MByte free (well ,the server has 24 GByte ... maybe other PostgreSQL processes used up the rest). Then, I
couldmonitor via "ll -h" how the file nicely growed (obviously no congestion), and in parallel see, how "free -m" the
"free"memory went down. Then, it reached a level below 192 MByte, and congestion began. Now it is going back and forth
around118-122-130 ... Obviously the STDOUT thing went out of some memory resources. 
> Now I "only" who and why is running out, and how I can prevent that.

 > Could there be some extremely big STDOUT buffering in play ????

Remember, "STDOUT" is misleading. The data is sent down the network
socket between the postgres backend and the client connected to that
backend. There is no actual stdio involved at all.

Imagine that the backend's stdout is redirected down the network socket
to the client, so when it sends to "stdout" it's just going to the
client. Any buffering you are interested in is in the unix or tcp/ip
socket (depending on how you're connecting), in the client, and in the
client's output to file/disk/whatever.

--
Craig Ringer

Re: PG_DUMP very slow because of STDOUT ??

От

Steve Clark

Дата:

13 июля 2010 г., 08:54:56

On 07/13/2010 07:29 AM, Andras Fabian wrote:
> Now, I have found an unorthodox way, to make a slow machine run COPY-to-STDOUT fast. I empty the cache memory of the
server,which makes "free" in "free -m" jump up to 14 GBytes (well, I just noticed, that most of the memory on the
serveris in "cache" ... up to 22 GBytes). I just entered: 
>
>     " sync;echo 3>  /proc/sys/vm/drop_caches"
>
> Running the COPY-to-STDOUT test after this immediately went through in a snap (2 1/2 minutes). I also see, that
somethingin relation with the file is "nicely" mapped into cache memory, because as soon as I delete the file (with
"rm"),that immediately frees up 3 GBytes of the cache. 
>
> This seems to prove, that a memory issue is/was behind the slow down. But still the question remains, why and how
thiscan happen? I mean, at some point the memory manager most have taken a very wrong decision, if this is the result
ofits "normal" work ... And how the writing trough the socket affects this, I don't understand (because I still see the
case,when a normal COPY-to-FILE didn't slow down at the same time when COPY-to-STDOUT was crouching). 
>
> So, potentially, maybe ... as a quick fix I could clean caches in my backup script that starts each night. But is
thisa safe idea at all? Or could there be adverse implications (yes, of course, some queries that got their data from
thefile cache before would now need to repopulate it) ? 
>
> Or is there a way to influence the memory manager Linux in a way, that it behaves a bit more conservative (or just
differentin a positive way)? 
>
> Andras Fabian
>
>
> -----Ursprüngliche Nachricht-----
> Von: Craig Ringer [mailto:craig@postnewspapers.com.au]
> Gesendet: Dienstag, 13. Juli 2010 12:51
> An: Andras Fabian
> Cc: pgsql-general@postgresql.org
> Betreff: Re: AW: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??
>
> On 13/07/2010 6:26 PM, Andras Fabian wrote:
>> Wait, now, here I see some correlation! Yes, it seems to be the memory! When I start my COPY-to-STDOUT experiment I
hadsome 2000 MByte free (well ,the server has 24 GByte ... maybe other PostgreSQL processes used up the rest). Then, I
couldmonitor via "ll -h" how the file nicely growed (obviously no congestion), and in parallel see, how "free -m" the
"free"memory went down. Then, it reached a level below 192 MByte, and congestion began. Now it is going back and forth
around118-122-130 ... Obviously the STDOUT thing went out of some memory resources. 
>> Now I "only" who and why is running out, and how I can prevent that.
>
>   >  Could there be some extremely big STDOUT buffering in play ????
>
> Remember, "STDOUT" is misleading. The data is sent down the network
> socket between the postgres backend and the client connected to that
> backend. There is no actual stdio involved at all.
>
> Imagine that the backend's stdout is redirected down the network socket
> to the client, so when it sends to "stdout" it's just going to the
> client. Any buffering you are interested in is in the unix or tcp/ip
> socket (depending on how you're connecting), in the client, and in the
> client's output to file/disk/whatever.
>
> --
> Craig Ringer
>
Have you posted this problem/issue to the linux kernel mailing list? You may get some rude responses about
the posting but someone will give you a tip or what is causing the problem.

I know there are some sysctls that affect memory management. I use the following 2 based on recommendation
of Linus to help interactivity for a desktop user.
/sbin/sysctl vm.dirty_background_ratio=3
/sbin/sysctl vm.dirty_ratio=5




--
Stephen Clark
NetWolves
Sr. Software Engineer III
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com
www.netwolves.com

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

13 июля 2010 г., 09:07:33

Hmm, I continued some testing. Now, strangely, the congestion_wait occurs even if free -m shows me about 1500 Mbytes
(beforethat I tried to "fill up" the cache by doing some plain "cat from_here > to_there" ... which pushed free down to
1500).But, interestingly, the value also doesn't gets lower. So, it stays around 1500 ... seems like the same scenario
asbefore, when it stopped around 120-130 ... So, obviously still some struggling for memory. But not as hard like when
itis completely down. Now the file grows quite steadily ... and congestion_wait is not always present in PS ...
(finishedafter 6 1/2 minutes instead the optimum around 2 1/2). 

Something very fishy about the memory manager ....

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Andras Fabian
Gesendet: Dienstag, 13. Juli 2010 13:35
An: 'Craig Ringer'
Cc: pgsql-general@postgresql.org
Betreff: AW: AW: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

I have just rechecked one of our old generation machines, which never had/have this problem (where the backup of a 100
GBdatabase - to a 10 GByte dump - is still going trough in about 2 hours). They seem to have this high caching ratio
too(one of the machine says it has 15 GByte in cache out of 16) ... but still, they manage to gracefully give back RAM
towhoever needs it (i.e. the backup process with its COPY-to-STDOU or others). Well, the difference is of course 6
Kernelversions (old machine 2.6.26 ... new machin 2.6.32) ... or who knows what kernel settings ...  

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:craig@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 12:51
An: Andras Fabian
Cc: pgsql-general@postgresql.org
Betreff: Re: AW: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/2010 6:26 PM, Andras Fabian wrote:
> Wait, now, here I see some correlation! Yes, it seems to be the memory! When I start my COPY-to-STDOUT experiment I
hadsome 2000 MByte free (well ,the server has 24 GByte ... maybe other PostgreSQL processes used up the rest). Then, I
couldmonitor via "ll -h" how the file nicely growed (obviously no congestion), and in parallel see, how "free -m" the
"free"memory went down. Then, it reached a level below 192 MByte, and congestion began. Now it is going back and forth
around118-122-130 ... Obviously the STDOUT thing went out of some memory resources. 
> Now I "only" who and why is running out, and how I can prevent that.

 > Could there be some extremely big STDOUT buffering in play ????

Remember, "STDOUT" is misleading. The data is sent down the network
socket between the postgres backend and the client connected to that
backend. There is no actual stdio involved at all.

Imagine that the backend's stdout is redirected down the network socket
to the client, so when it sends to "stdout" it's just going to the
client. Any buffering you are interested in is in the unix or tcp/ip
socket (depending on how you're connecting), in the client, and in the
client's output to file/disk/whatever.

--
Craig Ringer

Writing XML files to Operating System

От

Дата:

13 июля 2010 г., 09:09:12

Hi All,

We are a data based company and are migrating from Oracle to Postgresql. For this purpose I am doing POC for the same.
Wehave a business requirement to send the Data in XML files to our clients. The file size of XMLs is around 700MB and
isgrowing.

I have been able to generate sample XML files and store the same(Still to test for full files).

However I have the following problems

1) It took 35 minutes to store 1 GB data in the TEXT column
(create table test_text_nearly_1_GB as select repeat('1', 1024*1024*1023):: text as nearly_1_GB)

2) It took 30 Minutes to write the same to OS using COPY command
Copy test_text_nearly_1_GB to '$HOME/test_text_nearly_1_GB.xml'

However when I tested to write a table of size 3GB with 150000 rows it wrote to OS in 20 minutes.

I am looking for a solution to
1) generate and write the 1GB file in OS in less than 10 minutes.
2) write the file to OS the results of a query
3) Find a solution when the file size become > 1GB

Please help.

Regards,
Atul Goel
This e-mail is confidential and should not be used by anyone who is not the original intended recipient. Global
DataPointLimited does not accept liability for any statements made which are clearly the sender's own and not expressly
madeon behalf of Global DataPoint Limited. No contracts may be concluded on behalf of Global DataPoint Limited by means
ofe-mail communication. Global DataPoint Limited Registered in England and Wales with registered number 3739752
RegisteredOffice Middlesex House, 34-42 Cleveland Street, London W1T 4LB

Re: PG_DUMP very slow because of STDOUT ??

От

Craig Ringer

Дата:

13 июля 2010 г., 09:11:48

On 13/07/10 18:57, Andras Fabian wrote:
> OK, so here I should - maybe - look around the sockets. Hmm. Well, in the case of my experiments we are talking about
Unixsockets, as I am only connecting locally to the server (not real networking involved). Are there any ideas, where
sucha Unix Socket could impose such extreme buffering ??? And can/could I control its behavior at all?? (or would it be
athing, which can only be controlled from C-Code ... which would fall back to PostgreSQL as the initiator). 

Confirm your issue is actually the socket between client and server first.

Are you sure the client isn't buffering the data? Or a pager like
"less"? Or terminal scrollback?

Run "top" and sort by memory use by pressing Shift-M. Run your query.
Look at the output and see if anything grows lots.

You really need to start looking at if this is memory consumption, and
what's using it. Don't get fixated on one answer until you've excluded
other possibilities.

--
Craig Ringer

Re: Writing XML files to Operating System

От

Craig Ringer

Дата:

13 июля 2010 г., 09:26:26

It's helpful if you can avoid replying to an existing post to make a new
thread. Your mail client puts "In-reply-to" headers in the message that
confuses threaded mail clients.

I've replied to this message properly in a new thread to avoid confusing
things too much.

--
Craig Ringer

Re: Writing XML files to Operating System

От

Дата:

13 июля 2010 г., 09:43:17

Sure I will take care.

Regards,
Atul Goel


-----Original Message-----
From: Craig Ringer [mailto:craig@postnewspapers.com.au]
Sent: 13 July 2010 13:26
To: Atul Goel
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Writing XML files to Operating System

It's helpful if you can avoid replying to an existing post to make a new
thread. Your mail client puts "In-reply-to" headers in the message that
confuses threaded mail clients.

I've replied to this message properly in a new thread to avoid confusing
things too much.

--
Craig Ringer
This e-mail is confidential and should not be used by anyone who is not the original intended recipient. Global
DataPointLimited does not accept liability for any statements made which are clearly the sender's own and not expressly
madeon behalf of Global DataPoint Limited. No contracts may be concluded on behalf of Global DataPoint Limited by means
ofe-mail communication. Global DataPoint Limited Registered in England and Wales with registered number 3739752
RegisteredOffice Middlesex House, 34-42 Cleveland Street, London W1T 4LB

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

13 июля 2010 г., 09:46:07

Hi Craig,

Now I did the test with top too. Free RAM is around 900. And there doesn't seem to be other processes eating memory
away(looked at it with top/htop). The other procs having more RAM did have it before (mostly some postgres processes),
anddon't grow it at an exporbitant rate. One coul see with free, that most of the time, when it is eaten away, it goes
directlyto cache. I also observe, that the lower "free" is, the slower COPY-to-STDOUT becomes ... and linearly, the
numberof times one can see congestion_wait in PS grows. So, with more "free" available (even when it is not eaten away
anymore...) the more responsive COPY-to-STDOUT. 

Well, I have now also rerun a few times my plain COPY-to-FILE test. That one seems to behave similarly with the RAM
(whenmonitored via "free -m") ... but it never gets into congestion. So, maybe, just maybe ... could this be some kind
ofrace condition? I mean, the file cache going for RAM (or the memory mapped file ???) and the SOCKET going for some
memorytoo, and then they somehow overwhelm the memory manager (which shouldn't happen). ... Well, this might be a very
wildspeculation, but still. 

I now also looked at the stack, before the congestion_wait. There are some "interesting" calls leading to it. Like
"shrink_list","zone_reclaim" ... after the call of "get_page_from_freelist". Sounds like some heavy memory juggling
goingon (to get new free pages) ... What ever the reason is for this. 

Andras Fabian


-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:craig@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 14:11
An: Andras Fabian
Cc: pgsql-general@postgresql.org
Betreff: Re: AW: AW: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/10 18:57, Andras Fabian wrote:
> OK, so here I should - maybe - look around the sockets. Hmm. Well, in the case of my experiments we are talking about
Unixsockets, as I am only connecting locally to the server (not real networking involved). Are there any ideas, where
sucha Unix Socket could impose such extreme buffering ??? And can/could I control its behavior at all?? (or would it be
athing, which can only be controlled from C-Code ... which would fall back to PostgreSQL as the initiator). 

Confirm your issue is actually the socket between client and server first.

Are you sure the client isn't buffering the data? Or a pager like
"less"? Or terminal scrollback?

Run "top" and sort by memory use by pressing Shift-M. Run your query.
Look at the output and see if anything grows lots.

You really need to start looking at if this is memory consumption, and
what's using it. Don't get fixated on one answer until you've excluded
other possibilities.

--
Craig Ringer

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

13 июля 2010 г., 11:16:37

I think I have found the solution. Yes, I now can get constantly high throughput with COPY-to-STDOUT, even if free -m
onlyshows me 82 Mbytes (so no, this solution is not cleaning the cache). Always around 2 3/4 minutes. 

I have compared all the /proc/sys/vm settings on my new machines and the old machines (which never had problems), and
ofcourse found some differences, some new settings etc. (of course, lot of changes can happen between 2.6.26 and
2.6.32).And there was one, which stood out from the mass, because its name reminded me of some functions which I have
alwaysseen in the kernel stack while having congestion_wait: 

- zone_reclaim_mode
(yes, in the kernel stack there was always also a call to "zone_reclaim").

Interestingly, on the old machine this was set to "0" and on the new machine - obviously per Ubuntu default - to "1"
...What these all means is shortly described here: 

http://www.linuxinsight.com/proc_sys_vm_zone_reclaim_mode.html

Then I though, lets give it a try. Nd I set it to "0" on the new server too ... and voila, it is running at high speed
inCOPY-to-STDOUT. I can even switch back and forth between 0 and 1 and see, how congestion_wait comes back or
disappears.

Now, someone with big kernel know-how could try to describe me in detail, what exactly could be at odds here.

But for me it is now obvious, that I will put in my start up settings a change of "zone_reclaim_mode = 0".

And tomorrow I will see how my nightly backup runs with this setting.

Andras fabian




-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:craig@postnewspapers.com.au]
Gesendet: Dienstag, 13. Juli 2010 14:11
An: Andras Fabian
Cc: pgsql-general@postgresql.org
Betreff: Re: AW: AW: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/10 18:57, Andras Fabian wrote:
> OK, so here I should - maybe - look around the sockets. Hmm. Well, in the case of my experiments we are talking about
Unixsockets, as I am only connecting locally to the server (not real networking involved). Are there any ideas, where
sucha Unix Socket could impose such extreme buffering ??? And can/could I control its behavior at all?? (or would it be
athing, which can only be controlled from C-Code ... which would fall back to PostgreSQL as the initiator). 

Confirm your issue is actually the socket between client and server first.

Are you sure the client isn't buffering the data? Or a pager like
"less"? Or terminal scrollback?

Run "top" and sort by memory use by pressing Shift-M. Run your query.
Look at the output and see if anything grows lots.

You really need to start looking at if this is memory consumption, and
what's using it. Don't get fixated on one answer until you've excluded
other possibilities.

--
Craig Ringer

Re: PG_DUMP very slow because of STDOUT ??

От

Greg Smith

Дата:

13 июля 2010 г., 11:29:24

Andras Fabian wrote:
> So the kernel function it is always idling on seems to be congestion_wait ...

Ugh, not that thing again.  See
http://www.westnet.com/~gsmith/content/linux-pdflush.htm ; that chunk of
code has cost me weeks worth of "why isn't the kernel writing things the
way I asked it?" trouble in the past.  I know the kernel developers have
been fiddling with pdflush again recently, they might have introduced
yet another bug into how it handles heavy write volume.  You can reduce
dirty_ratio and dirty_background_ratio to try and improve things, but
the congestion code will thwart any attempt to make them really low.

You might monitor what shows up as "Dirty:" in /proc/meminfo to see if
that lines up with the slow periods; example of what bad output looks
like at
http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

13 июля 2010 г., 11:35:28

Hi Greg,

hmmm, thats true. Thos settings for example were much higher too (on the Ubuntu server), than on our old machine.
New machine has:
- dirty_ratio = 20 (old has 10)
- dirty_background_ratio = 10 (old has 5)

But obviously setting vm.zone_reclaim_mode=0 "fixes" the problem to (which was "1" on new machine and "0" on old). See
mylatest post to Craig. 

I hope using vm.zone_reclaim_mode=0 doesn't have other dire consequences :-)

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Greg Smith [mailto:greg@2ndquadrant.com]
Gesendet: Dienstag, 13. Juli 2010 16:29
An: Andras Fabian
Cc: Craig Ringer; Tom Lane; pgsql-general@postgresql.org
Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??

Andras Fabian wrote:
> So the kernel function it is always idling on seems to be congestion_wait ...

Ugh, not that thing again.  See
http://www.westnet.com/~gsmith/content/linux-pdflush.htm ; that chunk of
code has cost me weeks worth of "why isn't the kernel writing things the
way I asked it?" trouble in the past.  I know the kernel developers have
been fiddling with pdflush again recently, they might have introduced
yet another bug into how it handles heavy write volume.  You can reduce
dirty_ratio and dirty_background_ratio to try and improve things, but
the congestion code will thwart any attempt to make them really low.

You might monitor what shows up as "Dirty:" in /proc/meminfo to see if
that lines up with the slow periods; example of what bad output looks
like at
http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: PG_DUMP very slow because of STDOUT ??

От

Steve Clark

Дата:

13 июля 2010 г., 11:42:07

On 07/13/2010 10:35 AM, Andras Fabian wrote:
> Hi Greg,
>
> hmmm, thats true. Thos settings for example were much higher too (on the Ubuntu server), than on our old machine.
> New machine has:
> - dirty_ratio = 20 (old has 10)
> - dirty_background_ratio = 10 (old has 5)
>
> But obviously setting vm.zone_reclaim_mode=0 "fixes" the problem to (which was "1" on new machine and "0" on old).
Seemy latest post to Craig. 
>
> I hope using vm.zone_reclaim_mode=0 doesn't have other dire consequences :-)
It looks to me that vm.zone_reclaim_mode value is related to NUMA machines that
have "local" memory per node and shouldn't be used at all in your environment.

>
> Andras Fabian
>
> -----Ursprüngliche Nachricht-----
> Von: Greg Smith [mailto:greg@2ndquadrant.com]
> Gesendet: Dienstag, 13. Juli 2010 16:29
> An: Andras Fabian
> Cc: Craig Ringer; Tom Lane; pgsql-general@postgresql.org
> Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??
>
> Andras Fabian wrote:
>> So the kernel function it is always idling on seems to be congestion_wait ...
>
> Ugh, not that thing again.  See
> http://www.westnet.com/~gsmith/content/linux-pdflush.htm ; that chunk of
> code has cost me weeks worth of "why isn't the kernel writing things the
> way I asked it?" trouble in the past.  I know the kernel developers have
> been fiddling with pdflush again recently, they might have introduced
> yet another bug into how it handles heavy write volume.  You can reduce
> dirty_ratio and dirty_background_ratio to try and improve things, but
> the congestion code will thwart any attempt to make them really low.
>
> You might monitor what shows up as "Dirty:" in /proc/meminfo to see if
> that lines up with the slow periods; example of what bad output looks
> like at
> http://notemagnet.blogspot.com/2008/08/linux-write-cache-mystery.html
>


--
Stephen Clark
NetWolves
Sr. Software Engineer III
Phone: 813-579-3200
Fax: 813-882-0209
Email: steve.clark@netwolves.com
www.netwolves.com

Re: PG_DUMP very slow because of STDOUT ??

От

Craig Ringer

Дата:

14 июля 2010 г., 00:16:53

On 13/07/10 22:16, Andras Fabian wrote:
> I think I have found the solution. Yes, I now can get constantly high throughput with COPY-to-STDOUT, even if free -m
onlyshows me 82 Mbytes (so no, this solution is not cleaning the cache). Always around 2 3/4 minutes. 
>
> I have compared all the /proc/sys/vm settings on my new machines and the old machines (which never had problems), and
ofcourse found some differences, some new settings etc. (of course, lot of changes can happen between 2.6.26 and
2.6.32).And there was one, which stood out from the mass, because its name reminded me of some functions which I have
alwaysseen in the kernel stack while having congestion_wait: 
>
> - zone_reclaim_mode
> (yes, in the kernel stack there was always also a call to "zone_reclaim").
>
> Interestingly, on the old machine this was set to "0" and on the new machine - obviously per Ubuntu default - to "1"
...What these all means is shortly described here: 
>
> http://www.linuxinsight.com/proc_sys_vm_zone_reclaim_mode.html
>
> Then I though, lets give it a try. Nd I set it to "0" on the new server too ... and voila, it is running at high
speedin COPY-to-STDOUT. I can even switch back and forth between 0 and 1 and see, how congestion_wait comes back or
disappears.
>
> Now, someone with big kernel know-how could try to describe me in detail, what exactly could be at odds here.
>
> But for me it is now obvious, that I will put in my start up settings a change of "zone_reclaim_mode = 0".
>
> And tomorrow I will see how my nightly backup runs with this setting.

It sounds like it's time for a post to the Linux Kernel Mailing List,
and/or a Launchpad bug against the Ubuntu kernel.

Make sure to have your asbestos undewear on if posting to LKML ;-)

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

Re: PG_DUMP very slow because of STDOUT ??

От

Scott Marlowe

Дата:

14 июля 2010 г., 00:32:50

On Tue, Jul 13, 2010 at 9:10 PM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
>> And tomorrow I will see how my nightly backup runs with this setting.
>
> It sounds like it's time for a post to the Linux Kernel Mailing List,
> and/or a Launchpad bug against the Ubuntu kernel.
>
> Make sure to have your asbestos undewear on if posting to LKML ;-)

Note that I am testing a new 48 core machine with gobs of ram under
ubuntu 10.04, so I'll do some testing to see if I get the same kind of
issue, especially once I subscribe one of these machines to the slony
farm and get some action going on it to fill its buffers.

Note that the page posted earlier that described zone_reclaim_mode
said that when the memory is mostly used for file caching it's a good
idea to set it to 0.  Even though they said that they were talking
about file servers, I'd assume that applies to pgsql as well, since
it's normal to use most of the extra memory as file system cache.

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

14 июля 2010 г., 03:39:53

Hi guys,

I just wanted to tell you, that now the originating problem, namely that backups (with pg_dump) were much slower on the
newhardware IS solved. After setting " zone_reclaim_mode = 0", my backups last night were fast as expected ... for my
100GByte DB it took now only 1 hour (instead of 6 1/2 with "zone_reclaim_mode = 1" ). 
And again, thank you all for the good tips (especially about the tools to analyze the issue - I have learned a lot in
theprocess) which helped me to pin the problem  down! 

I will consider to post it to LKML 8as time permits) or to Ubuntu ...

By the way, would it maybe make sense, to put this information in the PostgreSQL Documentation ... as a kind of
cautionarywarning? Could make life for some admins easier in the future (especially if Ubuntu or other distributions
bringdifferent settings here). 

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Scott Marlowe [mailto:scott.marlowe@gmail.com]
Gesendet: Mittwoch, 14. Juli 2010 05:33
An: Craig Ringer
Cc: Andras Fabian; pgsql-general@postgresql.org
Betreff: Re: [GENERAL] PG_DUMP very slow because of STDOUT ??

On Tue, Jul 13, 2010 at 9:10 PM, Craig Ringer
<craig@postnewspapers.com.au> wrote:
>> And tomorrow I will see how my nightly backup runs with this setting.
>
> It sounds like it's time for a post to the Linux Kernel Mailing List,
> and/or a Launchpad bug against the Ubuntu kernel.
>
> Make sure to have your asbestos undewear on if posting to LKML ;-)

Note that I am testing a new 48 core machine with gobs of ram under
ubuntu 10.04, so I'll do some testing to see if I get the same kind of
issue, especially once I subscribe one of these machines to the slony
farm and get some action going on it to fill its buffers.

Note that the page posted earlier that described zone_reclaim_mode
said that when the memory is mostly used for file caching it's a good
idea to set it to 0.  Even though they said that they were talking
about file servers, I'd assume that applies to pgsql as well, since
it's normal to use most of the extra memory as file system cache.

Re: PG_DUMP very slow because of STDOUT ??

От

Andras Fabian

Дата:

14 июля 2010 г., 04:18:59

Well, I have searched a  bit more about the zone_reclaim_mode thing, and obviously I am not the first one having
problemswith it. It seems, there was some kind of logic introduced to the kernel, which decided that our machine must
bea NUMA architecture (which it is not), and as a consequence, decided to switch zone_reclaim_mode = 1. But now, it
seems,that some people have discovered that this makes no sense with some of the modern processors, which - although -
seemto be NUME from software ... are not NUMA. And thus, they want to go back to always set zone_reclaim_mode = 0 by
default.

Here is a lengthy discussion about the topic (so no need for me to start it again at LKML - and obviously it is not
Ubuntuspecific :-) 

    http://lkml.org/lkml/2009/5/12/586

(and yes, our hardware has Core i7 CPUs which seem to make many kernels think it is NUMA - it is often mentioned in the
LKMLthread). 

Andras Fabian

-----Ursprüngliche Nachricht-----
Von: Craig Ringer [mailto:craig@postnewspapers.com.au]
Gesendet: Mittwoch, 14. Juli 2010 05:11
An: Andras Fabian
Cc: pgsql-general@postgresql.org
Betreff: Re: AW: AW: AW: AW: AW: [GENERAL] PG_DUMP very slow because of STDOUT ??

On 13/07/10 22:16, Andras Fabian wrote:
> I think I have found the solution. Yes, I now can get constantly high throughput with COPY-to-STDOUT, even if free -m
onlyshows me 82 Mbytes (so no, this solution is not cleaning the cache). Always around 2 3/4 minutes. 
>
> I have compared all the /proc/sys/vm settings on my new machines and the old machines (which never had problems), and
ofcourse found some differences, some new settings etc. (of course, lot of changes can happen between 2.6.26 and
2.6.32).And there was one, which stood out from the mass, because its name reminded me of some functions which I have
alwaysseen in the kernel stack while having congestion_wait: 
>
> - zone_reclaim_mode
> (yes, in the kernel stack there was always also a call to "zone_reclaim").
>
> Interestingly, on the old machine this was set to "0" and on the new machine - obviously per Ubuntu default - to "1"
...What these all means is shortly described here: 
>
> http://www.linuxinsight.com/proc_sys_vm_zone_reclaim_mode.html
>
> Then I though, lets give it a try. Nd I set it to "0" on the new server too ... and voila, it is running at high
speedin COPY-to-STDOUT. I can even switch back and forth between 0 and 1 and see, how congestion_wait comes back or
disappears.
>
> Now, someone with big kernel know-how could try to describe me in detail, what exactly could be at odds here.
>
> But for me it is now obvious, that I will put in my start up settings a change of "zone_reclaim_mode = 0".
>
> And tomorrow I will see how my nightly backup runs with this setting.

It sounds like it's time for a post to the Linux Kernel Mailing List,
and/or a Launchpad bug against the Ubuntu kernel.

Make sure to have your asbestos undewear on if posting to LKML ;-)

--
Craig Ringer

Tech-related writing: http://soapyfrogs.blogspot.com/

Re: PG_DUMP very slow because of STDOUT ??

От

Greg Smith

Дата:

14 июля 2010 г., 10:10:35

Andras Fabian wrote:
> Here is a lengthy discussion about the topic (so no need for me to start it again at LKML - and obviously it is not
Ubuntuspecific :-) 
>     http://lkml.org/lkml/2009/5/12/58

Once there's some sort of resolution there, you might want to create an
Ubuntu bug report anyway and push for the fix to be backported to the
LTS version you're running.  The response you get to that should be an
interesting study in how Ubuntu treats bug fixes for your company.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us

Re: PG_DUMP very slow because of STDOUT ??

От

Scott Marlowe

Дата:

15 июля 2010 г., 02:18:45

On Tue, Jul 13, 2010 at 8:16 AM, Andras Fabian <Fabian@atrada.net> wrote:
> I think I have found the solution.

> - zone_reclaim_mode
> (yes, in the kernel stack there was always also a call to "zone_reclaim").

Thanks so much for this.  I too just got bit by the zone_reclaim_mode
slowing my file transfers to a crawl.  I can't believe that option is
enabled by default.  It appears to be a fabulously capable foot-gun.

Re: Writing XML files to Operating System

От

Martin Gainty

Дата:

18 июля 2010 г., 11:39:42

Good Morning

Admittedly oracle comes prepackaged with about 100 packaged goodies..reading and writing XML is the most useful.

I experienced similar headaches when moving a Oracle DB to mysql and discovered my 100 or so XML output scripts
could not be ported out of oracle..aggh!
The only solution i can think of you're going to have to backout all the XML routines out of the oracle DB
then create an XSD Xml Schema Document (apache axis contains about 12 of these you can use as your template)
then persist the individual simple and complex entities from the XSD structure to Entity Beans
Ultimately you will implemented your persistence with an available J2EE server such as Weblogic..Glassfish or tomcat (with chili beans)
When the webapp has finished processing you will eant to persist your entity beans to database and output the XML results to disk with FOP..or in my case write to disk and transmit to p/l

Looks like you've just added 12 line-items to the globaldatapoint bill!

Martin Gainty
______________________________________________
Verzicht und Vertraulichkeitanmerkung/Note de déni et de confidentialité

Diese Nachricht ist vertraulich. Sollten Sie nicht der vorgesehene Empfaenger sein, so bitten wir hoeflich um eine Mitteilung. Jede unbefugte Weiterleitung oder Fertigung einer Kopie ist unzulaessig. Diese Nachricht dient lediglich dem Austausch von Informationen und entfaltet keine rechtliche Bindungswirkung. Aufgrund der leichten Manipulierbarkeit von E-Mails koennen wir keine Haftung fuer den Inhalt uebernehmen.

Ce message est confidentiel et peut être privilégié. Si vous n'êtes pas le destinataire prévu, nous te demandons avec bonté que pour satisfaire informez l'expéditeur. N'importe quelle diffusion non autorisée ou la copie de ceci est interdite. Ce message sert à l'information seulement et n'aura pas n'importe quel effet légalement obligatoire. Étant donné que les email peuvent facilement être sujets à la manipulation, nous ne pouvons accepter aucune responsabilité pour le contenu fourni.

> From: Atul.Goel@globaldatapoint.com
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Writing XML files to Operating System
> Date: Tue, 13 Jul 2010 12:08:59 +0000
>
> Hi All,
>
> We are a data based company and are migrating from Oracle to Postgresql. For this purpose I am doing POC for the same. We have a business requirement to send the Data in XML files to our clients. The file size of XMLs is around 700MB and is growing.
>
> I have been able to generate sample XML files and store the same(Still to test for full files).
>
> However I have the following problems
>
> 1) It took 35 minutes to store 1 GB data in the TEXT column
> (create table test_text_nearly_1_GB as select repeat('1', 1024*1024*1023):: text as nearly_1_GB)
>
> 2) It took 30 Minutes to write the same to OS using COPY command
> Copy test_text_nearly_1_GB to '$HOME/test_text_nearly_1_GB.xml'
>
> However when I tested to write a table of size 3GB with 150000 rows it wrote to OS in 20 minutes.
>
> I am looking for a solution to
> 1) generate and write the 1GB file in OS in less than 10 minutes.
> 2) write the file to OS the results of a query
> 3) Find a solution when the file size become > 1GB
>
> Please help.
>
>
> Regards,
> Atul Goel
> This e-mail is confidential and should not be used by anyone who is not the original intended recipient. Global DataPoint Limited does not accept liability for any statements made which are clearly the sender's own and not expressly made on behalf of Global DataPoint Limited. No contracts may be concluded on behalf of Global DataPoint Limited by means of e-mail communication. Global DataPoint Limited Registered in England and Wales with registered number 3739752 Registered Office Middlesex House, 34-42 Cleveland Street, London W1T 4LB
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general

Hotmail has tools for the New Busy. Search, chat and e-mail from your inbox. Learn more.

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: PG_DUMP very slow because of STDOUT ??