Обсуждение: Pgstat.tmp file activity
Hi, Have a major reliability problem with out postgres installation here. Basically the backends seem to "hang" when the server is under non-trivial load. It can cope with ~200 concurrent connections (max is 1024) but anything over that and all the backends start to hang. By hang I mean they answer connection attempts and accept a SQL statement but take a LONG time to reply. During this time if I do a "df" on the server the df command will block as well when it comes to list the partition that postgres's database is stored on. The box does not appear to be CPU bound and there is little or no ioswap activity. There is however a LOT of disk writing activity on the postgres partition. Vitals: * Solaris 10 (SPARC) * Postgres 8.0.2 Solaris 10 has a tool called dtrace which reveals that pid #687 (a postgres backend) is doing a LOT of writing to a single file (iosnoop.d for you dtrace hackers): UID PID D BLOCK SIZE COMM PATHNAME 103 687 W 120384 98304 postgres /export/data/pgsql/data/global/pgstat.tmp.687 103 687 W 120384 98304 postgres /export/data/pgsql/data/global/pgstat.tmp.687 103 687 W 120384 98304 postgres /export/data/pgsql/data/global/pgstat.tmp.687 103 687 W 4521728 65536 postgres /export/data/pgsql/data/global/pgstat.tmp.687 103 687 W 4521728 65536 postgres /export/data/pgsql/data/global/pgstat.tmp.687 103 687 W 4521728 65536 postgres /export/data/pgsql/data/global/pgstat.tmp.687 103 687 W 4521904 131072 postgres /export/data/pgsql/data/global/pgstat.tmp.687 103 687 W 4522160 81920 postgres /export/data/pgsql/data/global/pgstat.tmp.687 103 687 W 4521904 131072 postgres /export/data/pgsql/data/global/pgstat.tmp.687 103 687 W 4522160 81920 postgres /export/data/pgsql/data/global/pgstat.tmp.687 103 687 W 4521904 212992 postgres /export/data/pgsql/data/global/pgstat.tmp.687 [...] # ps -fp 687 UID PID PPID C STIME TTY TIME CMD pgsql 687 686 1 Dec 20 ? 57:09 /usr/local/pgsql/bin/postmaster -D /export/data/pgsql/data (Hmmm... Interesting. The PID of the process doing all that writing is always "+1" of its parent PID. I take it that the first child spawned by postmaster is responsible for updating the pg_stats_activity table and others?) This database serves a website and most of its queries (almost all) will be read-only. So it's not clear why so much disk writing is going on. So we started to look at exactly what was doing the writing and that's when we found this file. We have postgres running on two other boxes, both Solaris, same version of Postgres, and they show similar activity. We think that maybe Postgres is saturating the disk and this is causing the "blocking". Mind you the iostat.d program that produced the output above was running at a time when there was low load and things were fine. So my questions are: 1. What is the pgstat.tmp file for and why does a single PID on each server write almost continuously to it? 2. Why does that file not really exist in the file system according to "ls"? 3. Where should I look to determine why Postgres is writing so much to that disk? Dan -- Daniel Austin (+61 2 9358 7983) Internet Technology Manager -- Confidential: This email is intended only for the person or entity to which it is addressed. The information contained in this email may be confidential and privileged. Any retransmission, review, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this email in error, please let us know by reply and destroy all copies of it. Thank you.
"Dan Austin" <DAustin@au.westfield.com> writes: > Solaris 10 has a tool called dtrace which reveals that pid #687 (a > postgres backend) is doing a LOT of writing to a single file (iosnoop.d > for you dtrace hackers): > UID PID D BLOCK SIZE COMM PATHNAME > 103 687 W 120384 98304 postgres > /export/data/pgsql/data/global/pgstat.tmp.687 Hmm. That's the stats collector creating a new current-stats file, which it does every half second or so. You might think about disabling stats collection if you don't have a need for it. But I'm having a hard time believing that that's the main cause of your bottleneck. The backends aren't supposed to block waiting for the stats collector ... regards, tom lane
Tom, Thanks for your reply. > [...] > > UID PID D BLOCK SIZE COMM PATHNAME > > 103 687 W 120384 98304 postgres > > /export/data/pgsql/data/global/pgstat.tmp.687 > > Hmm. That's the stats collector creating a new current-stats file, > which it does every half second or so. I take it that the file gets renamed almost immediately because we can also see writes to the parent directory -- but never see the file itself using ls. We can probably turn the stats collection off but it does help diagnose certain classes of production problem. Will turn it off on one of the servers and see what happens. I'm starting to think maybe we need to tune the memory parameters. There's also a lot of writing to files that look like this: /export/data/pgsql/data/base/64920741/pgsql_tmp/pgsql_tmp966.9 What are these files used for? In this particular sample, that file was being written to a lot. Perhaps we're running out of memory for some sorting? We've dialled the logging down a little so we're not logging pid's and statements to see what query that backend is doing. Thanks again. Dan -- Confidential: This email is intended only for the person or entity to which it is addressed. The information contained in this email may be confidential and privileged. Any retransmission, review, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this email in error, please let us know by reply and destroy all copies of it. Thank you.
""Dan Austin"" <DAustin@au.westfield.com> wrote > > I take it that the file gets renamed almost immediately because we can > also see writes to the parent directory -- but never see the file itself > using ls. > I think so. > > /export/data/pgsql/data/base/64920741/pgsql_tmp/pgsql_tmp966.9 > > What are these files used for? In this particular sample, that file was > being written to a lot. Perhaps we're running out of memory for some > sorting? > These files could be generated by several situations to store intermediate data. For example, hash join and external sort. For sort, if you increase sort_mem, then you may avoid/alleviate usage of temp file. Regards, Qingqing
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes: > ""Dan Austin"" <DAustin@au.westfield.com> wrote >> I take it that the file gets renamed almost immediately because we can >> also see writes to the parent directory -- but never see the file itself >> using ls. >> > I think so. Right, the file is fully written under a temp name, and then rename()'d into place as pgstat.stat. rename is atomic per the Unix specs, so this ensures that any backend looking at pgstat.stat will see a complete consistent file. (This works on Unix anyway ... I'm not convinced that it does on Windows ...) But I still doubt that that's the source of your performance problem. Earlier today I tried a 200-client pgbench run with all stats options enabled, and I couldn't see that the stats collector was eating any undue share of time ... >> /export/data/pgsql/data/base/64920741/pgsql_tmp/pgsql_tmp966.9 >> >> What are these files used for? In this particular sample, that file was >> being written to a lot. Perhaps we're running out of memory for some >> sorting? > These files could be generated by several situations to store intermediate > data. For example, hash join and external sort. For sort, if you increase > sort_mem, then you may avoid/alleviate usage of temp file. AFAIR, increasing work_mem (nee sort_mem) will affect all the sources of temp-file usage, not only sorts. You want to be careful not to push it up so far that your system starts swapping --- see past discussions in the pgsql-performance archives. regards, tom lane
> [...] > Right, the file is fully written under a temp name, and then > rename()'d This makes sense and is almost certainly not the cause of the problem. Although there is a lot more data being written to that file than I would have thought. One of Sun's kernel engineers has come up with a solution that seems to have fixed the problem. Instead of increasing work_mem (which we still might do) we've mounted a swapfs partition over the pg temp directory: # mount | grep pgsql_tmp /export/data/pgsql/data/base/64920741/pgsql_tmp on swap [...] This has actually reduced the disk activity considerably. So far, the blocking problem seems to have gone. Now to find some Postgres DBAs in Sydney to do some review / tuning of our memory settings. :-) Dan -- Confidential: This email is intended only for the person or entity to which it is addressed. The information contained in this email may be confidential and privileged. Any retransmission, review, dissemination or other use of, or taking of any action in reliance upon, this information by persons or entities other than the intended recipient is prohibited. If you received this email in error, please let us know by reply and destroy all copies of it. Thank you.
"Dan Austin" <DAustin@au.westfield.com> writes: > One of Sun's kernel engineers has come up with a solution that seems to > have fixed the problem. Instead of increasing work_mem (which we still > might do) we've mounted a swapfs partition over the pg temp directory: > # mount | grep pgsql_tmp > /export/data/pgsql/data/base/64920741/pgsql_tmp on swap [...] > This has actually reduced the disk activity considerably. So far, the > blocking problem seems to have gone. OK, so the stats collector is off the hook: it's not writing anything in that directory. I'd say you have a pretty conventional query-optimization problem here after all --- you want to cut the traffic into those temp files. You ought to be looking at EXPLAIN ANALYZE info for your common queries. regards, tom lane
On Fri, Dec 23, 2005 at 04:50:32PM +1100, Dan Austin wrote: > > [...] > > Right, the file is fully written under a temp name, and then > > rename()'d > > This makes sense and is almost certainly not the cause of the problem. > Although there is a lot more data being written to that file than I > would have thought. > > One of Sun's kernel engineers has come up with a solution that seems to > have fixed the problem. Instead of increasing work_mem (which we still > might do) we've mounted a swapfs partition over the pg temp directory: > > # mount | grep pgsql_tmp > /export/data/pgsql/data/base/64920741/pgsql_tmp on swap [...] > > This has actually reduced the disk activity considerably. So far, the > blocking problem seems to have gone. > > Now to find some Postgres DBAs in Sydney to do some review / tuning of > our memory settings. :-) Note also that Solaris by default limits the amount of memory used for disk caching, which is Not Good for PostgreSQL. IIRC this can be changed with a sysctl, but I don't recall the details off-hand. -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461
From http://docs.sun.com/source/817-1836-10/perfplat.html High File System Page-in Rate If you are seeing high file system page-in rates on Solaris 8 or 9, you may benefit from increasing the value of segmap_percent. This parameter is set by adding the following line to the /etc/system file: set segmap_percent=25 segmap_percent adjusts the percentage of memory that the kernel will map into its address space for the file system cache. The default value is 12; that is, the kernel will reserve enough space to map at most 12% of memory for the file system cache. On a heavily loaded machine with 4 GB of physical memory, improvements have been seen with values as high as 60. You should experiment with this value, starting with values around 25. On systems with large amounts of physical memory, you should raise this value in small increments, as it can significantly increase kernel memory requirements. ==================================== I think this can also be set with the kernel debugger. I keep it at 50% on my E3500 and E4500 systems (8 and 12 GB memory). Jim C. Nasby wrote: >On Fri, Dec 23, 2005 at 04:50:32PM +1100, Dan Austin wrote: > > >>>[...] >>>Right, the file is fully written under a temp name, and then >>>rename()'d >>> >>> >>This makes sense and is almost certainly not the cause of the problem. >>Although there is a lot more data being written to that file than I >>would have thought. >> >>One of Sun's kernel engineers has come up with a solution that seems to >>have fixed the problem. Instead of increasing work_mem (which we still >>might do) we've mounted a swapfs partition over the pg temp directory: >> >> # mount | grep pgsql_tmp >> /export/data/pgsql/data/base/64920741/pgsql_tmp on swap [...] >> >>This has actually reduced the disk activity considerably. So far, the >>blocking problem seems to have gone. >> >>Now to find some Postgres DBAs in Sydney to do some review / tuning of >>our memory settings. :-) >> >> > >Note also that Solaris by default limits the amount of memory used for >disk caching, which is Not Good for PostgreSQL. IIRC this can be changed >with a sysctl, but I don't recall the details off-hand. > >
There's also a non-percent-based control as well. My understanding is that segmap_percent doesn't work on some versions or architectures, notably OpenSolaris. On Tue, Dec 27, 2005 at 01:15:39PM -0800, Tom Arthurs wrote: > From http://docs.sun.com/source/817-1836-10/perfplat.html > > High File System Page-in Rate > > If you are seeing high file system page-in rates on Solaris 8 or 9, you > may benefit from increasing the value of segmap_percent. This parameter > is set by adding the following line to the /etc/system file: > > set segmap_percent=25 > > segmap_percent adjusts the percentage of memory that the kernel will map > into its address space for the file system cache. The default value is > 12; that is, the kernel will reserve enough space to map at most 12% of > memory for the file system cache. On a heavily loaded machine with 4 GB > of physical memory, improvements have been seen with values as high as > 60. You should experiment with this value, starting with values around > 25. On systems with large amounts of physical memory, you should raise > this value in small increments, as it can significantly increase kernel > memory requirements. > ==================================== > > I think this can also be set with the kernel debugger. > > I keep it at 50% on my E3500 and E4500 systems (8 and 12 GB memory). > > Jim C. Nasby wrote: > > >On Fri, Dec 23, 2005 at 04:50:32PM +1100, Dan Austin wrote: > > > > > >>>[...] > >>>Right, the file is fully written under a temp name, and then > >>>rename()'d > >>> > >>> > >>This makes sense and is almost certainly not the cause of the problem. > >>Although there is a lot more data being written to that file than I > >>would have thought. > >> > >>One of Sun's kernel engineers has come up with a solution that seems to > >>have fixed the problem. Instead of increasing work_mem (which we still > >>might do) we've mounted a swapfs partition over the pg temp directory: > >> > >> # mount | grep pgsql_tmp > >> /export/data/pgsql/data/base/64920741/pgsql_tmp on swap [...] > >> > >>This has actually reduced the disk activity considerably. So far, the > >>blocking problem seems to have gone. > >> > >>Now to find some Postgres DBAs in Sydney to do some review / tuning of > >>our memory settings. :-) > >> > >> > > > >Note also that Solaris by default limits the amount of memory used for > >disk caching, which is Not Good for PostgreSQL. IIRC this can be changed > >with a sysctl, but I don't recall the details off-hand. > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > -- Jim C. Nasby, Sr. Engineering Consultant jnasby@pervasive.com Pervasive Software http://pervasive.com work: 512-231-6117 vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461