Обсуждение: Pgstat.tmp file activity

Поиск
Список
Период
Сортировка

Pgstat.tmp file activity

От
"Dan Austin"
Дата:
Hi,

Have a major reliability problem with out postgres installation here.
Basically the backends seem to "hang" when the server is under
non-trivial load. It can cope with ~200 concurrent connections (max is
1024) but anything over that and all the backends start to hang. By hang
I mean they answer connection attempts and accept a SQL statement but
take a LONG time to reply.

During this time if I do a "df" on the server the df command will block
as well when it comes to list the partition that postgres's database is
stored on. The box does not appear to be CPU bound and there is little
or no ioswap activity. There is however a LOT of disk writing activity
on the postgres partition.

Vitals:
* Solaris 10 (SPARC)
* Postgres 8.0.2

Solaris 10 has a tool called dtrace which reveals that pid #687 (a
postgres backend) is doing a LOT of writing to a single file (iosnoop.d
for you dtrace hackers):

  UID   PID D    BLOCK   SIZE       COMM PATHNAME
  103   687 W   120384  98304   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W   120384  98304   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W   120384  98304   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W  4521728  65536   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W  4521728  65536   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W  4521728  65536   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W  4521904 131072   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W  4522160  81920   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W  4521904 131072   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W  4522160  81920   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
  103   687 W  4521904 212992   postgres
/export/data/pgsql/data/global/pgstat.tmp.687
[...]

# ps -fp 687
     UID   PID  PPID   C    STIME TTY         TIME CMD
   pgsql   687   686   1   Dec 20 ?          57:09
/usr/local/pgsql/bin/postmaster -D /export/data/pgsql/data

(Hmmm... Interesting. The PID of the process doing all that writing is
always "+1" of its parent PID. I take it that the first child spawned by
postmaster is responsible for updating the pg_stats_activity table and
others?)

This database serves a website and most of its queries (almost all) will
be read-only. So it's not clear why so much disk writing is going on. So
we started to look at exactly what was doing the writing and that's when
we found this file. We have postgres running on two other boxes, both
Solaris, same version of Postgres, and they show similar activity.

We think that maybe Postgres is saturating the disk and this is causing
the "blocking". Mind you the iostat.d program that produced the output
above was running at a time when there was low load and things were
fine.

So my questions are:

1. What is the pgstat.tmp file for and why does a single PID on each
server write almost continuously to it?

2. Why does that file not really exist in the file system according to
"ls"?

3. Where should I look to determine why Postgres is writing so much to
that disk?

Dan

--
Daniel Austin (+61 2 9358 7983)
Internet Technology Manager


--
Confidential: This email is intended only for the person or entity to
which it is addressed. The information contained in this email may be
confidential and privileged. Any retransmission, review, dissemination or
other use of, or taking of any action in reliance upon, this information
by persons or entities other than the intended recipient is prohibited.
If you received this email in error, please let us know by reply and
destroy all copies of it. Thank you.


Re: Pgstat.tmp file activity

От
Tom Lane
Дата:
"Dan Austin" <DAustin@au.westfield.com> writes:
> Solaris 10 has a tool called dtrace which reveals that pid #687 (a
> postgres backend) is doing a LOT of writing to a single file (iosnoop.d
> for you dtrace hackers):

>   UID   PID D    BLOCK   SIZE       COMM PATHNAME
>   103   687 W   120384  98304   postgres
> /export/data/pgsql/data/global/pgstat.tmp.687

Hmm.  That's the stats collector creating a new current-stats file,
which it does every half second or so.  You might think about disabling
stats collection if you don't have a need for it.  But I'm having a hard
time believing that that's the main cause of your bottleneck.  The
backends aren't supposed to block waiting for the stats collector ...

            regards, tom lane

Re: Pgstat.tmp file activity

От
"Dan Austin"
Дата:
Tom,

Thanks for your reply.

> [...]
> >   UID   PID D    BLOCK   SIZE       COMM PATHNAME
> >   103   687 W   120384  98304   postgres
> > /export/data/pgsql/data/global/pgstat.tmp.687
>
> Hmm.  That's the stats collector creating a new current-stats file,
> which it does every half second or so.

I take it that the file gets renamed almost immediately because we can
also see writes to the parent directory -- but never see the file itself
using ls.

We can probably turn the stats collection off but it does help diagnose
certain classes of production problem. Will turn it off on one of the
servers and see what happens.

I'm starting to think maybe we need to tune the memory parameters.
There's also a lot of writing to files that look like this:

/export/data/pgsql/data/base/64920741/pgsql_tmp/pgsql_tmp966.9

What are these files used for? In this particular sample, that file was
being written to a lot. Perhaps we're running out of memory for some
sorting? We've dialled the logging down a little so we're not logging
pid's and statements to see what query that backend is doing.

Thanks again.

Dan



--
Confidential: This email is intended only for the person or entity to
which it is addressed. The information contained in this email may be
confidential and privileged. Any retransmission, review, dissemination or
other use of, or taking of any action in reliance upon, this information
by persons or entities other than the intended recipient is prohibited.
If you received this email in error, please let us know by reply and
destroy all copies of it. Thank you.


Re: Pgstat.tmp file activity

От
"Qingqing Zhou"
Дата:
""Dan Austin"" <DAustin@au.westfield.com> wrote
>
> I take it that the file gets renamed almost immediately because we can
> also see writes to the parent directory -- but never see the file itself
> using ls.
>
I think so.

>
> /export/data/pgsql/data/base/64920741/pgsql_tmp/pgsql_tmp966.9
>
> What are these files used for? In this particular sample, that file was
> being written to a lot. Perhaps we're running out of memory for some
> sorting?
>

These files could be generated by several situations to store intermediate
data. For example, hash join and external sort. For sort, if you increase
sort_mem, then you may avoid/alleviate usage of temp file.

Regards,
Qingqing



Re: Pgstat.tmp file activity

От
Tom Lane
Дата:
"Qingqing Zhou" <zhouqq@cs.toronto.edu> writes:
> ""Dan Austin"" <DAustin@au.westfield.com> wrote
>> I take it that the file gets renamed almost immediately because we can
>> also see writes to the parent directory -- but never see the file itself
>> using ls.
>>
> I think so.

Right, the file is fully written under a temp name, and then rename()'d
into place as pgstat.stat.  rename is atomic per the Unix specs, so this
ensures that any backend looking at pgstat.stat will see a complete
consistent file.  (This works on Unix anyway ... I'm not convinced that
it does on Windows ...)

But I still doubt that that's the source of your performance problem.
Earlier today I tried a 200-client pgbench run with all stats options
enabled, and I couldn't see that the stats collector was eating any
undue share of time ...

>> /export/data/pgsql/data/base/64920741/pgsql_tmp/pgsql_tmp966.9
>>
>> What are these files used for? In this particular sample, that file was
>> being written to a lot. Perhaps we're running out of memory for some
>> sorting?

> These files could be generated by several situations to store intermediate
> data. For example, hash join and external sort. For sort, if you increase
> sort_mem, then you may avoid/alleviate usage of temp file.

AFAIR, increasing work_mem (nee sort_mem) will affect all the sources of
temp-file usage, not only sorts.  You want to be careful not to push it
up so far that your system starts swapping --- see past discussions in
the pgsql-performance archives.

            regards, tom lane

Re: Pgstat.tmp file activity

От
"Dan Austin"
Дата:
> [...]
> Right, the file is fully written under a temp name, and then
> rename()'d

This makes sense and is almost certainly not the cause of the problem.
Although there is a lot more data being written to that file than I
would have thought.

One of Sun's kernel engineers has come up with a solution that seems to
have fixed the problem. Instead of increasing work_mem (which we still
might do) we've mounted a swapfs partition over the pg temp directory:

  # mount | grep pgsql_tmp
  /export/data/pgsql/data/base/64920741/pgsql_tmp on swap [...]

This has actually reduced the disk activity considerably. So far, the
blocking problem seems to have gone.

Now to find some Postgres DBAs in Sydney to do some review / tuning of
our memory settings. :-)

Dan



--
Confidential: This email is intended only for the person or entity to
which it is addressed. The information contained in this email may be
confidential and privileged. Any retransmission, review, dissemination or
other use of, or taking of any action in reliance upon, this information
by persons or entities other than the intended recipient is prohibited.
If you received this email in error, please let us know by reply and
destroy all copies of it. Thank you.


Re: Pgstat.tmp file activity

От
Tom Lane
Дата:
"Dan Austin" <DAustin@au.westfield.com> writes:
> One of Sun's kernel engineers has come up with a solution that seems to
> have fixed the problem. Instead of increasing work_mem (which we still
> might do) we've mounted a swapfs partition over the pg temp directory:
>   # mount | grep pgsql_tmp
>   /export/data/pgsql/data/base/64920741/pgsql_tmp on swap [...]
> This has actually reduced the disk activity considerably. So far, the
> blocking problem seems to have gone.

OK, so the stats collector is off the hook: it's not writing anything
in that directory.  I'd say you have a pretty conventional
query-optimization problem here after all --- you want to cut the
traffic into those temp files.  You ought to be looking at EXPLAIN
ANALYZE info for your common queries.

            regards, tom lane

Re: Pgstat.tmp file activity

От
"Jim C. Nasby"
Дата:
On Fri, Dec 23, 2005 at 04:50:32PM +1100, Dan Austin wrote:
> > [...]
> > Right, the file is fully written under a temp name, and then
> > rename()'d
>
> This makes sense and is almost certainly not the cause of the problem.
> Although there is a lot more data being written to that file than I
> would have thought.
>
> One of Sun's kernel engineers has come up with a solution that seems to
> have fixed the problem. Instead of increasing work_mem (which we still
> might do) we've mounted a swapfs partition over the pg temp directory:
>
>   # mount | grep pgsql_tmp
>   /export/data/pgsql/data/base/64920741/pgsql_tmp on swap [...]
>
> This has actually reduced the disk activity considerably. So far, the
> blocking problem seems to have gone.
>
> Now to find some Postgres DBAs in Sydney to do some review / tuning of
> our memory settings. :-)

Note also that Solaris by default limits the amount of memory used for
disk caching, which is Not Good for PostgreSQL. IIRC this can be changed
with a sysctl, but I don't recall the details off-hand.
--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461

Re: Pgstat.tmp file activity

От
Tom Arthurs
Дата:
 From http://docs.sun.com/source/817-1836-10/perfplat.html

High File System Page-in Rate

If you are seeing high file system page-in rates on Solaris 8 or 9, you
may benefit from increasing the value of segmap_percent. This parameter
is set by adding the following line to the /etc/system file:

set segmap_percent=25

segmap_percent adjusts the percentage of memory that the kernel will map
into its address space for the file system cache. The default value is
12; that is, the kernel will reserve enough space to map at most 12% of
memory for the file system cache. On a heavily loaded machine with 4 GB
of physical memory, improvements have been seen with values as high as
60. You should experiment with this value, starting with values around
25. On systems with large amounts of physical memory, you should raise
this value in small increments, as it can significantly increase kernel
memory requirements.
====================================

I think this can also be set with the kernel debugger.

I keep it at 50% on my E3500 and E4500 systems (8 and 12 GB memory).

Jim C. Nasby wrote:

>On Fri, Dec 23, 2005 at 04:50:32PM +1100, Dan Austin wrote:
>
>
>>>[...]
>>>Right, the file is fully written under a temp name, and then
>>>rename()'d
>>>
>>>
>>This makes sense and is almost certainly not the cause of the problem.
>>Although there is a lot more data being written to that file than I
>>would have thought.
>>
>>One of Sun's kernel engineers has come up with a solution that seems to
>>have fixed the problem. Instead of increasing work_mem (which we still
>>might do) we've mounted a swapfs partition over the pg temp directory:
>>
>>  # mount | grep pgsql_tmp
>>  /export/data/pgsql/data/base/64920741/pgsql_tmp on swap [...]
>>
>>This has actually reduced the disk activity considerably. So far, the
>>blocking problem seems to have gone.
>>
>>Now to find some Postgres DBAs in Sydney to do some review / tuning of
>>our memory settings. :-)
>>
>>
>
>Note also that Solaris by default limits the amount of memory used for
>disk caching, which is Not Good for PostgreSQL. IIRC this can be changed
>with a sysctl, but I don't recall the details off-hand.
>
>


Re: Pgstat.tmp file activity

От
"Jim C. Nasby"
Дата:
There's also a non-percent-based control as well. My understanding is
that segmap_percent doesn't work on some versions or architectures,
notably OpenSolaris.

On Tue, Dec 27, 2005 at 01:15:39PM -0800, Tom Arthurs wrote:
> From http://docs.sun.com/source/817-1836-10/perfplat.html
>
> High File System Page-in Rate
>
> If you are seeing high file system page-in rates on Solaris 8 or 9, you
> may benefit from increasing the value of segmap_percent. This parameter
> is set by adding the following line to the /etc/system file:
>
> set segmap_percent=25
>
> segmap_percent adjusts the percentage of memory that the kernel will map
> into its address space for the file system cache. The default value is
> 12; that is, the kernel will reserve enough space to map at most 12% of
> memory for the file system cache. On a heavily loaded machine with 4 GB
> of physical memory, improvements have been seen with values as high as
> 60. You should experiment with this value, starting with values around
> 25. On systems with large amounts of physical memory, you should raise
> this value in small increments, as it can significantly increase kernel
> memory requirements.
> ====================================
>
> I think this can also be set with the kernel debugger.
>
> I keep it at 50% on my E3500 and E4500 systems (8 and 12 GB memory).
>
> Jim C. Nasby wrote:
>
> >On Fri, Dec 23, 2005 at 04:50:32PM +1100, Dan Austin wrote:
> >
> >
> >>>[...]
> >>>Right, the file is fully written under a temp name, and then
> >>>rename()'d
> >>>
> >>>
> >>This makes sense and is almost certainly not the cause of the problem.
> >>Although there is a lot more data being written to that file than I
> >>would have thought.
> >>
> >>One of Sun's kernel engineers has come up with a solution that seems to
> >>have fixed the problem. Instead of increasing work_mem (which we still
> >>might do) we've mounted a swapfs partition over the pg temp directory:
> >>
> >> # mount | grep pgsql_tmp
> >> /export/data/pgsql/data/base/64920741/pgsql_tmp on swap [...]
> >>
> >>This has actually reduced the disk activity considerably. So far, the
> >>blocking problem seems to have gone.
> >>
> >>Now to find some Postgres DBAs in Sydney to do some review / tuning of
> >>our memory settings. :-)
> >>
> >>
> >
> >Note also that Solaris by default limits the amount of memory used for
> >disk caching, which is Not Good for PostgreSQL. IIRC this can be changed
> >with a sysctl, but I don't recall the details off-hand.
> >
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>

--
Jim C. Nasby, Sr. Engineering Consultant      jnasby@pervasive.com
Pervasive Software      http://pervasive.com    work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf       cell: 512-569-9461