Re: Problem with pgstat timneouts

Поиск
Список
Период
Сортировка
От Benjamin Krajmalnik
Тема Re: Problem with pgstat timneouts
Дата
Msg-id F4E6A2751A2823418A21D4A160B689888CA5AE@fletch.stackdump.local
обсуждение исходный текст
Ответ на Re: Problem with pgstat timneouts  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Problem with pgstat timneouts  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-admin
Thanks, Tom.
The system is running 16 cores (dual E5620's), 96GB RAM.
Nothing changed on the system when this started happening - at least
nothing in the configuration - it has not been touched since it was set
up.
Load averages are in between 6 and 8 (all 3 of them)
Memory wise, I do not see a problem - plenty of free memory, and only
120K of swap has been used.
The only thing which I see when I run iostat is load on the drives every
so often.  Mifd0 is the database, mifd1 is the pg_xlog.


db1# iostat -x 5
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   1.6   0
ad6        0.0   0.0     0.0     0.0    0   0.8   0
mfid0     11.6 172.2   171.7  2988.5    0 178.1   6
mfid1      0.1 734.0     1.1  5898.8    0   0.2   4
ar0        0.3   0.7     1.0    10.3    0 114.7   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      2.8  73.4    22.4  1961.9    0   0.2   2
mfid1      0.0 1784.9     0.0 14303.2    0   0.0   5
ar0        0.0   1.0     0.0     6.8    0   0.8   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      2.2 652.7    44.8  7671.1 2053  99.6  10
mfid1      0.0 1525.9     0.0 12207.6    1   0.1  12
ar0        0.0   0.0     0.0     0.0    0   0.0   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      1.0 2202.4     8.0 19776.3    0 632.0  68
mfid1      0.0 574.3     0.0  4666.4    0   0.5  23
ar0        0.0   0.0     0.0     0.0    0   0.0   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      1.8  75.8    14.4  2242.8    0   0.3   2
mfid1      0.0 1722.5     0.0 13802.0    0   0.0   5
ar0        0.0   0.0     0.0     0.0    0   0.0   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      1.4  71.4    11.2  1991.6    0   0.2   1
mfid1      0.0 1708.9     0.0 13671.4    0   0.0   4
ar0        0.2   2.4     3.2    38.4    0   2.5   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      1.0  78.0     8.0  2222.5    0   0.1   1
mfid1      0.0 1541.0     0.0 12352.1    0   0.0   5
ar0        0.0   0.6     0.0     9.6    0   0.6   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      1.4  56.6    11.2  1629.1    0   0.3   2
mfid1      0.0 1558.3     0.0 12468.3    0   0.0   6
ar0        0.0   0.6     0.0     3.2    0   0.5   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      1.8  72.2    14.4  1706.7    0   0.2   1
mfid1      0.0 1696.5     0.0 13596.2    0   0.0   5
ar0        0.0   0.0     0.0     0.0    0   0.0   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      0.0 1757.7     0.0 16093.5  351 648.0  48
mfid1      0.0 503.9     0.0  4031.3    0   0.2  10
ar0        0.0   0.0     0.0     0.0    0   0.0   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      0.0 125.0     0.0  2542.5    0 404.4   1
mfid1      0.0 1743.9     0.0 13951.1    0   0.1   5
ar0        0.0   0.0     0.0     0.0    0   0.0   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      1.2  73.4     9.6  2122.6    0   0.1   1
mfid1      0.0 1624.3     0.0 13020.0    0   0.0   4
ar0        0.0   2.0     0.0    32.0    0   1.8   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      0.2  67.4     1.6  1955.8    0   0.1   0
mfid1      0.0 1334.5     0.0 10699.7    0   0.0   5
ar0        0.0   0.4     0.0     6.4    0   0.5   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      1.0  64.6     8.0  1789.1    0   0.2   1
mfid1      0.0 1541.0     0.0 12352.1    0   0.0   5
ar0        0.0   0.0     0.0     0.0    0   0.0   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      1.0  78.4     8.0  1855.3    0   0.2   1
mfid1      0.0 1652.1     0.0 13217.0    0   0.0   5
ar0        0.0   0.8     0.0     6.4    0   0.7   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      0.6 691.6     4.8  7919.5 3461  82.4  11
mfid1      0.0 1477.7     0.0 11821.7    1   0.3  15
ar0        0.0   0.0     0.0     0.0    0   0.0   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      0.4 1346.2     3.2 12558.1    0 855.1  41
mfid1      0.0 1020.8     0.0  8190.6    0   0.5  39
ar0        0.0   0.0     0.0     0.0    0   0.0   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      1.0  62.2     8.0  1792.6    0   0.3   2
mfid1      0.0 1636.3     0.0 13112.4    0   0.0   4
ar0        0.0   0.8     0.0    12.8    0   0.8   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      0.2  63.0     1.6  1818.1    0   0.1   0
mfid1      0.0 1710.6     0.0 13684.6    0   0.0   5
ar0        0.0   1.2     0.0    19.2    0   1.2   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      0.8  83.8     6.4  2330.6    0   0.1   1
mfid1      0.0 1218.8     0.0  9750.2    0   0.0   4
ar0        0.0   0.4     0.0     6.4    0   0.5   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      0.0  68.6     0.0  1655.7    0   0.1   0
mfid1      0.0 1708.9     0.0 13671.4    0   0.0   5
ar0        0.0   0.6     0.0     3.2    0   0.5   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      0.0  71.6     0.0  1999.4    0   0.0   0
mfid1      0.0 1071.3     0.0  8594.1    0   0.1   3
ar0        0.0   0.0     0.0     0.0    0   0.0   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      0.4 1373.6     3.2 12956.5    0 550.7  35
mfid1      0.0 999.8     0.0  8022.7    0   0.8  37
ar0        0.0   0.0     0.0     0.0    0   0.0   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      0.8  68.8     8.0  1908.5    0   0.2   1
mfid1      0.0 1370.5     0.0 10962.3    0   0.0   3
ar0        0.0   0.6     0.0     9.6    0   0.6   0
                        extended device statistics
device     r/s   w/s    kr/s    kw/s wait svc_t  %b
ad4        0.0   0.0     0.0     0.0    0   0.0   0
ad6        0.0   0.0     0.0     0.0    0   0.0   0
mfid0      1.2  63.8     9.6  1912.1    0   0.3   2
mfid1      0.0 1355.8     0.0 10846.2    0   0.0   3
ar0        0.0   1.0     0.0    16.0    0   1.1   0

Looking at /var/log/messages, everything appears to be ok.
The only thing I found, but it was from over a month ago, was the
following:
Nov 17 06:35:03 db1 kernel: MCA: Bank 8, Status 0x8c0000400001009f
Nov 17 06:35:03 db1 kernel: MCA: Global Cap 0x0000000000001c09, Status
0x0000000000000000
Nov 17 06:35:03 db1 kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC
ID 0
Nov 17 06:35:03 db1 kernel: MCA: CPU 0 COR (1) RD channel ?? memory
error
Nov 17 06:35:03 db1 kernel: MCA: Address 0x84aa98000
Nov 17 06:35:03 db1 kernel: MCA: Misc 0x4701041000041280
Nov 17 07:35:03 db1 kernel: MCA: Bank 8, Status 0x8c0000400001009f
Nov 17 07:35:03 db1 kernel: MCA: Global Cap 0x0000000000001c09, Status
0x0000000000000000
Nov 17 07:35:03 db1 kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC
ID 0
Nov 17 07:35:03 db1 kernel: MCA: CPU 0 COR (1) RD channel ?? memory
error
Nov 17 07:35:03 db1 kernel: MCA: Address 0x84aa98000
Nov 17 07:35:03 db1 kernel: MCA: Misc 0x4701041000046141
Nov 17 09:35:03 db1 kernel: MCA: Bank 8, Status 0x88000040000200cf
Nov 17 09:35:03 db1 kernel: MCA: Global Cap 0x0000000000001c09, Status
0x0000000000000000
Nov 17 09:35:03 db1 kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC
ID 0
Nov 17 09:35:03 db1 kernel: MCA: CPU 0 COR (1) MS channel ?? memory
error
Nov 17 09:35:03 db1 kernel: MCA: Misc 0x4701041000046141
Nov 18 13:34:59 db1 kernel: MCA: Bank 8, Status 0xcc0000800001009f
Nov 18 13:34:59 db1 kernel: MCA: Global Cap 0x0000000000001c09, Status
0x0000000000000000
Nov 18 13:34:59 db1 kernel: MCA: Vendor "GenuineIntel", ID 0x206c2, APIC
ID 0
Nov 18 13:34:59 db1 kernel: MCA: CPU 0 COR (2) OVER RD channel ?? memory
error
Nov 18 13:34:59 db1 kernel: MCA: Address 0x84aa98000
Nov 18 13:34:59 db1 kernel: MCA: Misc 0x4701041000044707
But it has not reoccurred.


> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Thursday, December 22, 2011 10:55 AM
> To: Benjamin Krajmalnik
> Cc: pgsql-admin
> Subject: Re: [ADMIN] Problem with pgstat timneouts
>
> "Benjamin Krajmalnik" <kraj@servoyant.com> writes:
> > About a month ago, I started receiving quite a few pgstat timeouts
on
> my
> > production database.
>
> > PostgreSQL 9.0.3 on amd64-portbld-freebsd8.1, compiled by GCC cc
> (GCC)
> > 4.2.1 20070719  [FreeBSD], 64-bit
>
> > I am not sure where to start looking for the cause.
>
> Yeah, so are the rest of us :-(.  People have been reporting sometimes
> seeing these messages without any clear cause.  We've been unable to
> find any bug in Postgres that would explain them, and the obvious
> explanation of "your system is so overloaded that the stats collector
> is too slow to respond" isn't very satisfying if response time seems
> snappy otherwise.  I think we have seen one case where the problem was
> provably caused by significant (multi-second) clock skew between
> different processors, but that's likely not applicable to many people.
>
> Can you reconstruct what changed in your system around the time you
> started seeing these?
>
> > It appears that these timeouts are affecting the performance of the
> > application - sometimes it hangs for a few seconds before data gets
> > returned, which was not the case before this.
>
> Usually these timeouts are reported by autovacuum, and so should not
be
> affecting the performance of foreground queries, at least not queries
> that are not explicitly requesting pgstats data.  The fact that your
> log messages don't include any query context reinforces the idea that
> they are coming from autovacuum for you too.  So if you are seeing
> query response time problems, maybe the pgstat timeouts are another
> symptom, not the cause, of an underlying too-much-load problem.  You
> might try turning on query duration logging and see if you can
> correlate
> response-time glitches and pgstat timeouts to anything else
observable,
> such as peaks of system load average.
>
>             regards, tom lane

В списке pgsql-admin по дате отправления:

Предыдущее
От: Dinesh Bhandary
Дата:
Сообщение: impact of truncate table on indexes
Следующее
От: Tom Lane
Дата:
Сообщение: Re: impact of truncate table on indexes