Re: heavy swapping, not sure why

Поиск
Список
Период
Сортировка
От Lonni J Friedman
Тема Re: heavy swapping, not sure why
Дата
Msg-id CAP=oouFOKT7Mbukp3bKx9rJ2sPRf=okrZMD_=5mAvE9y0qxCgA@mail.gmail.com
обсуждение исходный текст
Ответ на Re: heavy swapping, not sure why  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: heavy swapping, not sure why
Список pgsql-general
On Mon, Aug 29, 2011 at 5:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Lonni J Friedman <netllama@gmail.com> writes:
>> I have several Linux-x68_64 based dedicated PostgreSQL servers where
>> I'm experiencing significant swap usage growth over time.  All of them
>> have fairly substantial amounts of RAM (not including swap), yet the
>> amount of swap that postgres is using ramps up over time and
>> eventually hurts performance badly.  In every case, simply restarting
>> postgresql frees up all the swap in use (until it ramps up again
>> later).
>
> If you're certain that it's restarting *postgres* that does it, and not
> restarting your application or pgbouncer or some other code, then it
> seems like you must have uncovered a memory leak someplace.  We haven't
> got nearly enough info here to diagnose it though.
>
> First thing I'd want to know is which process(es) exactly are bloating.
> The top output you showed us is unhelpful for that since it just shows
> them all as "postmaster" --- you'll need to match up the problem PIDs
> with "ps auxww" output.  Keep in mind also that top is pretty awful
> about distinguishing a process's actual memory use (private memory)
> from the portion of PG's shared memory that it happens to have touched.
> What you need to pay attention to is RES minus SHR, not either number
> alone.  With shared buffers set as high as you've got it, you'll
> probably not be able to be sure that a process is bloating until it's
> eaten hundreds of megs of private space.

In the past 18 hours, swap usage has nearly doubled on systemA:
$ free -m
             total       used       free     shared    buffers     cached
Mem:         56481      56210        271          0         11      52470
-/+ buffers/cache:       3727      52753
Swap:         1099         35       1064

As a reminder, this is what it was yesterday afternoon (roughly 18
hours earlier):
            total       used       free     shared    buffers     cached
Mem:         56481      55486        995          0         15      53298
-/+ buffers/cache:       2172      54309
Swap:         1099         18       1081

Here's top sorted by memory usage:
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2250 postgres  20   0 13.5g  10g  10g S  0.0 19.0   4:50.17
postmaster
20427 postgres  20   0 13.6g 8.6g 8.5g D  3.6 15.5   6:15.53
postmaster
 1861 postgres  20   0 13.5g 2.2g 2.2g D  7.2  3.9   0:44.34
postmaster
32275 postgres  20   0 13.5g 2.0g 2.0g S  0.0  3.7   0:32.52
postmaster
 2444 postgres  20   0 13.5g 2.0g 2.0g S  0.0  3.6   0:21.22
postmaster
21632 postgres  20   0 13.5g 2.0g 1.9g S  0.0  3.6   1:48.76
postmaster
 1870 postgres  20   0 13.5g 1.7g 1.7g S  0.0  3.1   0:18.70
postmaster
 2383 postgres  20   0 13.5g 1.7g 1.7g S  0.0  3.1   0:13.23
postmaster
32280 postgres  20   0 13.5g 1.6g 1.6g S  0.0  2.9   0:20.97
postmaster
 2378 postgres  20   0 13.5g 1.3g 1.3g S  0.0  2.4   0:15.36
postmaster
 2367 postgres  20   0 13.5g 1.3g 1.3g S  0.0  2.4   0:15.71
postmaster
 2396 postgres  20   0 13.5g 1.2g 1.2g S  0.0  2.3   0:09.33
postmaster
31794 postgres  20   0 13.5g 1.2g 1.2g S  0.0  2.2   0:14.68
postmaster
 1891 postgres  20   0 13.5g 1.2g 1.2g S  0.0  2.1   0:06.49
postmaster
 2435 postgres  20   0 13.5g 1.2g 1.1g S  0.0  2.1   0:16.38
postmaster
 2370 postgres  20   0 13.5g 1.1g 1.1g S  0.0  2.0   0:08.51
postmaster
 2411 postgres  20   0 13.5g 1.0g 1.0g S  0.0  1.9   0:09.18
postmaster
31782 postgres  20   0 13.5g 981m 967m S  0.0  1.7   0:12.33
postmaster
32314 postgres  20   0 13.5g 909m 891m S  0.0  1.6   0:09.07
postmaster
 2395 postgres  20   0 13.5g 890m 876m S  0.0  1.6   0:07.16
postmaster
 2381 postgres  20   0 13.5g 829m 815m S  0.0  1.5   0:05.30
postmaster
 2213 postgres  20   0 13.5g 829m 817m S  0.0  1.5   0:04.40
postmaster
 2421 postgres  20   0 13.5g 803m 790m S  0.0  1.4   0:05.11
postmaster
 1866 postgres  20   0 13.5g 797m 784m S  0.0  1.4   0:08.10
postmaster
 2371 postgres  20   0 13.5g 793m 781m S  0.0  1.4   0:05.88
postmaster
 3460 postgres  20   0 13.5g 786m 773m S  0.0  1.4   0:06.17
postmaster
 2418 postgres  20   0 13.5g 604m 594m S  0.0  1.1   0:01.69
postmaster
 2425 postgres  20   0 13.5g 582m 570m S  0.0  1.0   0:02.67
postmaster
 5863 postgres  20   0 13.5g 358m 347m D  0.0  0.6   0:01.90
postmaster
 5865 postgres  20   0 13.5g 293m 283m S  0.0  0.5   0:01.70
postmaster
 2243 postgres  20   0 13.5g 289m 288m S  0.0  0.5   0:07.01
postmaster
 5862 postgres  20   0 13.5g 195m 189m S  0.0  0.3   0:00.48
postmaster
 1890 postgres  20   0 13.5g 149m 145m S  0.0  0.3   0:06.02
postmaster
 5852 postgres  20   0 13.5g 132m 124m S  0.0  0.2   0:01.04
postmaster
32316 postgres  20   0 13.5g  21m  19m S  0.0  0.0   0:00.60
postmaster
 1876 postgres  20   0 13.5g  21m  19m S  0.0  0.0   0:00.16
postmaster
 5866 postgres  20   0 13.5g  17m  15m S  0.0  0.0   0:00.01
postmaster
 2251 postgres  20   0 13.5g  17m  16m S  0.0  0.0   0:57.19
postmaster
32328 postgres  20   0 13.5g  16m  13m S  0.0  0.0   0:00.20
postmaster
 4893 postgres  20   0 13.5g  15m  13m S  0.0  0.0   0:00.15
postmaster
 2426 postgres  20   0 13.5g  14m  12m S  0.0  0.0   0:00.15
postmaster
 1878 postgres  20   0 13.5g  14m  12m S  0.0  0.0   0:00.12
postmaster
 2408 postgres  20   0 13.5g  13m  10m S  0.0  0.0   0:00.06
postmaster
 1896 postgres  20   0 13.5g  12m  10m S  0.0  0.0   0:00.13
postmaster
 2427 postgres  20   0 13.5g  10m 8436 S  0.0  0.0   0:00.07
postmaster
 2401 postgres  20   0 13.5g 8484 6412 S  0.0  0.0   0:00.03
postmaster
 5864 postgres  20   0 13.5g 7740 5836 S  0.0  0.0   0:00.00
postmaster
 5853 postgres  20   0 13.5g 7528 5652 S  0.0  0.0   0:00.00 postmaster


I've attached the corresponding 'ps auxwww' output as well (since its
rather wide, and it will likely get mangled inside the email body),
but here it is too for those who don't want to deal with the
attachment:
##############
postgres  1861  2.1  4.1 14135092 2421300 ?    Ds   08:37   0:45
postgres: lfriedman farm 127.0.0.1(34293) INSERT
postgres  1866  0.3  1.4 14142616 816732 ?     Ss   08:38   0:08
postgres: lfriedman nightly 127.0.0.1(56661) idle
postgres  1870  0.9  3.1 14147660 1816576 ?    Ss   08:38   0:18
postgres: lfriedman nightly 127.0.0.1(56665) idle
postgres  1876  0.0  0.0 14134040 21892 ?      Ss   08:38   0:00
postgres: lfriedman cbs 127.0.0.1(56670) idle
postgres  1878  0.0  0.0 14134636 14880 ?      Ss   08:38   0:00
postgres: lfriedman cbs 127.0.0.1(56672) idle
postgres  1890  0.3  0.2 14134992 153224 ?     Ss   08:40   0:06
postgres: lfriedman farm 127.0.0.1(56673) idle
postgres  1891  0.3  2.1 14134852 1215668 ?    Ss   08:40   0:06
postgres: lfriedman farm 127.0.0.1(56674) idle
postgres  1896  0.0  0.0 14134612 13244 ?      Ss   08:40   0:00
postgres: lfriedman cbs 127.0.0.1(56677) idle
postgres  2213  0.2  1.4 14142356 849448 ?     Ss   08:40   0:04
postgres: lfriedman nightly 127.0.0.1(56693) idle
postgres  2243  0.0  0.5 14125816 296012 ?     S    Aug29   0:07
/usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data
postgres  2248  0.0  0.0 159892   908 ?        Ss   Aug29   0:00
postgres: logger process
postgres  2250  0.3 19.0 14133028 11013112 ?   Ss   Aug29   4:50
postgres: writer process
postgres  2251  0.0  0.0 14132476 17532 ?      Ss   Aug29   0:57
postgres: wal writer process
postgres  2252  0.0  0.0 14133884 2428 ?       Ss   Aug29   0:09
postgres: autovacuum launcher process
postgres  2253  0.0  0.0 161272  1872 ?        Ss   Aug29   0:33
postgres: stats collector process
postgres  2367  0.8  2.3 14147948 1385192 ?    Ss   08:41   0:15
postgres: lfriedman nightly 127.0.0.1(56695) idle
postgres  2370  0.4  1.9 14147088 1151312 ?    Ss   08:41   0:08
postgres: lfriedman nightly 127.0.0.1(56698) idle
postgres  2371  0.3  1.4 14142216 814360 ?     Ds   08:41   0:05
postgres: lfriedman nightly 127.0.0.1(56699) UPDATE
postgres  2378  0.8  2.3 14142980 1387028 ?    Ss   08:41   0:15
postgres: lfriedman nightly 127.0.0.1(56706) idle
postgres  2381  0.2  1.4 14144484 849852 ?     Ss   08:41   0:05
postgres: lfriedman nightly 127.0.0.1(56709) idle
postgres  2383  0.7  3.1 14147196 1800300 ?    Ss   08:41   0:13
postgres: lfriedman nightly 127.0.0.1(56711) idle
postgres  2395  0.3  1.5 14142816 911604 ?     Ss   08:41   0:07
postgres: lfriedman nightly 127.0.0.1(56722) idle
postgres  2396  0.5  2.2 14144544 1302852 ?    Ss   08:41   0:09
postgres: lfriedman nightly 127.0.0.1(56723) idle
postgres  2401  0.0  0.0 14134040 8984 ?       Ss   08:41   0:00
postgres: lfriedman cbs 127.0.0.1(56728) idle
postgres  2408  0.0  0.0 14134824 13940 ?      Ss   08:41   0:00
postgres: lfriedman cbs 127.0.0.1(56735) idle
postgres  2411  0.4  1.8 14147688 1087016 ?    Ss   08:41   0:09
postgres: lfriedman nightly 127.0.0.1(56738) idle
postgres  2418  0.0  1.0 14141992 619484 ?     Ss   08:41   0:01
postgres: lfriedman nightly 127.0.0.1(56745) idle
postgres  2421  0.2  1.4 14142428 822400 ?     Ss   08:41   0:05
postgres: lfriedman nightly 127.0.0.1(56748) idle
postgres  2425  0.1  1.0 14142500 596420 ?     Ss   08:41   0:02
postgres: lfriedman nightly 127.0.0.1(56752) idle
postgres  2426  0.0  0.0 14134720 15896 ?      Ss   08:41   0:00
postgres: lfriedman cbs 127.0.0.1(56753) idle
postgres  2427  0.0  0.0 14134040 10536 ?      Ss   08:41   0:00
postgres: lfriedman cbs 127.0.0.1(56754) idle
postgres  2435  0.8  2.0 14147132 1210368 ?    Ss   08:41   0:16
postgres: lfriedman nightly 127.0.0.1(56761) idle
postgres  2444  1.1  3.5 14136592 2063556 ?    Ss   08:42   0:21
postgres: lfriedman farm 127.0.0.1(56769) idle
postgres  3460  0.5  1.3 14142640 805412 ?     Ss   08:51   0:06
postgres: lfriedman nightly 127.0.0.1(37675) idle
postgres  4893  0.0  0.0 14134520 16068 ?      Ss   09:02   0:00
postgres: lfriedman inventory 127.0.0.1(50384) idle
postgres  5852  0.6  0.2 14139168 135420 ?     Ss   09:09   0:01
postgres: lfriedman nightly 127.0.0.1(39228) idle
postgres  5853  0.0  0.0 14133968 7528 ?       Ss   09:09   0:00
postgres: lfriedman nightly 127.0.0.1(39229) idle
postgres  5862  0.3  0.3 14137720 200300 ?     Ss   09:10   0:00
postgres: lfriedman nightly 127.0.0.1(39230) idle
postgres  5863  1.5  0.6 14144120 367536 ?     Ss   09:10   0:01
postgres: lfriedman nightly 127.0.0.1(39231) idle
postgres  5864  0.0  0.0 14133968 7740 ?       Ss   09:10   0:00
postgres: lfriedman nightly 127.0.0.1(39232) idle
postgres  5865  1.7  0.5 14144176 338516 ?     Ss   09:10   0:02
postgres: lfriedman nightly 127.0.0.1(39233) idle
postgres  5866  0.0  0.0 14135036 18388 ?      Ss   09:10   0:00
postgres: lfriedman nightly 127.0.0.1(39234) idle
postgres 20427  0.6 15.5 14232764 8991924 ?    Ss   Aug29   6:16
postgres: autovacuum worker process   nightly
postgres 21632  0.1  3.5 14146368 2055856 ?    Ss   Aug29   1:48
postgres: lfriedman nightly 127.0.0.1(33664) idle
postgres 31782  0.3  1.7 14142696 1005052 ?    Ss   08:18   0:12
postgres: lfriedman nightly 127.0.0.1(39318) idle
postgres 31794  0.4  2.1 14149300 1261472 ?    Ss   08:18   0:14
postgres: lfriedman nightly 127.0.0.1(39330) idle
postgres 32275  1.0  3.6 14147716 2125608 ?    Ss   08:21   0:32
postgres: lfriedman nightly 127.0.0.1(39356) idle
postgres 32280  0.6  2.9 14150556 1699160 ?    Ss   08:21   0:20
postgres: lfriedman nightly 127.0.0.1(39361) idle
postgres 32314  0.3  1.6 14146864 930840 ?     Ss   08:22   0:09
postgres: lfriedman nightly 127.0.0.1(39395) idle
postgres 32328  0.0  0.0 14134636 16396 ?      Ss   08:22   0:00
postgres: lfriedman cbs 127.0.0.1(39409) idle
##############

I'm not sure if there's much to work with at this point, since is only
been 18 hours.  Please let me know if I can get additional data, or if
its better to wait several more days before capturing this data again.

thanks!

Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: Lonni J Friedman
Дата:
Сообщение: Re: heavy swapping, not sure why
Следующее
От: Peter Warasin
Дата:
Сообщение: duplicate key violates unique on a nextval() field