Обсуждение: postgresql recommendation memory

Поиск
Список
Период
Сортировка

postgresql recommendation memory

От
Евгений Селявка
Дата:
Please help with advice!

Server
HP ProLiant BL460c G1

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    1
Core(s) per socket:    4
CPU socket(s):         2
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 23
Stepping:              6
CPU MHz:               3000.105
BogoMIPS:              6000.04
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              6144K
NUMA node0 CPU(s):     0-7

32GB RAM
[root@db3 ~]# numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 32765 MB
node 0 free: 317 MB
node distances:
node   0
  0:  10


RAID1 2x146GB 10k rpm

CentOS release 6.3 (Final)
Linux 2.6.32-279.11.1.el6.x86_64 #1 SMP x86_64 GNU/Linux


kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
vm.swappiness = 30
vm.dirty_background_bytes = 67108864
vm.dirty_bytes = 536870912


PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit

listen_addresses = '*'
port = 5433
max_connections = 350
shared_buffers = 8GB
temp_buffers = 64MB
max_prepared_transactions = 350
work_mem = 256MB
maintenance_work_mem = 1GB
max_stack_depth = 4MB
max_files_per_process = 5000
effective_io_concurrency = 2
wal_level = hot_standby
synchronous_commit = off
checkpoint_segments = 64
checkpoint_timeout = 15min
checkpoint_completion_target = 0.75
max_wal_senders = 4
wal_sender_delay = 100ms
wal_keep_segments = 128
random_page_cost = 3.0
effective_cache_size = 18GB
autovacuum = on
autovacuum_max_workers = 5
autovacuum_vacuum_threshold = 900
autovacuum_analyze_threshold = 350
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.05
log_min_duration_statement = 500
deadlock_timeout = 1s


DB size is about 20GB. There is no high write activity on DB. But periodically in postgresql log i see for example: "select 1" duration is about 500-1000 ms.

In this period of time response time from db terribly. This period of time not bound with high traffic. It is not other app on the server. There is not specific cron job on server.

Our app written on java and use jdbc to connect to DB and internal pooling. There is about 100 connection to DB. This is sar output:

12:00:01 AM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
09:30:01 PM     73.17    302.72 134790.16      0.00  46809.73      0.00      0.00      0.00      0.00
09:35:01 PM     63.42    655.80 131740.74      0.00  46182.74      0.00      0.00      0.00      0.00
09:40:01 PM     76.87    400.62 122375.34      0.00  42096.27      0.00      0.00      0.00      0.00
09:45:01 PM     58.49    198.33 121922.86      0.00  42765.27      0.00      0.00      0.00      0.00
09:50:01 PM     52.21    485.45 136775.65      0.15  49098.65      0.00      0.00      0.00      0.00
09:55:01 PM     49.68    476.75 130159.24      0.00  45192.54      0.00      0.00      0.00      0.00
10:00:01 PM     41.35    295.34 118655.80      0.00  40786.52      0.00      0.00      0.00      0.00
10:05:01 PM     60.84    593.85 129890.83      0.00  44170.92      0.00      0.00      0.00      0.00
10:10:01 PM     52.08    471.36 132773.63      0.00  46019.13      0.00      2.41      2.41    100.00
10:15:01 PM     73.93    196.50 129384.21      0.33  45255.76     65.92      1.19     66.87     99.64
10:20:02 PM     70.35    473.16 121940.38      0.11  44061.52     81.95     37.79    119.42     99.73
10:25:01 PM     57.84    471.69 130583.33      0.01  46093.33      0.00      0.00      0.00      0.00
10:30:01 PM     52.91    321.62 119264.34      0.01  41748.19      0.00      0.00      0.00      0.00
10:35:01 PM     47.13    451.78 114625.62      0.02  40600.98      0.00      0.00      0.00      0.00
10:40:01 PM     48.96    472.41 102352.79      0.00  35402.17      0.00      0.00      0.00      0.00
10:45:01 PM     70.07    321.33 121423.02      0.00  43052.04      0.00      0.00      0.00      0.00
10:50:01 PM     46.78    479.95 128938.09      0.02  37864.07    116.64     48.97    165.07     99.67
10:55:02 PM    104.84    453.55 109189.98      0.00  37583.50      0.00      0.00      0.00      0.00
11:00:01 PM     46.23    248.75 107313.26      0.00  37278.10      0.00      0.00      0.00      0.00
11:05:01 PM     44.28    446.41 115598.61      0.01  40070.61      0.00      0.00      0.00      0.00
11:10:01 PM     38.86    457.32 100240.71      0.00  34407.29      0.00      0.00      0.00      0.00
11:15:01 PM     48.23    275.60 104780.84      0.00  36183.84      0.00      0.00      0.00      0.00
11:20:01 PM     92.74    432.49 114698.74      0.01  40413.14      0.00      0.00      0.00      0.00
11:25:01 PM     42.76    428.50  87769.28      0.00  29379.87      0.00      0.00      0.00      0.00
11:30:01 PM     36.83    260.34  85072.46      0.00  28234.50      0.00      0.00      0.00      0.00
11:35:01 PM     62.52    481.56  93150.67      0.00  31137.13      0.00      0.00      0.00      0.00
11:40:01 PM     43.50    459.10  90407.34      0.00  30241.70      0.00      0.00      0.00      0.00

12:00:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
09:30:01 PM    531792  32345400     98.38    475504  29583340  10211064     27.62
09:35:01 PM    512096  32365096     98.44    475896  29608660  10200916     27.59
09:40:01 PM    455584  32421608     98.61    476276  29638952  10211652     27.62
09:45:01 PM    425744  32451448     98.71    476604  29662384  10206044     27.60
09:50:01 PM    380960  32496232     98.84    477004  29684296  10243704     27.71
09:55:01 PM    385644  32491548     98.83    477312  29706940  10204776     27.60
10:00:01 PM    348604  32528588     98.94    477672  29725476  10228984     27.67
10:05:01 PM    279216  32597976     99.15    478104  29751016  10281748     27.81
10:10:01 PM    255168  32622024     99.22    478220  29769924  10247404     27.72
10:15:01 PM    321188  32556004     99.02    475124  29721912  10234500     27.68
10:20:02 PM    441660  32435532     98.66    472336  29610476  10246288     27.71
10:25:01 PM    440636  32436556     98.66    472636  29634960  10219940     27.64
10:30:01 PM    469872  32407320     98.57    473016  29651476  10208520     27.61
10:35:01 PM    414540  32462652     98.74    473424  29672728  10223964     27.65
10:40:01 PM    354632  32522560     98.92    473772  29693016  10247752     27.72
10:45:01 PM    333708  32543484     98.98    474092  29720256  10227204     27.66
10:50:01 PM    528004  32349188     98.39    469396  29549832  10219536     27.64
10:55:02 PM    499068  32378124     98.48    469692  29587140  10204836     27.60
11:00:01 PM    462980  32414212     98.59    470032  29606764  10235820     27.68
11:05:01 PM    449540  32427652     98.63    470368  29626136  10209788     27.61
11:10:01 PM    419984  32457208     98.72    470772  29644248  10214480     27.63
11:15:01 PM    429900  32447292     98.69    471104  29664292  10202344     27.59
11:20:01 PM    394852  32482340     98.80    471528  29698052  10207604     27.61
11:25:01 PM    345328  32531864     98.95    471904  29717264  10215632     27.63
11:30:01 PM    368224  32508968     98.88    472236  29733544  10206468     27.61
11:35:01 PM    321800  32555392     99.02    472528  29758548  10211820     27.62
11:40:01 PM    282520  32594672     99.14    472860  29776952  10243516     27.71

12:00:01 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
09:30:01 PM  dev253-5     66.29    146.33    483.33      9.50      6.27     94.53      2.08     13.78
09:35:01 PM  dev253-5    154.80    126.85   1192.96      8.53     28.57    184.59      1.45     22.43
09:40:01 PM  dev253-5     92.21    153.75    686.75      9.11     11.53    125.00      1.87     17.21
09:45:01 PM  dev253-5     39.66    116.99    279.32      9.99      0.42     10.66      2.61     10.36
09:50:01 PM  dev253-5    106.73     95.58    820.70      8.58     16.77    157.12      1.68     17.88
09:55:01 PM  dev253-5    107.90     99.36    831.46      8.63     16.05    148.76      1.71     18.42
10:00:01 PM  dev253-5     62.48     82.70    471.28      8.87      5.91     94.52      2.10     13.11
10:05:01 PM  dev253-5    137.84    121.69   1064.03      8.60     24.48    177.31      1.56     21.52
10:10:01 PM  dev253-5    107.93    104.16    827.83      8.64     16.69    155.04      1.68     18.11
10:15:01 PM  dev253-5     40.55    126.12    277.57      9.96      0.41     10.13      2.57     10.42
10:20:02 PM  dev253-5    104.33    136.77    793.49      8.92     16.97    162.69      1.76     18.35
10:25:01 PM  dev253-5    108.04    115.36    825.26      8.71     16.68    154.36      1.76     19.05
10:30:01 PM  dev253-5     69.72    105.66    523.05      9.02      7.45    106.92      1.90     13.25
10:35:01 PM  dev253-5    101.58     91.59    781.85      8.60     15.00    147.68      1.67     16.97
10:40:01 PM  dev253-5    107.50     97.91    827.17      8.61     17.68    164.49      1.77     19.06
10:45:01 PM  dev253-5     69.98    140.13    519.57      9.43      7.09    101.25      1.96     13.72
10:50:01 PM  dev253-5    104.30     83.31    806.12      8.53     16.18    155.10      1.65     17.16
10:55:02 PM  dev253-5    106.86    209.65    790.27      9.36     15.59    145.08      1.74     18.60
11:00:01 PM  dev253-5     50.42     92.08    371.52      9.19      3.05     62.16      2.28     11.52
11:05:01 PM  dev253-5    101.06     88.31    776.57      8.56     15.12    149.58      1.67     16.90
11:10:01 PM  dev253-5    103.08     77.73    798.23      8.50     17.14    166.25      1.74     17.90
11:15:01 PM  dev253-5     57.74     96.45    428.62      9.09      5.23     90.52      2.13     12.32
11:20:01 PM  dev253-5     97.73    185.18    727.38      9.34     14.64    149.84      1.94     18.92
11:25:01 PM  dev253-5     95.03     85.52    730.31      8.58     14.42    151.79      1.79     16.97
11:30:01 PM  dev253-5     53.76     73.65    404.47      8.89      3.94     73.25      2.17     11.64
11:35:01 PM  dev253-5    110.37    125.05    842.17      8.76     16.96    153.63      1.66     18.30
11:40:01 PM  dev253-5    103.93     87.00    801.59      8.55     16.01    154.00      1.73     18.00

As you can see there is no high io activity in this period of time but db is frozen. My opinion that i have incorrect kernel setting and/or i have a mistake in postgresql.conf. Because there is not high activity on db. load avg is about 1. When there is high traffic is about 1.15. This is from nagios monitoring system.

But sometimes load is about 4 and this time matches with sar %vmeff = 100% and database response time increase.

--
С уважением Селявка Евгений

Re: postgresql recommendation memory

От
Евгений Селявка
Дата:
Hello desmodemone, i look again and again through my sar statistics and i don't think that my db swapping in freeze time. For example:

sar -B
12:00:02 AM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
09:40:01 PM     66.13    352.43 195070.33      0.00  70627.21      0.00      0.00      0.00      0.00
09:45:01 PM     54.55    526.87 190893.02      0.00  67850.63     25.76      2.40     28.16    100.00
09:50:01 PM     74.97    509.70 200564.75      0.00  71807.80      0.00      0.00      0.00      0.00
09:55:01 PM     81.54    443.57 186335.43      0.05  68486.83    127.33    160.26    287.35     99.92
10:00:01 PM     62.03    528.46 169701.41      0.00  60487.37      0.00     15.62     15.62    100.00
10:05:01 PM     64.61    504.76 178725.60      0.00  66251.26      0.00     15.80     15.80    100.00
10:10:01 PM     80.06    336.47 172819.14      0.00  62379.45      0.00      0.00      0.00      0.00
10:15:01 PM     59.69    512.85 180228.56      0.00  64091.90      0.00      0.00      0.00      0.00

sar -S
12:00:02 AM kbswpfree kbswpused  %swpused  kbswpcad   %swpcad
09:40:01 PM   4095420       572      0.01       252     44.06
09:45:01 PM   4095420       572      0.01       252     44.06
09:50:01 PM   4095420       572      0.01       252     44.06
09:55:01 PM   4095420       572      0.01       252     44.06
10:00:01 PM   4095420       572      0.01       252     44.06
10:05:01 PM   4095420       572      0.01       252     44.06
10:10:01 PM   4095420       572      0.01       252     44.06
10:15:01 PM   4095420       572      0.01       252     44.06

In thist time as you can see swap usage didn't change at all. And there is dedicated server for postgresql, there are no more app on this server, except pacemaker+corosync for HA cluster. May be i read my sar statistics incorrect?

I set work_mem to 1/4 from available RAM. I have 32Gb RAM so i set shared_buffers to 8Gb.

Now i also set

vm.dirty_bytes=67108864 this value equal my Smart Array E200i Cache Size.
vm.dirty_background_bytes = 16777216 - 1/4 from vm.dirty_bytes

Next step try to set correct values for:
bgwriter_delay
bgwriter_lru_maxpages
bgwriter_lru_multiplier


And one more fact, if i cleanup fs cache. sync && echo 1 > /proc/sys/vm/drop_caches. OS releases about 10-12Gb memory and freeze time comes when fs cache comes again full. For example  one or two day there is no freeze on DB.


2013/11/4 desmodemone <desmodemone@gmail.com>
Hello,
             I see your request on performance mailing list. I think your server is swapping  and because yoru swap is in the same RAID disk with all (/ , database satastore etc ) you ncounter a freeze of system.

I think you have to analyze why you are swapping. Are ther eonly postgresql inside ? is it possible you are using too much work_mem memory ?


Have a nice day


2013/11/2 Евгений Селявка <evg.selyavka@gmail.com>
Please help with advice!

Server
HP ProLiant BL460c G1

Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                8
On-line CPU(s) list:   0-7
Thread(s) per core:    1
Core(s) per socket:    4
CPU socket(s):         2
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 23
Stepping:              6
CPU MHz:               3000.105
BogoMIPS:              6000.04
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              6144K
NUMA node0 CPU(s):     0-7

32GB RAM
[root@db3 ~]# numactl --hardware
available: 1 nodes (0)
node 0 cpus: 0 1 2 3 4 5 6 7
node 0 size: 32765 MB
node 0 free: 317 MB
node distances:
node   0
  0:  10


RAID1 2x146GB 10k rpm

CentOS release 6.3 (Final)
Linux 2.6.32-279.11.1.el6.x86_64 #1 SMP x86_64 GNU/Linux


kernel.msgmnb = 65536
kernel.msgmax = 65536
kernel.shmmax = 68719476736
kernel.shmall = 4294967296
vm.swappiness = 30
vm.dirty_background_bytes = 67108864
vm.dirty_bytes = 536870912


PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6 20120305 (Red Hat 4.4.6-4), 64-bit

listen_addresses = '*'
port = 5433
max_connections = 350
shared_buffers = 8GB
temp_buffers = 64MB
max_prepared_transactions = 350
work_mem = 256MB
maintenance_work_mem = 1GB
max_stack_depth = 4MB
max_files_per_process = 5000
effective_io_concurrency = 2
wal_level = hot_standby
synchronous_commit = off
checkpoint_segments = 64
checkpoint_timeout = 15min
checkpoint_completion_target = 0.75
max_wal_senders = 4
wal_sender_delay = 100ms
wal_keep_segments = 128
random_page_cost = 3.0
effective_cache_size = 18GB
autovacuum = on
autovacuum_max_workers = 5
autovacuum_vacuum_threshold = 900
autovacuum_analyze_threshold = 350
autovacuum_vacuum_scale_factor = 0.1
autovacuum_analyze_scale_factor = 0.05
log_min_duration_statement = 500
deadlock_timeout = 1s


DB size is about 20GB. There is no high write activity on DB. But periodically in postgresql log i see for example: "select 1" duration is about 500-1000 ms.

In this period of time response time from db terribly. This period of time not bound with high traffic. It is not other app on the server. There is not specific cron job on server.

Our app written on java and use jdbc to connect to DB and internal pooling. There is about 100 connection to DB. This is sar output:

12:00:01 AM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s pgscand/s pgsteal/s    %vmeff
09:30:01 PM     73.17    302.72 134790.16      0.00  46809.73      0.00      0.00      0.00      0.00
09:35:01 PM     63.42    655.80 131740.74      0.00  46182.74      0.00      0.00      0.00      0.00
09:40:01 PM     76.87    400.62 122375.34      0.00  42096.27      0.00      0.00      0.00      0.00
09:45:01 PM     58.49    198.33 121922.86      0.00  42765.27      0.00      0.00      0.00      0.00
09:50:01 PM     52.21    485.45 136775.65      0.15  49098.65      0.00      0.00      0.00      0.00
09:55:01 PM     49.68    476.75 130159.24      0.00  45192.54      0.00      0.00      0.00      0.00
10:00:01 PM     41.35    295.34 118655.80      0.00  40786.52      0.00      0.00      0.00      0.00
10:05:01 PM     60.84    593.85 129890.83      0.00  44170.92      0.00      0.00      0.00      0.00
10:10:01 PM     52.08    471.36 132773.63      0.00  46019.13      0.00      2.41      2.41    100.00
10:15:01 PM     73.93    196.50 129384.21      0.33  45255.76     65.92      1.19     66.87     99.64
10:20:02 PM     70.35    473.16 121940.38      0.11  44061.52     81.95     37.79    119.42     99.73
10:25:01 PM     57.84    471.69 130583.33      0.01  46093.33      0.00      0.00      0.00      0.00
10:30:01 PM     52.91    321.62 119264.34      0.01  41748.19      0.00      0.00      0.00      0.00
10:35:01 PM     47.13    451.78 114625.62      0.02  40600.98      0.00      0.00      0.00      0.00
10:40:01 PM     48.96    472.41 102352.79      0.00  35402.17      0.00      0.00      0.00      0.00
10:45:01 PM     70.07    321.33 121423.02      0.00  43052.04      0.00      0.00      0.00      0.00
10:50:01 PM     46.78    479.95 128938.09      0.02  37864.07    116.64     48.97    165.07     99.67
10:55:02 PM    104.84    453.55 109189.98      0.00  37583.50      0.00      0.00      0.00      0.00
11:00:01 PM     46.23    248.75 107313.26      0.00  37278.10      0.00      0.00      0.00      0.00
11:05:01 PM     44.28    446.41 115598.61      0.01  40070.61      0.00      0.00      0.00      0.00
11:10:01 PM     38.86    457.32 100240.71      0.00  34407.29      0.00      0.00      0.00      0.00
11:15:01 PM     48.23    275.60 104780.84      0.00  36183.84      0.00      0.00      0.00      0.00
11:20:01 PM     92.74    432.49 114698.74      0.01  40413.14      0.00      0.00      0.00      0.00
11:25:01 PM     42.76    428.50  87769.28      0.00  29379.87      0.00      0.00      0.00      0.00
11:30:01 PM     36.83    260.34  85072.46      0.00  28234.50      0.00      0.00      0.00      0.00
11:35:01 PM     62.52    481.56  93150.67      0.00  31137.13      0.00      0.00      0.00      0.00
11:40:01 PM     43.50    459.10  90407.34      0.00  30241.70      0.00      0.00      0.00      0.00

12:00:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
09:30:01 PM    531792  32345400     98.38    475504  29583340  10211064     27.62
09:35:01 PM    512096  32365096     98.44    475896  29608660  10200916     27.59
09:40:01 PM    455584  32421608     98.61    476276  29638952  10211652     27.62
09:45:01 PM    425744  32451448     98.71    476604  29662384  10206044     27.60
09:50:01 PM    380960  32496232     98.84    477004  29684296  10243704     27.71
09:55:01 PM    385644  32491548     98.83    477312  29706940  10204776     27.60
10:00:01 PM    348604  32528588     98.94    477672  29725476  10228984     27.67
10:05:01 PM    279216  32597976     99.15    478104  29751016  10281748     27.81
10:10:01 PM    255168  32622024     99.22    478220  29769924  10247404     27.72
10:15:01 PM    321188  32556004     99.02    475124  29721912  10234500     27.68
10:20:02 PM    441660  32435532     98.66    472336  29610476  10246288     27.71
10:25:01 PM    440636  32436556     98.66    472636  29634960  10219940     27.64
10:30:01 PM    469872  32407320     98.57    473016  29651476  10208520     27.61
10:35:01 PM    414540  32462652     98.74    473424  29672728  10223964     27.65
10:40:01 PM    354632  32522560     98.92    473772  29693016  10247752     27.72
10:45:01 PM    333708  32543484     98.98    474092  29720256  10227204     27.66
10:50:01 PM    528004  32349188     98.39    469396  29549832  10219536     27.64
10:55:02 PM    499068  32378124     98.48    469692  29587140  10204836     27.60
11:00:01 PM    462980  32414212     98.59    470032  29606764  10235820     27.68
11:05:01 PM    449540  32427652     98.63    470368  29626136  10209788     27.61
11:10:01 PM    419984  32457208     98.72    470772  29644248  10214480     27.63
11:15:01 PM    429900  32447292     98.69    471104  29664292  10202344     27.59
11:20:01 PM    394852  32482340     98.80    471528  29698052  10207604     27.61
11:25:01 PM    345328  32531864     98.95    471904  29717264  10215632     27.63
11:30:01 PM    368224  32508968     98.88    472236  29733544  10206468     27.61
11:35:01 PM    321800  32555392     99.02    472528  29758548  10211820     27.62
11:40:01 PM    282520  32594672     99.14    472860  29776952  10243516     27.71

12:00:01 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
09:30:01 PM  dev253-5     66.29    146.33    483.33      9.50      6.27     94.53      2.08     13.78
09:35:01 PM  dev253-5    154.80    126.85   1192.96      8.53     28.57    184.59      1.45     22.43
09:40:01 PM  dev253-5     92.21    153.75    686.75      9.11     11.53    125.00      1.87     17.21
09:45:01 PM  dev253-5     39.66    116.99    279.32      9.99      0.42     10.66      2.61     10.36
09:50:01 PM  dev253-5    106.73     95.58    820.70      8.58     16.77    157.12      1.68     17.88
09:55:01 PM  dev253-5    107.90     99.36    831.46      8.63     16.05    148.76      1.71     18.42
10:00:01 PM  dev253-5     62.48     82.70    471.28      8.87      5.91     94.52      2.10     13.11
10:05:01 PM  dev253-5    137.84    121.69   1064.03      8.60     24.48    177.31      1.56     21.52
10:10:01 PM  dev253-5    107.93    104.16    827.83      8.64     16.69    155.04      1.68     18.11
10:15:01 PM  dev253-5     40.55    126.12    277.57      9.96      0.41     10.13      2.57     10.42
10:20:02 PM  dev253-5    104.33    136.77    793.49      8.92     16.97    162.69      1.76     18.35
10:25:01 PM  dev253-5    108.04    115.36    825.26      8.71     16.68    154.36      1.76     19.05
10:30:01 PM  dev253-5     69.72    105.66    523.05      9.02      7.45    106.92      1.90     13.25
10:35:01 PM  dev253-5    101.58     91.59    781.85      8.60     15.00    147.68      1.67     16.97
10:40:01 PM  dev253-5    107.50     97.91    827.17      8.61     17.68    164.49      1.77     19.06
10:45:01 PM  dev253-5     69.98    140.13    519.57      9.43      7.09    101.25      1.96     13.72
10:50:01 PM  dev253-5    104.30     83.31    806.12      8.53     16.18    155.10      1.65     17.16
10:55:02 PM  dev253-5    106.86    209.65    790.27      9.36     15.59    145.08      1.74     18.60
11:00:01 PM  dev253-5     50.42     92.08    371.52      9.19      3.05     62.16      2.28     11.52
11:05:01 PM  dev253-5    101.06     88.31    776.57      8.56     15.12    149.58      1.67     16.90
11:10:01 PM  dev253-5    103.08     77.73    798.23      8.50     17.14    166.25      1.74     17.90
11:15:01 PM  dev253-5     57.74     96.45    428.62      9.09      5.23     90.52      2.13     12.32
11:20:01 PM  dev253-5     97.73    185.18    727.38      9.34     14.64    149.84      1.94     18.92
11:25:01 PM  dev253-5     95.03     85.52    730.31      8.58     14.42    151.79      1.79     16.97
11:30:01 PM  dev253-5     53.76     73.65    404.47      8.89      3.94     73.25      2.17     11.64
11:35:01 PM  dev253-5    110.37    125.05    842.17      8.76     16.96    153.63      1.66     18.30
11:40:01 PM  dev253-5    103.93     87.00    801.59      8.55     16.01    154.00      1.73     18.00

As you can see there is no high io activity in this period of time but db is frozen. My opinion that i have incorrect kernel setting and/or i have a mistake in postgresql.conf. Because there is not high activity on db. load avg is about 1. When there is high traffic is about 1.15. This is from nagios monitoring system.

But sometimes load is about 4 and this time matches with sar %vmeff = 100% and database response time increase.

--
С уважением Селявка Евгений




--
С уважением Селявка Евгений

Re: postgresql recommendation memory

От
Josh Berkus
Дата:
> PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6
> 20120305 (Red Hat 4.4.6-4), 64-bit

First, you should be using the latest update version.  You are currently
missing multiple patch updates.

> listen_addresses = '*'
> port = 5433
> max_connections = 350
> shared_buffers = 8GB

Try dropping shared_buffers to 2GB.  We've seen some issues on certain
systems with 8GB shared buffers.

--
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


Re: postgresql recommendation memory

От
Michael Paquier
Дата:
On Tue, Nov 5, 2013 at 8:37 AM, Евгений Селявка <evg.selyavka@gmail.com> wrote:
> I set work_mem to 1/4 from available RAM. I have 32Gb RAM so i set
> shared_buffers to 8Gb.
I am sure you are mentioning shared_buffers here and not work_mem.
work_mem is a per-operation parameter. So if you are using an
operation involving work_mem more than 4 times simultaneously on
different sessions you'll swap pretty quickly ;)
--
Michael


Re: postgresql recommendation memory

От
Scott Marlowe
Дата:
On Sat, Nov 2, 2013 at 12:54 PM, Евгений Селявка <evg.selyavka@gmail.com> wrote:

SNIP

> max_connections = 350
SNIP
> work_mem = 256MB

These two settings together are quite dangerous.

1: Look into a db pooler to get your connections needed down to no
more than 2x # of cores in your machine. I recommend pgbouncer
2: Your current settings mean that if you max out connections and each
of those connections does a large sort at the same time, they'll try
to allocated 256MB*250 or 89,600MB. If you run one job that can use
that much work_mem, then set it by connection or user for that one
connection or user only. Allowing any process to allocate 256MB is
usually a bad idea, and doubly so if you allow 350 incoming
connections.  Dropping work_mem to 16MB means a ceiling of about 5G
memory if you get swamped and each query is averaging 1 sort. Note
that a single query CAN run > 1 sort, so it's not a hard limit and you
could still swamp your machine, but it's less likely.
3: Turn off the OOM killer. On a production DB it is unwelcome and
could cause some serious issues.
4: vm.swappiness = 0 is my normal setting. I also tend to just turn
off swap on big memory linux boxes because linux cirtual memory is
often counterproductive on db servers. Some people might even say it
is broken, I tend to agree. Better to have a process fail to allocate
memory and report it in logs than have a machine slow to a crawl under
load. But that's your choice. And 64G isn't that big, so you're in the
in between zone for me on whether to just turn off swap.
5: turn down shared_buffers to 1 or 2G.
6: lower all your vm dirty ratio / size settings so that the machine
never has to write a lot at one time.

Basically don't TRY to allocate all the memory, try to leave 75% or so
free for the OS to allocate as buffers. After getting a baseline for
performance under load then make bigger changes.


Re: postgresql recommendation memory

От
Евгений Селявка
Дата:
Thank you for advice.

1)
First off all, we use java app with jdbc driver wich can pool connection, thats why i don't think that this is good decision to put one more pooler between app and DB. May be someone have an experience with  pgbouncer and jdbc and could give a good advice with advantage and disadvantage of this architecture.

2) Yes this is my error in configuration and every day or two i decrease work_mem and monitor for my system and postgresql log try to find record about temp files. I will decrease work_mem to 16MB or may be 32MB. But really i repeat that i have about 100 concurrent connections to my DB. I set this value with big reserve. I can't change this setting because db in production.

3)
I also read about disabling OOM killer but when i set vm.overcommit_memory=2. My DB work couple of a day and then pacemaker stop it because i set wrong value for vm.overcommit_ratio i set it to 90. And when pacemaker try to execute psql -c 'select 1' postmaster return 'out of memory' and pacemaker stop my production DB. I need to know what is the correct value for vm.overcommit_ratio or how postmaster allocate memory when fork may be formula or something? If i get answers on this question i can pick up vm.overcommit_ratio.

4)
About vm.swappiness totally agree and i turn it on for experiment goals, because i have problem and my db freeze. I play with different kernel setting try to pick up correct value. In the beginning  i set it to 0 and all works fine.

5)
I will be planing downtime and decrease max_connection and shared_buffers.

6) I set to this values
vm.dirty_bytes=67108864 this value equal my Smart Array E200i Cache Size.
vm.dirty_background_bytes = 16777216 - 1/4 from vm.dirty_bytes


<Basically don't TRY to allocate all the memory, try to leave 75% or so
<free for the OS to allocate as buffers. After getting a baseline for
<performance under load then make bigger changes

This means that i should set effective_cache_size to 75% of my RAM?



2013/11/6 Scott Marlowe <scott.marlowe@gmail.com>
On Sat, Nov 2, 2013 at 12:54 PM, Евгений Селявка <evg.selyavka@gmail.com> wrote:

SNIP

> max_connections = 350
SNIP
> work_mem = 256MB

These two settings together are quite dangerous.

1: Look into a db pooler to get your connections needed down to no
more than 2x # of cores in your machine. I recommend pgbouncer
2: Your current settings mean that if you max out connections and each
of those connections does a large sort at the same time, they'll try
to allocated 256MB*250 or 89,600MB. If you run one job that can use
that much work_mem, then set it by connection or user for that one
connection or user only. Allowing any process to allocate 256MB is
usually a bad idea, and doubly so if you allow 350 incoming
connections.  Dropping work_mem to 16MB means a ceiling of about 5G
memory if you get swamped and each query is averaging 1 sort. Note
that a single query CAN run > 1 sort, so it's not a hard limit and you
could still swamp your machine, but it's less likely.
3: Turn off the OOM killer. On a production DB it is unwelcome and
could cause some serious issues.
4: vm.swappiness = 0 is my normal setting. I also tend to just turn
off swap on big memory linux boxes because linux cirtual memory is
often counterproductive on db servers. Some people might even say it
is broken, I tend to agree. Better to have a process fail to allocate
memory and report it in logs than have a machine slow to a crawl under
load. But that's your choice. And 64G isn't that big, so you're in the
in between zone for me on whether to just turn off swap.
5: turn down shared_buffers to 1 or 2G.
6: lower all your vm dirty ratio / size settings so that the machine
never has to write a lot at one time.

Basically don't TRY to allocate all the memory, try to leave 75% or so
free for the OS to allocate as buffers. After getting a baseline for
performance under load then make bigger changes.



--
С уважением Селявка Евгений

Re: postgresql recommendation memory

От
Scott Marlowe
Дата:
On Wed, Nov 6, 2013 at 1:53 AM, Евгений Селявка <evg.selyavka@gmail.com> wrote:
> Thank you for advice.
>
> 1)
> First off all, we use java app with jdbc driver wich can pool connection,
> thats why i don't think that this is good decision to put one more pooler
> between app and DB. May be someone have an experience with  pgbouncer and
> jdbc and could give a good advice with advantage and disadvantage of this
> architecture.

That's a mostly religious argument. I.e. you're going on feeling here
that pooling in jdbc alone is better than either jdbc/pgbouncer or
plain pgbouncer alone. My experience is that jdbc pooling is not in
the same category as pgbouncer for configuration and performance.
Either way, get that connection count down to something reasonable.

If you've routinely got 200+ connections you've got too many. Again,
2x cores is about max on most machines for maximum throughput. 3x to
4x is the absolute max really. Unless you've got a machine with 40+
cores, you should be running a lot fewer connections.

Basically pgbouncer is veyr lightweight and can take thousands of
incoming connections and balance them into a few dozen connections to
the database.

> 2) Yes this is my error in configuration and every day or two i decrease
> work_mem and monitor for my system and postgresql log try to find record
> about temp files. I will decrease work_mem to 16MB or may be 32MB. But
> really i repeat that i have about 100 concurrent connections to my DB. I set
> this value with big reserve. I can't change this setting because db in
> production.

If you've got one job that needs lots of mem and lot of jobs that
don't, look at my recommendation to lower work_mem for all the low mem
requiring jobs. If you can split those heavy lifting jobs out to
another user, then you can use a pooler like pgbouncer to do admission
control by limiting that heavy lifter to a few connections at a time.
The rest will wait in line behind it.

> 3)
> I also read about disabling OOM killer but when i set
> vm.overcommit_memory=2. My DB work couple of a day and then pacemaker stop
> it because i set wrong value for vm.overcommit_ratio i set it to 90. And
> when pacemaker try to execute psql -c 'select 1' postmaster return 'out of
> memory' and pacemaker stop my production DB. I need to know what is the
> correct value for vm.overcommit_ratio or how postmaster allocate memory when
> fork may be formula or something? If i get answers on this question i can
> pick up vm.overcommit_ratio.

You are definitely running your server out of memory then. Can you
throw say 256G into it? It's usually worth every penny to throw memory
at the problem. Reducing usage will help a lot for now tho.

> 4)
> About vm.swappiness totally agree and i turn it on for experiment goals,
> because i have problem and my db freeze. I play with different kernel
> setting try to pick up correct value. In the beginning  i set it to 0 and
> all works fine.

The linux kernel is crazy about swapping. even with swappinness set to
0, it'll swap stuff out once it's gotten old. suddenly shared_buffers
are on disk not in ram etc. On big memory machines (we use 128G to 1TB
memory at work) I just turn it off because the bigger the memory the
dumber it seems to get.

> 6) I set to this values
> vm.dirty_bytes=67108864 this value equal my Smart Array E200i Cache Size.
> vm.dirty_background_bytes = 16777216 - 1/4 from vm.dirty_bytes

Ahh I usually set the ratio for dirty_ratio to something small like 5
to 10 and dirty_background_ratio to 1. The less bursty the dirty
background stuff is the better in db land.
Your numbers are fairly low assuming dirty bytes in in bytes and not
kilobytes or something. :) I never use it do I'm not sure one way or
the other.

> <Basically don't TRY to allocate all the memory, try to leave 75% or so
> <free for the OS to allocate as buffers. After getting a baseline for
> <performance under load then make bigger changes
>
> This means that i should set effective_cache_size to 75% of my RAM?

That is a reasonable number, especially once you get the machine to
stop using so much memory for sorts and shared_buffers. The idea is
that when you look at free, after the db's been up for a day or two,
you should see 75% or so of your RAM allocated to cache / buffers.

Good luck, keep us informed on your progress.


Re: postgresql recommendation memory

От
Scott Marlowe
Дата:
As a followup to my previous message, here's a response curve on a 48
core server I used at my last job.

https://picasaweb.google.com/lh/photo/aPYHPWPivPsS79fG3AKtZNMTjNZETYmyPJy0liipFm0?feat=directlink

Note the peak at around 38 to 48 cores. This is the sweetspot on this
server for connections. If I allow this server to get to 68 to 70 hard
working connections my throughput drops by half.  It's far better to
have inbound connections sit in a queue in a connection pooler and
wait their turn than to have them all clobber this server at one time.

Of course the issue here is active connections, not idle ones. So if
you have say 150 idle and 50 active connections on this server you'd
be doing fine. Until load started to climb. Then as the number of
active connections went past 50, it would get slower in a non-linear
fashion. Since the throughput is half, at 70 or so connections each
query would now be 1/4th or so as fast as they had been when we had 35
or so.

So it's a good idea to get some idea of where that sweet spot is on
your server and stay under it with a good pooler.


Re: postgresql recommendation memory

От
Scott Marlowe
Дата:
Also also, the definitive page for postgres and dirty pages etc is here:

http://www.westnet.com/~gsmith/content/linux-pdflush.htm

Not sure if it's out of date with more modern kernels. Maybe Greg will chime in.


Re: postgresql recommendation memory

От
Merlin Moncure
Дата:
On Sat, Nov 2, 2013 at 1:54 PM, Евгений Селявка <evg.selyavka@gmail.com> wrote:
> Please help with advice!
>
> Server
> HP ProLiant BL460c G1
>
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                8
> On-line CPU(s) list:   0-7
> Thread(s) per core:    1
> Core(s) per socket:    4
> CPU socket(s):         2
> NUMA node(s):          1
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 23
> Stepping:              6
> CPU MHz:               3000.105
> BogoMIPS:              6000.04
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              6144K
> NUMA node0 CPU(s):     0-7
>
> 32GB RAM
> [root@db3 ~]# numactl --hardware
> available: 1 nodes (0)
> node 0 cpus: 0 1 2 3 4 5 6 7
> node 0 size: 32765 MB
> node 0 free: 317 MB
> node distances:
> node   0
>   0:  10
>
>
> RAID1 2x146GB 10k rpm
>
> CentOS release 6.3 (Final)
> Linux 2.6.32-279.11.1.el6.x86_64 #1 SMP x86_64 GNU/Linux
>
>
> kernel.msgmnb = 65536
> kernel.msgmax = 65536
> kernel.shmmax = 68719476736
> kernel.shmall = 4294967296
> vm.swappiness = 30
> vm.dirty_background_bytes = 67108864
> vm.dirty_bytes = 536870912
>
>
> PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6
> 20120305 (Red Hat 4.4.6-4), 64-bit
>
> listen_addresses = '*'
> port = 5433
> max_connections = 350
> shared_buffers = 8GB
> temp_buffers = 64MB
> max_prepared_transactions = 350
> work_mem = 256MB
> maintenance_work_mem = 1GB
> max_stack_depth = 4MB
> max_files_per_process = 5000
> effective_io_concurrency = 2
> wal_level = hot_standby
> synchronous_commit = off
> checkpoint_segments = 64
> checkpoint_timeout = 15min
> checkpoint_completion_target = 0.75
> max_wal_senders = 4
> wal_sender_delay = 100ms
> wal_keep_segments = 128
> random_page_cost = 3.0
> effective_cache_size = 18GB
> autovacuum = on
> autovacuum_max_workers = 5
> autovacuum_vacuum_threshold = 900
> autovacuum_analyze_threshold = 350
> autovacuum_vacuum_scale_factor = 0.1
> autovacuum_analyze_scale_factor = 0.05
> log_min_duration_statement = 500
> deadlock_timeout = 1s
>
>
> DB size is about 20GB. There is no high write activity on DB. But
> periodically in postgresql log i see for example: "select 1" duration is
> about 500-1000 ms.
>
> In this period of time response time from db terribly. This period of time
> not bound with high traffic. It is not other app on the server. There is not
> specific cron job on server.
>
> Our app written on java and use jdbc to connect to DB and internal pooling.
> There is about 100 connection to DB. This is sar output:
>
> 12:00:01 AM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s
> pgscand/s pgsteal/s    %vmeff
> 09:30:01 PM     73.17    302.72 134790.16      0.00  46809.73      0.00
> 0.00      0.00      0.00
> 09:35:01 PM     63.42    655.80 131740.74      0.00  46182.74      0.00
> 0.00      0.00      0.00
> 09:40:01 PM     76.87    400.62 122375.34      0.00  42096.27      0.00
> 0.00      0.00      0.00
> 09:45:01 PM     58.49    198.33 121922.86      0.00  42765.27      0.00
> 0.00      0.00      0.00
> 09:50:01 PM     52.21    485.45 136775.65      0.15  49098.65      0.00
> 0.00      0.00      0.00
> 09:55:01 PM     49.68    476.75 130159.24      0.00  45192.54      0.00
> 0.00      0.00      0.00
> 10:00:01 PM     41.35    295.34 118655.80      0.00  40786.52      0.00
> 0.00      0.00      0.00
> 10:05:01 PM     60.84    593.85 129890.83      0.00  44170.92      0.00
> 0.00      0.00      0.00
> 10:10:01 PM     52.08    471.36 132773.63      0.00  46019.13      0.00
> 2.41      2.41    100.00
> 10:15:01 PM     73.93    196.50 129384.21      0.33  45255.76     65.92
> 1.19     66.87     99.64
> 10:20:02 PM     70.35    473.16 121940.38      0.11  44061.52     81.95
> 37.79    119.42     99.73
> 10:25:01 PM     57.84    471.69 130583.33      0.01  46093.33      0.00
> 0.00      0.00      0.00
> 10:30:01 PM     52.91    321.62 119264.34      0.01  41748.19      0.00
> 0.00      0.00      0.00
> 10:35:01 PM     47.13    451.78 114625.62      0.02  40600.98      0.00
> 0.00      0.00      0.00
> 10:40:01 PM     48.96    472.41 102352.79      0.00  35402.17      0.00
> 0.00      0.00      0.00
> 10:45:01 PM     70.07    321.33 121423.02      0.00  43052.04      0.00
> 0.00      0.00      0.00
> 10:50:01 PM     46.78    479.95 128938.09      0.02  37864.07    116.64
> 48.97    165.07     99.67
> 10:55:02 PM    104.84    453.55 109189.98      0.00  37583.50      0.00
> 0.00      0.00      0.00
> 11:00:01 PM     46.23    248.75 107313.26      0.00  37278.10      0.00
> 0.00      0.00      0.00
> 11:05:01 PM     44.28    446.41 115598.61      0.01  40070.61      0.00
> 0.00      0.00      0.00
> 11:10:01 PM     38.86    457.32 100240.71      0.00  34407.29      0.00
> 0.00      0.00      0.00
> 11:15:01 PM     48.23    275.60 104780.84      0.00  36183.84      0.00
> 0.00      0.00      0.00
> 11:20:01 PM     92.74    432.49 114698.74      0.01  40413.14      0.00
> 0.00      0.00      0.00
> 11:25:01 PM     42.76    428.50  87769.28      0.00  29379.87      0.00
> 0.00      0.00      0.00
> 11:30:01 PM     36.83    260.34  85072.46      0.00  28234.50      0.00
> 0.00      0.00      0.00
> 11:35:01 PM     62.52    481.56  93150.67      0.00  31137.13      0.00
> 0.00      0.00      0.00
> 11:40:01 PM     43.50    459.10  90407.34      0.00  30241.70      0.00
> 0.00      0.00      0.00
>
> 12:00:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit
> %commit
> 09:30:01 PM    531792  32345400     98.38    475504  29583340  10211064
> 27.62
> 09:35:01 PM    512096  32365096     98.44    475896  29608660  10200916
> 27.59
> 09:40:01 PM    455584  32421608     98.61    476276  29638952  10211652
> 27.62
> 09:45:01 PM    425744  32451448     98.71    476604  29662384  10206044
> 27.60
> 09:50:01 PM    380960  32496232     98.84    477004  29684296  10243704
> 27.71
> 09:55:01 PM    385644  32491548     98.83    477312  29706940  10204776
> 27.60
> 10:00:01 PM    348604  32528588     98.94    477672  29725476  10228984
> 27.67
> 10:05:01 PM    279216  32597976     99.15    478104  29751016  10281748
> 27.81
> 10:10:01 PM    255168  32622024     99.22    478220  29769924  10247404
> 27.72
> 10:15:01 PM    321188  32556004     99.02    475124  29721912  10234500
> 27.68
> 10:20:02 PM    441660  32435532     98.66    472336  29610476  10246288
> 27.71
> 10:25:01 PM    440636  32436556     98.66    472636  29634960  10219940
> 27.64
> 10:30:01 PM    469872  32407320     98.57    473016  29651476  10208520
> 27.61
> 10:35:01 PM    414540  32462652     98.74    473424  29672728  10223964
> 27.65
> 10:40:01 PM    354632  32522560     98.92    473772  29693016  10247752
> 27.72
> 10:45:01 PM    333708  32543484     98.98    474092  29720256  10227204
> 27.66
> 10:50:01 PM    528004  32349188     98.39    469396  29549832  10219536
> 27.64
> 10:55:02 PM    499068  32378124     98.48    469692  29587140  10204836
> 27.60
> 11:00:01 PM    462980  32414212     98.59    470032  29606764  10235820
> 27.68
> 11:05:01 PM    449540  32427652     98.63    470368  29626136  10209788
> 27.61
> 11:10:01 PM    419984  32457208     98.72    470772  29644248  10214480
> 27.63
> 11:15:01 PM    429900  32447292     98.69    471104  29664292  10202344
> 27.59
> 11:20:01 PM    394852  32482340     98.80    471528  29698052  10207604
> 27.61
> 11:25:01 PM    345328  32531864     98.95    471904  29717264  10215632
> 27.63
> 11:30:01 PM    368224  32508968     98.88    472236  29733544  10206468
> 27.61
> 11:35:01 PM    321800  32555392     99.02    472528  29758548  10211820
> 27.62
> 11:40:01 PM    282520  32594672     99.14    472860  29776952  10243516
> 27.71
>
> 12:00:01 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz
> await     svctm     %util
> 09:30:01 PM  dev253-5     66.29    146.33    483.33      9.50      6.27
> 94.53      2.08     13.78
> 09:35:01 PM  dev253-5    154.80    126.85   1192.96      8.53     28.57
> 184.59      1.45     22.43
> 09:40:01 PM  dev253-5     92.21    153.75    686.75      9.11     11.53
> 125.00      1.87     17.21
> 09:45:01 PM  dev253-5     39.66    116.99    279.32      9.99      0.42
> 10.66      2.61     10.36
> 09:50:01 PM  dev253-5    106.73     95.58    820.70      8.58     16.77
> 157.12      1.68     17.88
> 09:55:01 PM  dev253-5    107.90     99.36    831.46      8.63     16.05
> 148.76      1.71     18.42
> 10:00:01 PM  dev253-5     62.48     82.70    471.28      8.87      5.91
> 94.52      2.10     13.11
> 10:05:01 PM  dev253-5    137.84    121.69   1064.03      8.60     24.48
> 177.31      1.56     21.52
> 10:10:01 PM  dev253-5    107.93    104.16    827.83      8.64     16.69
> 155.04      1.68     18.11
> 10:15:01 PM  dev253-5     40.55    126.12    277.57      9.96      0.41
> 10.13      2.57     10.42
> 10:20:02 PM  dev253-5    104.33    136.77    793.49      8.92     16.97
> 162.69      1.76     18.35
> 10:25:01 PM  dev253-5    108.04    115.36    825.26      8.71     16.68
> 154.36      1.76     19.05
> 10:30:01 PM  dev253-5     69.72    105.66    523.05      9.02      7.45
> 106.92      1.90     13.25
> 10:35:01 PM  dev253-5    101.58     91.59    781.85      8.60     15.00
> 147.68      1.67     16.97
> 10:40:01 PM  dev253-5    107.50     97.91    827.17      8.61     17.68
> 164.49      1.77     19.06
> 10:45:01 PM  dev253-5     69.98    140.13    519.57      9.43      7.09
> 101.25      1.96     13.72
> 10:50:01 PM  dev253-5    104.30     83.31    806.12      8.53     16.18
> 155.10      1.65     17.16
> 10:55:02 PM  dev253-5    106.86    209.65    790.27      9.36     15.59
> 145.08      1.74     18.60
> 11:00:01 PM  dev253-5     50.42     92.08    371.52      9.19      3.05
> 62.16      2.28     11.52
> 11:05:01 PM  dev253-5    101.06     88.31    776.57      8.56     15.12
> 149.58      1.67     16.90
> 11:10:01 PM  dev253-5    103.08     77.73    798.23      8.50     17.14
> 166.25      1.74     17.90
> 11:15:01 PM  dev253-5     57.74     96.45    428.62      9.09      5.23
> 90.52      2.13     12.32
> 11:20:01 PM  dev253-5     97.73    185.18    727.38      9.34     14.64
> 149.84      1.94     18.92
> 11:25:01 PM  dev253-5     95.03     85.52    730.31      8.58     14.42
> 151.79      1.79     16.97
> 11:30:01 PM  dev253-5     53.76     73.65    404.47      8.89      3.94
> 73.25      2.17     11.64
> 11:35:01 PM  dev253-5    110.37    125.05    842.17      8.76     16.96
> 153.63      1.66     18.30
> 11:40:01 PM  dev253-5    103.93     87.00    801.59      8.55     16.01
> 154.00      1.73     18.00
>
> As you can see there is no high io activity in this period of time but db is
> frozen. My opinion that i have incorrect kernel setting and/or i have a
> mistake in postgresql.conf. Because there is not high activity on db. load
> avg is about 1. When there is high traffic is about 1.15. This is from
> nagios monitoring system.
>
> But sometimes load is about 4 and this time matches with sar %vmeff = 100%
> and database response time increase.


Need to see: iowait, system load.

Also consider installing perf and grabbing a profile while issue is happening.

Probably this problem will go way with 2GB shared buffers, but before
doing that we'd like to diagnose this if possible.

merlin


Re: postgresql recommendation memory

От
David Rees
Дата:
On Wed, Nov 6, 2013 at 8:35 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> That's a mostly religious argument. I.e. you're going on feeling here
> that pooling in jdbc alone is better than either jdbc/pgbouncer or
> plain pgbouncer alone. My experience is that jdbc pooling is not in
> the same category as pgbouncer for configuration and performance.
> Either way, get that connection count down to something reasonable.
>
> Basically pgbouncer is veyr lightweight and can take thousands of
> incoming connections and balance them into a few dozen connections to
> the database.

I've had a look at the pgbouncer docs, and it appears that there are 3
modes: session, transaction and statement.

Session pooling appears to be the most conservative, but unless I am
missing something, I don't see how it will reduce the number of actual
database connections when used in between a JDBC connection pool?

If using Transaction pooling, it appears that you need to disable
prepared statements in JDBC - any the FAQ stays you need to apply a
patch to the JDBC driver to get it (admittedly, that FAQ appears that
it might be out of date given the JDBC version it references).

-Dave


Re: postgresql recommendation memory

От
Евгений Селявка
Дата:
Scott thank you for advice.

> If you've got one job that needs lots of mem and lot of jobs that
> don't, look at my recommendation to lower work_mem for all the low mem
> requiring jobs. If you can split those heavy lifting jobs out to
> another user, then you can use a pooler like pgbouncer to do admission
> control by limiting that heavy lifter to a few connections at a time.
> The rest will wait in line behind it.

I will deacrease this parametr to 32MB because this DB cluster is for WEB application, thats why there is notning to do heavyweight querys.

> You are definitely running your server out of memory then. Can you
> throw say 256G into it? It's usually worth every penny to throw memory
> at the problem. Reducing usage will help a lot for now tho.

Unfortunately no, all that i can grow up my memory for 72GB. If i set another 32GB to server, what shared_buffer i should use 8GB, 2GB or 18GB 1/4 from 72GB?
In this period of time i can set vm.overcommit_ratio=500 or 700 but this is very dangerous i think. Because all process can allocate (RAM+SWAP)*vm.overcommit_ratio/100 as i understand?

Once again thank you very much for link, i am read about it and graph. About max_connection i reply later, now i calculate it.


2013/11/6 Scott Marlowe <scott.marlowe@gmail.com>
Also also, the definitive page for postgres and dirty pages etc is here:

http://www.westnet.com/~gsmith/content/linux-pdflush.htm

Not sure if it's out of date with more modern kernels. Maybe Greg will chime in.



--
С уважением Селявка Евгений

Re: postgresql recommendation memory

От
Евгений Селявка
Дата:
All my sar statistics
sar -r
11:40:02 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit
01:15:01 PM    269108  32608084     99.18    367144  29707240  10289444     27.83
01:20:01 PM    293560  32583632     99.11    367428  29674272  10287136     27.82
01:25:01 PM    417640  32459552     98.73    366148  29563220  10289220     27.83
01:30:01 PM    399448  32477744     98.79    366812  29596520  10298876     27.85
01:35:01 PM    432332  32444860     98.69    367412  29616524  10277484     27.80

sar -d -p
11:40:02 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz     await     svctm     %util
01:15:01 PM vg_root-lv_pgsql     73.10    116.59    540.15      8.98      6.98     95.44      1.61     11.78
01:20:01 PM vg_root-lv_pgsql     71.39    170.21    508.21      9.50      5.44     76.23      1.72     12.31
01:25:01 PM vg_root-lv_pgsql     54.32    136.21    381.53      9.53      3.58     65.98      1.81      9.85
01:30:01 PM vg_root-lv_pgsql     81.35    167.98    585.25      9.26      8.15    100.13      1.63     13.25
01:35:01 PM vg_root-lv_pgsql     66.75    126.02    482.72      9.12      5.59     83.73      1.78     11.90

sar -u ALL
11:40:02 AM     CPU      %usr     %nice      %sys   %iowait    %steal      %irq     %soft    %guest     %idle
01:15:01 PM     all      8.57      0.00      1.52      1.46      0.00      0.00      0.05      0.00     88.40
01:20:01 PM     all      8.50      0.00      1.53      1.61      0.00      0.00      0.05      0.00     88.31
01:25:01 PM     all      9.00      0.00      1.78      1.27      0.00      0.00      0.06      0.00     87.89
01:30:01 PM     all      9.58      0.00      1.63      1.71      0.00      0.00      0.06      0.00     87.01
01:35:01 PM     all      8.75      0.00      1.47      1.57      0.00      0.00      0.06      0.00     88.15

As you say i install perf and get statistics with command

perf record -g -f -u postgres -e block:block_rq_*,syscalls:sys_enter_write,syscalls:sys_enter_fsync

But i really don't understand perf report, what values i need to see. Could you help me with advice how to read perf report. What events from perf list i shoud trace, and what the good and bad values in this report depend of my hardware?




2013/11/7 Merlin Moncure <mmoncure@gmail.com>
On Sat, Nov 2, 2013 at 1:54 PM, Евгений Селявка <evg.selyavka@gmail.com> wrote:
> Please help with advice!
>
> Server
> HP ProLiant BL460c G1
>
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                8
> On-line CPU(s) list:   0-7
> Thread(s) per core:    1
> Core(s) per socket:    4
> CPU socket(s):         2
> NUMA node(s):          1
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 23
> Stepping:              6
> CPU MHz:               3000.105
> BogoMIPS:              6000.04
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              6144K
> NUMA node0 CPU(s):     0-7
>
> 32GB RAM
> [root@db3 ~]# numactl --hardware
> available: 1 nodes (0)
> node 0 cpus: 0 1 2 3 4 5 6 7
> node 0 size: 32765 MB
> node 0 free: 317 MB
> node distances:
> node   0
>   0:  10
>
>
> RAID1 2x146GB 10k rpm
>
> CentOS release 6.3 (Final)
> Linux 2.6.32-279.11.1.el6.x86_64 #1 SMP x86_64 GNU/Linux
>
>
> kernel.msgmnb = 65536
> kernel.msgmax = 65536
> kernel.shmmax = 68719476736
> kernel.shmall = 4294967296
> vm.swappiness = 30
> vm.dirty_background_bytes = 67108864
> vm.dirty_bytes = 536870912
>
>
> PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc (GCC) 4.4.6
> 20120305 (Red Hat 4.4.6-4), 64-bit
>
> listen_addresses = '*'
> port = 5433
> max_connections = 350
> shared_buffers = 8GB
> temp_buffers = 64MB
> max_prepared_transactions = 350
> work_mem = 256MB
> maintenance_work_mem = 1GB
> max_stack_depth = 4MB
> max_files_per_process = 5000
> effective_io_concurrency = 2
> wal_level = hot_standby
> synchronous_commit = off
> checkpoint_segments = 64
> checkpoint_timeout = 15min
> checkpoint_completion_target = 0.75
> max_wal_senders = 4
> wal_sender_delay = 100ms
> wal_keep_segments = 128
> random_page_cost = 3.0
> effective_cache_size = 18GB
> autovacuum = on
> autovacuum_max_workers = 5
> autovacuum_vacuum_threshold = 900
> autovacuum_analyze_threshold = 350
> autovacuum_vacuum_scale_factor = 0.1
> autovacuum_analyze_scale_factor = 0.05
> log_min_duration_statement = 500
> deadlock_timeout = 1s
>
>
> DB size is about 20GB. There is no high write activity on DB. But
> periodically in postgresql log i see for example: "select 1" duration is
> about 500-1000 ms.
>
> In this period of time response time from db terribly. This period of time
> not bound with high traffic. It is not other app on the server. There is not
> specific cron job on server.
>
> Our app written on java and use jdbc to connect to DB and internal pooling.
> There is about 100 connection to DB. This is sar output:
>
> 12:00:01 AM  pgpgin/s pgpgout/s   fault/s  majflt/s  pgfree/s pgscank/s
> pgscand/s pgsteal/s    %vmeff
> 09:30:01 PM     73.17    302.72 134790.16      0.00  46809.73      0.00
> 0.00      0.00      0.00
> 09:35:01 PM     63.42    655.80 131740.74      0.00  46182.74      0.00
> 0.00      0.00      0.00
> 09:40:01 PM     76.87    400.62 122375.34      0.00  42096.27      0.00
> 0.00      0.00      0.00
> 09:45:01 PM     58.49    198.33 121922.86      0.00  42765.27      0.00
> 0.00      0.00      0.00
> 09:50:01 PM     52.21    485.45 136775.65      0.15  49098.65      0.00
> 0.00      0.00      0.00
> 09:55:01 PM     49.68    476.75 130159.24      0.00  45192.54      0.00
> 0.00      0.00      0.00
> 10:00:01 PM     41.35    295.34 118655.80      0.00  40786.52      0.00
> 0.00      0.00      0.00
> 10:05:01 PM     60.84    593.85 129890.83      0.00  44170.92      0.00
> 0.00      0.00      0.00
> 10:10:01 PM     52.08    471.36 132773.63      0.00  46019.13      0.00
> 2.41      2.41    100.00
> 10:15:01 PM     73.93    196.50 129384.21      0.33  45255.76     65.92
> 1.19     66.87     99.64
> 10:20:02 PM     70.35    473.16 121940.38      0.11  44061.52     81.95
> 37.79    119.42     99.73
> 10:25:01 PM     57.84    471.69 130583.33      0.01  46093.33      0.00
> 0.00      0.00      0.00
> 10:30:01 PM     52.91    321.62 119264.34      0.01  41748.19      0.00
> 0.00      0.00      0.00
> 10:35:01 PM     47.13    451.78 114625.62      0.02  40600.98      0.00
> 0.00      0.00      0.00
> 10:40:01 PM     48.96    472.41 102352.79      0.00  35402.17      0.00
> 0.00      0.00      0.00
> 10:45:01 PM     70.07    321.33 121423.02      0.00  43052.04      0.00
> 0.00      0.00      0.00
> 10:50:01 PM     46.78    479.95 128938.09      0.02  37864.07    116.64
> 48.97    165.07     99.67
> 10:55:02 PM    104.84    453.55 109189.98      0.00  37583.50      0.00
> 0.00      0.00      0.00
> 11:00:01 PM     46.23    248.75 107313.26      0.00  37278.10      0.00
> 0.00      0.00      0.00
> 11:05:01 PM     44.28    446.41 115598.61      0.01  40070.61      0.00
> 0.00      0.00      0.00
> 11:10:01 PM     38.86    457.32 100240.71      0.00  34407.29      0.00
> 0.00      0.00      0.00
> 11:15:01 PM     48.23    275.60 104780.84      0.00  36183.84      0.00
> 0.00      0.00      0.00
> 11:20:01 PM     92.74    432.49 114698.74      0.01  40413.14      0.00
> 0.00      0.00      0.00
> 11:25:01 PM     42.76    428.50  87769.28      0.00  29379.87      0.00
> 0.00      0.00      0.00
> 11:30:01 PM     36.83    260.34  85072.46      0.00  28234.50      0.00
> 0.00      0.00      0.00
> 11:35:01 PM     62.52    481.56  93150.67      0.00  31137.13      0.00
> 0.00      0.00      0.00
> 11:40:01 PM     43.50    459.10  90407.34      0.00  30241.70      0.00
> 0.00      0.00      0.00
>
> 12:00:01 AM kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit
> %commit
> 09:30:01 PM    531792  32345400     98.38    475504  29583340  10211064
> 27.62
> 09:35:01 PM    512096  32365096     98.44    475896  29608660  10200916
> 27.59
> 09:40:01 PM    455584  32421608     98.61    476276  29638952  10211652
> 27.62
> 09:45:01 PM    425744  32451448     98.71    476604  29662384  10206044
> 27.60
> 09:50:01 PM    380960  32496232     98.84    477004  29684296  10243704
> 27.71
> 09:55:01 PM    385644  32491548     98.83    477312  29706940  10204776
> 27.60
> 10:00:01 PM    348604  32528588     98.94    477672  29725476  10228984
> 27.67
> 10:05:01 PM    279216  32597976     99.15    478104  29751016  10281748
> 27.81
> 10:10:01 PM    255168  32622024     99.22    478220  29769924  10247404
> 27.72
> 10:15:01 PM    321188  32556004     99.02    475124  29721912  10234500
> 27.68
> 10:20:02 PM    441660  32435532     98.66    472336  29610476  10246288
> 27.71
> 10:25:01 PM    440636  32436556     98.66    472636  29634960  10219940
> 27.64
> 10:30:01 PM    469872  32407320     98.57    473016  29651476  10208520
> 27.61
> 10:35:01 PM    414540  32462652     98.74    473424  29672728  10223964
> 27.65
> 10:40:01 PM    354632  32522560     98.92    473772  29693016  10247752
> 27.72
> 10:45:01 PM    333708  32543484     98.98    474092  29720256  10227204
> 27.66
> 10:50:01 PM    528004  32349188     98.39    469396  29549832  10219536
> 27.64
> 10:55:02 PM    499068  32378124     98.48    469692  29587140  10204836
> 27.60
> 11:00:01 PM    462980  32414212     98.59    470032  29606764  10235820
> 27.68
> 11:05:01 PM    449540  32427652     98.63    470368  29626136  10209788
> 27.61
> 11:10:01 PM    419984  32457208     98.72    470772  29644248  10214480
> 27.63
> 11:15:01 PM    429900  32447292     98.69    471104  29664292  10202344
> 27.59
> 11:20:01 PM    394852  32482340     98.80    471528  29698052  10207604
> 27.61
> 11:25:01 PM    345328  32531864     98.95    471904  29717264  10215632
> 27.63
> 11:30:01 PM    368224  32508968     98.88    472236  29733544  10206468
> 27.61
> 11:35:01 PM    321800  32555392     99.02    472528  29758548  10211820
> 27.62
> 11:40:01 PM    282520  32594672     99.14    472860  29776952  10243516
> 27.71
>
> 12:00:01 AM       DEV       tps  rd_sec/s  wr_sec/s  avgrq-sz  avgqu-sz
> await     svctm     %util
> 09:30:01 PM  dev253-5     66.29    146.33    483.33      9.50      6.27
> 94.53      2.08     13.78
> 09:35:01 PM  dev253-5    154.80    126.85   1192.96      8.53     28.57
> 184.59      1.45     22.43
> 09:40:01 PM  dev253-5     92.21    153.75    686.75      9.11     11.53
> 125.00      1.87     17.21
> 09:45:01 PM  dev253-5     39.66    116.99    279.32      9.99      0.42
> 10.66      2.61     10.36
> 09:50:01 PM  dev253-5    106.73     95.58    820.70      8.58     16.77
> 157.12      1.68     17.88
> 09:55:01 PM  dev253-5    107.90     99.36    831.46      8.63     16.05
> 148.76      1.71     18.42
> 10:00:01 PM  dev253-5     62.48     82.70    471.28      8.87      5.91
> 94.52      2.10     13.11
> 10:05:01 PM  dev253-5    137.84    121.69   1064.03      8.60     24.48
> 177.31      1.56     21.52
> 10:10:01 PM  dev253-5    107.93    104.16    827.83      8.64     16.69
> 155.04      1.68     18.11
> 10:15:01 PM  dev253-5     40.55    126.12    277.57      9.96      0.41
> 10.13      2.57     10.42
> 10:20:02 PM  dev253-5    104.33    136.77    793.49      8.92     16.97
> 162.69      1.76     18.35
> 10:25:01 PM  dev253-5    108.04    115.36    825.26      8.71     16.68
> 154.36      1.76     19.05
> 10:30:01 PM  dev253-5     69.72    105.66    523.05      9.02      7.45
> 106.92      1.90     13.25
> 10:35:01 PM  dev253-5    101.58     91.59    781.85      8.60     15.00
> 147.68      1.67     16.97
> 10:40:01 PM  dev253-5    107.50     97.91    827.17      8.61     17.68
> 164.49      1.77     19.06
> 10:45:01 PM  dev253-5     69.98    140.13    519.57      9.43      7.09
> 101.25      1.96     13.72
> 10:50:01 PM  dev253-5    104.30     83.31    806.12      8.53     16.18
> 155.10      1.65     17.16
> 10:55:02 PM  dev253-5    106.86    209.65    790.27      9.36     15.59
> 145.08      1.74     18.60
> 11:00:01 PM  dev253-5     50.42     92.08    371.52      9.19      3.05
> 62.16      2.28     11.52
> 11:05:01 PM  dev253-5    101.06     88.31    776.57      8.56     15.12
> 149.58      1.67     16.90
> 11:10:01 PM  dev253-5    103.08     77.73    798.23      8.50     17.14
> 166.25      1.74     17.90
> 11:15:01 PM  dev253-5     57.74     96.45    428.62      9.09      5.23
> 90.52      2.13     12.32
> 11:20:01 PM  dev253-5     97.73    185.18    727.38      9.34     14.64
> 149.84      1.94     18.92
> 11:25:01 PM  dev253-5     95.03     85.52    730.31      8.58     14.42
> 151.79      1.79     16.97
> 11:30:01 PM  dev253-5     53.76     73.65    404.47      8.89      3.94
> 73.25      2.17     11.64
> 11:35:01 PM  dev253-5    110.37    125.05    842.17      8.76     16.96
> 153.63      1.66     18.30
> 11:40:01 PM  dev253-5    103.93     87.00    801.59      8.55     16.01
> 154.00      1.73     18.00
>
> As you can see there is no high io activity in this period of time but db is
> frozen. My opinion that i have incorrect kernel setting and/or i have a
> mistake in postgresql.conf. Because there is not high activity on db. load
> avg is about 1. When there is high traffic is about 1.15. This is from
> nagios monitoring system.
>
> But sometimes load is about 4 and this time matches with sar %vmeff = 100%
> and database response time increase.


Need to see: iowait, system load.

Also consider installing perf and grabbing a profile while issue is happening.

Probably this problem will go way with 2GB shared buffers, but before
doing that we'd like to diagnose this if possible.

merlin



--
С уважением Селявка Евгений

Re: postgresql recommendation memory

От
Sergey Konoplev
Дата:
On Sat, Nov 2, 2013 at 11:54 AM, Евгений Селявка <evg.selyavka@gmail.com> wrote:
> DB size is about 20GB. There is no high write activity on DB. But
> periodically in postgresql log i see for example: "select 1" duration is
> about 500-1000 ms.
>
> In this period of time response time from db terribly. This period of time
> not bound with high traffic. It is not other app on the server. There is not
> specific cron job on server.

Have you shown all the modified kernel settings? Don't you use huge
pages accidentally? It might be a transparent huge pages
defragmentation issue, the symptoms look similar.

Another thing that might cause it is network. Try to monitor it at the
time of these stalls.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com


Re: postgresql recommendation memory

От
Евгений Селявка
Дата:
Scott hi, i calculate all of my jdbc pool size. Maximum is 300 connections from components wich use jdbc. I don't think that this is a good idea use pgbouncer, because our application using spring framework which using jdbc and prepared statement. I try to talk with our developer about disabling prepared statement in this framework, they don't want do this. Thats why i will try to upgrade HW and buy CPU with more core as you say based on formula 3-4xcore. But most of this connection is idle. This is a web based app not a datawarehouse, thats why all this connection is lightwear.

About my db freeze i set this kernel parameter:
echo 1048576 > /proc/sys/vm/min_free_kbytes
echo 80 > /proc/sys/vm/vfs_cache_pressure

And my freeze intervals is steel smaller. I try to dig deeper.


2013/11/6 Scott Marlowe <scott.marlowe@gmail.com>
Also also, the definitive page for postgres and dirty pages etc is here:

http://www.westnet.com/~gsmith/content/linux-pdflush.htm

Not sure if it's out of date with more modern kernels. Maybe Greg will chime in.



--
С уважением Селявка Евгений

Re: postgresql recommendation memory

От
Scott Marlowe
Дата:
On Mon, Nov 11, 2013 at 1:09 AM, Евгений Селявка <evg.selyavka@gmail.com> wrote:
> Scott hi, i calculate all of my jdbc pool size. Maximum is 300 connections
> from components wich use jdbc. I don't think that this is a good idea use
> pgbouncer, because our application using spring framework which using jdbc
> and prepared statement. I try to talk with our developer about disabling
> prepared statement in this framework, they don't want do this. Thats why i
> will try to upgrade HW and buy CPU with more core as you say based on
> formula 3-4xcore. But most of this connection is idle. This is a web based
> app not a datawarehouse, thats why all this connection is lightwear.
>
> About my db freeze i set this kernel parameter:
> echo 1048576 > /proc/sys/vm/min_free_kbytes
> echo 80 > /proc/sys/vm/vfs_cache_pressure
>
> And my freeze intervals is steel smaller. I try to dig deeper.

well you can hopefully reduce connections from jdbc pooling then. The
fact that the connections are idle is good.

The problem you run into is what happens when things go into
"overload" I.e. when the db server starts to slow down, more of those
idle connections become not idle. If all 300 are then waiting on the
db server, it will slow to a crawl and eventually fall over.


Re: postgresql recommendation memory

От
"ktm@rice.edu"
Дата:
On Mon, Nov 11, 2013 at 09:14:43AM -0700, Scott Marlowe wrote:
> On Mon, Nov 11, 2013 at 1:09 AM, Евгений Селявка <evg.selyavka@gmail.com> wrote:
> > Scott hi, i calculate all of my jdbc pool size. Maximum is 300 connections
> > from components wich use jdbc. I don't think that this is a good idea use
> > pgbouncer, because our application using spring framework which using jdbc
> > and prepared statement. I try to talk with our developer about disabling
> > prepared statement in this framework, they don't want do this. Thats why i
> > will try to upgrade HW and buy CPU with more core as you say based on
> > formula 3-4xcore. But most of this connection is idle. This is a web based
> > app not a datawarehouse, thats why all this connection is lightwear.
> >
> > About my db freeze i set this kernel parameter:
> > echo 1048576 > /proc/sys/vm/min_free_kbytes
> > echo 80 > /proc/sys/vm/vfs_cache_pressure
> >
> > And my freeze intervals is steel smaller. I try to dig deeper.
>
> well you can hopefully reduce connections from jdbc pooling then. The
> fact that the connections are idle is good.
>
> The problem you run into is what happens when things go into
> "overload" I.e. when the db server starts to slow down, more of those
> idle connections become not idle. If all 300 are then waiting on the
> db server, it will slow to a crawl and eventually fall over.
>
+1 I would definitely encourage the use of pgbouncer to map the 300 connections
to a saner number that your DB can actually handle. We had a similar problem
and very, very occasionally the server would "lockup". Once we put the
resource management pooler in place, performance has been the same best-case
and much, much better worse-case and NO lockups.

Regards,
Ken


Re: postgresql recommendation memory

От
Sergey Konoplev
Дата:
On Sun, Nov 10, 2013 at 11:48 PM, Евгений Селявка
<evg.selyavka@gmail.com> wrote:
> Sergey, yes this is all of my kernel setting. I don't use THP intentionally. I think that i need a special library to
useTHP with postgresql like this http://code.google.com/p/pgcookbook/wiki/Database_Server_Configuration. This is my
valuesfor this kernel settings: 

Then it is definitely not THP.

ps. BTW, pgcookbook has been moved to GitHub several weeks ago
https://github.com/grayhemp/pgcookbook.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com


Re: postgresql recommendation memory

От
Sergey Konoplev
Дата:
On Mon, Nov 11, 2013 at 8:14 AM, Scott Marlowe <scott.marlowe@gmail.com> wrote:
> well you can hopefully reduce connections from jdbc pooling then. The
> fact that the connections are idle is good.
>
> The problem you run into is what happens when things go into
> "overload" I.e. when the db server starts to slow down, more of those
> idle connections become not idle. If all 300 are then waiting on the
> db server, it will slow to a crawl and eventually fall over.

+1.

Try to monitor your connections, for example like this

while true; do
echo -n "$(date): "
psql -XAt -c "select count(1) from pg_stat_activity"
sleep 1
done > activity.log

and its correlation with slowdowns.

--
Kind regards,
Sergey Konoplev
PostgreSQL Consultant and DBA

http://www.linkedin.com/in/grayhemp
+1 (415) 867-9984, +7 (901) 903-0499, +7 (988) 888-1979
gray.ru@gmail.com


Re: postgresql recommendation memory

От
Jeff Janes
Дата:
On Thu, Nov 7, 2013 at 2:13 AM, Евгений Селявка <evg.selyavka@gmail.com> wrote:
All my sar statistics
... 
sar -u ALL
11:40:02 AM     CPU      %usr     %nice      %sys   %iowait    %steal      %irq     %soft    %guest     %idle
01:15:01 PM     all      8.57      0.00      1.52      1.46      0.00      0.00      0.05      0.00     88.40
01:20:01 PM     all      8.50      0.00      1.53      1.61      0.00      0.00      0.05      0.00     88.31
01:25:01 PM     all      9.00      0.00      1.78      1.27      0.00      0.00      0.06      0.00     87.89
01:30:01 PM     all      9.58      0.00      1.63      1.71      0.00      0.00      0.06      0.00     87.01
01:35:01 PM     all      8.75      0.00      1.47      1.57      0.00      0.00      0.06      0.00     88.15


Did a freeze-up occur in there someplace?  Otherwise, that is not not so useful.

You should try to decrease the sar interval to 1 min if you can.  The extra overhead is negligible and the extra information can be very valuable.  I'd also have something like "vmstat 5" running and capture that.  Although perhaps one of the options to sar other than -u capture that same information, I know little of the other sar options.

Cheers,

Jeff

Re: postgresql recommendation memory

От
Евгений Селявка
Дата:
Sergey  i will try to monitor my pgsql activity for several days.

Scott about pooling connection. Yesterday i start read about spring implementation of jdbc our app use dbcp implementation: http://commons.apache.org/proper/commons-dbcp/configuration.html
So i have this parameter in config
        <property name="maxActive" value="15" />
        <property name="maxIdle" value="1" />
        <property name="maxWait" value="10000" />
        <property name="validationQuery" value="SELECT 1" />
        <property name="removeAbandoned" value="true" />
        <property name="removeAbandonedTimeout" value="60" />
And i have several app that initialize and use this driver. Then i calculate all of this initializing datasource - result is 380 active connections. I simple add all maxActive node directive from all app dbcp config xml. But as i write earlier this is about 100 concurrent connection when i do 'select count(1) from pg_stat_activity'. I think that inexpedient to install pgbouncer in front off  db, or may be somebody in this  list have some experience with pgbouncer and dbcp? Why i don't want to use pgbouncer:
1. I should use session mode because transaction doesn't support prepared statement.
2. If i use session mode, i will have the same number max connection to DB, because dbcp open connection to pgbouncer pgbouncer to DB and nobody close this connection, only dbcp first, if i understand all correct. So if overload happen i will have the same 380 heavyweight connection to DB and all breaks down?

I think that i should correctly configure my dbcp pool config xml file.

2013/11/12 Jeff Janes <jeff.janes@gmail.com>
On Thu, Nov 7, 2013 at 2:13 AM, Евгений Селявка <evg.selyavka@gmail.com> wrote:
All my sar statistics
... 
sar -u ALL
11:40:02 AM     CPU      %usr     %nice      %sys   %iowait    %steal      %irq     %soft    %guest     %idle
01:15:01 PM     all      8.57      0.00      1.52      1.46      0.00      0.00      0.05      0.00     88.40
01:20:01 PM     all      8.50      0.00      1.53      1.61      0.00      0.00      0.05      0.00     88.31
01:25:01 PM     all      9.00      0.00      1.78      1.27      0.00      0.00      0.06      0.00     87.89
01:30:01 PM     all      9.58      0.00      1.63      1.71      0.00      0.00      0.06      0.00     87.01
01:35:01 PM     all      8.75      0.00      1.47      1.57      0.00      0.00      0.06      0.00     88.15


Did a freeze-up occur in there someplace?  Otherwise, that is not not so useful.

You should try to decrease the sar interval to 1 min if you can.  The extra overhead is negligible and the extra information can be very valuable.  I'd also have something like "vmstat 5" running and capture that.  Although perhaps one of the options to sar other than -u capture that same information, I know little of the other sar options.

Cheers,

Jeff



--
С уважением Селявка Евгений