Обсуждение: Memory Usage and OpenBSD
I'm not getting something about the best way to set up a server using PostgreSQL as a backend for a busy web server running drupal. The postgresql performance folks http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server say that in a server with more that 1GB of ram "a reasonable starting value for shared_buffers is 1/4 of the memory in your system." Okay, this server has 4GB of ram. pgtune suggests the following values for predominately web based usage: maintenance_work_mem = 240MB # pgtune wizard 2010-01-27 checkpoint_completion_target = 0.7 # pgtune wizard 2010-01-27 effective_cache_size = 2816MB # pgtune wizard 2010-01-27 work_mem = 18MB # pgtune wizard 2010-01-27 wal_buffers = 4MB # pgtune wizard 2010-01-27 checkpoint_segments = 8 # pgtune wizard 2010-01-27 shared_buffers = 960MB # pgtune wizard 2010-01-27 max_connections = 200 # pgtune wizard 2010-01-27 Here is where I'm not doing something right. With my shared_buffers at 960MB, I need to adjust kern.shminfo.shmmax to 1GB (1073741824) to get postgres to start. I thought I'd need to also adjust kern.shminfo.shmmall value as well but that seems to change automatically whenever I adjust kern.shminfo.shmmax. $ sysctl -a | grep kern.s kern.securelevel=1 kern.saved_ids=1 kern.somaxconn=128 kern.sominconn=80 kern.sysvmsg=1 kern.sysvsem=1 kern.sysvshm=1 kern.stackgap_random=262144 kern.splassert=1 kern.seminfo.semmni=256 kern.seminfo.semmns=2048 kern.seminfo.semmnu=30 kern.seminfo.semmsl=60 kern.seminfo.semopm=100 kern.seminfo.semume=10 kern.seminfo.semusz=100 kern.seminfo.semvmx=32767 kern.seminfo.semaem=16384 kern.shminfo.shmmax=1073741824 kern.shminfo.shmmin=1 kern.shminfo.shmmni=128 kern.shminfo.shmseg=128 kern.shminfo.shmall=262144 At these values postgres will start and top shows a large amount of memory still free: Memory: Real: 55M/465M act/tot Free: 3433M Swap: 0K/8197M used/tot Running a simple select only pgbench test against it will fail with an out of memory error as it tries to vacuum --analyze the newly created database with 7500000 tuples. pgbench -i -s 75 -h varley.openvistas.net -U _postgresql pgbench vacuumdb --analyze -h varley.openvistas.net U _postgresql pgbench When I run this and have top refreshing every second, I never see the free memory drop below 3400M, so I'm not sure what memory we are running out of. systat -i shows similar amounts of memory yet free. The postgresql logs can perhaps shed some light on this for someone more knowledgeable than myself. Here is what is in the log immediately prior to the out of memory error: 2010-01-27 14:07:26.326319500 TopMemoryContext: 60712 total in 7 blocks; 4488 free (8 chunks); 56224 used 2010-01-27 14:07:26.326374500 TopTransactionContext: 8192 total in 1 blocks; 5408 free (0 chunks); 2784 used 2010-01-27 14:07:26.326389500 Operator class cache: 8192 total in 1 blocks; 3848 free (0 chunks); 4344 used 2010-01-27 14:07:26.326412500 MessageContext: 8192 total in 1 blocks; 5488 free (1 chunks); 2704 used 2010-01-27 14:07:26.326434500 smgr relation table: 8192 total in 1 blocks; 2816 free (0 chunks); 5376 used 2010-01-27 14:07:26.326440500 TransactionAbortContext: 32768 total in 1 blocks; 32752 free (0 chunks); 16 used 2010-01-27 14:07:26.326462500 Portal hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used 2010-01-27 14:07:26.326469500 PortalMemory: 8192 total in 1 blocks; 8040 free (0 chunks); 152 used 2010-01-27 14:07:26.326490500 PortalHeapMemory: 15360 total in 4 blocks; 7944 free (12 chunks); 7416 used 2010-01-27 14:07:26.326496500 ExecutorState: 8192 total in 1 blocks; 7928 free (0 chunks); 264 used 2010-01-27 14:07:26.326517500 ExprContext: 0 total in 0 blocks; 0 free (0 chunks); 0 used 2010-01-27 14:07:26.326539500 TupleSort: 24600 total in 2 blocks; 7584 free (0 chunks); 17016 used 2010-01-27 14:07:26.326562500 TupleSort: 92266520 total in 17 blocks; 8379568 free (9 chunks); 83886952 used 2010-01-27 14:07:26.326584500 Relcache by OID: 8192 total in 1 blocks; 2856 free (0 chunks); 5336 used 2010-01-27 14:07:26.326624500 CacheMemoryContext: 667696 total in 20 blocks; 235240 free (7 chunks); 432456 used 2010-01-27 14:07:26.326646500 pgbench_accounts_pkey: 1024 total in 1 blocks; 344 free (0 chunks); 680 used 2010-01-27 14:07:26.326652500 pg_constraint_contypid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used 2010-01-27 14:07:26.326674500 pg_constraint_conrelid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used 2010-01-27 14:07:26.326681500 pg_constraint_conname_nsp_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used 2010-01-27 14:07:26.326703500 pg_shdepend_reference_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used 2010-01-27 14:07:26.326724500 pg_index_indrelid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used 2010-01-27 14:07:26.326730500 pg_inherits_relid_seqno_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used 2010-01-27 14:07:26.326752500 pg_description_o_c_o_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used 2010-01-27 14:07:26.326758500 pg_shdepend_depender_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used 2010-01-27 14:07:26.326780500 pg_depend_reference_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used 2010-01-27 14:07:26.326802500 pg_depend_depender_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used 2010-01-27 14:07:26.326809500 pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 152 free (0 chunks); 872 used 2010-01-27 14:07:26.326830500 pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used 2010-01-27 14:07:26.326837500 pg_enum_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used 2010-01-27 14:07:26.326860500 pg_class_relname_nsp_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used 2010-01-27 14:07:26.326866500 pg_foreign_server_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used 2010-01-27 14:07:26.326888500 pg_statistic_relid_att_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used 2010-01-27 14:07:26.326909500 pg_cast_source_target_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used 2010-01-27 14:07:26.326915500 pg_language_name_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used 2010-01-27 14:07:26.326937500 pg_authid_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used 2010-01-27 14:07:26.326943500 pg_amop_fam_strat_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used 2010-01-27 14:07:26.326966500 pg_index_indexrelid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used 2010-01-27 14:07:26.326986500 pg_ts_template_tmplname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used 2010-01-27 14:07:26.326993500 pg_ts_config_map_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used 2010-01-27 14:07:26.327014500 pg_opclass_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used 2010-01-27 14:07:26.327021500 pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used 2010-01-27 14:07:26.327044500 pg_auth_members_member_role_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used 2010-01-27 14:07:26.327065500 pg_ts_dict_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used 2010-01-27 14:07:26.327072500 pg_conversion_default_index: 1024 total in 1 blocks; 128 free (0 chunks); 896 used 2010-01-27 14:07:26.327095500 pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 128 free (0 chunks); 896 used 2010-01-27 14:07:26.327116500 pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used 2010-01-27 14:07:26.327123500 pg_enum_typid_label_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used 2010-01-27 14:07:26.327144500 pg_ts_config_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used 2010-01-27 14:07:26.327151500 pg_user_mapping_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used 2010-01-27 14:07:26.327173500 pg_opfamily_am_name_nsp_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used 2010-01-27 14:07:26.327194500 pg_type_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used 2010-01-27 14:07:26.327201500 pg_aggregate_fnoid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used 2010-01-27 14:07:26.327222500 pg_constraint_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used 2010-01-27 14:07:26.327228500 pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used 2010-01-27 14:07:26.327251500 pg_ts_parser_prsname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used 2010-01-27 14:07:26.327272500 pg_ts_config_cfgname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used 2010-01-27 14:07:26.327278500 pg_ts_parser_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used 2010-01-27 14:07:26.327299500 pg_operator_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used 2010-01-27 14:07:26.327306500 pg_namespace_nspname_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used 2010-01-27 14:07:26.327329500 pg_ts_template_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used 2010-01-27 14:07:26.327335500 pg_amop_opr_fam_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used 2010-01-27 14:07:26.327357500 pg_ts_dict_dictname_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used 2010-01-27 14:07:26.327377500 pg_auth_members_role_member_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used 2010-01-27 14:07:26.327384500 pg_type_typname_nsp_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used 2010-01-27 14:07:26.327405500 pg_opfamily_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used 2010-01-27 14:07:26.327412500 pg_class_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used 2010-01-27 14:07:26.327449500 pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 192 free (0 chunks); 832 used 2010-01-27 14:07:26.327456500 pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used 2010-01-27 14:07:26.327479500 pg_proc_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used 2010-01-27 14:07:26.327499500 pg_language_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used 2010-01-27 14:07:26.327506500 pg_namespace_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used 2010-01-27 14:07:26.327527500 pg_database_oid_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used 2010-01-27 14:07:26.327534500 pg_amproc_fam_proc_index: 1024 total in 1 blocks; 88 free (0 chunks); 936 used 2010-01-27 14:07:26.327556500 pg_authid_rolname_index: 1024 total in 1 blocks; 304 free (0 chunks); 720 used 2010-01-27 14:07:26.327577500 pg_foreign_server_name_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used 2010-01-27 14:07:26.327584500 pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 240 free (0 chunks); 784 used 2010-01-27 14:07:26.327605500 pg_conversion_oid_index: 1024 total in 1 blocks; 344 free (0 chunks); 680 used 2010-01-27 14:07:26.327612500 pg_user_mapping_user_server_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used 2010-01-27 14:07:26.327634500 pg_conversion_name_nsp_index: 1024 total in 1 blocks; 280 free (0 chunks); 744 used 2010-01-27 14:07:26.327664500 MdSmgr: 8192 total in 1 blocks; 7960 free (0 chunks); 232 used 2010-01-27 14:07:26.327671500 LOCALLOCK hash: 8192 total in 1 blocks; 3912 free (0 chunks); 4280 used 2010-01-27 14:07:26.327723500 Timezones: 78520 total in 2 blocks; 5968 free (0 chunks); 72552 used 2010-01-27 14:07:26.327729500 ErrorContext: 8192 total in 1 blocks; 8176 free (6 chunks); 16 used 2010-01-27 14:07:26.327874500 172.16.0.1(22842):_postgresql@pgbench:[17225]:ERROR: out of memory 2010-01-27 14:07:26.327881500 172.16.0.1(22842):_postgresql@pgbench:[17225]:DETAIL: Failed on request of size 67108864. Jeff Ross
Jeff Ross <jross@wykids.org> writes: > Running a simple select only pgbench test against it will fail with an out of > memory error as it tries to vacuum --analyze the newly created database with > 7500000 tuples. Better look at the "ulimit" values the postmaster is started with; you shouldn't be getting that out-of-memory error AFAICS, unless there's a very restrictive ulimit on what an individual process can allocate. regards, tom lane
Tom Lane wrote: > Jeff Ross <jross@wykids.org> writes: >> Running a simple select only pgbench test against it will fail with an out of >> memory error as it tries to vacuum --analyze the newly created database with >> 7500000 tuples. > > Better look at the "ulimit" values the postmaster is started with; > you shouldn't be getting that out-of-memory error AFAICS, unless > there's a very restrictive ulimit on what an individual process can > allocate. > > regards, tom lane > Thanks! OpenBSD makes a _postgresql user on install and it is in the daemon class with the following values: daemon:\ :ignorenologin:\ :datasize=infinity:\ :maxproc=infinity:\ :openfiles-cur=128:\ :stacksize-cur=8M:\ :localcipher=blowfish,8:\ :tc=default: The OpenBSD specific readme suggests making a special postgresql login class and bumping openfiles-cur to 768 but I don't see how that helps here. Would bumping stacksize also help? Jeff Ross
Jeff Ross <jross@wykids.org> writes: > Tom Lane wrote: >> Better look at the "ulimit" values the postmaster is started with; > OpenBSD makes a _postgresql user on install and it is in the daemon class with > the following values: > daemon:\ > :ignorenologin:\ > :datasize=infinity:\ > :maxproc=infinity:\ > :openfiles-cur=128:\ > :stacksize-cur=8M:\ > :localcipher=blowfish,8:\ > :tc=default: > The OpenBSD specific readme suggests making a special postgresql login class > and bumping openfiles-cur to 768 but I don't see how that helps here. Would > bumping stacksize also help? Huh --- that looks okay to me. The stacksize is not what you're hitting (and 8MB is plenty sufficient anyway, at least unless you like recursive functions). I concur with the readme that 128 open files isn't much, but that's not what you're hitting either. The only thing that comes to mind is that on Linux there are several different ulimit values that are related to maximum per-process data space. I don't know BSD very well so I can't say if datasize is the only such value for BSD, but it'd be worth checking. (Hmm, on OS X which is at least partly BSDish, I see -m and -v in addition to -d, so I'm suspicious OpenBSD might have these concepts too.) regards, tom lane
On Wed, Jan 27, 2010 at 4:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > related to maximum per-process data space. I don't know BSD very well > so I can't say if datasize is the only such value for BSD, but it'd be > worth checking. (Hmm, on OS X which is at least partly BSDish, I see > -m and -v in addition to -d, so I'm suspicious OpenBSD might have these > concepts too.) Isn't the usual advice here is to log the ulimit setting from the pg startup script so you can what it really is for the user at the moment they're starting up the db? (I think some guy named Tom mentioned doing that before.)
2010/1/28 Scott Marlowe <scott.marlowe@gmail.com>: >> related to maximum per-process data space. I don't know BSD very well >> so I can't say if datasize is the only such value for BSD, but it'd be >> worth checking. (Hmm, on OS X which is at least partly BSDish, I see >> -m and -v in addition to -d, so I'm suspicious OpenBSD might have these >> concepts too.) > > Isn't the usual advice here is to log the ulimit setting from the pg > startup script so you can what it really is for the user at the moment > they're starting up the db? (I think some guy named Tom mentioned > doing that before.) I think that "su" is enough: root@testbed:/root # su -l _postgresql $ ulimit -a time(cpu-seconds) unlimited file(blocks) unlimited coredump(blocks) unlimited data(kbytes) 1048576 stack(kbytes) 8192 lockedmem(kbytes) 672777 memory(kbytes) 2016764 nofiles(descriptors) 768 processes 1024 I think Jeff will get almost same values with login.conf showed above. -- antonvm
On Tue, Feb 9, 2010 at 3:18 AM, Anton Maksimenkov <anton200@gmail.com> wrote: > 2010/1/28 Scott Marlowe <scott.marlowe@gmail.com>: >>> related to maximum per-process data space. I don't know BSD very well >>> so I can't say if datasize is the only such value for BSD, but it'd be >>> worth checking. (Hmm, on OS X which is at least partly BSDish, I see >>> -m and -v in addition to -d, so I'm suspicious OpenBSD might have these >>> concepts too.) >> >> Isn't the usual advice here is to log the ulimit setting from the pg >> startup script so you can what it really is for the user at the moment >> they're starting up the db? (I think some guy named Tom mentioned >> doing that before.) > > I think that "su" is enough: In previous discussions it was mentioned that startup scripts often inherit different settings from the default shell of a user, hence the need to check it from within the startup script.
2010/2/9 Scott Marlowe <scott.marlowe@gmail.com>: > On Tue, Feb 9, 2010 at 3:18 AM, Anton Maksimenkov <anton200@gmail.com> wrote: >>> Isn't the usual advice here is to log the ulimit setting from the pg >>> startup script so you can what it really is for the user at the moment >> I think that "su" is enough: > In previous discussions it was mentioned that startup scripts often > inherit different settings from the default shell of a user, hence the Hm. My postgres starts with following /etc/rc.local ... if [ -x /usr/local/bin/pg_ctl ]; then echo -n ' postgresql' su -l _postgresql -c "nohup /usr/local/bin/pg_ctl start \ -D /var/postgresql/data -l /var/postgresql/logfile \ -o '-i -D /var/postgresql/data' >/dev/null" fi ... I start it manually by copy/paste of text above. But I want explain something, just to be clear. In openbsd on i386 we have 4G virtual space. 768M of it used for kernel virtual space. So 4G - 768 = about 3,2G - that is userspace. userspace itself is "divided" to 2 parts. First part of that virtual space is about 1G and reserved for old *brk() interface. The rest 3,2G - 1G = about 2,2 G is the virtual space which used by mmap, shm*, malloc (it use mmap). It means that on openbsd i386 we have about 2,2G of virtual space for malloc, shm*. So, postgres will use that space. But mmap() use random addresses. So when you get big chunk of memory for shared buffers (say, 2G) then you may get it somewhere in middle of virtual space (2,2G). In worst case it might be: 100Mb freespace from (start)...to (start+100Mb) 2G shared memory chunk from (start+100Mb) to (start + 100 Mb + 2G) 100Mb freespace from (start + 100 Mb + 2G) to (end) After that, if you want to allocate more than 100Mb by one big chunk - you can't. This is why postgres may fail at "vacuum" or "dump" or somewhere else - it may happen when postgres try to allocate one big chunk. Can anybody briefly explain me how one postgres process allocate memory for it needs? I mean, what is the biggest size of malloc() it may want? How many such chunks? What is the average size of allocations? -- antonvm
On Tue, Feb 09, 2010 at 08:19:51PM +0500, Anton Maksimenkov wrote: > It means that on openbsd i386 we have about 2,2G of virtual space for > malloc, shm*. So, postgres will use that space. > > But mmap() use random addresses. So when you get big chunk of memory > for shared buffers (say, 2G) then you may get it somewhere in middle > of virtual space (2,2G). This is essentially the reason why it's not a good idea to use really large amounts of shared_buffers on 32-bit systems: there isn't the address space to support it. > Can anybody briefly explain me how one postgres process allocate > memory for it needs? > I mean, what is the biggest size of malloc() it may want? How many > such chunks? What is the average size of allocations? There's no real maximum, as it depends on the exact usage. However, in general postgres tries to keep below the values in work_mem and maintainence_workmem. Most of the allocations are quite small, but postgresql has an internal allocator which means that the system only sees relatively large allocations. The majority will be in the order of tens of kilobytes I suspect. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > Please line up in a tree and maintain the heap invariant while > boarding. Thank you for flying nlogn airlines.
Вложения
2010/2/10 Martijn van Oosterhout <kleptog@svana.org>: >> Can anybody briefly explain me how one postgres process allocate >> memory for it needs? > > There's no real maximum, as it depends on the exact usage. However, in > general postgres tries to keep below the values in work_mem and > maintainence_workmem. Most of the allocations are quite small, but > postgresql has an internal allocator which means that the system only > sees relatively large allocations. These "relatively large allocations" are exactly what I mean. What size are they? Is it right to say that these allocations are work_mem size, or temp_buffers size, or maintainence_workmem size? Or something like. -- antonvm
Martijn van Oosterhout <kleptog@svana.org> writes: > On Tue, Feb 09, 2010 at 08:19:51PM +0500, Anton Maksimenkov wrote: >> Can anybody briefly explain me how one postgres process allocate >> memory for it needs? > There's no real maximum, as it depends on the exact usage. However, in > general postgres tries to keep below the values in work_mem and > maintainence_workmem. Most of the allocations are quite small, but > postgresql has an internal allocator which means that the system only > sees relatively large allocations. The majority will be in the order of > tens of kilobytes I suspect. IIRC, the complaint that started this thread was about a VACUUM command failing. Plain VACUUM will in fact start out by trying to acquire a single chunk of size maintenance_work_mem. (On a small table it might not be so greedy, but on a large table it will do that.) So you probably shouldn't ever try to set that value as large as 1GB if you're working in a 32-bit address space. You could maybe do it if you've kept shared_buffers small, but that seems like the wrong performance tradeoff in most cases ... regards, tom lane
Tom Lane wrote:
That would have been my original message.
I've been running a series of pgbench test on an i386 dual processor XEON server with 4G of ram and a RAID10 disk on a LSI MegaRAIDw/BBU controller. I fixed the original problem by re-enabling better login.conf values for the postgresql user.
I ran the pgtune wizard and started with the settings I got from that. On i386 OpenBSD the recommended settings are far too large and cause a kernel panic in short order. Here are the settings that pgtune gives for -T web and -c 200:
maintenance_work_mem = 240MB # pgtune wizard 2010-02-10
effective_cache_size = 2816MB # pgtune wizard 2010-02-10
work_mem = 18MB # pgtune wizard 2010-02-10
wal_buffers = 4MB # pgtune wizard 2010-02-10
checkpoint_segments = 8 # pgtune wizard 2010-02-10
shared_buffers = 960MB # pgtune wizard 2010-02-10
max_connections = 200 # pgtune wizard 2010-02-10
I've been whittling that back and have got down to this:
maintenance_work_mem = 240MB # pgtune wizard 2010-01-27
checkpoint_completion_target = 0.7 # pgtune wizard 2010-01-27
effective_cache_size = 2816MB # pgtune wizard 2010-01-27
work_mem = 18MB # pgtune wizard 2010-01-27
wal_buffers = 4MB # pgtune wizard 2010-01-27
checkpoint_segments = 8 # pgtune wizard 2010-01-27
full_page_writes = off
synchronous_commit = off
max_connections = 100
shared_buffers = 250MB # pgtune wizard 2010-01-27
work_mem = 64MB
temp_buffers = 32MB
checkpoint_segments = 32
Additionally, in OpenBSD's sysctl.conf I have this set:
kern.maxproc=10240
kern.maxfiles=20480
kern.shminfo.shmseg=32
kern.seminfo.semmni=256
kern.seminfo.semmns=2048
kern.shminfo.shmmax=283115520
kern.maxvnodes=6000
kern.bufcachepercent=70
The kern.shminfo.shmmax value is just enought to let postgresql start. kern.bufcachepercent=70 matches the effective_cache_size value.
pgbench is run with this:
pgbench -h varley.openvistas.net -U _postgresql -t 20000 -c $SCALE pgbench
with scale starting at 10 and then incrementing by 10. I call it three times for each scale. I've turned on logging to 'all' to try and help figure out where the system panics, so that may lower the TPS somewhat but I have not been very favorably impressed with the speed of these U320 15K disks in RAID10 yet.
Scale 10:
tps = 643.830650 (including connections establishing)
tps = 644.001003 (excluding connections establishing)
Martijn van Oosterhout <kleptog@svana.org> writes:On Tue, Feb 09, 2010 at 08:19:51PM +0500, Anton Maksimenkov wrote:Can anybody briefly explain me how one postgres process allocate memory for it needs?There's no real maximum, as it depends on the exact usage. However, in general postgres tries to keep below the values in work_mem and maintainence_workmem. Most of the allocations are quite small, but postgresql has an internal allocator which means that the system only sees relatively large allocations. The majority will be in the order of tens of kilobytes I suspect.IIRC, the complaint that started this thread was about a VACUUM command failing. Plain VACUUM will in fact start out by trying to acquire a single chunk of size maintenance_work_mem. (On a small table it might not be so greedy, but on a large table it will do that.) So you probably shouldn't ever try to set that value as large as 1GB if you're working in a 32-bit address space. You could maybe do it if you've kept shared_buffers small, but that seems like the wrong performance tradeoff in most cases ... regards, tom lane
That would have been my original message.
I've been running a series of pgbench test on an i386 dual processor XEON server with 4G of ram and a RAID10 disk on a LSI MegaRAIDw/BBU controller. I fixed the original problem by re-enabling better login.conf values for the postgresql user.
I ran the pgtune wizard and started with the settings I got from that. On i386 OpenBSD the recommended settings are far too large and cause a kernel panic in short order. Here are the settings that pgtune gives for -T web and -c 200:
maintenance_work_mem = 240MB # pgtune wizard 2010-02-10
effective_cache_size = 2816MB # pgtune wizard 2010-02-10
work_mem = 18MB # pgtune wizard 2010-02-10
wal_buffers = 4MB # pgtune wizard 2010-02-10
checkpoint_segments = 8 # pgtune wizard 2010-02-10
shared_buffers = 960MB # pgtune wizard 2010-02-10
max_connections = 200 # pgtune wizard 2010-02-10
I've been whittling that back and have got down to this:
maintenance_work_mem = 240MB # pgtune wizard 2010-01-27
checkpoint_completion_target = 0.7 # pgtune wizard 2010-01-27
effective_cache_size = 2816MB # pgtune wizard 2010-01-27
work_mem = 18MB # pgtune wizard 2010-01-27
wal_buffers = 4MB # pgtune wizard 2010-01-27
checkpoint_segments = 8 # pgtune wizard 2010-01-27
full_page_writes = off
synchronous_commit = off
max_connections = 100
shared_buffers = 250MB # pgtune wizard 2010-01-27
work_mem = 64MB
temp_buffers = 32MB
checkpoint_segments = 32
Additionally, in OpenBSD's sysctl.conf I have this set:
kern.maxproc=10240
kern.maxfiles=20480
kern.shminfo.shmseg=32
kern.seminfo.semmni=256
kern.seminfo.semmns=2048
kern.shminfo.shmmax=283115520
kern.maxvnodes=6000
kern.bufcachepercent=70
The kern.shminfo.shmmax value is just enought to let postgresql start. kern.bufcachepercent=70 matches the effective_cache_size value.
pgbench is run with this:
pgbench -h varley.openvistas.net -U _postgresql -t 20000 -c $SCALE pgbench
with scale starting at 10 and then incrementing by 10. I call it three times for each scale. I've turned on logging to 'all' to try and help figure out where the system panics, so that may lower the TPS somewhat but I have not been very favorably impressed with the speed of these U320 15K disks in RAID10 yet.
Scale 10:
tps = 644.152616 (including connections establishing) tps = 644.323919 (excluding connections establishing) tps = 644.032366 (including connections establishing) tps = 644.219732 (excluding connections establishing) tps = 659.320222 (including connections establishing) tps = 659.506025 (excluding connections establishing)Scale 20:
tps = 643.830650 (including connections establishing)
tps = 644.001003 (excluding connections establishing)
tps = 631.357346 (including connections establishing) tps = 631.538591 (excluding connections establishing) tps = 629.035682 (including connections establishing) tps = 629.245788 (excluding connections establishing)Scale 30:
tps = 571.640243 (including connections establishing) tps = 571.777080 (excluding connections establishing) tps = 565.742963 (including connections establishing) tps = 565.888874 (excluding connections establishing) tps = 564.058710 (including connections establishing) tps = 564.203138 (excluding connections establishing)Scale 40:
tps = 525.018290 (including connections establishing) tps = 525.132745 (excluding connections establishing) tps = 515.277398 (including connections establishing) tps = 515.419313 (excluding connections establishing) tps = 513.006317 (including connections establishing) tps = 513.129971 (excluding connections establishing)Scale 50:
tps = 468.323275 (including connections establishing) tps = 468.415751 (excluding connections establishing) tps = 453.100701 (including connections establishing) tps = 453.201980 (excluding connections establishing) tps = 461.739929 (excluding connections establishing) tps = 461.587221 (including connections establishing) Scale 60: tps = 450.277550 (including connections establishing) tps = 450.365946 (excluding connections establishing) tps = 453.268713 (including connections establishing) tps = 453.363862 (excluding connections establishing) tps = 448.965514 (including connections establishing) tps = 449.060461 (excluding connections establishing) At Scale 70, the kernel panics with a panic: malloc: out of space in kmem_map error. The last few lines of the logs before the panic reveal nothing out of the ordinary to me: 2010-02-10 10:58:07.863133500 172.16.0.1(43152):_postgresql@pgbench:[16586]:LOG: statement: UPDATE pgbench_tellers SET tbalance = tbalance + -4197 WHERE tid = 328; 2010-02-10 10:58:07.863139500 172.16.0.1(40518):_postgresql@pgbench:[25686]:LOG: statement: UPDATE pgbench_accounts SET abalance = abalance + 1254 WHERE aid = 3832418; 2010-02-10 10:58:07.863150500 172.16.0.1(25655):_postgresql@pgbench:[4335]:LOG: statement: SELECT abalance FROM pgbench_accounts WHERE aid = 208539; 2010-02-10 10:58:07.863156500 172.16.0.1(25655):_postgresql@pgbench:[4335]:LOG: duration: 0.532 ms 2010-02-10 10:58:07.863161500 172.16.0.1(40496):_postgresql@pgbench:[4200]:LOG: duration: 23.825 ms 2010-02-10 10:58:07.863178500 172.16.0.1(15183):_postgresql@pgbench:[12518]:LOG: statement: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (268, 38, 4052674, 2403, CURRENT_TIMESTAMP); 2010-02-10 10:58:07.863184500 172.16.0.1(15183):_postgresql@pgbench:[12518]:LOG: duration: 0.396 ms I have not yet tried a connection pooler since the panic happens with only 70 clients connected but just for fun I'm going to install pgbouncer and run the set again. I've been told that the amd64 OpenBSD will not have this problem, however, the amd64 kernel will not run on this particular server. I think that means that I'm going to be shopping for an Opteron based server before long. Thanks to all! Jeff Ross
Jeff Ross wrote: > pgbench is run with this: > pgbench -h varley.openvistas.net -U _postgresql -t 20000 -c $SCALE pgbench > with scale starting at 10 and then incrementing by 10. I call it > three times for each scale. I've turned on logging to 'all' to try > and help figure out where the system panics, so that may lower the TPS > somewhat but I have not been very favorably impressed with the speed > of these U320 15K disks in RAID10 yet. "-c" sets the number of clients active at once. pgbench has a database scale option when you're initializing, "-s", that sets how many records are in the tables, and therefore how large the database is. If you don't set the scale to a larger number, so that "-c" > "-s", you'll get bad performance results. The way you're saying scale but changing the client numbers is a little confusing. I can't comment how whether yours are good or bad numbers without knowing the actual database scale number. When reporting a pgbench result, it's handy to include the complete output from one of the runs, just so people can see exactly what test was run. After that you can just show the TPS values. Showing the command used to initialize the pgbench database can also be helpful. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Greg Smith wrote: > Jeff Ross wrote: >> pgbench is run with this: >> pgbench -h varley.openvistas.net -U _postgresql -t 20000 -c $SCALE >> pgbench >> with scale starting at 10 and then incrementing by 10. I call it >> three times for each scale. I've turned on logging to 'all' to try >> and help figure out where the system panics, so that may lower the >> TPS somewhat but I have not been very favorably impressed with the >> speed of these U320 15K disks in RAID10 yet. > > "-c" sets the number of clients active at once. pgbench has a > database scale option when you're initializing, "-s", that sets how > many records are in the tables, and therefore how large the database > is. If you don't set the scale to a larger number, so that "-c" > > "-s", you'll get bad performance results. The way you're saying scale > but changing the client numbers is a little confusing. > My bad. I know that more information is better than too little. I think I'm doing it right. Here's the whole script. I run it from another server on the lan. #!/bin/sh #use this to automate pg_bench load MAX_SCALE=150 HOST=varley.openvistas.net SCALE=70 rm -rf /tmp/pgbench_results_scale* while [ $SCALE -ne $MAX_SCALE ] ; do pgbench -i -s $SCALE -h $HOST -U _postgresql pgbench | tee -a /tmp/pgbench_results_scale_$SCALE.txt vacuumdb --analyze -h $HOST -U _postgresql pgbench | tee -a /tmp/pgbench_results_scale_$SCALE.txt psql -h $HOST -U _postgresql -c "checkpoint;" pgbench | tee -a /tmp/pgbench_results_scale_$SCALE.txt psql -h $HOST -U _postgresql -c "SELECT relname, reltuples, pg_size_pretty(relpages * 8192) as size FROM pg_class \ where relname like ('%account%');" pgbench | tee -a /tmp/pgbench_results_scale_$SCALE.txt psql -h $HOST -U _postgresql -c "select pg_size_pretty(pg_database_size(oid)) from pg_database \ where datname = 'pgbench';" pgbench | tee -a /tmp/pgbench_results_scale_$SCALE.txt time pgbench -h $HOST -U _postgresql -t 20000 -c $SCALE pgbench | tee -a /tmp/pgbench_results_scale_$SCALE.txt time pgbench -h $HOST -U _postgresql -t 20000 -c $SCALE pgbench | tee -a /tmp/pgbench_results_scale_$SCALE.txt time pgbench -h $HOST -U _postgresql -t 20000 -c $SCALE pgbench | tee -a /tmp/pgbench_results_scale_$SCALE.txt cat /tmp/pgbench_results_scale_$SCALE.txt | mail -s "Results for Scale $SCALE" jross@openvistas.net let SCALE=$SCALE+10 done > I can't comment how whether yours are good or bad numbers without > knowing the actual database scale number. When reporting a pgbench > result, it's handy to include the complete output from one of the > runs, just so people can see exactly what test was run. After that > you can just show the TPS values. Showing the command used to > initialize the pgbench database can also be helpful. > > And here's the whole output of scale = 70. It made it throught this last time without panickinging, but then did panic on scale=80. CHECKPOINT relname | reltuples | size -----------------------+-------------+-------- pgbench_accounts_pkey | 7e+06 | 120 MB pgbench_accounts | 6.99985e+06 | 868 MB (2 rows) pg_size_pretty ---------------- 994 MB (1 row) transaction type: TPC-B (sort of) scaling factor: 70 query mode: simple number of clients: 70 number of transactions per client: 20000 number of transactions actually processed: 1400000/1400000 tps = 293.081245 (including connections establishing) tps = 293.124705 (excluding connections establishing) transaction type: TPC-B (sort of) scaling factor: 70 query mode: simple number of clients: 70 number of transactions per client: 20000 number of transactions actually processed: 1400000/1400000 tps = 332.154154 (including connections establishing) tps = 332.219275 (excluding connections establishing) transaction type: TPC-B (sort of) scaling factor: 70 query mode: simple number of clients: 70 number of transactions per client: 20000 number of transactions actually processed: 1400000/1400000 tps = 354.983013 (including connections establishing) tps = 355.181403 (excluding connections establishing)
Jeff Ross wrote: > I think I'm doing it right. Here's the whole script. I run it from > another server on the lan. That looks basically sane--your description was wrong, not your program, which is always better than the other way around. Note that everything your script is doing and way more is done quite easily with pgbench-tools: http://git.postgresql.org/gitweb?p=pgbench-tools.git;a=summary You can just dump a list of scales and client counts you want to test and let that loose, it will generate graphs showing TPS vs. scale/clients and everything if gnuplot is available. > transaction type: TPC-B (sort of) > scaling factor: 70 > query mode: simple > number of clients: 70 > number of transactions per client: 20000 > number of transactions actually processed: 1400000/1400000 > tps = 293.081245 (including connections establishing) > tps = 293.124705 (excluding connections establishing) This is way more clients than your server is going to handle well on pgbench's TPC-B test, which is primarily a test of hard disk write speed but it will get bogged down with client contention in many conditions. Performance degrades considerably as the number of clients increases much past the number of cores in the server; typically 1.5 to 2X as many clients as cores gives peak throughput. I'm not sure what's causing your panic--not enough BSD practice. But I think Tom's suggestion of vastly decreasing from: maintenance_work_mem = 240MB Is worth trying. Reducing it won't hurt pgbench performance on quick tests, just how long it takes to get the tests setup. Sorry about pgtune being a bit aggressive in what it suggests--on the TODO list to scale it back, and hopefully provide more helpful suggestions for kernel tuning too. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
Greg Smith wrote: > Jeff Ross wrote: >> I think I'm doing it right. Here's the whole script. I run it from >> another server on the lan. > > That looks basically sane--your description was wrong, not your > program, which is always better than the other way around. > > Note that everything your script is doing and way more is done quite > easily with pgbench-tools: > http://git.postgresql.org/gitweb?p=pgbench-tools.git;a=summary > > You can just dump a list of scales and client counts you want to test > and let that loose, it will generate graphs showing TPS vs. > scale/clients and everything if gnuplot is available. > Cool! I'll get gnuplot installed and have some runs going before long. >> transaction type: TPC-B (sort of) >> scaling factor: 70 >> query mode: simple >> number of clients: 70 >> number of transactions per client: 20000 >> number of transactions actually processed: 1400000/1400000 >> tps = 293.081245 (including connections establishing) >> tps = 293.124705 (excluding connections establishing) > > This is way more clients than your server is going to handle well on > pgbench's TPC-B test, which is primarily a test of hard disk write > speed but it will get bogged down with client contention in many > conditions. Performance degrades considerably as the number of > clients increases much past the number of cores in the server; > typically 1.5 to 2X as many clients as cores gives peak throughput. > > I'm not sure what's causing your panic--not enough BSD practice. But > I think Tom's suggestion of vastly decreasing from: > > maintenance_work_mem = 240MB > > Is worth trying. Reducing it won't hurt pgbench performance on quick > tests, just how long it takes to get the tests setup. > Okay, I'll try that. Hopefully if I can get it to run well under pgbench the same setup will work well with drupal. The site I was worried about when I went to this bigger server has started a little slower than originally projected so the old server is handling the load. > Sorry about pgtune being a bit aggressive in what it suggests--on the > TODO list to scale it back, and hopefully provide more helpful > suggestions for kernel tuning too. > One of my goals (once I get all this figured out) is to provide some guide lines on how to best configure OpenBSD to run PostgreSQL. Thanks for all of your work and tools, Greg! > -- Greg Smith 2ndQuadrant Baltimore, MD > PostgreSQL Training, Services and Support > greg@2ndQuadrant.com www.2ndQuadrant.com > Jeff Ross
Jeff Ross wrote: > Hopefully if I can get it to run well under pgbench the same setup > will work well with drupal. The site I was worried about when I went > to this bigger server has started a little slower than originally > projected so the old server is handling the load. The standard TPC-B-like test pgbench runs is not even remotely like what Drupal will do on your server. It's almost a write-only test--the few things that are read in are also being written. I'd suggest you explore pgbench using the "-S" flag (or what pgbench-tools calls select.sql) to generate select-only loads at various sizes and client counts to get an idea what your server is going to do on loads more like a web application. -- Greg Smith 2ndQuadrant US Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.us