Обсуждение: Memory Usage and OpenBSD

Поиск
Список
Период
Сортировка

Memory Usage and OpenBSD

От
Jeff Ross
Дата:
I'm not getting something about the best way to set up a server using
PostgreSQL as a backend for a busy web server running drupal.

The postgresql performance folks
     http://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
say that in a server with more that 1GB of ram

"a reasonable starting value for shared_buffers is 1/4 of the memory in your
system."

Okay, this server has 4GB of ram.  pgtune suggests the following values for
predominately web based usage:

maintenance_work_mem = 240MB # pgtune wizard 2010-01-27
checkpoint_completion_target = 0.7 # pgtune wizard 2010-01-27
effective_cache_size = 2816MB # pgtune wizard 2010-01-27
work_mem = 18MB # pgtune wizard 2010-01-27
wal_buffers = 4MB # pgtune wizard 2010-01-27
checkpoint_segments = 8 # pgtune wizard 2010-01-27
shared_buffers = 960MB # pgtune wizard 2010-01-27
max_connections = 200 # pgtune wizard 2010-01-27

Here is where I'm not doing something right.  With my shared_buffers at 960MB,
I need to adjust kern.shminfo.shmmax to 1GB (1073741824) to get postgres to
start.  I thought I'd need to also adjust kern.shminfo.shmmall value as well
but that seems to change automatically whenever I adjust kern.shminfo.shmmax.

$ sysctl -a | grep kern.s

kern.securelevel=1
kern.saved_ids=1
kern.somaxconn=128
kern.sominconn=80
kern.sysvmsg=1
kern.sysvsem=1
kern.sysvshm=1
kern.stackgap_random=262144
kern.splassert=1
kern.seminfo.semmni=256
kern.seminfo.semmns=2048
kern.seminfo.semmnu=30
kern.seminfo.semmsl=60
kern.seminfo.semopm=100
kern.seminfo.semume=10
kern.seminfo.semusz=100
kern.seminfo.semvmx=32767
kern.seminfo.semaem=16384
kern.shminfo.shmmax=1073741824
kern.shminfo.shmmin=1
kern.shminfo.shmmni=128
kern.shminfo.shmseg=128
kern.shminfo.shmall=262144

At these values postgres will start and top shows a large amount of memory
still free:
Memory: Real: 55M/465M act/tot  Free: 3433M  Swap: 0K/8197M used/tot

Running a simple select only pgbench test against it will fail with an out of
memory error as it tries to vacuum --analyze the newly created database with
7500000 tuples.

pgbench -i -s 75 -h varley.openvistas.net -U _postgresql pgbench
vacuumdb --analyze -h varley.openvistas.net U _postgresql pgbench

When I run this and have top refreshing every second, I never see the free
memory drop below 3400M, so I'm not sure what memory we are running out of.
systat -i shows similar amounts of memory yet free.

The postgresql logs can perhaps shed some light on this for someone more
knowledgeable than myself.  Here is what is in the log immediately prior to
the out of memory error:

2010-01-27 14:07:26.326319500 TopMemoryContext: 60712 total in 7 blocks; 4488
free (8 chunks); 56224 used
2010-01-27 14:07:26.326374500   TopTransactionContext: 8192 total in 1 blocks;
5408 free (0 chunks); 2784 used
2010-01-27 14:07:26.326389500   Operator class cache: 8192 total in 1 blocks;
3848 free (0 chunks); 4344 used
2010-01-27 14:07:26.326412500   MessageContext: 8192 total in 1 blocks; 5488
free (1 chunks); 2704 used
2010-01-27 14:07:26.326434500   smgr relation table: 8192 total in 1 blocks;
2816 free (0 chunks); 5376 used
2010-01-27 14:07:26.326440500   TransactionAbortContext: 32768 total in 1
blocks; 32752 free (0 chunks); 16 used
2010-01-27 14:07:26.326462500   Portal hash: 8192 total in 1 blocks; 3912 free
(0 chunks); 4280 used
2010-01-27 14:07:26.326469500   PortalMemory: 8192 total in 1 blocks; 8040
free (0 chunks); 152 used
2010-01-27 14:07:26.326490500     PortalHeapMemory: 15360 total in 4 blocks;
7944 free (12 chunks); 7416 used
2010-01-27 14:07:26.326496500       ExecutorState: 8192 total in 1 blocks;
7928 free (0 chunks); 264 used
2010-01-27 14:07:26.326517500         ExprContext: 0 total in 0 blocks; 0 free
(0 chunks); 0 used
2010-01-27 14:07:26.326539500       TupleSort: 24600 total in 2 blocks; 7584
free (0 chunks); 17016 used
2010-01-27 14:07:26.326562500       TupleSort: 92266520 total in 17 blocks;
8379568 free (9 chunks); 83886952 used
2010-01-27 14:07:26.326584500   Relcache by OID: 8192 total in 1 blocks; 2856
free (0 chunks); 5336 used
2010-01-27 14:07:26.326624500   CacheMemoryContext: 667696 total in 20 blocks;
235240 free (7 chunks); 432456 used
2010-01-27 14:07:26.326646500     pgbench_accounts_pkey: 1024 total in 1
blocks; 344 free (0 chunks); 680 used
2010-01-27 14:07:26.326652500     pg_constraint_contypid_index: 1024 total in
1 blocks; 304 free (0 chunks); 720 used
2010-01-27 14:07:26.326674500     pg_constraint_conrelid_index: 1024 total in
1 blocks; 304 free (0 chunks); 720 used
2010-01-27 14:07:26.326681500     pg_constraint_conname_nsp_index: 1024 total
in 1 blocks; 240 free (0 chunks); 784 used
2010-01-27 14:07:26.326703500     pg_shdepend_reference_index: 1024 total in 1
blocks; 240 free (0 chunks); 784 used
2010-01-27 14:07:26.326724500     pg_index_indrelid_index: 1024 total in 1
blocks; 304 free (0 chunks); 720 used
2010-01-27 14:07:26.326730500     pg_inherits_relid_seqno_index: 1024 total in
1 blocks; 280 free (0 chunks); 744 used
2010-01-27 14:07:26.326752500     pg_description_o_c_o_index: 1024 total in 1
blocks; 152 free (0 chunks); 872 used
2010-01-27 14:07:26.326758500     pg_shdepend_depender_index: 1024 total in 1
blocks; 88 free (0 chunks); 936 used
2010-01-27 14:07:26.326780500     pg_depend_reference_index: 1024 total in 1
blocks; 152 free (0 chunks); 872 used
2010-01-27 14:07:26.326802500     pg_depend_depender_index: 1024 total in 1
blocks; 152 free (0 chunks); 872 used
2010-01-27 14:07:26.326809500     pg_opclass_am_name_nsp_index: 1024 total in
1 blocks; 152 free (0 chunks); 872 used
2010-01-27 14:07:26.326830500     pg_foreign_data_wrapper_name_index: 1024
total in 1 blocks; 344 free (0 chunks); 680 used
2010-01-27 14:07:26.326837500     pg_enum_oid_index: 1024 total in 1 blocks;
344 free (0 chunks); 680 used
2010-01-27 14:07:26.326860500     pg_class_relname_nsp_index: 1024 total in 1
blocks; 240 free (0 chunks); 784 used
2010-01-27 14:07:26.326866500     pg_foreign_server_oid_index: 1024 total in 1
blocks; 344 free (0 chunks); 680 used
2010-01-27 14:07:26.326888500     pg_statistic_relid_att_index: 1024 total in
1 blocks; 240 free (0 chunks); 784 used
2010-01-27 14:07:26.326909500     pg_cast_source_target_index: 1024 total in 1
blocks; 240 free (0 chunks); 784 used
2010-01-27 14:07:26.326915500     pg_language_name_index: 1024 total in 1
blocks; 344 free (0 chunks); 680 used
2010-01-27 14:07:26.326937500     pg_authid_oid_index: 1024 total in 1 blocks;
304 free (0 chunks); 720 used
2010-01-27 14:07:26.326943500     pg_amop_fam_strat_index: 1024 total in 1
blocks; 88 free (0 chunks); 936 used
2010-01-27 14:07:26.326966500     pg_index_indexrelid_index: 1024 total in 1
blocks; 304 free (0 chunks); 720 used
2010-01-27 14:07:26.326986500     pg_ts_template_tmplname_index: 1024 total in
1 blocks; 280 free (0 chunks); 744 used
2010-01-27 14:07:26.326993500     pg_ts_config_map_index: 1024 total in 1
blocks; 192 free (0 chunks); 832 used
2010-01-27 14:07:26.327014500     pg_opclass_oid_index: 1024 total in 1
blocks; 304 free (0 chunks); 720 used
2010-01-27 14:07:26.327021500     pg_foreign_data_wrapper_oid_index: 1024
total in 1 blocks; 344 free (0 chunks); 680 used
2010-01-27 14:07:26.327044500     pg_auth_members_member_role_index: 1024
total in 1 blocks; 280 free (0 chunks); 744 used
2010-01-27 14:07:26.327065500     pg_ts_dict_oid_index: 1024 total in 1
blocks; 344 free (0 chunks); 680 used
2010-01-27 14:07:26.327072500     pg_conversion_default_index: 1024 total in 1
blocks; 128 free (0 chunks); 896 used
2010-01-27 14:07:26.327095500     pg_operator_oprname_l_r_n_index: 1024 total
in 1 blocks; 128 free (0 chunks); 896 used
2010-01-27 14:07:26.327116500     pg_trigger_tgrelid_tgname_index: 1024 total
in 1 blocks; 240 free (0 chunks); 784 used
2010-01-27 14:07:26.327123500     pg_enum_typid_label_index: 1024 total in 1
blocks; 280 free (0 chunks); 744 used
2010-01-27 14:07:26.327144500     pg_ts_config_oid_index: 1024 total in 1
blocks; 344 free (0 chunks); 680 used
2010-01-27 14:07:26.327151500     pg_user_mapping_oid_index: 1024 total in 1
blocks; 344 free (0 chunks); 680 used
2010-01-27 14:07:26.327173500     pg_opfamily_am_name_nsp_index: 1024 total in
1 blocks; 192 free (0 chunks); 832 used
2010-01-27 14:07:26.327194500     pg_type_oid_index: 1024 total in 1 blocks;
304 free (0 chunks); 720 used
2010-01-27 14:07:26.327201500     pg_aggregate_fnoid_index: 1024 total in 1
blocks; 344 free (0 chunks); 680 used
2010-01-27 14:07:26.327222500     pg_constraint_oid_index: 1024 total in 1
blocks; 304 free (0 chunks); 720 used
2010-01-27 14:07:26.327228500     pg_rewrite_rel_rulename_index: 1024 total in
1 blocks; 280 free (0 chunks); 744 used
2010-01-27 14:07:26.327251500     pg_ts_parser_prsname_index: 1024 total in 1
blocks; 280 free (0 chunks); 744 used
2010-01-27 14:07:26.327272500     pg_ts_config_cfgname_index: 1024 total in 1
blocks; 280 free (0 chunks); 744 used
2010-01-27 14:07:26.327278500     pg_ts_parser_oid_index: 1024 total in 1
blocks; 344 free (0 chunks); 680 used
2010-01-27 14:07:26.327299500     pg_operator_oid_index: 1024 total in 1
blocks; 344 free (0 chunks); 680 used
2010-01-27 14:07:26.327306500     pg_namespace_nspname_index: 1024 total in 1
blocks; 304 free (0 chunks); 720 used
2010-01-27 14:07:26.327329500     pg_ts_template_oid_index: 1024 total in 1
blocks; 344 free (0 chunks); 680 used
2010-01-27 14:07:26.327335500     pg_amop_opr_fam_index: 1024 total in 1
blocks; 280 free (0 chunks); 744 used
2010-01-27 14:07:26.327357500     pg_ts_dict_dictname_index: 1024 total in 1
blocks; 280 free (0 chunks); 744 used
2010-01-27 14:07:26.327377500     pg_auth_members_role_member_index: 1024
total in 1 blocks; 280 free (0 chunks); 744 used
2010-01-27 14:07:26.327384500     pg_type_typname_nsp_index: 1024 total in 1
blocks; 240 free (0 chunks); 784 used
2010-01-27 14:07:26.327405500     pg_opfamily_oid_index: 1024 total in 1
blocks; 344 free (0 chunks); 680 used
2010-01-27 14:07:26.327412500     pg_class_oid_index: 1024 total in 1 blocks;
304 free (0 chunks); 720 used
2010-01-27 14:07:26.327449500     pg_proc_proname_args_nsp_index: 1024 total
in 1 blocks; 192 free (0 chunks); 832 used
2010-01-27 14:07:26.327456500     pg_attribute_relid_attnum_index: 1024 total
in 1 blocks; 240 free (0 chunks); 784 used
2010-01-27 14:07:26.327479500     pg_proc_oid_index: 1024 total in 1 blocks;
344 free (0 chunks); 680 used
2010-01-27 14:07:26.327499500     pg_language_oid_index: 1024 total in 1
blocks; 344 free (0 chunks); 680 used
2010-01-27 14:07:26.327506500     pg_namespace_oid_index: 1024 total in 1
blocks; 304 free (0 chunks); 720 used
2010-01-27 14:07:26.327527500     pg_database_oid_index: 1024 total in 1
blocks; 304 free (0 chunks); 720 used
2010-01-27 14:07:26.327534500     pg_amproc_fam_proc_index: 1024 total in 1
blocks; 88 free (0 chunks); 936 used
2010-01-27 14:07:26.327556500     pg_authid_rolname_index: 1024 total in 1
blocks; 304 free (0 chunks); 720 used
2010-01-27 14:07:26.327577500     pg_foreign_server_name_index: 1024 total in
1 blocks; 344 free (0 chunks); 680 used
2010-01-27 14:07:26.327584500     pg_attribute_relid_attnam_index: 1024 total
in 1 blocks; 240 free (0 chunks); 784 used
2010-01-27 14:07:26.327605500     pg_conversion_oid_index: 1024 total in 1
blocks; 344 free (0 chunks); 680 used
2010-01-27 14:07:26.327612500     pg_user_mapping_user_server_index: 1024
total in 1 blocks; 280 free (0 chunks); 744 used
2010-01-27 14:07:26.327634500     pg_conversion_name_nsp_index: 1024 total in
1 blocks; 280 free (0 chunks); 744 used
2010-01-27 14:07:26.327664500   MdSmgr: 8192 total in 1 blocks; 7960 free (0
chunks); 232 used
2010-01-27 14:07:26.327671500   LOCALLOCK hash: 8192 total in 1 blocks; 3912
free (0 chunks); 4280 used
2010-01-27 14:07:26.327723500   Timezones: 78520 total in 2 blocks; 5968 free
(0 chunks); 72552 used
2010-01-27 14:07:26.327729500   ErrorContext: 8192 total in 1 blocks; 8176
free (6 chunks); 16 used
2010-01-27 14:07:26.327874500
172.16.0.1(22842):_postgresql@pgbench:[17225]:ERROR:  out of memory
2010-01-27 14:07:26.327881500
172.16.0.1(22842):_postgresql@pgbench:[17225]:DETAIL:  Failed on request of
size 67108864.


Jeff Ross

Re: Memory Usage and OpenBSD

От
Tom Lane
Дата:
Jeff Ross <jross@wykids.org> writes:
> Running a simple select only pgbench test against it will fail with an out of
> memory error as it tries to vacuum --analyze the newly created database with
> 7500000 tuples.

Better look at the "ulimit" values the postmaster is started with;
you shouldn't be getting that out-of-memory error AFAICS, unless
there's a very restrictive ulimit on what an individual process can
allocate.

            regards, tom lane

Re: Memory Usage and OpenBSD

От
Jeff Ross
Дата:
Tom Lane wrote:
> Jeff Ross <jross@wykids.org> writes:
>> Running a simple select only pgbench test against it will fail with an out of
>> memory error as it tries to vacuum --analyze the newly created database with
>> 7500000 tuples.
>
> Better look at the "ulimit" values the postmaster is started with;
> you shouldn't be getting that out-of-memory error AFAICS, unless
> there's a very restrictive ulimit on what an individual process can
> allocate.
>
>             regards, tom lane
>

Thanks!

OpenBSD makes a _postgresql user on install and it is in the daemon class with
the following values:

daemon:\
         :ignorenologin:\
         :datasize=infinity:\
         :maxproc=infinity:\
         :openfiles-cur=128:\
         :stacksize-cur=8M:\
         :localcipher=blowfish,8:\
         :tc=default:

The OpenBSD specific readme suggests making a special postgresql login class
and bumping openfiles-cur to 768 but I don't see how that helps here.  Would
bumping stacksize also help?

Jeff Ross


Re: Memory Usage and OpenBSD

От
Tom Lane
Дата:
Jeff Ross <jross@wykids.org> writes:
> Tom Lane wrote:
>> Better look at the "ulimit" values the postmaster is started with;

> OpenBSD makes a _postgresql user on install and it is in the daemon class with
> the following values:

> daemon:\
>          :ignorenologin:\
>          :datasize=infinity:\
>          :maxproc=infinity:\
>          :openfiles-cur=128:\
>          :stacksize-cur=8M:\
>          :localcipher=blowfish,8:\
>          :tc=default:

> The OpenBSD specific readme suggests making a special postgresql login class
> and bumping openfiles-cur to 768 but I don't see how that helps here.  Would
> bumping stacksize also help?

Huh --- that looks okay to me.  The stacksize is not what you're hitting
(and 8MB is plenty sufficient anyway, at least unless you like recursive
functions).  I concur with the readme that 128 open files isn't much,
but that's not what you're hitting either.  The only thing that comes to
mind is that on Linux there are several different ulimit values that are
related to maximum per-process data space.  I don't know BSD very well
so I can't say if datasize is the only such value for BSD, but it'd be
worth checking.  (Hmm, on OS X which is at least partly BSDish, I see
-m and -v in addition to -d, so I'm suspicious OpenBSD might have these
concepts too.)

            regards, tom lane

Re: Memory Usage and OpenBSD

От
Scott Marlowe
Дата:
On Wed, Jan 27, 2010 at 4:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> related to maximum per-process data space.  I don't know BSD very well
> so I can't say if datasize is the only such value for BSD, but it'd be
> worth checking.  (Hmm, on OS X which is at least partly BSDish, I see
> -m and -v in addition to -d, so I'm suspicious OpenBSD might have these
> concepts too.)

Isn't the usual advice here is to log the ulimit setting from the pg
startup script so you can what it really is for the user at the moment
they're starting up the db? (I think some guy named Tom mentioned
doing that before.)

Re: Memory Usage and OpenBSD

От
Anton Maksimenkov
Дата:
2010/1/28 Scott Marlowe <scott.marlowe@gmail.com>:
>> related to maximum per-process data space.  I don't know BSD very well
>> so I can't say if datasize is the only such value for BSD, but it'd be
>> worth checking.  (Hmm, on OS X which is at least partly BSDish, I see
>> -m and -v in addition to -d, so I'm suspicious OpenBSD might have these
>> concepts too.)
>
> Isn't the usual advice here is to log the ulimit setting from the pg
> startup script so you can what it really is for the user at the moment
> they're starting up the db? (I think some guy named Tom mentioned
> doing that before.)

I think that "su" is enough:

root@testbed:/root
 # su -l _postgresql
$ ulimit -a
time(cpu-seconds)    unlimited
file(blocks)         unlimited
coredump(blocks)     unlimited
data(kbytes)         1048576
stack(kbytes)        8192
lockedmem(kbytes)    672777
memory(kbytes)       2016764
nofiles(descriptors) 768
processes            1024

I think Jeff will get almost same values with login.conf showed above.
--
antonvm

Re: Memory Usage and OpenBSD

От
Scott Marlowe
Дата:
On Tue, Feb 9, 2010 at 3:18 AM, Anton Maksimenkov <anton200@gmail.com> wrote:
> 2010/1/28 Scott Marlowe <scott.marlowe@gmail.com>:
>>> related to maximum per-process data space.  I don't know BSD very well
>>> so I can't say if datasize is the only such value for BSD, but it'd be
>>> worth checking.  (Hmm, on OS X which is at least partly BSDish, I see
>>> -m and -v in addition to -d, so I'm suspicious OpenBSD might have these
>>> concepts too.)
>>
>> Isn't the usual advice here is to log the ulimit setting from the pg
>> startup script so you can what it really is for the user at the moment
>> they're starting up the db? (I think some guy named Tom mentioned
>> doing that before.)
>
> I think that "su" is enough:

In previous discussions it was mentioned that startup scripts often
inherit different settings from the default shell of a user, hence the
need to check it from within the startup script.

Re: Memory Usage and OpenBSD

От
Anton Maksimenkov
Дата:
2010/2/9 Scott Marlowe <scott.marlowe@gmail.com>:
> On Tue, Feb 9, 2010 at 3:18 AM, Anton Maksimenkov <anton200@gmail.com> wrote:
>>> Isn't the usual advice here is to log the ulimit setting from the pg
>>> startup script so you can what it really is for the user at the moment
>> I think that "su" is enough:
> In previous discussions it was mentioned that startup scripts often
> inherit different settings from the default shell of a user, hence the

Hm. My postgres starts with following /etc/rc.local
...
if [ -x /usr/local/bin/pg_ctl ]; then
        echo -n ' postgresql'
        su -l _postgresql -c "nohup /usr/local/bin/pg_ctl start \
            -D /var/postgresql/data -l /var/postgresql/logfile \
            -o '-i -D /var/postgresql/data' >/dev/null"
fi
...
I start it manually by copy/paste of text above.

But I want explain something, just to be clear.

In openbsd on i386 we have 4G virtual space.
768M of it used for kernel virtual space. So 4G - 768 = about 3,2G -
that is userspace.
userspace itself is "divided" to 2 parts. First part of that virtual
space is about 1G and reserved for old *brk() interface. The rest 3,2G
- 1G = about 2,2 G is the virtual space which used by mmap, shm*,
malloc (it use mmap).

It means that on openbsd i386 we have about 2,2G of virtual space for
malloc, shm*. So, postgres will use that space.

But mmap() use random addresses. So when you get big chunk of memory
for shared buffers (say, 2G) then you may get it somewhere in middle
of virtual space (2,2G). In worst case it might be:
100Mb freespace from (start)...to (start+100Mb)
2G shared memory chunk from (start+100Mb) to (start + 100 Mb + 2G)
100Mb freespace from (start + 100 Mb + 2G) to (end)

After that, if you want to allocate more than 100Mb by one big chunk -
you can't. This is why postgres may fail at "vacuum" or "dump" or
somewhere else - it may happen when postgres try to allocate one big
chunk.

Can anybody briefly explain me how one postgres process allocate
memory for it needs?
I mean, what is the biggest size of malloc() it may want? How many
such chunks? What is the average size of allocations?
--
antonvm

Re: Memory Usage and OpenBSD

От
Martijn van Oosterhout
Дата:
On Tue, Feb 09, 2010 at 08:19:51PM +0500, Anton Maksimenkov wrote:
> It means that on openbsd i386 we have about 2,2G of virtual space for
> malloc, shm*. So, postgres will use that space.
>
> But mmap() use random addresses. So when you get big chunk of memory
> for shared buffers (say, 2G) then you may get it somewhere in middle
> of virtual space (2,2G).

This is essentially the reason why it's not a good idea to use really
large amounts of shared_buffers on 32-bit systems: there isn't the
address space to support it.

> Can anybody briefly explain me how one postgres process allocate
> memory for it needs?
> I mean, what is the biggest size of malloc() it may want? How many
> such chunks? What is the average size of allocations?

There's no real maximum, as it depends on the exact usage. However, in
general postgres tries to keep below the values in work_mem and
maintainence_workmem. Most of the allocations are quite small, but
postgresql has an internal allocator which means that the system only
sees relatively large allocations. The majority will be in the order of
tens of kilobytes I suspect.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Please line up in a tree and maintain the heap invariant while
> boarding. Thank you for flying nlogn airlines.

Вложения

Re: Memory Usage and OpenBSD

От
Anton Maksimenkov
Дата:
2010/2/10 Martijn van Oosterhout <kleptog@svana.org>:
>> Can anybody briefly explain me how one postgres process allocate
>> memory for it needs?
>
> There's no real maximum, as it depends on the exact usage. However, in
> general postgres tries to keep below the values in work_mem and
> maintainence_workmem. Most of the allocations are quite small, but
> postgresql has an internal allocator which means that the system only
> sees relatively large allocations.

These "relatively large allocations" are exactly what I mean. What
size are they?
Is it right to say that these allocations are work_mem size, or
temp_buffers size, or maintainence_workmem size? Or something like.
--
antonvm

Re: Memory Usage and OpenBSD

От
Tom Lane
Дата:
Martijn van Oosterhout <kleptog@svana.org> writes:
> On Tue, Feb 09, 2010 at 08:19:51PM +0500, Anton Maksimenkov wrote:
>> Can anybody briefly explain me how one postgres process allocate
>> memory for it needs?

> There's no real maximum, as it depends on the exact usage. However, in
> general postgres tries to keep below the values in work_mem and
> maintainence_workmem. Most of the allocations are quite small, but
> postgresql has an internal allocator which means that the system only
> sees relatively large allocations. The majority will be in the order of
> tens of kilobytes I suspect.

IIRC, the complaint that started this thread was about a VACUUM command
failing.  Plain VACUUM will in fact start out by trying to acquire a
single chunk of size maintenance_work_mem.  (On a small table it might
not be so greedy, but on a large table it will do that.)  So you
probably shouldn't ever try to set that value as large as 1GB if you're
working in a 32-bit address space.  You could maybe do it if you've kept
shared_buffers small, but that seems like the wrong performance tradeoff
in most cases ...

            regards, tom lane

Re: Memory Usage and OpenBSD

От
Jeff Ross
Дата:
Tom Lane wrote:
Martijn van Oosterhout <kleptog@svana.org> writes: 
On Tue, Feb 09, 2010 at 08:19:51PM +0500, Anton Maksimenkov wrote:   
Can anybody briefly explain me how one postgres process allocate
memory for it needs?     
 
There's no real maximum, as it depends on the exact usage. However, in
general postgres tries to keep below the values in work_mem and
maintainence_workmem. Most of the allocations are quite small, but
postgresql has an internal allocator which means that the system only
sees relatively large allocations. The majority will be in the order of
tens of kilobytes I suspect.   
IIRC, the complaint that started this thread was about a VACUUM command
failing.  Plain VACUUM will in fact start out by trying to acquire a
single chunk of size maintenance_work_mem.  (On a small table it might
not be so greedy, but on a large table it will do that.)  So you
probably shouldn't ever try to set that value as large as 1GB if you're
working in a 32-bit address space.  You could maybe do it if you've kept
shared_buffers small, but that seems like the wrong performance tradeoff
in most cases ...
		regards, tom lane
 

That would have been my original message.

I've been running a series of pgbench test on an i386 dual processor XEON server with 4G of ram and a RAID10 disk on a LSI MegaRAIDw/BBU controller.  I fixed the original problem by re-enabling better login.conf values for the postgresql user.

I ran the pgtune wizard and started with the settings I got from that.  On i386 OpenBSD the recommended settings are far too large and cause  a kernel panic in short order.  Here are the settings that pgtune gives for -T web and -c 200:

maintenance_work_mem = 240MB # pgtune wizard 2010-02-10
effective_cache_size = 2816MB # pgtune wizard 2010-02-10
work_mem = 18MB # pgtune wizard 2010-02-10
wal_buffers = 4MB # pgtune wizard 2010-02-10
checkpoint_segments = 8 # pgtune wizard 2010-02-10
shared_buffers = 960MB # pgtune wizard 2010-02-10
max_connections = 200 # pgtune wizard 2010-02-10

I've been whittling that back and have got down to this:

maintenance_work_mem = 240MB # pgtune wizard 2010-01-27
checkpoint_completion_target = 0.7 # pgtune wizard 2010-01-27
effective_cache_size = 2816MB # pgtune wizard 2010-01-27
work_mem = 18MB # pgtune wizard 2010-01-27
wal_buffers = 4MB # pgtune wizard 2010-01-27
checkpoint_segments = 8 # pgtune wizard 2010-01-27
full_page_writes = off
synchronous_commit = off  
max_connections = 100
shared_buffers = 250MB # pgtune wizard 2010-01-27
work_mem = 64MB
temp_buffers = 32MB
checkpoint_segments = 32

Additionally, in OpenBSD's sysctl.conf I have this set:
kern.maxproc=10240
kern.maxfiles=20480

kern.shminfo.shmseg=32
kern.seminfo.semmni=256
kern.seminfo.semmns=2048
kern.shminfo.shmmax=283115520
kern.maxvnodes=6000
kern.bufcachepercent=70

The kern.shminfo.shmmax value is just enought to let postgresql start.  kern.bufcachepercent=70 matches the effective_cache_size value.

pgbench is run with this:
pgbench -h varley.openvistas.net -U _postgresql -t 20000 -c $SCALE pgbench
with scale starting at 10 and then incrementing by 10.  I call it three times for each scale.  I've turned on logging to 'all' to try and help figure out where the system panics, so that may lower the TPS somewhat but I have not been very favorably impressed with the speed of these U320 15K disks in RAID10 yet.

Scale 10: 
tps = 644.152616 (including connections establishing)
tps = 644.323919 (excluding connections establishing)

tps = 644.032366 (including connections establishing)
tps = 644.219732 (excluding connections establishing)

tps = 659.320222 (including connections establishing)
tps = 659.506025 (excluding connections establishing)
Scale 20:

tps = 643.830650 (including connections establishing)
tps = 644.001003 (excluding connections establishing)
tps = 631.357346 (including connections establishing)
tps = 631.538591 (excluding connections establishing)

tps = 629.035682 (including connections establishing)
tps = 629.245788 (excluding connections establishing)
Scale 30:
tps = 571.640243 (including connections establishing)
tps = 571.777080 (excluding connections establishing)

tps = 565.742963 (including connections establishing)
tps = 565.888874 (excluding connections establishing)

tps = 564.058710 (including connections establishing)
tps = 564.203138 (excluding connections establishing)
Scale 40:

tps = 525.018290 (including connections establishing)
tps = 525.132745 (excluding connections establishing)

tps = 515.277398 (including connections establishing)
tps = 515.419313 (excluding connections establishing)


tps = 513.006317 (including connections establishing)
tps = 513.129971 (excluding connections establishing)
Scale 50:
tps = 468.323275 (including connections establishing)
tps = 468.415751 (excluding connections establishing)

tps = 453.100701 (including connections establishing)
tps = 453.201980 (excluding connections establishing)


tps = 461.739929 (excluding connections establishing)
tps = 461.587221 (including connections establishing)

Scale 60:
tps = 450.277550 (including connections establishing)
tps = 450.365946 (excluding connections establishing)

tps = 453.268713 (including connections establishing)
tps = 453.363862 (excluding connections establishing)

tps = 448.965514 (including connections establishing)
tps = 449.060461 (excluding connections establishing)

At Scale 70, the kernel panics with a 
panic: malloc: out of space in kmem_map
error.

The last few lines of the logs before the panic reveal nothing out of the ordinary to me:

2010-02-10 10:58:07.863133500 172.16.0.1(43152):_postgresql@pgbench:[16586]:LOG:  statement: UPDATE pgbench_tellers SET tbalance = tbalance + -4197 WHERE tid = 328;
2010-02-10 10:58:07.863139500 172.16.0.1(40518):_postgresql@pgbench:[25686]:LOG:  statement: UPDATE pgbench_accounts SET abalance = abalance + 1254 WHERE aid = 3832418;
2010-02-10 10:58:07.863150500 172.16.0.1(25655):_postgresql@pgbench:[4335]:LOG:  statement: SELECT abalance FROM pgbench_accounts WHERE aid = 208539;
2010-02-10 10:58:07.863156500 172.16.0.1(25655):_postgresql@pgbench:[4335]:LOG:  duration: 0.532 ms
2010-02-10 10:58:07.863161500 172.16.0.1(40496):_postgresql@pgbench:[4200]:LOG:  duration: 23.825 ms
2010-02-10 10:58:07.863178500 172.16.0.1(15183):_postgresql@pgbench:[12518]:LOG:  statement: INSERT INTO pgbench_history (tid, bid, aid, delta, mtime) VALUES (268, 38, 4052674, 2403, CURRENT_TIMESTAMP);
2010-02-10 10:58:07.863184500 172.16.0.1(15183):_postgresql@pgbench:[12518]:LOG:  duration: 0.396 ms


I have not yet tried a connection pooler since the panic happens with only 70 clients connected but just for fun I'm going to install pgbouncer and run the set again.

I've been told that the amd64 OpenBSD will not have this problem, however, the amd64 kernel will not run on this particular server.  I think that means that I'm going to be shopping for an Opteron based server before long.  

Thanks to all!

Jeff Ross 

Re: Memory Usage and OpenBSD

От
Greg Smith
Дата:
Jeff Ross wrote:
> pgbench is run with this:
> pgbench -h varley.openvistas.net -U _postgresql -t 20000 -c $SCALE pgbench
> with scale starting at 10 and then incrementing by 10.  I call it
> three times for each scale.  I've turned on logging to 'all' to try
> and help figure out where the system panics, so that may lower the TPS
> somewhat but I have not been very favorably impressed with the speed
> of these U320 15K disks in RAID10 yet.

"-c" sets the number of clients active at once.  pgbench has a database
scale option when you're initializing, "-s", that sets how many records
are in the tables, and therefore how large the database is.  If you
don't set the scale to a larger number, so that "-c" > "-s", you'll get
bad performance results.  The way you're saying scale but changing the
client numbers is a little confusing.

I can't comment how whether yours are good or bad numbers without
knowing the actual database scale number.  When reporting a pgbench
result, it's handy to include the complete output from one of the runs,
just so people can see exactly what test was run.  After that you can
just show the TPS values.  Showing the command used to initialize the
pgbench database can also be helpful.


--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com


Re: Memory Usage and OpenBSD

От
Jeff Ross
Дата:
Greg Smith wrote:
> Jeff Ross wrote:
>> pgbench is run with this:
>> pgbench -h varley.openvistas.net -U _postgresql -t 20000 -c $SCALE
>> pgbench
>> with scale starting at 10 and then incrementing by 10.  I call it
>> three times for each scale.  I've turned on logging to 'all' to try
>> and help figure out where the system panics, so that may lower the
>> TPS somewhat but I have not been very favorably impressed with the
>> speed of these U320 15K disks in RAID10 yet.
>
> "-c" sets the number of clients active at once.  pgbench has a
> database scale option when you're initializing, "-s", that sets how
> many records are in the tables, and therefore how large the database
> is.  If you don't set the scale to a larger number, so that "-c" >
> "-s", you'll get bad performance results.  The way you're saying scale
> but changing the client numbers is a little confusing.
>

My bad.  I know that more information is better than too little.

I think I'm doing it right.  Here's the whole script.  I run it from
another server on the lan.

#!/bin/sh
#use this to automate pg_bench load
MAX_SCALE=150
HOST=varley.openvistas.net
SCALE=70
rm -rf /tmp/pgbench_results_scale*
while [ $SCALE -ne $MAX_SCALE ] ; do
  pgbench -i -s $SCALE -h $HOST -U _postgresql pgbench | tee -a
/tmp/pgbench_results_scale_$SCALE.txt
  vacuumdb --analyze -h $HOST -U _postgresql pgbench  | tee -a
/tmp/pgbench_results_scale_$SCALE.txt
  psql -h $HOST -U _postgresql -c "checkpoint;" pgbench  | tee -a
/tmp/pgbench_results_scale_$SCALE.txt
  psql -h $HOST -U _postgresql -c "SELECT relname, reltuples,
pg_size_pretty(relpages * 8192) as size FROM pg_class \
    where relname like ('%account%');" pgbench  | tee -a
/tmp/pgbench_results_scale_$SCALE.txt
  psql -h $HOST -U _postgresql -c "select
pg_size_pretty(pg_database_size(oid)) from pg_database \
    where datname = 'pgbench';" pgbench  | tee -a
/tmp/pgbench_results_scale_$SCALE.txt
  time pgbench -h $HOST -U _postgresql -t 20000 -c $SCALE  pgbench  |
tee -a /tmp/pgbench_results_scale_$SCALE.txt
  time pgbench -h $HOST -U _postgresql -t 20000 -c $SCALE  pgbench  |
tee -a /tmp/pgbench_results_scale_$SCALE.txt
  time pgbench -h $HOST -U _postgresql -t 20000 -c $SCALE  pgbench  |
tee -a /tmp/pgbench_results_scale_$SCALE.txt
  cat /tmp/pgbench_results_scale_$SCALE.txt | mail -s "Results for Scale
$SCALE" jross@openvistas.net
  let SCALE=$SCALE+10

done


> I can't comment how whether yours are good or bad numbers without
> knowing the actual database scale number.  When reporting a pgbench
> result, it's handy to include the complete output from one of the
> runs, just so people can see exactly what test was run.  After that
> you can just show the TPS values.  Showing the command used to
> initialize the pgbench database can also be helpful.
>
>
And here's the whole output of scale = 70.  It made it throught this
last time without panickinging, but then did panic on scale=80.

CHECKPOINT
        relname        |  reltuples  |  size
-----------------------+-------------+--------
 pgbench_accounts_pkey |       7e+06 | 120 MB
 pgbench_accounts      | 6.99985e+06 | 868 MB
(2 rows)

 pg_size_pretty
----------------
 994 MB
(1 row)

transaction type: TPC-B (sort of)
scaling factor: 70
query mode: simple
number of clients: 70
number of transactions per client: 20000
number of transactions actually processed: 1400000/1400000
tps = 293.081245 (including connections establishing)
tps = 293.124705 (excluding connections establishing)
transaction type: TPC-B (sort of)
scaling factor: 70
query mode: simple
number of clients: 70
number of transactions per client: 20000
number of transactions actually processed: 1400000/1400000
tps = 332.154154 (including connections establishing)
tps = 332.219275 (excluding connections establishing)
transaction type: TPC-B (sort of)
scaling factor: 70
query mode: simple
number of clients: 70
number of transactions per client: 20000
number of transactions actually processed: 1400000/1400000
tps = 354.983013 (including connections establishing)
tps = 355.181403 (excluding connections establishing)




Re: Memory Usage and OpenBSD

От
Greg Smith
Дата:
Jeff Ross wrote:
> I think I'm doing it right.  Here's the whole script.  I run it from
> another server on the lan.

That looks basically sane--your description was wrong, not your program,
which is always better than the other way around.

Note that everything your script is doing and way more is done quite
easily with pgbench-tools:
http://git.postgresql.org/gitweb?p=pgbench-tools.git;a=summary

You can just dump a list of scales and client counts you want to test
and let that loose, it will generate graphs showing TPS vs.
scale/clients and everything if gnuplot is available.

> transaction type: TPC-B (sort of)
> scaling factor: 70
> query mode: simple
> number of clients: 70
> number of transactions per client: 20000
> number of transactions actually processed: 1400000/1400000
> tps = 293.081245 (including connections establishing)
> tps = 293.124705 (excluding connections establishing)

This is way more clients than your server is going to handle well on
pgbench's TPC-B test, which is primarily a test of hard disk write speed
but it will get bogged down with client contention in many conditions.
Performance degrades considerably as the number of clients increases
much past the number of cores in the server; typically 1.5 to 2X as many
clients as cores gives peak throughput.

I'm not sure what's causing your panic--not enough BSD practice.  But I
think Tom's suggestion of vastly decreasing from:

maintenance_work_mem = 240MB

Is worth trying.  Reducing it won't hurt pgbench performance on quick
tests, just how long it takes to get the tests setup.

Sorry about pgtune being a bit aggressive in what it suggests--on the
TODO list to scale it back, and hopefully provide more helpful
suggestions for kernel tuning too.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com


Re: Memory Usage and OpenBSD

От
Jeff Ross
Дата:
Greg Smith wrote:
> Jeff Ross wrote:
>> I think I'm doing it right.  Here's the whole script.  I run it from
>> another server on the lan.
>
> That looks basically sane--your description was wrong, not your
> program, which is always better than the other way around.
>
> Note that everything your script is doing and way more is done quite
> easily with pgbench-tools:
> http://git.postgresql.org/gitweb?p=pgbench-tools.git;a=summary
>
> You can just dump a list of scales and client counts you want to test
> and let that loose, it will generate graphs showing TPS vs.
> scale/clients and everything if gnuplot is available.
>

Cool!  I'll get gnuplot installed and have some runs going before long.

>> transaction type: TPC-B (sort of)
>> scaling factor: 70
>> query mode: simple
>> number of clients: 70
>> number of transactions per client: 20000
>> number of transactions actually processed: 1400000/1400000
>> tps = 293.081245 (including connections establishing)
>> tps = 293.124705 (excluding connections establishing)
>
> This is way more clients than your server is going to handle well on
> pgbench's TPC-B test, which is primarily a test of hard disk write
> speed but it will get bogged down with client contention in many
> conditions.  Performance degrades considerably as the number of
> clients increases much past the number of cores in the server;
> typically 1.5 to 2X as many clients as cores gives peak throughput.
>
> I'm not sure what's causing your panic--not enough BSD practice.  But
> I think Tom's suggestion of vastly decreasing from:
>
> maintenance_work_mem = 240MB
>
> Is worth trying.  Reducing it won't hurt pgbench performance on quick
> tests, just how long it takes to get the tests setup.
>

Okay, I'll try that.  Hopefully if I can get it to run well under
pgbench the same setup will work well with drupal.  The site I was
worried about when I went to this bigger server has started a little
slower than originally projected so the old server is handling the load.

> Sorry about pgtune being a bit aggressive in what it suggests--on the
> TODO list to scale it back, and hopefully provide more helpful
> suggestions for kernel tuning too.
>
One of my goals (once I get all this figured out) is to provide some
guide lines on how to best configure OpenBSD to run PostgreSQL.

Thanks for all of your work and tools, Greg!

> --  Greg Smith    2ndQuadrant   Baltimore, MD
> PostgreSQL Training, Services and Support
> greg@2ndQuadrant.com  www.2ndQuadrant.com
>

Jeff Ross

Re: Memory Usage and OpenBSD

От
Greg Smith
Дата:
Jeff Ross wrote:
> Hopefully if I can get it to run well under pgbench the same setup
> will work well with drupal.  The site I was worried about when I went
> to this bigger server has started a little slower than originally
> projected so the old server is handling the load.

The standard TPC-B-like test pgbench runs is not even remotely like what
Drupal will do on your server.  It's almost a write-only test--the few
things that are read in are also being written.

I'd suggest you explore pgbench using the "-S" flag (or what
pgbench-tools calls select.sql) to generate select-only loads at various
sizes and client counts to get an idea what your server is going to do
on loads more like a web application.

--
Greg Smith  2ndQuadrant US  Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com   www.2ndQuadrant.us