Обсуждение: Postgres 8.2 memory weirdness

Поиск
Список
Период
Сортировка

Postgres 8.2 memory weirdness

От
"Tory M Blue"
Дата:
I'm not sure what is going on but looking for some advice, knowledge.

I'm running multiple postgres servers in a slon relationship. I have
hundreds of thousands of updates, inserts a day. But what I'm seeing
is my server appears to "deallocate" memory (for the lack of a better
term) and performance goes to heck, slow response, a sub second query
takes anywhere from 6-40 seconds to complete when this happens.

My guess is that postgres does not have enough memory to handle the
query and starts to swap (although I can't see any swapping happening
(vmstat)

Some examples: Also when the memory deallocates, I start stacking
connections (again, performance goes to heck).

FC6, postgres 8.2 on a dual quad core intel box, 8 gigs of physical RAM

Between 8:17:36 and 8:17:57, my system goes from almost all memory
consumed to almost all memory free.. Previous to this big swing, Cache
goes from 7 gigs down to what you see here, connections start stacking
and IOWAIT goes thru the roof.

1200932256 - Mon Jan 21 08:17:36 2008 - qdb02.gc.sv.admission.net - 0
sec. elapsed.
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
 0  2   9072 297460   4904 1543316    0    0    52   175    2    2 21
3 75  2  0

1200932277 - Mon Jan 21 08:17:57 2008 - qdb02.gc.sv.admission.net - 2
sec. elapsed.
procs -----------memory---------- ---swap-- -----io---- --system--
-----cpu------
 r  b   swpd   free   buff  cache   si   so    bi    bo   in   cs us sy id wa st
18  6   9072 6354560   6128 1574324    0    0    52   175    2    2 21
 3 75  2  0

At this point I have connections stacking (simple queries),  Until
what I believe is that the kernel starts to allocate memory for the dB
(or the db starts to be moved from disk to memory?) (queries, tables
etc).. However performance is still dismal and I do see high IOWAIT
when I don't have sufficient memory for the DB (again my feeling). If
I do a reindex at this time, the system alocates most of my RAM and my
queries are subsecond agiain and I work fine until the next time, the
DB appears to be flushed from RAM.

I'm having a hard time understanding what I'm seeing. I do grab
netstat/ps/free/vmstat/netstat every few seconds so I have insight as
to the systems health when these things happen.

I suspect I've outgrown our initial postgres config, or there are more
sysctl or other kernel tweaks that need to happen.

I really am just trying to understand the memory allocation etc. And
why my memory goes from fully utilized to 90% free and DB performance
goes to heck.

I'll provide more info as needed, and my apologies for this being a
bit scattered, but I'm really confused.  I'm either running out of a
resource or other (but no errors in any logs, postgres or otherwise)..

Thanks
Tory

Re: Postgres 8.2 memory weirdness

От
Greg Smith
Дата:
On Wed, 23 Jan 2008, Tory M Blue wrote:

> I have hundreds of thousands of updates, inserts a day. But what I'm
> seeing is my server appears to "deallocate" memory (for the lack of a
> better term) and performance goes to heck, slow response, a sub second
> query takes anywhere from 6-40 seconds to complete when this happens.

Generally if you have a system doing many updates and inserts that slows
for that long, it's because it hit a checkpoint.  I'm not sure what your
memory-related issues are but it's possible that might be from a backlog
of sessions using memory that are stuck behind the checkpoint,
particularly since you mention simple query connections stacking up during
these periods.

In any case you should prove/disprove this is checkpoint-related behavior
before you chase down something more esoteric.  There's a quick intro to
this area in the "Monitoring checkpoints" section of
http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm and the
later sections go into what you can do about it.

> I suspect I've outgrown our initial postgres config, or there are more
> sysctl or other kernel tweaks that need to happen.

You should post a list of what you're changed from the defaults.  You're
analyzing from the perspective where you assume it's a memory problem and
a look at your config will give a better idea whether that's possible or
not.  Other good things to mention:  exact 8.2 version, OS, total memory,
outline of disk configuration.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Postgres 8.2 memory weirdness

От
"Tory M Blue"
Дата:
On Jan 23, 2008 4:31 PM, Greg Smith <gsmith@gregsmith.com> wrote:

> Generally if you have a system doing many updates and inserts that slows
> for that long, it's because it hit a checkpoint.  I'm not sure what your
> memory-related issues are but it's possible that might be from a backlog
> of sessions using memory that are stuck behind the checkpoint,
> particularly since you mention simple query connections stacking up during
> these periods.
>
> In any case you should prove/disprove this is checkpoint-related behavior
> before you chase down something more esoteric.  There's a quick intro to
> this area in the "Monitoring checkpoints" section of
> http://www.westnet.com/~gsmith/content/postgresql/chkp-bgw-83.htm and the
> later sections go into what you can do about it.
>

>

Thanks Greg,

Particulars:

Postg: 8.2.1fc6
Fedora FC6: 2.6.19-1.2911.fc6
Dell 2950, Dual quad core
8 Gigs of Ram
Lefthand Iscsi; 48 drives


Postgres.conf

max_connections = 300
shared_buffers = 75000   <--- Believe these need tuning (based on the
reading last night)
max_prepared_transactions = 0
work_mem = 102400
maintenance_work_mem = 65536
max_fsm_pages = 1087500 <-- modified last night, based on warnings in log
max_fsm_relations = 430
fsync = true
checkpoint_segments = 50
checkpoint_timeout = 300
checkpoint_warning = 3600s              <--- set this last night and
already see instances of

"2008-01-24 03:54:39 PST    LOG:  checkpoints are occurring too
frequently (89 seconds apart)
2008-01-24 03:54:39 PST    HINT:  Consider increasing the
configuration parameter "checkpoint_segments"."

effective_cache_size = 330000  <-- This appears totally wrong and
something I noticed last night.  left over from previous versions of
postgres on different hardware. (thinking to set this to 6-7G)

autovacuum = on
autovacuum_analyze_threshold = 2000

Thanks for  the link, I read lots of good information last night and
will start pushing forward with some changes in my test area.

Any insight into what my current settings are telling you is appreciated

-Tory

Re: Postgres 8.2 memory weirdness

От
Greg Smith
Дата:
On Thu, 24 Jan 2008, Tory M Blue wrote:

> Postg: 8.2.1fc6

8.2.1 has a nasty bug related to statistics collection that causes
performance issues exactly in the kind of heavy update situation you're
in.  That's actually why i asked for the exact 8.2 version.  You should
plan an upgrade as soon as feasible to the current release just to
eliminate this as a possible influence on your problems.  No need to dump
the database or do anything fancy, just get the new version going and
point it at the existing database.

To do a quick check on whether this is impacting things, run top, press
"c" to show the full process lines, and note whether the statistics
collector process is taking up a significant amount of CPU time.  If it
is, you're being nailed by the bug, and you really need that ugprade.

> 8 Gigs of Ram
> shared_buffers = 75000   <--- Believe these need tuning (based on the
> reading last night)

Probably, but if you're having checkpoint problems now making
shared_buffers bigger will likely make them worse.  Some people with
update-heavy workloads end up reducing this to a very small value (<250MB)
even with large amounts of RAM because that makes less information to dump
at checkpoint time.

> checkpoint_segments = 50
> checkpoint_timeout = 300
> checkpoint_warning = 3600s              <--- set this last night and
> already see instances of
>
> "2008-01-24 03:54:39 PST    LOG:  checkpoints are occurring too
> frequently (89 seconds apart)
> 2008-01-24 03:54:39 PST    HINT:  Consider increasing the
> configuration parameter "checkpoint_segments"."

If you're getting checkpoints every 89 seconds it's no wonder your system
is dying.  You may need to consider a large increase to
checkpoint_segments to get the interval between checkpoints to increase.
It should at least be a few minutes between them if you want any
reasonable performance level.

> effective_cache_size = 330000  <-- This appears totally wrong and
> something I noticed last night.  left over from previous versions of
> postgres on different hardware. (thinking to set this to 6-7G)

Right, that's where it should be.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Postgres 8.2 memory weirdness

От
"Tory M Blue"
Дата:
On Jan 24, 2008 10:49 AM, Greg Smith <gsmith@gregsmith.com> wrote:
> 8.2.1 has a nasty bug related to statistics collection that causes
> performance issues exactly in the kind of heavy update situation you're
> in.  That's actually why i asked for the exact 8.2 version.  You should
> plan an upgrade as soon as feasible to the current release just to
> eliminate this as a possible influence on your problems.  No need to dump
> the database or do anything fancy, just get the new version going and
> point it at the existing database.

Not seeing any excessive cpu from the stats collector process.. So
maybe not being hit with this bug.

 PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  P SWAP
TIME COMMAND
28445 postgres  15   0  7432  828  408 S    0  0.0   4:15.47 3 6604
4:15 postgres: stats collector process

With the above said, we had started sometime ago to move 8.2.5 into
our environments, so that should be on these servers next week (the
push is a slow process, but we are really liking what we are seeing
for 8.3, so I'm hoping once blessed, i'll push it  thru quickly)..


> > checkpoint_segments = 50
> > checkpoint_timeout = 300
> > checkpoint_warning = 3600s              <--- set this last night and
>
> If you're getting checkpoints every 89 seconds it's no wonder your system
> is dying.  You may need to consider a large increase to
> checkpoint_segments to get the interval between checkpoints to increase.
> It should at least be a few minutes between them if you want any
> reasonable performance level.

I doubled the checkpoint segments yesterday and have not seen any
warnings. Will run with segments of 100 for a while and see how things
look.. Anyway to make sure that there is not a number between 50 and
100 that makes more sense?


> > effective_cache_size = 330000  <-- This appears totally wrong and
> > something I noticed last night.  left over from previous versions of
> > postgres on different hardware. (thinking to set this to 6-7G)
>
> Right, that's where it should be.

We have started some performance analysis and this numvber is sure
affecting performance in good ways by having it set semi correctly.
This has not been pushed (too many changes), but we will continue
performance testing and it will probably make it to prod next week.

Thanks for some sanity checks here Greg, it's truly appreciated.

Tory

Re: Postgres 8.2 memory weirdness

От
Greg Smith
Дата:
On Fri, 25 Jan 2008, Tory M Blue wrote:

> I doubled the checkpoint segments yesterday and have not seen any
> warnings. Will run with segments of 100 for a while and see how things
> look.. Anyway to make sure that there is not a number between 50 and
> 100 that makes more sense?

More segments means more disk space taken up with them and a longer crash
recovery.  Those are the downsides; if you can live with those there's no
reason to run at <100 if that works for you.  Fine-tuning here isn't
really that helpful.

I'm a little confused by your report through because you should still be
seeing regular checkpoint warnings if you set checkpoint_warning = 3600s ,
they should just be spaced further apart.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

Re: Postgres 8.2 memory weirdness

От
"Tory M Blue"
Дата:
On Jan 27, 2008 4:08 PM, Greg Smith <gsmith@gregsmith.com> wrote:

>
> More segments means more disk space taken up with them and a longer crash
> recovery.  Those are the downsides; if you can live with those there's no
> reason to run at <100 if that works for you.  Fine-tuning here isn't
> really that helpful.
>
> I'm a little confused by your report through because you should still be
> seeing regular checkpoint warnings if you set checkpoint_warning = 3600s ,
> they should just be spaced further apart.

I'm not seeing any warnings at all.

[idb01 ~]$ sudo cat /data/logs/pgsql-27.log  | grep -i check
[idb01 ~]$ sudo cat /data/logs/pgsql-26.log  | grep -i check
[idb01 ~]$ sudo cat /data/logs/pgsql-25.log  | grep -i check
[idb01 ~]$ sudo cat /data/logs/pgsql-24.log  | grep -i check
2008-01-24 03:54:39 PST    LOG:  checkpoints are occurring too
frequently (89 seconds apart)
2008-01-24 03:54:39 PST    HINT:  Consider increasing the
configuration parameter "checkpoint_segments".
2008-01-24 07:26:25 PST    LOG:  checkpoints are occurring too
frequently (106 seconds apart)
2008-01-24 07:26:25 PST    HINT:  Consider increasing the
configuration parameter "checkpoint_segments".
2008-01-24 11:34:18 PST    LOG:  checkpoints are occurring too
frequently (173 seconds apart)
2008-01-24 11:34:18 PST    HINT:  Consider increasing the
configuration parameter "checkpoint_segments".

Segment config still:

# - Checkpoints -
checkpoint_segments = 100               # bumped from 50
checkpoint_timeout = 300                # range 30s-1h
#checkpoint_warning = 30s               # 0 is off
checkpoint_warning = 3600s              # 0 is off

No warnings in my logs, I see some LOG information but it pertains to
slon and not postgres directly.

Ideas?!

Thanks again

Tory