Обсуждение: increased load on server

Поиск
Список
Период
Сортировка

increased load on server

От
jf
Дата:
Hi,

I'm using postgreSQL 8.3 on Ubuntu 8.04 with ltree.
I tried to tune posgresql.conf to use more memory.

In two years on production everything was fine, but today something
goes wrong: postgresql processes eat my server ressources, my load,
which was between 0 and 2.5 on my 4 cpus server, grows to 20.

When the trouble appends there was a lot of waiting queries on my ltree table.
I tried to restart postgresql server and to reboot the server, but 5
minutes later, the trouble was here again.

I did a backup/restore of my database (pg_dump -Fc / pg_restore), and
it was fine for 2 hours but after the touble reappears.

I notice that the "base" directory grows a lot when the trouble
begins. I made a 'vacuum', 'vacuum analyze', 'vacuum full' but the
disk usage isn't lower.

Please have you a clue to help me ? This is the week-end and I'm
hoping to fix this before monday.

Thanks.

Re: increased load on server

От
"Kevin Grittner"
Дата:
jf <jfmeteo@gmail.com> wrote:

> I tried to tune posgresql.conf to use more memory.

Right before the problems started?  As an attempt to deal with the
problems?  Years ago?

> In two years on production everything was fine, but today
> something goes wrong: postgresql processes eat my server
> ressources, my load, which was between 0 and 2.5 on my 4 cpus
> server, grows to 20.

The output from `vmstat 1` during such an episode might be
enlightening.

> When the trouble appends there was a lot of waiting queries on my
> ltree table.

Waiting where?  Are you talking about executing queries which are
blocked trying to acqure locks on the ltree table?

> I tried to restart postgresql server and to reboot the server, but
> 5 minutes later, the trouble was here again.

Any idea what changed during those 5 minutes?

> I did a backup/restore of my database (pg_dump -Fc / pg_restore),
> and it was fine for 2 hours but after the touble reappears.

Did you notice anything about the usage during those two hours?  Did
you take a close look at what queries were being run when things are
OK versus when they are not?

> I notice that the "base" directory grows a lot when the trouble
> begins. I made a 'vacuum', 'vacuum analyze', 'vacuum full' but the
> disk usage isn't lower.

Hopefully you are running autovacuum.  VACUUM FULL is normally not a
good idea in 8.3.  What you might want to look for is a lingering
transaction which is preventing normal autovacuum from working.
Does anything show if you select from pg_prepared_xacts?  Are there
any rows in pg_stat_activity with a xact_start value older than it
should be?  One connection sitting in status '<IDLE> in transaction'
for a long time can cause all sorts of problems.

If these hints don't point you in the right direction, please review
this page for ideas on more information you can supply to help
people understand the problem and its cause:

http://wiki.postgresql.org/wiki/Guide_to_reporting_problems

In particular, knowing your exact PostgreSQL version, the contents
of your postgresql.conf file (minus all comments), the number of
connections you have active during the problem periods, and (if
possible) listings of pg_stat_activity and pg_locks during a problem
episode.

-Kevin

Re: increased load on server

От
jf
Дата:
Hi,


>> I tried to tune posgresql.conf to use more memory.
> Right before the problems started?

Yes two years ago.


>>  postgresql processes eat my server
>> ressources, my load, which was between 0 and 2.5 on my 4 cpus
>> server, grows to 20.
>
> The output from `vmstat 1` during such an episode might be
> enlightening.

I saw those things during the problem:
 - a lot of blocks read per seconds (max to 46k) on the disk which
only contains /var/lib (PostgreSQL datas but not pg_xlog)
 - lot of blocks write on the same disk (max 20k) and on the system +
pg_xlog disk (max 25k)
 - a peak to 64k of PostgreSQL buffers allocated per second


>> When the trouble appends there was a lot of waiting queries on my
>> ltree table.
>
> Waiting where?  Are you talking about executing queries which are
> blocked trying to acqure locks on the ltree table?

pg_stat_activity shows only select queries on ltree table and "IDLE"
connections.
All of those queries had xact_start value older than it should be.

>> I tried to restart postgresql server and to reboot the server, but
>> 5 minutes later, the trouble was here again.
>
> Any idea what changed during those 5 minutes?
5 minutes: just the time for my users to reconnect to my frontend.


>> I did a backup/restore of my database (pg_dump -Fc / pg_restore),
>> and it was fine for 2 hours but after the touble reappears.
>
> Did you notice anything about the usage during those two hours?  Did
> you take a close look at what queries were being run when things are
> OK versus when they are not?
I think queries are always quite the same.


> In particular, knowing your exact PostgreSQL version,
It was PostgresSQL 8.3.9, after this I upgrade to 8.3.14.

> your postgresql.conf file (minus all comments),
data_directory = '/var/lib/postgresql/8.3/main'
hba_file = '/etc/postgresql/8.3/main/pg_hba.conf'
ident_file = '/etc/postgresql/8.3/main/pg_ident.conf'
external_pid_file = '/var/run/postgresql/8.3-main.pid'
port = 5432
max_connections = 100
shared_buffers = 64MB
work_mem       = 16MB
maintenance_work_mem  = 128MB
max_fsm_pages = 153600
log_min_duration_statement = 100
autovacuum = on
datestyle = 'iso, dmy'
lc_messages = 'fr_FR.UTF-8'
lc_monetary = 'fr_FR.UTF-8'
lc_numeric = 'fr_FR.UTF-8'
lc_time = 'fr_FR.UTF-8'
default_text_search_config = 'pg_catalog.french'

>  the number of connections you have active during the problem periods,
Connections during problem: peak to 63.
During normal use my average number is 6.5

My system:
 - RAM: 4Go
 - 2 disks:
     - sdb: contains only /var/lib,
/var/lib/postgresql/8.3/main/pg_xlog is a symlink to an other
partition on an other disk
     - sda contains partitions for /, /var, /tmp, /boot, /home
 - CPU: Intel Xeon 1.6G with 4 cores
 - No error on Linux system logs: dmesg, /var/log/syslog


Thanks.

Re: increased load on server

От
"Kevin Grittner"
Дата:
jf  wrote:

>>> I tried to restart postgresql server and to reboot the server,
>>> but 5 minutes later, the trouble was here again.
>>
>> Any idea what changed during those 5 minutes?
> 5 minutes: just the time for my users to reconnect to my frontend.

> max_fsm_pages = 153600

Probably not the main problem, but you should be sure to schedule a
VACUUM ANALYZE of the whole database (run as the database superuser)
and check the last few lines to make sure the free space manager has
enough space to prevent bloat.

> I saw those things during the problem:
> - a lot of blocks read per seconds (max to 46k) on the disk which
> only contains /var/lib (PostgreSQL datas but not pg_xlog)
> - lot of blocks write on the same disk (max 20k) and on the system
> + pg_xlog disk (max 25k)

> max_connections = 100
> shared_buffers = 64MB

> Connections during problem: peak to 63.
> During normal use my average number is 6.5
>
> My system:
> - RAM: 4Go
> - 2 disks:
> - sdb: contains only /var/lib,
> /var/lib/postgresql/8.3/main/pg_xlog is a symlink to an other
> partition on an other disk
> - sda contains partitions for /, /var, /tmp, /boot, /home
> - CPU: Intel Xeon 1.6G with 4 cores
> - No error on Linux system logs: dmesg, /var/log/syslog

You need to use some sort of connection pool to funnel those requests
through about nine or ten connections.  When you have more
connections than that active on the hardware you describe, you're
going to cut both throughput and response time, just when you don't
want those to suffer.  You want a pool which has a hard upper limit
on number of connections and will queue requests when at the limit.

Once you've done that you'll be in a better position to determine
whether your hardware is hefty enough to handle the load.  In
particular, I'm concerned that you don't have RAID, both from a
performance perspective, and from the ability to survive a drive
failure.  Those do happen.

-Kevin

Re: increased load on server

От
Shrirang Chitnis
Дата:
Hi,

I had encountered this problem when there were a large number of writes on the disk (the operation involved bulk
imports- read and writes) and disk configuration at that point of time showed latency in write operations - and inturn
slowingdown the reads. The problem was resolved with following steps: 

1) Changing disk configuration - it was a RAID 5. Had it replaced by a RAID 10 array. You may not require a large array
dueto size of database, but it will certainly help to have multiple disks. 
2) Regular VACUUM (auto vacuum/ analyze,  manual vacuum/ analyze on certain tables during day and complete db vacuum
andanalyze during weekends). 
3) Regular reindexing - I schedule reindexes on tables after every 3-4 weeks.

HTH,

Shrirang Chitnis
Sr. Manager, Applications Development
HOV Services


Office: (866) 808-0935 Ext: 39210
shrirang.chitnis@hovservices.com
www.hovservices.com

The information contained in this message, including any attachments, is attorney privileged and/or confidential
informationintended only for the use of the individual or entity named as addressee.  The review, dissemination,
distributionor copying of this communication by or to anyone other than the intended addressee is strictly prohibited.
Ifyou have received this communication in error, please immediately notify the sender by replying to the message and
destroyall copies of the original message. 

-----Original Message-----
From: Kevin Grittner [mailto:Kevin.Grittner@wicourts.gov]
Sent: Saturday, February 12, 2011 8:05 PM
To: jfmeteo@gmail.com; pgsql-admin@postgresql.org
Subject: Re: increased load on server

jf  wrote:

>>> I tried to restart postgresql server and to reboot the server,
>>> but 5 minutes later, the trouble was here again.
>>
>> Any idea what changed during those 5 minutes?
> 5 minutes: just the time for my users to reconnect to my frontend.

> max_fsm_pages = 153600

Probably not the main problem, but you should be sure to schedule a
VACUUM ANALYZE of the whole database (run as the database superuser)
and check the last few lines to make sure the free space manager has
enough space to prevent bloat.

> I saw those things during the problem:
> - a lot of blocks read per seconds (max to 46k) on the disk which
> only contains /var/lib (PostgreSQL datas but not pg_xlog)
> - lot of blocks write on the same disk (max 20k) and on the system
> + pg_xlog disk (max 25k)

> max_connections = 100
> shared_buffers = 64MB

> Connections during problem: peak to 63.
> During normal use my average number is 6.5
>
> My system:
> - RAM: 4Go
> - 2 disks:
> - sdb: contains only /var/lib,
> /var/lib/postgresql/8.3/main/pg_xlog is a symlink to an other
> partition on an other disk
> - sda contains partitions for /, /var, /tmp, /boot, /home
> - CPU: Intel Xeon 1.6G with 4 cores
> - No error on Linux system logs: dmesg, /var/log/syslog

You need to use some sort of connection pool to funnel those requests
through about nine or ten connections.  When you have more
connections than that active on the hardware you describe, you're
going to cut both throughput and response time, just when you don't
want those to suffer.  You want a pool which has a hard upper limit
on number of connections and will queue requests when at the limit.

Once you've done that you'll be in a better position to determine
whether your hardware is hefty enough to handle the load.  In
particular, I'm concerned that you don't have RAID, both from a
performance perspective, and from the ability to survive a drive
failure.  Those do happen.

-Kevin