Обсуждение: Any advice on debugging hanging postgresql-8.1.21 (many postmaster's)
Hello Postgres users,
I have a Linux website with phpBB serving a small Facebook game:
# uname -a
Linux XXXXX 2.6.18-194.17.1.el5 #1 SMP Wed Sep 29 12:50:31 EDT 2010
x86_64 x86_64 x86_64 GNU/Linux
# cat /etc/redhat-release
CentOS release 5.5 (Final)
# rpm -qa | grep -i postgres
postgresql-devel-8.1.21-1.el5_5.1
postgresql-8.1.21-1.el5_5.1
postgresql-server-8.1.21-1.el5_5.1
postgresql-devel-8.1.21-1.el5_5.1
postgresql-libs-8.1.21-1.el5_5.1
postgresql-libs-8.1.21-1.el5_5.1
postgresql-docs-8.1.21-1.el5_5.1
It works generally ok and with little load since June.
(My game has less than 100 users and I have AMD/Quad+4GB)
On Friday I've installed a cronjob (full source code at the bottom):
3 3 * * * psql -a -f
$HOME/bin/clean-phpbb-forum.sql
and on Monday I've found out, that the site is overloaded:
top - 10:16:56 up 3 days, 23:56, 1 user, load average: 20.55, 21.38, 22.92
Tasks: 157 total, 24 running, 133 sleeping, 0 stopped, 0 zombie
Cpu(s): 98.9%us, 1.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st
Mem: 4019028k total, 3073968k used, 945060k free, 50604k buffers
Swap: 2104496k total, 132k used, 2104364k free, 2316196k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
28974 postgres 16 0 122m 14m 10m R 73.3 0.4 20:56.06 postmaster
28727 postgres 16 0 121m 13m 10m R 54.7 0.3 37:58.58 postmaster
28714 postgres 16 0 122m 14m 10m R 50.4 0.4 38:38.98 postmaster
29412 postgres 16 0 121m 13m 10m R 46.5 0.4 8:03.96 postmaster
28542 postgres 16 0 122m 14m 10m R 46.2 0.4 58:49.38 postmaster
28482 postgres 16 0 122m 14m 10m R 45.8 0.4 61:53.37 postmaster
28468 postgres 16 0 122m 14m 10m R 44.2 0.4 62:46.17 postmaster
29017 postgres 16 0 122m 14m 10m R 43.9 0.4 19:17.06 postmaster
28929 postgres 15 0 122m 14m 10m R 42.2 0.4 22:01.43 postmaster
28500 postgres 16 0 122m 14m 10m R 41.3 0.4 59:40.23 postmaster
28460 postgres 16 0 122m 14m 10m R 40.6 0.4 64:17.16 postmaster
28894 postgres 16 0 122m 14m 10m R 38.6 0.4 23:35.53 postmaster
28489 postgres 16 0 122m 14m 10m R 36.0 0.4 60:32.59 postmaster
28719 postgres 15 0 121m 13m 10m R 25.2 0.3 38:10.33 postmaster
29496 postgres 16 0 121m 13m 10m R 22.9 0.4 4:20.32 postmaster
28556 postgres 15 0 122m 14m 10m R 17.7 0.4 57:32.62 postmaster
28735 postgres 15 0 122m 14m 10m R 15.7 0.4 36:09.45 postmaster
29602 postgres 15 0 119m 11m 9680 S 8.2 0.3 0:00.25 postmaster
28457 postgres 17 0 122m 14m 9m R 3.6 0.4 64:34.38 postmaster
26092 apache 15 0 238m 16m 3740 S 0.3 0.4 0:03.38 httpd
29596 afarber 15 0 12744 1116 800 R 0.3 0.0 0:00.09 top
1 root 15 0 10352 700 592 S 0.0 0.0 0:01.69 init
I understand, that I probably supply not enough info,
but how would you approach debugging this problem?
I've run my cronjob numerous times from CLI - it works ok
and takes only few seconds. I've installed it as an every-3 mins
cronjob on my development VM - it works ok there too.
My biggest problem is, that I don't see any information
from Postgres at the production site - why did it have to start
so many postmaster's (whatever those processes do).
The only log file I've found has been /var/lib/pgsql/pgstartup.log
and there is nothing suspicious there. (Also nothing related to Postgres
in messages, mcelog (empty), audit.log, access_log, error_log).
Please give me some hints
Alex
# cat bin/clean-phpbb-forum.sql
start transaction;
delete from phpbb_users
where user_type=1 and user_inactive_reason=1 and user_lastvisit=0 and
age(to_timestamp(user_regdate))>interval '3 days';
create temp table old_topics (topic_id integer) on commit delete rows;
create temp table old_posts (post_id integer) on commit delete rows;
insert into old_topics select topic_id from phpbb_topics
where forum_id=5 and topic_poster=1 and
age(to_timestamp(topic_time))>interval '7 days';
-- select count(topic_id) as "old topics:" from old_topics;
insert into old_posts select p.post_id from phpbb_posts p, old_topics t
where p.forum_id=5 and p.poster_id=1 and p.topic_id=t.topic_id;
-- select count(post_id) as "old posts:" from old_posts;
delete from phpbb_posts where post_id in (select post_id from old_posts);
delete from phpbb_topics where topic_id in (select topic_id from old_topics);
update phpbb_config set
config_value = (select count(topic_id) from phpbb_topics)
where config_name = 'num_topics';
update phpbb_config set
config_value = (select count(post_id) from phpbb_posts)
where config_name = 'num_posts';
update phpbb_users set
user_posts = (select count(post_id) from phpbb_posts where poster_id=1)
where user_id = 1;
update phpbb_forums set
forum_posts = (select count(post_id) from phpbb_posts),
forum_topics = (select count(topic_id) from phpbb_topics),
forum_topics_real = (select count(topic_id) from phpbb_topics)
where forum_id = 5;
commit
Re: Any advice on debugging hanging postgresql-8.1.21 (many postmaster's)
От
Rajesh Kumar Mallah
Дата:
Dear Farber,
Is there anything that stops you from upgrading to a more respectable release number?
Eg 9.0 or at least latest 8.4.X
Regds
Rajesh Kumar Mallah.
Is there anything that stops you from upgrading to a more respectable release number?
Eg 9.0 or at least latest 8.4.X
Regds
Rajesh Kumar Mallah.
Alexander Farber skrev 2010-10-04 11.20: > Hello Postgres users, > > I have a Linux website with phpBB serving a small Facebook game: > > # uname -a > Linux XXXXX 2.6.18-194.17.1.el5 #1 SMP Wed Sep 29 12:50:31 EDT 2010 > x86_64 x86_64 x86_64 GNU/Linux > # cat /etc/redhat-release > CentOS release 5.5 (Final) > # rpm -qa | grep -i postgres > postgresql-devel-8.1.21-1.el5_5.1 > postgresql-8.1.21-1.el5_5.1 > postgresql-server-8.1.21-1.el5_5.1 > postgresql-devel-8.1.21-1.el5_5.1 > postgresql-libs-8.1.21-1.el5_5.1 > postgresql-libs-8.1.21-1.el5_5.1 > postgresql-docs-8.1.21-1.el5_5.1 > > It works generally ok and with little load since June. > (My game has less than 100 users and I have AMD/Quad+4GB) > > On Friday I've installed a cronjob (full source code at the bottom): > > 3 3 * * * psql -a -f > $HOME/bin/clean-phpbb-forum.sql > > and on Monday I've found out, that the site is overloaded: > > top - 10:16:56 up 3 days, 23:56, 1 user, load average: 20.55, 21.38, 22.92 > Tasks: 157 total, 24 running, 133 sleeping, 0 stopped, 0 zombie > Cpu(s): 98.9%us, 1.0%sy, 0.0%ni, 0.0%id, 0.0%wa, 0.0%hi, 0.0%si, 0.0%st > Mem: 4019028k total, 3073968k used, 945060k free, 50604k buffers > Swap: 2104496k total, 132k used, 2104364k free, 2316196k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 28974 postgres 16 0 122m 14m 10m R 73.3 0.4 20:56.06 postmaster > 28727 postgres 16 0 121m 13m 10m R 54.7 0.3 37:58.58 postmaster > 28714 postgres 16 0 122m 14m 10m R 50.4 0.4 38:38.98 postmaster > 29412 postgres 16 0 121m 13m 10m R 46.5 0.4 8:03.96 postmaster > 28542 postgres 16 0 122m 14m 10m R 46.2 0.4 58:49.38 postmaster > 28482 postgres 16 0 122m 14m 10m R 45.8 0.4 61:53.37 postmaster > 28468 postgres 16 0 122m 14m 10m R 44.2 0.4 62:46.17 postmaster > 29017 postgres 16 0 122m 14m 10m R 43.9 0.4 19:17.06 postmaster > 28929 postgres 15 0 122m 14m 10m R 42.2 0.4 22:01.43 postmaster > 28500 postgres 16 0 122m 14m 10m R 41.3 0.4 59:40.23 postmaster > 28460 postgres 16 0 122m 14m 10m R 40.6 0.4 64:17.16 postmaster > 28894 postgres 16 0 122m 14m 10m R 38.6 0.4 23:35.53 postmaster > 28489 postgres 16 0 122m 14m 10m R 36.0 0.4 60:32.59 postmaster > 28719 postgres 15 0 121m 13m 10m R 25.2 0.3 38:10.33 postmaster > 29496 postgres 16 0 121m 13m 10m R 22.9 0.4 4:20.32 postmaster > 28556 postgres 15 0 122m 14m 10m R 17.7 0.4 57:32.62 postmaster > 28735 postgres 15 0 122m 14m 10m R 15.7 0.4 36:09.45 postmaster > 29602 postgres 15 0 119m 11m 9680 S 8.2 0.3 0:00.25 postmaster > 28457 postgres 17 0 122m 14m 9m R 3.6 0.4 64:34.38 postmaster > 26092 apache 15 0 238m 16m 3740 S 0.3 0.4 0:03.38 httpd > 29596 afarber 15 0 12744 1116 800 R 0.3 0.0 0:00.09 top > 1 root 15 0 10352 700 592 S 0.0 0.0 0:01.69 init > > I understand, that I probably supply not enough info, > but how would you approach debugging this problem? I would connect to the db with psql and query the pg_stat_activity and the pg_locks views. > > I've run my cronjob numerous times from CLI - it works ok > and takes only few seconds. I've installed it as an every-3 mins > cronjob on my development VM - it works ok there too. > > My biggest problem is, that I don't see any information > from Postgres at the production site - why did it have to start > so many postmaster's (whatever those processes do). Those are backends running queries. > > The only log file I've found has been /var/lib/pgsql/pgstartup.log > and there is nothing suspicious there. (Also nothing related to Postgres > in messages, mcelog (empty), audit.log, access_log, error_log). You might want to increase logging. Take a look at http://www.postgresql.org/docs/8.1/interactive/runtime-config-logging.html for details. Regards /roppert > > Please give me some hints > Alex > > # cat bin/clean-phpbb-forum.sql > start transaction; > > delete from phpbb_users > where user_type=1 and user_inactive_reason=1 and user_lastvisit=0 and > age(to_timestamp(user_regdate))>interval '3 days'; > > create temp table old_topics (topic_id integer) on commit delete rows; > create temp table old_posts (post_id integer) on commit delete rows; > > insert into old_topics select topic_id from phpbb_topics > where forum_id=5 and topic_poster=1 and > age(to_timestamp(topic_time))>interval '7 days'; > > -- select count(topic_id) as "old topics:" from old_topics; > > insert into old_posts select p.post_id from phpbb_posts p, old_topics t > where p.forum_id=5 and p.poster_id=1 and p.topic_id=t.topic_id; > > -- select count(post_id) as "old posts:" from old_posts; > > delete from phpbb_posts where post_id in (select post_id from old_posts); > delete from phpbb_topics where topic_id in (select topic_id from old_topics); > > update phpbb_config set > config_value = (select count(topic_id) from phpbb_topics) > where config_name = 'num_topics'; > > update phpbb_config set > config_value = (select count(post_id) from phpbb_posts) > where config_name = 'num_posts'; > > update phpbb_users set > user_posts = (select count(post_id) from phpbb_posts where poster_id=1) > where user_id = 1; > > update phpbb_forums set > forum_posts = (select count(post_id) from phpbb_posts), > forum_topics = (select count(topic_id) from phpbb_topics), > forum_topics_real = (select count(topic_id) from phpbb_topics) > where forum_id = 5; > > commit >
Hello again,
ouch, I have these postmaster's spinning up again -
even though I've disabled the cronjob and rebooted:
top - 11:42:31 up 1:22, 2 users, load average: 9.15, 7.59, 5.03
Tasks: 135 total, 10 running, 125 sleeping, 0 stopped, 0 zombie
Cpu0 : 31.7%us, 0.5%sy, 0.0%ni, 67.4%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 32.9%us, 0.7%sy, 0.0%ni, 65.8%id, 0.1%wa, 0.0%hi, 0.5%si, 0.0%st
Cpu2 : 58.8%us, 1.3%sy, 0.0%ni, 38.2%id, 1.3%wa, 0.0%hi, 0.5%si, 0.0%st
Cpu3 : 21.7%us, 0.5%sy, 0.0%ni, 77.1%id, 0.1%wa, 0.0%hi, 0.5%si, 0.0%st
Mem: 4019028k total, 1512432k used, 2506596k free, 21828k buffers
Swap: 2104496k total, 0k used, 2104496k free, 1152844k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3208 postgres 17 0 122m 14m 9m R 143.9 0.4 30:22.79 postmaster
3193 postgres 18 0 122m 14m 9m R 117.7 0.4 34:04.08 postmaster
3263 postgres 18 0 120m 13m 9m R 109.4 0.3 11:06.24 postmaster
3315 postgres 17 0 121m 13m 9.9m R 90.7 0.4 1:59.89 postmaster
3275 postgres 18 0 121m 13m 9m R 83.2 0.4 6:44.74 postmaster
3243 postgres 18 0 122m 14m 9m R 69.0 0.4 20:04.06 postmaster
3233 postgres 17 0 122m 14m 9m R 47.2 0.4 25:01.18 postmaster
3256 postgres 18 0 122m 14m 9m R 35.2 0.4 14:48.54 postmaster
On Mon, Oct 4, 2010 at 11:34 AM, Robert Gravsjö <robert@blogg.se> wrote:
> I would connect to the db with psql and query the pg_stat_activity and the
> pg_locks views.
pref=> select * from pg_stat_activity;
datid | datname | procpid | usesysid | usename | current_query
| query_start | backend_start | client_addr |
client_port
-------+---------+---------+----------+---------+------------------------------+-------------+-------------------------------+-------------+-------------
16384 | pref | 2681 | 16385 | pref | <command string not
enabled> | | 2010-10-04 10:22:53.051483+02 | |
-1
16384 | pref | 3193 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:17:40.081829+02 | |
-1
16384 | pref | 3208 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:20:18.763179+02 | |
-1
16384 | pref | 3233 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:22:50.818701+02 | |
-1
16384 | pref | 3243 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:25:34.110573+02 | |
-1
16384 | pref | 3256 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:28:33.355689+02 | |
-1
16384 | pref | 3263 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:32:02.116546+02 | |
-1
16384 | pref | 3275 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:35:31.611361+02 | |
-1
16384 | pref | 3315 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:40:00.119433+02 | |
-1
16384 | pref | 3355 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:42:51.827796+02 | |
-1
16384 | pref | 3356 | 16385 | pref | <command string not
enabled> | | 2010-10-04 11:43:04.285152+02 | |
-1
(11 rows)
pref=> select * from pg_locks;
locktype | database | relation | page | tuple | transactionid |
classid | objid | objsubid | transaction | pid | mode |
granted
---------------+----------+----------+------+-------+---------------+---------+-------+----------+-------------+------+-----------------+---------
relation | 16384 | 17755 | | | |
| | | 5677546 | 3275 | AccessShareLock | t
relation | 16384 | 17313 | | | |
| | | 5677546 | 3275 | AccessShareLock | t
relation | 16384 | 16858 | | | |
| | | 5677313 | 3243 | AccessShareLock | t
transactionid | | | | | 5677313 |
| | | 5677313 | 3243 | ExclusiveLock | t
relation | 16384 | 16858 | | | |
| | | 5677742 | 3315 | AccessShareLock | t
relation | 16384 | 17755 | | | |
| | | 5677395 | 3256 | AccessShareLock | t
relation | 16384 | 17716 | | | |
| | | 5677313 | 3243 | AccessShareLock | t
relation | 16384 | 17313 | | | |
| | | 5677313 | 3243 | AccessShareLock | t
relation | 16384 | 17755 | | | |
| | | 5677070 | 3193 | AccessShareLock | t
relation | 16384 | 17716 | | | |
| | | 5677234 | 3233 | AccessShareLock | t
relation | 16384 | 10342 | | | |
| | | 5677823 | 3355 | AccessShareLock | t
transactionid | | | | | 5677234 |
| | | 5677234 | 3233 | ExclusiveLock | t
relation | 16384 | 17716 | | | |
| | | 5677454 | 3263 | AccessShareLock | t
transactionid | | | | | 5677167 |
| | | 5677167 | 3208 | ExclusiveLock | t
relation | 16384 | 17716 | | | |
| | | 5677546 | 3275 | AccessShareLock | t
relation | 16384 | 17716 | | | |
| | | 5677167 | 3208 | AccessShareLock | t
relation | 16384 | 17313 | | | |
| | | 5677070 | 3193 | AccessShareLock | t
relation | 16384 | 17721 | | | |
| | | 5677167 | 3208 | AccessShareLock | t
transactionid | | | | | 5677070 |
| | | 5677070 | 3193 | ExclusiveLock | t
relation | 16384 | 16858 | | | |
| | | 5677167 | 3208 | AccessShareLock | t
relation | 16384 | 17721 | | | |
| | | 5677742 | 3315 | AccessShareLock | t
transactionid | | | | | 5677823 |
| | | 5677823 | 3355 | ExclusiveLock | t
transactionid | | | | | 5677454 |
| | | 5677454 | 3263 | ExclusiveLock | t
relation | 16384 | 16858 | | | |
| | | 5677546 | 3275 | AccessShareLock | t
relation | 16384 | 17721 | | | |
| | | 5677313 | 3243 | AccessShareLock | t
relation | 16384 | 17755 | | | |
| | | 5677454 | 3263 | AccessShareLock | t
relation | 16384 | 17716 | | | |
| | | 5677742 | 3315 | AccessShareLock | t
relation | 16384 | 17313 | | | |
| | | 5677167 | 3208 | AccessShareLock | t
relation | 16384 | 16858 | | | |
| | | 5677454 | 3263 | AccessShareLock | t
relation | 16384 | 17716 | | | |
| | | 5677070 | 3193 | AccessShareLock | t
relation | 16384 | 17721 | | | |
| | | 5677234 | 3233 | AccessShareLock | t
relation | 16384 | 17721 | | | |
| | | 5677546 | 3275 | AccessShareLock | t
relation | 16384 | 17755 | | | |
| | | 5677234 | 3233 | AccessShareLock | t
transactionid | | | | | 5677742 |
| | | 5677742 | 3315 | ExclusiveLock | t
relation | 16384 | 16858 | | | |
| | | 5677070 | 3193 | AccessShareLock | t
relation | 16384 | 17716 | | | |
| | | 5677395 | 3256 | AccessShareLock | t
relation | 16384 | 17755 | | | |
| | | 5677167 | 3208 | AccessShareLock | t
relation | 16384 | 17313 | | | |
| | | 5677742 | 3315 | AccessShareLock | t
relation | 16384 | 17313 | | | |
| | | 5677234 | 3233 | AccessShareLock | t
relation | 16384 | 16858 | | | |
| | | 5677395 | 3256 | AccessShareLock | t
relation | 16384 | 17755 | | | |
| | | 5677742 | 3315 | AccessShareLock | t
relation | 16384 | 17721 | | | |
| | | 5677070 | 3193 | AccessShareLock | t
transactionid | | | | | 5677395 |
| | | 5677395 | 3256 | ExclusiveLock | t
relation | 16384 | 17721 | | | |
| | | 5677395 | 3256 | AccessShareLock | t
relation | 16384 | 16858 | | | |
| | | 5677234 | 3233 | AccessShareLock | t
relation | 16384 | 17755 | | | |
| | | 5677313 | 3243 | AccessShareLock | t
relation | 16384 | 17313 | | | |
| | | 5677454 | 3263 | AccessShareLock | t
relation | 16384 | 17721 | | | |
| | | 5677454 | 3263 | AccessShareLock | t
transactionid | | | | | 5677546 |
| | | 5677546 | 3275 | ExclusiveLock | t
relation | 16384 | 17313 | | | |
| | | 5677395 | 3256 | AccessShareLock | t
(50 rows)
Regards
Alex
I wish I could see those hanging queries, what SQL do they try to execute:
pref=> select * from pg_stat_activity;
datid | datname | procpid | usesysid | usename | current_query
| query_start | backend_start | client_addr |
client_port
-------+---------+---------+----------+---------+------------------------------+-------------+-------------------------------+-------------+-------------
16384 | pref | 2681 | 16385 | pref | <command string not
enabled> | | 2010-10-04 10:22:53.051483+02 | |
-1
.........
I have usually just one Postgres connection from my game
(a perl script running as daemon). And then I have several
connections from phpBB. But when I restart httpd,
the spinning postmaster's don't disappear:
top - 11:48:11 up 1:28, 1 user, load average: 9.85, 8.68, 6.25
Tasks: 126 total, 12 running, 114 sleeping, 0 stopped, 0 zombie
Cpu0 : 36.1%us, 0.5%sy, 0.0%ni, 63.0%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st
Cpu1 : 37.1%us, 0.8%sy, 0.0%ni, 61.6%id, 0.1%wa, 0.0%hi, 0.5%si, 0.0%st
Cpu2 : 61.4%us, 1.3%sy, 0.0%ni, 35.7%id, 1.2%wa, 0.0%hi, 0.5%si, 0.0%st
Cpu3 : 26.6%us, 0.6%sy, 0.0%ni, 72.1%id, 0.1%wa, 0.0%hi, 0.5%si, 0.0%st
Mem: 4019028k total, 1428256k used, 2590772k free, 22324k buffers
Swap: 2104496k total, 0k used, 2104496k free, 1199036k cached
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
3263 postgres 16 0 122m 14m 9m R 116.0 0.4 16:16.83 postmaster
3208 postgres 16 0 122m 14m 9m R 112.0 0.4 35:16.08 postmaster
3275 postgres 16 0 121m 13m 9m R 110.6 0.4 11:34.32 postmaster
3315 postgres 16 0 121m 13m 9m R 100.5 0.4 7:16.93 postmaster
3193 postgres 16 0 122m 14m 9m R 96.4 0.4 38:44.78 postmaster
3233 postgres 16 0 122m 14m 9m R 68.8 0.4 29:28.90 postmaster
3243 postgres 16 0 122m 14m 9m R 53.3 0.4 25:13.96 postmaster
3256 postgres 16 0 122m 14m 9m R 34.4 0.4 19:23.93 postmaster
On 4/10/2010 5:20 PM, Alexander Farber wrote: > # rpm -qa | grep -i postgres > postgresql-8.1.21-1.el5_5.1 Mmm, elderly. > On Friday I've installed a cronjob (full source code at the bottom): > > 3 3 * * * psql -a -f > $HOME/bin/clean-phpbb-forum.sql Do you rate-control it or provide any sort of interlocking so that you don't land up with multiple copies running at once? See "man 2 flock" for a convenient way to do that reliably from the shell. > My biggest problem is, that I don't see any information > from Postgres at the production site - why did it have to start > so many postmaster's (whatever those processes do). http://wiki.postgresql.org/wiki/FAQ#Why_does_PostgreSQL_have_so_many_processes.2C_even_when_idle.3F As for what they're doing: see pg_stat_activity. -- Craig Ringer Tech-related writing at http://soapyfrogs.blogspot.com/
Thank you, I've enabled logging, but couldn't see anything suspicious there. Also the 60%-120% CPU-eating postmaster processes are now suddenly gone. I suspect this is some bug in the postgresql-server-8.1.21-1.el5_5.1 delivered with CenOS 5.5/64 bit, because I don't have many simultaneous users ( < 100) and not that many queries. I'll try to upgrade to 8.4.4 as described at http://www.pgrpms.org/ later. I'm a bit confused though that they do not support "yum update" anymore (I think that worked earlier with PGDG....) Regards Alex
Alexander Farber skrev 2010-10-04 11.48: > I wish I could see those hanging queries, what SQL do they try to execute: To see the query you need to enable stats_command_string in your postgresql.conf. See: http://www.postgresql.org/docs/8.1/interactive/runtime-config-statistics.html You might also benefit from chapter 24: http://www.postgresql.org/docs/8.1/interactive/monitoring.html Regards, roppert > > pref=> select * from pg_stat_activity; > datid | datname | procpid | usesysid | usename | current_query > | query_start | backend_start | client_addr | > client_port > -------+---------+---------+----------+---------+------------------------------+-------------+-------------------------------+-------------+------------- > 16384 | pref | 2681 | 16385 | pref |<command string not > enabled> | | 2010-10-04 10:22:53.051483+02 | | > -1 > ......... > > I have usually just one Postgres connection from my game > (a perl script running as daemon). And then I have several > connections from phpBB. But when I restart httpd, > the spinning postmaster's don't disappear: > > top - 11:48:11 up 1:28, 1 user, load average: 9.85, 8.68, 6.25 > Tasks: 126 total, 12 running, 114 sleeping, 0 stopped, 0 zombie > Cpu0 : 36.1%us, 0.5%sy, 0.0%ni, 63.0%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu1 : 37.1%us, 0.8%sy, 0.0%ni, 61.6%id, 0.1%wa, 0.0%hi, 0.5%si, 0.0%st > Cpu2 : 61.4%us, 1.3%sy, 0.0%ni, 35.7%id, 1.2%wa, 0.0%hi, 0.5%si, 0.0%st > Cpu3 : 26.6%us, 0.6%sy, 0.0%ni, 72.1%id, 0.1%wa, 0.0%hi, 0.5%si, 0.0%st > Mem: 4019028k total, 1428256k used, 2590772k free, 22324k buffers > Swap: 2104496k total, 0k used, 2104496k free, 1199036k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 3263 postgres 16 0 122m 14m 9m R 116.0 0.4 16:16.83 postmaster > 3208 postgres 16 0 122m 14m 9m R 112.0 0.4 35:16.08 postmaster > 3275 postgres 16 0 121m 13m 9m R 110.6 0.4 11:34.32 postmaster > 3315 postgres 16 0 121m 13m 9m R 100.5 0.4 7:16.93 postmaster > 3193 postgres 16 0 122m 14m 9m R 96.4 0.4 38:44.78 postmaster > 3233 postgres 16 0 122m 14m 9m R 68.8 0.4 29:28.90 postmaster > 3243 postgres 16 0 122m 14m 9m R 53.3 0.4 25:13.96 postmaster > 3256 postgres 16 0 122m 14m 9m R 34.4 0.4 19:23.93 postmaster >
I've upgraded my CentOS 5.5 VM to:
postgresql-libs-8.4.4-1PGDG.rhel5
postgresql-devel-8.4.4-1PGDG.rhel5
postgresql-8.4.4-1PGDG.rhel5
postgresql-contrib-8.4.4-1PGDG.rhel5
postgresql-docs-8.4.4-1PGDG.rhel5
postgresql-server-8.4.4-1PGDG.rhel5
postgresql-plperl-8.4.4-1PGDG.rhel5
and then have loaded my backup and
then run my cleanup script (source below)
and it would stuck:
pref=> \i bin/clean-phpbb-forum.sql
START TRANSACTION
DELETE 6
CREATE TABLE
CREATE TABLE
INSERT 0 26699
INSERT 0 129903
[... and here nothing happens for minutes....]
And in the
/var/lib/pgsql/data/pg_log/postgresql-Mon.log
see (probably because of the many inserts?)
LOG: autovacuum launcher started
LOG: database system is ready to accept connections
LOG: statement: start transaction;
LOG: statement: delete from phpbb_users
where user_type=1 and user_inactive_reason=1 and
user_lastvisit=0 and age(to_timestamp(user_regdate))>interval '3
days';
LOG: statement: create temp table old_topics (topic_id integer) on
commit delete rows;
LOG: statement: create temp table old_posts (post_id integer) on
commit delete rows;
LOG: statement: insert into old_topics select topic_id from phpbb_topics
where forum_id=5 and topic_poster=1 and
age(to_timestamp(topic_time))>interval '7 days';
LOG: statement: insert into old_posts select p.post_id from
phpbb_posts p, old_topics t
where p.forum_id=5 and p.poster_id=1 and p.topic_id=t.topic_id;
LOG: statement: delete from phpbb_posts where post_id in (select
post_id from old_posts);
LOG: checkpoints are occurring too frequently (29 seconds apart)
HINT: Consider increasing the configuration parameter "checkpoint_segments".
LOG: checkpoints are occurring too frequently (29 seconds apart)
Then after some minutes waiting,
I've pressed ctrl-Z on the pgsql and then "fg"
and this has waken PostgreSQL up for some reason:
[1]+ Stopped psql
afarber@centos:~> fg
psql
DELETE 129903
DELETE 26699
UPDATE 1
UPDATE 1
UPDATE 1
UPDATE 1
COMMIT
Could it be that my script is somehow making
PostgreSQL hang? (or wait for some input?):
cat ~/bin/clean-phpbb-forum.sql
start transaction;
delete from phpbb_users
where user_type=1 and user_inactive_reason=1 and user_lastvisit=0 and
age(to_timestamp(user_regdate))>interval '3 days';
create temp table old_topics (topic_id integer) on commit delete rows;
create temp table old_posts (post_id integer) on commit delete rows;
insert into old_topics select topic_id from phpbb_topics
where forum_id=5 and topic_poster=1 and
age(to_timestamp(topic_time))>interval '7 days';
-- select count(topic_id) as "old topics:" from old_topics;
insert into old_posts select p.post_id from phpbb_posts p, old_topics t
where p.forum_id=5 and p.poster_id=1 and p.topic_id=t.topic_id;
-- select count(post_id) as "old posts:" from old_posts;
delete from phpbb_posts where post_id in (select post_id from old_posts);
delete from phpbb_topics where topic_id in (select topic_id from old_topics);
update phpbb_config set
config_value = (select count(topic_id) from phpbb_topics)
where config_name = 'num_topics';
update phpbb_config set
config_value = (select count(post_id) from phpbb_posts)
where config_name = 'num_posts';
update phpbb_users set
user_posts = (select count(post_id) from phpbb_posts where poster_id=1)
where user_id = 1;
update phpbb_forums set
forum_posts = (select count(post_id) from phpbb_posts),
forum_topics = (select count(topic_id) from phpbb_topics),
forum_topics_real = (select count(topic_id) from phpbb_topics)
where forum_id = 5;
commit
Thank you for any insights
Alex
PS: I haven't modified postgresql.conf
or other conf files besides logging:
# diff postgresql.conf{.OLD,}
353c353
< #log_statement = 'none' # none, ddl, mod, all
---
> log_statement = 'all' # none, ddl, mod, all
# grep checkpoint postgresql.conf
#checkpoint_segments = 3 # in logfile segments, min 1, 16MB each
#checkpoint_timeout = 5min # range 30s-1h
#checkpoint_completion_target = 0.5 # checkpoint target duration, 0.0 - 1.0
#checkpoint_warning = 30s # 0 disables
#log_checkpoints = off
Specifically look into setting..... log_min_duration_statement (integer) in the postgresql.conf file. -----Original Message----- From: Robert Gravsjö [mailto:tekniksupport@blogg.se] Sent: Monday, October 04, 2010 7:22 AM To: Alexander Farber Cc: pgsql-general@postgresql.org Subject: Re: Any advice on debugging hanging postgresql-8.1.21 (many postmaster's) Alexander Farber skrev 2010-10-04 11.48: > I wish I could see those hanging queries, what SQL do they try to execute: To see the query you need to enable stats_command_string in your postgresql.conf. See: http://www.postgresql.org/docs/8.1/interactive/runtime-config-statistics.html You might also benefit from chapter 24: http://www.postgresql.org/docs/8.1/interactive/monitoring.html Regards, roppert > > pref=> select * from pg_stat_activity; > datid | datname | procpid | usesysid | usename | current_query > | query_start | backend_start | client_addr | > client_port > -------+---------+---------+----------+---------+------------------------------+-------------+-------------------------------+-------------+------------- > 16384 | pref | 2681 | 16385 | pref |<command string not > enabled> | | 2010-10-04 10:22:53.051483+02 | | > -1 > ......... > > I have usually just one Postgres connection from my game > (a perl script running as daemon). And then I have several > connections from phpBB. But when I restart httpd, > the spinning postmaster's don't disappear: > > top - 11:48:11 up 1:28, 1 user, load average: 9.85, 8.68, 6.25 > Tasks: 126 total, 12 running, 114 sleeping, 0 stopped, 0 zombie > Cpu0 : 36.1%us, 0.5%sy, 0.0%ni, 63.0%id, 0.4%wa, 0.0%hi, 0.0%si, 0.0%st > Cpu1 : 37.1%us, 0.8%sy, 0.0%ni, 61.6%id, 0.1%wa, 0.0%hi, 0.5%si, 0.0%st > Cpu2 : 61.4%us, 1.3%sy, 0.0%ni, 35.7%id, 1.2%wa, 0.0%hi, 0.5%si, 0.0%st > Cpu3 : 26.6%us, 0.6%sy, 0.0%ni, 72.1%id, 0.1%wa, 0.0%hi, 0.5%si, 0.0%st > Mem: 4019028k total, 1428256k used, 2590772k free, 22324k buffers > Swap: 2104496k total, 0k used, 2104496k free, 1199036k cached > > PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND > 3263 postgres 16 0 122m 14m 9m R 116.0 0.4 16:16.83 postmaster > 3208 postgres 16 0 122m 14m 9m R 112.0 0.4 35:16.08 postmaster > 3275 postgres 16 0 121m 13m 9m R 110.6 0.4 11:34.32 postmaster > 3315 postgres 16 0 121m 13m 9m R 100.5 0.4 7:16.93 postmaster > 3193 postgres 16 0 122m 14m 9m R 96.4 0.4 38:44.78 postmaster > 3233 postgres 16 0 122m 14m 9m R 68.8 0.4 29:28.90 postmaster > 3243 postgres 16 0 122m 14m 9m R 53.3 0.4 25:13.96 postmaster > 3256 postgres 16 0 122m 14m 9m R 34.4 0.4 19:23.93 postmaster >