Обсуждение: vacuum won't even start
Hi all, I've a problem on a heavy loaded database: vacuums don't work since about a week. All I got is: mybase=# vacuum verbose analyze public.mytable; INFO: vacuuming "public.mytable" (I stop it after hours) Looking with top and iotop, I see the process takes some cpu and disk io time during several minutes, then it seems to fall asleep. The process isn't locked according to pg_stat_activity. My setup: - postgresql 8.3.7 with contribs ltree and pgcrypto - OS: debian etch kernel 2.6.24 - HW: 8cores Xeon/32GB RAM/3RAID10 volumes(index, data, pgxlog) - dbase size: about 240GB - millions of queries/day - 1000 locks continually - about 200 simultanous connections - load: 30%iowait, 60%user, 10%sys Autovacuum is disabled to prevent it from loading the server during peak hours. Regular vacuums running each night as cron job Since about a week the nightly vacuums don't work. I tried manual ones with no avail, same symptoms as above on small tables (350 rows) as well as on big ones (almost 1 billion rows) As the croned vacuums don't run anymore, I see now autovacuums (to prevent wraparound) running all the time, but their process don't use any cpu time nor disk io. Autovacuum seems to work well on the pg_catalog schema. The problem seems to start with some queries lasting more 15 hours. I tried to kill them (signal 15) with no avail. I can't restart the server as it's a big production server. We're planning to upgrade the hardware soon, but I suspect we'll have the same problems in the future as our platform is growing. Does anyone have any info about this problem, and the means to prevent it ? Thanks in advance. Regards, -- JC Ph'nglui mglw'nafh Cthulhu n'gah Bill R'lyeh Wgah'nagl fhtagn!
Jean-Christophe Praud wrote: > Hi all, > > I've a problem on a heavy loaded database: vacuums don't work since > about a week. All I got is: > > mybase=# vacuum verbose analyze public.mytable; > INFO: vacuuming "public.mytable" > (I stop it after hours) > > Looking with top and iotop, I see the process takes some cpu and > disk io time during several minutes, then it seems to fall asleep. > The process isn't locked according to pg_stat_activity. What are your vacuum_cost_% parameters? -- Alvaro Herrera http://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support
Jean-Christophe Praud <jc@steek.com> writes: > I've a problem on a heavy loaded database: vacuums don't work since > about a week. All I got is: > mybase=# vacuum verbose analyze public.mytable; > INFO: vacuuming "public.mytable" > (I stop it after hours) > Looking with top and iotop, I see the process takes some cpu and disk io > time during several minutes, then it seems to fall asleep. > The process isn't locked according to pg_stat_activity. When vacuum wants to clean up a particular table page, it will wait until no other process is examining that page; and this wait is not visible in pg_locks. Perhaps you have got some queries referencing those tables that have stopped midway and are just sitting? Although pg_locks won't immediately show the wait, it could be useful to help identify the culprit --- look for other processes holding any type of lock on the table the vacuum is stuck on, and then go to pg_stat_activity to see how old their current query is. regards, tom lane
Alvaro Herrera a écrit :
#vacuum_cost_delay = 0 # 0-1000 milliseconds
#vacuum_cost_page_hit = 1 # 0-10000 credits
#vacuum_cost_page_miss = 10 # 0-10000 credits
#vacuum_cost_page_dirty = 20 # 0-10000 credits
#vacuum_cost_limit = 200 # 1-10000 credits
I've let the default values (not even uncommented in the conf file ;) ):Jean-Christophe Praud wrote:Hi all, I've a problem on a heavy loaded database: vacuums don't work since about a week. All I got is: mybase=# vacuum verbose analyze public.mytable; INFO: vacuuming "public.mytable" (I stop it after hours) Looking with top and iotop, I see the process takes some cpu and disk io time during several minutes, then it seems to fall asleep. The process isn't locked according to pg_stat_activity.What are your vacuum_cost_% parameters?
#vacuum_cost_delay = 0 # 0-1000 milliseconds
#vacuum_cost_page_hit = 1 # 0-10000 credits
#vacuum_cost_page_miss = 10 # 0-10000 credits
#vacuum_cost_page_dirty = 20 # 0-10000 credits
#vacuum_cost_limit = 200 # 1-10000 credits
-- JC Ph'nglui mglw'nafh Cthulhu n'gah Bill R'lyeh Wgah'nagl fhtagn!
Tom Lane a écrit :
How can I get rid of these blocking queries without restarting the server ? They are not listed as "waiting" in pg_stat_activity.
These queries are MOVE FORWARD on cursors, the underlying query is a rather complex one (unions, joins, functions calls)
Regards,
Indeed, the tables I tried to vacuum have locks on them. AccessShareLock belonging to queries which seem sleeping. I tried to kill these queries but pg_cancel_backend() has no effect, and the process doesn't get the 15 signal.Jean-Christophe Praud <jc@steek.com> writes:I've a problem on a heavy loaded database: vacuums don't work since about a week. All I got is:mybase=# vacuum verbose analyze public.mytable; INFO: vacuuming "public.mytable" (I stop it after hours)Looking with top and iotop, I see the process takes some cpu and disk io time during several minutes, then it seems to fall asleep. The process isn't locked according to pg_stat_activity.When vacuum wants to clean up a particular table page, it will wait until no other process is examining that page; and this wait is not visible in pg_locks. Perhaps you have got some queries referencing those tables that have stopped midway and are just sitting? Although pg_locks won't immediately show the wait, it could be useful to help identify the culprit --- look for other processes holding any type of lock on the table the vacuum is stuck on, and then go to pg_stat_activity to see how old their current query is. regards, tom lane
How can I get rid of these blocking queries without restarting the server ? They are not listed as "waiting" in pg_stat_activity.
These queries are MOVE FORWARD on cursors, the underlying query is a rather complex one (unions, joins, functions calls)
Regards,
-- JC Ph'nglui mglw'nafh Cthulhu n'gah Bill R'lyeh Wgah'nagl fhtagn!
Jean-Christophe Praud <jc@steek.com> writes: > Indeed, the tables I tried to vacuum have locks on them. > AccessShareLock belonging to queries which seem sleeping. I tried to > kill these queries but pg_cancel_backend() has no effect, and the > process doesn't get the 15 signal. > How can I get rid of these blocking queries without restarting the > server ? They are not listed as "waiting" in pg_stat_activity. Have you tried killing the connected client sessions? regards, tom lane
Tom Lane a écrit :
I had pgbouncer connections hanging for several days.
Thanks for your help :)
Regards,
It works !Jean-Christophe Praud <jc@steek.com> writes:Indeed, the tables I tried to vacuum have locks on them. AccessShareLock belonging to queries which seem sleeping. I tried to kill these queries but pg_cancel_backend() has no effect, and the process doesn't get the 15 signal.How can I get rid of these blocking queries without restarting the server ? They are not listed as "waiting" in pg_stat_activity.Have you tried killing the connected client sessions? regards, tom lane
I had pgbouncer connections hanging for several days.
Thanks for your help :)
Regards,
-- JC Ph'nglui mglw'nafh Cthulhu n'gah Bill R'lyeh Wgah'nagl fhtagn!