Обсуждение: Autovacuum doesn't work if the table has large number of records

Поиск
Список
Период
Сортировка

Autovacuum doesn't work if the table has large number of records

От
Ascot Moss
Дата:
(please ignore if duplicated) 


Hi,

I am new to PostgreSQL.

I have a PostgreSQL table named "test" with 300,000,000 records inserted, it has only two fields (id, and int i).  For testing purposes, after inserts, all records have been updated once in order to make n_dead_tup = 300,000,000 

Current PostgreSQL settings:
autovacuum: On (default)
track_counts: ON (default)
autovacuum_vacuum_threshold: 50 (default)
autovacuum_vacuum_scale_factor: 0.2 (default)
the autovacuum checker should be run every minute (default)
autovacuum_freeze_max_age: 200000000 (default)
os: ubnutu 12.04
PostgreSQL: 9.2.4

Current stat of "test" table:
pg_class.reltuples: 3.8415e+08 
pg_class.relpages: 1703069
last_autovacuum: null (or blank)
last_autoanalyze: 2013-04-13 20:27:12.396048+08
pg_stat_user_tables.n_dead_tup: 300000000

The autovacuum threshold should be about : 76,830,130 (50 + 3.8415e+08 x 0.2)

I expected the autovacuum should be run automatically to clear the dead tuples, however, after over 3 hours, by checking pg_stat_user_tables,  the last_autovacuum is still null and n_dead_tup still equals to 300000000, 

Can anyone advise me why the autovacuum is not running or if the autovacuum is running but it is not yet completed?

FYI: for the same machine (i.e. same o/s and same postgresql and same postgresql conf), I have done another test case with reltuples around 60,000,000 records the autovacuum worked well. 


Regards

Re: Autovacuum doesn't work if the table has large number of records

От
Jeff Janes
Дата:
On Sat, Apr 13, 2013 at 9:55 AM, Ascot Moss <ascot.moss@gmail.com> wrote:

Current stat of "test" table:
pg_class.reltuples: 3.8415e+08 
pg_class.relpages: 1703069
last_autovacuum: null (or blank)
last_autoanalyze: 2013-04-13 20:27:12.396048+08
pg_stat_user_tables.n_dead_tup: 300000000

The autovacuum threshold should be about : 76,830,130 (50 + 3.8415e+08 x 0.2)

I expected the autovacuum should be run automatically to clear the dead tuples, however, after over 3 hours, by checking pg_stat_user_tables,  the last_autovacuum is still null and n_dead_tup still equals to 300000000, 

Every page is going to be both read and dirtied, so with default vacuum_cost_* settings you are going to get have 1703069 * (10+20) / 200 = 255,460.35 delays of 0.020 seconds, for  5,109.207 second of sleeping.  Plus it actually has to do the work, including fsync the WAL log about once every 32 buffers.  So it is going to take a while.

 

Can anyone advise me why the autovacuum is not running or if the autovacuum is running but it is not yet completed?

You can check if it is ongoing:

select * from pg_stat_activity where query like 'autovacuum%' \x\g\x

Cheers,

Jeff

Re: Autovacuum doesn't work if the table has large number of records

От
Ascot Moss
Дата:

Hi Jeff,

Thanks for your reply.


Yes, it took a while to complete the autovacuum.

I checked it again this morning:

last_autovacuum: 2013-04-14 06:17:02.9464+08

last_autoanalyze: 2013-04-13 20:27:12.396048+08

n_dead_tup: nill (or blank)

n_live_tup: 334453396

relpages: 2654868

reltuples: 3.34453e+08

file size of pgstat.stat: /opt/PostgreSQL/9.2/data/pg_stat_tmp/pgstat.stat : 11611



I also checked the pg_log files, the autovacuum somehow was delayed by "pgstat wait timeout" , during 1:10am to 05:41am this morning (over 4.5 hours). 

I think this would impact the DB performance.


2013-04-14 01:10:15.314 HKT,,,16427,,51699177.402b,1,,2013-04-14 01:10:15 HKT,,0,LOG,00000,"autovacuum launcher started",,,,,,,,,""

2013-04-14 04:08:32.741 HKT,,,16427,,51699177.402b,2,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:09:23.522 HKT,,,16427,,51699177.402b,3,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:09:54.825 HKT,,,16427,,51699177.402b,4,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:10:09.384 HKT,,,16427,,51699177.402b,5,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:45:05.227 HKT,,,16427,,51699177.402b,6,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:45:15.332 HKT,,,16427,,51699177.402b,7,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:45:15.342 HKT,,,23728,,5169c3d1.5cb0,1,,2013-04-14 04:45:05 HKT,3/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:45:25.443 HKT,,,16427,,51699177.402b,8,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:46:23.065 HKT,,,16427,,51699177.402b,9,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:46:33.168 HKT,,,16427,,51699177.402b,10,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:46:33.174 HKT,,,23752,,5169c41f.5cc8,1,,2013-04-14 04:46:23 HKT,3/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:46:43.286 HKT,,,16427,,51699177.402b,11,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:46:55.408 HKT,,,16427,,51699177.402b,12,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:47:05.508 HKT,,,16427,,51699177.402b,13,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:47:05.519 HKT,,,23753,,5169c43f.5cc9,1,,2013-04-14 04:46:55 HKT,3/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:47:15.620 HKT,,,16427,,51699177.402b,14,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:47:27.742 HKT,,,16427,,51699177.402b,15,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:47:37.843 HKT,,,16427,,51699177.402b,16,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:47:37.854 HKT,,,23758,,5169c45f.5cce,1,,2013-04-14 04:47:27 HKT,3/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:47:47.953 HKT,,,16427,,51699177.402b,17,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:48:00.075 HKT,,,16427,,51699177.402b,18,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:48:10.179 HKT,,,16427,,51699177.402b,19,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:48:10.189 HKT,,,23759,,5169c480.5ccf,1,,2013-04-14 04:48:00 HKT,3/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:48:20.291 HKT,,,16427,,51699177.402b,20,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 04:48:32.415 HKT,,,16427,,51699177.402b,21,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 05:09:16.361 HKT,,,16427,,51699177.402b,22,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 05:15:10.859 HKT,,,24645,,5169cad4.6045,1,,2013-04-14 05:15:00 HKT,3/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 05:15:12.853 HKT,,,16427,,51699177.402b,23,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 05:41:07.482 HKT,,,16427,,51699177.402b,24,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 05:41:17.571 HKT,,,16427,,51699177.402b,25,,2013-04-14 01:10:15 HKT,1/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 05:41:17.583 HKT,,,25513,,5169d0f3.63a9,1,,2013-04-14 05:41:07 HKT,3/0,0,WARNING,01000,"pgstat wait timeout",,,,,,,,,""

2013-04-14 06:17:02.946 HKT,,,16436,,51699179.4034,1,,2013-04-14 01:10:17 HKT,2/4,0,LOG,00000,"automatic vacuum of table ""postgres.public.test"": index scans: 2

pages: 0 removed, 2654868 remain

tuples: 0 removed, 334453396 remain

buffer usage: 2501814 hits, 8108302 misses, 3585758 dirtied

avg read rate: 3.442 MiB/s, avg write rate: 1.522 MiB/s

system usage: CPU 78.37s/234.07u sec elapsed 18405.38 sec",,,,,,,,,""


I am new to PostgreSQL, can anyone advise me how to handle "pgstat wait timeout"?


Regards



On Sun, Apr 14, 2013 at 3:15 AM, Jeff Janes <jeff.janes@gmail.com> wrote:
On Sat, Apr 13, 2013 at 9:55 AM, Ascot Moss <ascot.moss@gmail.com> wrote:

Current stat of "test" table:
pg_class.reltuples: 3.8415e+08 
pg_class.relpages: 1703069
last_autovacuum: null (or blank)
last_autoanalyze: 2013-04-13 20:27:12.396048+08
pg_stat_user_tables.n_dead_tup: 300000000

The autovacuum threshold should be about : 76,830,130 (50 + 3.8415e+08 x 0.2)

I expected the autovacuum should be run automatically to clear the dead tuples, however, after over 3 hours, by checking pg_stat_user_tables,  the last_autovacuum is still null and n_dead_tup still equals to 300000000, 

Every page is going to be both read and dirtied, so with default vacuum_cost_* settings you are going to get have 1703069 * (10+20) / 200 = 255,460.35 delays of 0.020 seconds, for  5,109.207 second of sleeping.  Plus it actually has to do the work, including fsync the WAL log about once every 32 buffers.  So it is going to take a while.

 

Can anyone advise me why the autovacuum is not running or if the autovacuum is running but it is not yet completed?

You can check if it is ongoing:

select * from pg_stat_activity where query like 'autovacuum%' \x\g\x

Cheers,

Jeff

Autovacuum doesn't work if the table has large number of records

От
Jeff Janes
Дата:
On Saturday, April 13, 2013, Ascot Moss wrote:

Hi Jeff,

Thanks for your reply.


Yes, it took a while to complete the autovacuum.

I checked it again this morning:

last_autovacuum: 2013-04-14 06:17:02.9464+08

last_autoanalyze: 2013-04-13 20:27:12.396048+08

...


I also checked the pg_log files, the autovacuum somehow was delayed by "pgstat wait timeout" , during 1:10am to 05:41am this morning (over 4.5 hours). 

I think this would impact the DB performance.


Those warnings are not directly related to your vacuum performance, although they do indicate your system is under some IO stress.  Whether that stress is slowing down the vacuum, or caused by the vacuum, or irrelevant to it, is hard to say.  Was the system idle other than the vacuum?  What indexes do you have on the table?  What is your IO subsystem like?

...

I am new to PostgreSQL, can anyone advise me how to handle "pgstat wait timeout"?


If they occur during maintenance/refactoring/bulk loading, just ignore them.  If they occur a lot during normal operations, you probably need better hardware.  Maybe separating the pg_xlog from the rest of the IO if you haven't already.  (You could also move the stats file to a ramdisk, but that is probably treating the symptom more than the problem)
 
Cheers,

Jeff