Обсуждение: Bloat and Slow Vacuum Time on Toast

Поиск
Список
Период
Сортировка

Bloat and Slow Vacuum Time on Toast

От
Ken Caruso
Дата:
Hello,

I am trying to trouble shoot a couple of issue related to vacuum and db size. I currently have a postgres DB that is around 390GB on disk (du -hs in $PG_DATA/base). If I look at the total relation size using the following query:

SELECT SUM(pg_relation_size(pg_class.oid))
FROM pg_class ;

This says the total size is around 191GB. That query should sum all of the tables, toast and indexes AFAIK.

The following query:

select pg_size_pretty(pg_database_size('dbname'));  returns:

 pg_size_pretty 
----------------
 390 GB

There is one table using TOAST that gets alot of updates. It generates around 40-50 million dead tuples a day. 

Here is the output of pgstattuple for toast portion of the table:

  table_len   | tuple_count |  tuple_len  | tuple_percent | dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space  | free_percent
--------------+-------------+-------------+---------------+------------------+----------------+--------------------+-------------+--------------
 150103572480 |    22166132 | 30900832474 |         20.59 |         35732563 |    51134576972 |              34.07 | 67022123976 |        44.65


So the first question how do I find out where all the disk space going and what next step do I need to perform to trouble shoot it. I am assuming that the 200GB is dead space from vacuum not keeping up however the above queries don't seem to confirm that.

Or at least I would expect pgstattuple to report a number closer to 200GB. So I must be missing something here. I have looked at all the other tables and they are all unremarkable size wise.

The second question is related to autovacuum. On both the table and the toast and I have explicitly set the autovacuum threshold to 0.02. So it should kick off around %2. This seems to be working great on the non TOAST part of this table, but on the TOAST vacuum takes anywhere from 12hrs to 24 hrs depending on how busy the database is. Is there anything I can do to fix this outside of increasing the IO if the underlying storage?

In addition I have the following settings in postgresql.conf:
max_connections              = 500
max_prepared_transactions    = 2000
shared_buffers               = 4GB
effective_cache_size         = 8GB
work_mem                     = 8MB
maintenance_work_mem         = 1600MB
wal_buffers                  = 16MB
checkpoint_segments          = 64
checkpoint_timeout           = 600s
autovacuum_max_workers       = 6
log_autovacuum_min_duration  = 600

Hardware is an HPDL380 G5, dual quad core Xeon, 32GB of RAM, P800 RAID controller. Pg_xlog is a RAID1 on local storage. PG_DATA is on an external MSA70 with a 18 disk (146GB 10k) RAID10. There are other volumes on the MSA but they are not actively used for anything.   

Thanks

-Ken



Re: Bloat and Slow Vacuum Time on Toast

От
"Kevin Grittner"
Дата:
Ken Caruso <ken@ipl31.net> wrote:


> SELECT SUM(pg_relation_size(pg_class.oid))
> FROM pg_class ;

Perhaps you want pg_total_relation_size()?

-Kevin

Re: Bloat and Slow Vacuum Time on Toast

От
Ken Caruso
Дата:


On Tue, Jul 19, 2011 at 1:50 PM, Kevin Grittner <Kevin.Grittner@wicourts.gov> wrote:
Ken Caruso <ken@ipl31.net> wrote:


> SELECT SUM(pg_relation_size(pg_class.oid))
> FROM pg_class ;

Perhaps you want pg_total_relation_size()?

-Kevin

The number returned when using pg_total_relation size makes more sense: 346GB. However it looked like pg_class has toast and indexes listed individually, so I assumed that using pg_total_relation_size would count values twice, because the total relation size of table would include toast and indexes, and then they would get counted again when I ran pg_total_relation_size on the toast and indexes themselves. Is that an incorrect assumption on my part?

Thanks

-Ken 

Re: Bloat and Slow Vacuum Time on Toast

От
Tom Lane
Дата:
Ken Caruso <ken@ipl31.net> writes:
> I am trying to trouble shoot a couple of issue related to vacuum and db
> size. I currently have a postgres DB that is around 390GB on disk (du -hs in
> $PG_DATA/base). If I look at the total relation size using the following
> query:

> SELECT SUM(pg_relation_size(pg_class.oid))
> FROM pg_class ;

> This says the total size is around 191GB. That query should sum all of the
> tables, toast and indexes AFAIK.

What PG version?

If it's a version with multiple table forks, you're only counting the
main forks, not FSM or VM.

Theoretically the latter two should be pretty small relative to the main
fork, but obviously you are missing something with this query ...

            regards, tom lane

Re: Bloat and Slow Vacuum Time on Toast

От
"Kevin Grittner"
Дата:
Ken Caruso <ken@ipl31.net> wrote:
> Kevin Grittner <Kevin.Grittner@wicourts.gov  wrote:
>
>> Ken Caruso <ken@ipl31.net> wrote:
>>
>>
>> > SELECT SUM(pg_relation_size(pg_class.oid))
>> > FROM pg_class ;
>>
>> Perhaps you want pg_total_relation_size()?

> The number returned when using pg_total_relation size makes more
> sense: 346GB. However it looked like pg_class has toast and
> indexes listed individually, so I assumed that using
> pg_total_relation_size would count values twice, because the total
> relation size of table would include toast and indexes, and then
> they would get counted again when I ran pg_total_relation_size on
> the toast and indexes themselves. Is that an incorrect assumption
> on my part?

No that's a good point.  On one of our production databases,
externally:

ccsa@SOCRATES:~> du -sh /var/pgsql/data/cc/
2.2T    /var/pgsql/data/cc/

internally:

cir=# select
pg_size_pretty(sum(pg_relation_size(pg_class.oid))::bigint) from
pg_class;
 pg_size_pretty
----------------
 2176 GB
(1 row)

cir=# select
pg_size_pretty(sum(pg_total_relation_size(pg_class.oid))::bigint)
from pg_class;
 pg_size_pretty
----------------
 4223 GB
(1 row)

cir=# select
pg_size_pretty(sum(pg_total_relation_size(pg_class.oid))::bigint)
from pg_class where relkind = 'r';
 pg_size_pretty
----------------
 2176 GB
(1 row)

You must have a lot of something outside of relations showing as
active in pg_class taking up space.  WAL files?  Server logs?  Temp
files left over from crashes?

-Kevin

Re: Bloat and Slow Vacuum Time on Toast

От
Ken Caruso
Дата:
On Tue, Jul 19, 2011 at 2:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Ken Caruso <ken@ipl31.net> writes:
> I am trying to trouble shoot a couple of issue related to vacuum and db
> size. I currently have a postgres DB that is around 390GB on disk (du -hs in
> $PG_DATA/base). If I look at the total relation size using the following
> query:

> SELECT SUM(pg_relation_size(pg_class.oid))
> FROM pg_class ;

> This says the total size is around 191GB. That query should sum all of the
> tables, toast and indexes AFAIK.

What PG version?

9.0.4

-Ken
 

If it's a version with multiple table forks, you're only counting the
main forks, not FSM or VM.

Theoretically the latter two should be pretty small relative to the main
fork, but obviously you are missing something with this query ...

                       regards, tom lane

Re: Bloat and Slow Vacuum Time on Toast

От
Tom Lane
Дата:
Ken Caruso <ken@ipl31.net> writes:
> On Tue, Jul 19, 2011 at 2:26 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Ken Caruso <ken@ipl31.net> writes:
>>> If I look at the total relation size using the following query:
>>> SELECT SUM(pg_relation_size(pg_class.oid)) FROM pg_class ;
>>> This says the total size is around 191GB.

>> What PG version?

> 9.0.4

OK, so you need to worry about alternate forks.  Your query is the same
as

select sum(pg_relation_size(oid, 'main')) from pg_class;

I bet you'll find that the difference is accounted for by one of

select sum(pg_relation_size(oid, 'fsm')) from pg_class;
select sum(pg_relation_size(oid, 'vm')) from pg_class;

Drill down and see which table is responsible ...

            regards, tom lane