Обсуждение: surprised to find bloat in insert-only table
I recently set up partitioning on a table that sees heavy insert traffic. There are never updates or deletes, we just drop the partitions later. It's my understanding that bloat can only appear through updates or deletes, but these partitions are reported to have significant bloat in them. Where else can this come from and how I can I reduce it? I'm using a "bloat" view with 9.0.3 which might the same as this one: https://wiki.postgresql.org/wiki/Show_database_bloat This shows that each index on the table has over a gig of bloat: select distinct(iname), wastedbytes, wastedsize from bloat where wastedbytes > 0 order by wastedbytes DESC limit 20; And this query also shows the same amount of bloat in the table itself: select distinct(tablename), wastedbytes, wastedsize from bloat where wastedbytes > 0 order by wastedbytes DESC limit 10; Mark
Mark Stosberg <mark@summersault.com> writes: > I recently set up partitioning on a table that sees heavy insert > traffic. There are never updates or deletes, we just drop the partitions > later. > It's my understanding that bloat can only appear through updates or > deletes, but these partitions are reported to have significant bloat in > them. Where else can this come from and how I can I reduce it? > I'm using a "bloat" view with 9.0.3 which might the same as this one: > https://wiki.postgresql.org/wiki/Show_database_bloat > This shows that each index on the table has over a gig of bloat: > select distinct(iname), wastedbytes, wastedsize from bloat where > wastedbytes > 0 order by wastedbytes DESC limit 20; > And this query also shows the same amount of bloat in the table itself: > select distinct(tablename), wastedbytes, wastedsize from bloat where > wastedbytes > 0 order by wastedbytes DESC limit 10; It's hard to evaluate that without knowing what the actual table/index sizes are, or IOW what is the reported bloat on a percentage basis? The view you mention isn't tremendously accurate --- AFAICS it isn't accounting for alignment padding between fields, page headers, and some other things. And it will consider the unused space on a page to be "bloat" even if it's too small to fit another tuple. So expecting the number to be zero is hopelessly optimistic. Also, indexes generally don't even try to pack pages completely full, so a larger percentage of unused space is to be expected for them. regards, tom lane
Thanks for reply, Tom. > It's hard to evaluate that without knowing what the actual table/index > sizes are, or IOW what is the reported bloat on a percentage basis? The table size is reported as: 4036 MB according to: select pg_size_pretty(pg_table_size('table_name')); Compared to 1669 MB reported as table bloat in the 'bloat' view. So, the bloat is about 40% of the total size. For an index, it's 410 MB of bloat, vs 1669 MB for an index size. One thing that looks suspicious is that the exact same number of bytes is being report for the table as well as each index. > The view you mention isn't tremendously accurate --- AFAICS it isn't > accounting for alignment padding between fields, page headers, and some > other things. And it will consider the unused space on a page to be > "bloat" even if it's too small to fit another tuple. So expecting the > number to be zero is hopelessly optimistic. Also, indexes generally > don't even try to pack pages completely full, so a larger percentage of > unused space is to be expected for them. Thanks for the clarifications! In summary, it sounds like there is some bloat that I need to live with, but at least with partitioning the amount of bloat will stay relatively constant. Mark
Mark Stosberg <mark@summersault.com> writes: > Compared to 1669 MB reported as table bloat in the 'bloat' view. So, > the bloat is about 40% of the total size. > For an index, it's 410 MB of bloat, vs 1669 MB for an index size. Hm ... 40% unused space wouldn't be surprising at all for an index. The traditional rule of thumb for a b-tree index is that the steady state fill factor is about 2/3rds. You can do better for an index that is loaded in increasing order (eg, an index on a serial or timestamp column typically has a higher fill factor) but indexes on columns that are more random are not going to see that. > One thing that looks suspicious is that the exact same number of bytes > is being report for the table as well as each index. Um. In that case I'd say your view is flat out wrong... regards, tom lane