I am trying to refactor a table on disk so it consumes less space:
Original is:
create table post_timings(
topic_id int not null,
post_number int not null,
user_id int not null,
msecs int not null
)
Target is:
create table post_timings(
post_id int not null,
user_id int not null,
dsecs smallint not null
)
Before I have:
select * from pgstattuple('post_timings2');
table_len | tuple_count | tuple_len | tuple_percent |
dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space |
free_percent
------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+--------------
5146427392 | 116221695 | 4648867800 | 90.33 |
0 | 0 | 0 | 15082484 | 0.29
After I have:
table_len | tuple_count | tuple_len | tuple_percent |
dead_tuple_count | dead_tuple_len | dead_tuple_percent | free_space |
free_percent
------------+-------------+------------+---------------+------------------+----------------+--------------------+------------+--------------
5142036480 | 116122544 | 3948166496 | 76.78 |
0 | 0 | 0 | 15069224 | 0.29
What I find striking is that the table size on disk remains almost
unchanged despite tuples taking 6 less bytes per tuple.
All the "missing space" is in overhead that is missing from
pgstattuple, in particular tuple percent moves from 90 to 76.7
I was wondering:
1. Where is all my missing space, is this in page alignment stuff and
per-page overhead?
2. Is there any other schemes I can look at for storing this data to
have a more efficient yet easily queryable / updateable table.
Keep in mind these tables get huge and in many of our cases will span
10-20GB just to store this information.
Sam