Обсуждение: Efficient use of space in large table?
I was just wondering - I've got two large tables and I was wondering if there is anyway to shrink them somewhat. I imagined compression for non-indexed columns or something. Is varchar or char more efficient than text? I'm sort of just grasping at straws for something to get the data smaller. Josh A size summary relname | size | type ------------------------+-------+---------- sospeople | 599MB | table sospeople_fn | 71MB | index sospeople_ln | 73MB | index sospeople_zip | 73MB | index votes | 937MB | table Table "sospeople" Attribute | Type | Modifier ------------------+---------+--------------------- sosid | text | not null default '' countyname | text | not null firstname | text | middlename | text | lastname | text | suffix | text | homephone | text | registrationdate | text | birthyear | integer | status | text | precinctcode | text | housenumber | text | streetname | text | unittype | text | unit | text | address2 | text | city | text | state | text | zipcode | text | Indices: sospeople_fn, sospeople_ln, sospeople_zip Table "votes" Attribute | Type | Modifier ------------+---------------+---------- sosid | character(10) | electionid | integer | votetype | character(1) | Joshua b. Jore ; http://www.greentechnologist.org
On Thu, 4 Jul 2002 21:43:10 -0500 (CDT), Josh Jore <josh@greentechnologist.org> wrote: >I was just wondering - I've got two large tables and I was wondering >if there is anyway to shrink them somewhat. I imagined compression for >non-indexed columns or something. Is varchar or char more efficient than >text? > Josh, first of all, text is ok. You might want to store NULL instead of '' to squeeze out a few bytes here and there. Now I have even more questions instead of answers :-) PG version? OS? Do you have lots of UPDATEs/DELETEs? Do you ANALYZE regularly? Please show us the outputs of VACUUM VERBOSE sospeople; VACUUM VERBOSE votes; > >A size summary > relname | size | type | reltuples >------------------------+-------+---------- +---------- > sospeople | 599MB | table | 2M > sospeople_fn | 71MB | index | 2M > sospeople_ln | 73MB | index | 2M > sospeople_zip | 73MB | index | 2M > votes | 937MB | table | 15M Could you show us the tuple counts in your relations, just like the wild guess I have inserted? > > > Table "sospeople" > Attribute | Type | Modifier >------------------+---------+--------------------- > sosid | text | not null default '' > countyname | text | not null > firstname | text | > middlename | text | > lastname | text | > suffix | text | > homephone | text | > registrationdate | text | > birthyear | integer | > status | text | > precinctcode | text | > housenumber | text | > streetname | text | > unittype | text | > unit | text | > address2 | text | > city | text | > state | text | > zipcode | text | What are the average sizes of these text columns? If there are long repeated values (e.g. countyname), it might help do pull these out into a separate table CREATE TABLE county (id serial, name text); and replace > countyname | text | not null by county_id NOT NULL REFERENCES county >Indices: sospeople_fn, > sospeople_ln, > sospeople_zip > Also post the results of \d sospeople_fn \d sospeople_ln \d sospeople_zip > Table "votes" > Attribute | Type | Modifier >------------+---------------+---------- > sosid | character(10) | > electionid | integer | > votetype | character(1) | No index here? Using sosno INT instead of sosid CHAR(10) could save you (vaguely estimated) up to 10% of space for this table. OTOH you would have to insert sosno into sospeople, so this would only be a win, if votes has far more rows than sospeople. >I'm sort of just grasping at straws for something to get the data >smaller. Is your problem really related to space, or to speed? Servus Manfred
On Fri, 5 Jul 2002, Manfred Koizar wrote: > first of all, text is ok. You might want to store NULL instead of '' > to squeeze out a few bytes here and there. Correct me if I'm wrong, but doesn't NULL take up *more* room than ''? After all, as soon as you have a NULL, you have to have a null bit array in the tuple. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're all light. --XTC
Curt Sampson dijo: > On Fri, 5 Jul 2002, Manfred Koizar wrote: > > > first of all, text is ok. You might want to store NULL instead of '' > > to squeeze out a few bytes here and there. > > Correct me if I'm wrong, but doesn't NULL take up *more* room than ''? > After all, as soon as you have a NULL, you have to have a null bit array > in the tuple. I think that if you have 32 attributes or less in the table, the bitmap will use the same space as '' (because '' has the 4-byte length). And if you have at least one other NULL attribute, the rest come for free. So no, it won't take more space, AFAIU anyway. -- Alvaro Herrera (<alvherre[a]atentus.com>) Officer Krupke, what are we to do? Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")
On Fri, 5 Jul 2002, Manfred Koizar wrote: > On Thu, 4 Jul 2002 21:43:10 -0500 (CDT), Josh Jore > <josh@greentechnologist.org> wrote: > >I was just wondering - I've got two large tables and I was wondering > >if there is anyway to shrink them somewhat. I imagined compression for > >non-indexed columns or something. Is varchar or char more efficient than > >text? > > > Josh, > > first of all, text is ok. You might want to store NULL instead of '' > to squeeze out a few bytes here and there. I just thought I'd follow up - it turns out that most of my space was going to tuple headers (some 40ish bytes header, 16 bytes data). I just took the data out of PostgreSQL and stuck it into partitioned ASCII files and BerkeleyDB for indexes. That happens to work excellently and doesn't require as fancy a machine as PostgreSQL did. So the answer is to sometimesquestion your choice of tool ;-)