Обсуждение: Efficient use of space in large table?

Поиск
Список
Период
Сортировка

Efficient use of space in large table?

От
Josh Jore
Дата:
I was just wondering - I've got two large tables and I was wondering
if there is anyway to shrink them somewhat. I imagined compression for
non-indexed columns or something. Is varchar or char more efficient than
text? I'm sort of just grasping at straws for something to get the data
smaller.

Josh

A size summary
        relname         | size  |   type
------------------------+-------+----------
 sospeople              | 599MB | table
 sospeople_fn           | 71MB  | index
 sospeople_ln           | 73MB  | index
 sospeople_zip          | 73MB  | index
 votes                  | 937MB | table


                Table "sospeople"
    Attribute     |  Type   |      Modifier
------------------+---------+---------------------
 sosid            | text    | not null default ''
 countyname       | text    | not null
 firstname        | text    |
 middlename       | text    |
 lastname         | text    |
 suffix           | text    |
 homephone        | text    |
 registrationdate | text    |
 birthyear        | integer |
 status           | text    |
 precinctcode     | text    |
 housenumber      | text    |
 streetname       | text    |
 unittype         | text    |
 unit             | text    |
 address2         | text    |
 city             | text    |
 state            | text    |
 zipcode          | text    |
Indices: sospeople_fn,
         sospeople_ln,
         sospeople_zip


             Table "votes"
 Attribute  |     Type      | Modifier
------------+---------------+----------
 sosid      | character(10) |
 electionid | integer       |
 votetype   | character(1)  |



Joshua b. Jore ; http://www.greentechnologist.org




Re: Efficient use of space in large table?

От
Manfred Koizar
Дата:
On Thu, 4 Jul 2002 21:43:10 -0500 (CDT), Josh Jore
<josh@greentechnologist.org> wrote:
>I was just wondering - I've got two large tables and I was wondering
>if there is anyway to shrink them somewhat. I imagined compression for
>non-indexed columns or something. Is varchar or char more efficient than
>text?
>
Josh,

first of all, text is ok.  You might want to store NULL instead of ''
to squeeze out a few bytes here and there.

Now I have even more questions instead of answers :-)

PG version? OS?
Do you have lots of UPDATEs/DELETEs?
Do you ANALYZE regularly?
Please show us the outputs of
    VACUUM VERBOSE sospeople;
    VACUUM VERBOSE votes;

>
>A size summary
>        relname         | size  |   type    | reltuples
>------------------------+-------+---------- +----------
> sospeople              | 599MB | table     |    2M
> sospeople_fn           | 71MB  | index     |    2M
> sospeople_ln           | 73MB  | index     |    2M
> sospeople_zip          | 73MB  | index     |    2M
> votes                  | 937MB | table     |   15M

Could you show us the tuple counts in your relations, just like the
wild guess I have inserted?

>
>
>                Table "sospeople"
>    Attribute     |  Type   |      Modifier
>------------------+---------+---------------------
> sosid            | text    | not null default ''
> countyname       | text    | not null
> firstname        | text    |
> middlename       | text    |
> lastname         | text    |
> suffix           | text    |
> homephone        | text    |
> registrationdate | text    |
> birthyear        | integer |
> status           | text    |
> precinctcode     | text    |
> housenumber      | text    |
> streetname       | text    |
> unittype         | text    |
> unit             | text    |
> address2         | text    |
> city             | text    |
> state            | text    |
> zipcode          | text    |

What are the average sizes of these text columns?  If there are long
repeated values (e.g. countyname), it might help do pull these out
into a separate table

    CREATE TABLE county (id serial, name text);

and replace
> countyname       | text    | not null
by
    county_id NOT NULL REFERENCES county

>Indices: sospeople_fn,
>         sospeople_ln,
>         sospeople_zip
>
Also post the results of
    \d sospeople_fn
    \d sospeople_ln
    \d sospeople_zip

>             Table "votes"
> Attribute  |     Type      | Modifier
>------------+---------------+----------
> sosid      | character(10) |
> electionid | integer       |
> votetype   | character(1)  |

No index here?
Using sosno INT instead of sosid CHAR(10) could save you (vaguely
estimated) up to 10% of space for this table.  OTOH you would have to
insert sosno into sospeople, so this would only be a win, if votes has
far more rows than sospeople.

>I'm sort of just grasping at straws for something to get the data
>smaller.

Is your problem really related to space, or to speed?

Servus
 Manfred



Re: Efficient use of space in large table?

От
Curt Sampson
Дата:
On Fri, 5 Jul 2002, Manfred Koizar wrote:

> first of all, text is ok.  You might want to store NULL instead of ''
> to squeeze out a few bytes here and there.

Correct me if I'm wrong, but doesn't NULL take up *more* room than ''?
After all, as soon as you have a NULL, you have to have a null bit array
in the tuple.

cjs
--
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org
    Don't you know, in this new Dark Age, we're all light.  --XTC




Re: Efficient use of space in large table?

От
Alvaro Herrera
Дата:
Curt Sampson dijo:

> On Fri, 5 Jul 2002, Manfred Koizar wrote:
>
> > first of all, text is ok.  You might want to store NULL instead of ''
> > to squeeze out a few bytes here and there.
>
> Correct me if I'm wrong, but doesn't NULL take up *more* room than ''?
> After all, as soon as you have a NULL, you have to have a null bit array
> in the tuple.

I think that if you have 32 attributes or less in the table, the bitmap
will use the same space as '' (because '' has the 4-byte length).  And
if you have at least one other NULL attribute, the rest come for free.
So no, it won't take more space, AFAIU anyway.

--
Alvaro Herrera (<alvherre[a]atentus.com>)
Officer Krupke, what are we to do?
Gee, officer Krupke, Krup you! (West Side Story, "Gee, Officer Krupke")




Re: Efficient use of space in large table?

От
Josh Jore
Дата:
On Fri, 5 Jul 2002, Manfred Koizar wrote:

> On Thu, 4 Jul 2002 21:43:10 -0500 (CDT), Josh Jore
> <josh@greentechnologist.org> wrote:
> >I was just wondering - I've got two large tables and I was wondering
> >if there is anyway to shrink them somewhat. I imagined compression for
> >non-indexed columns or something. Is varchar or char more efficient than
> >text?
> >
> Josh,
>
> first of all, text is ok.  You might want to store NULL instead of ''
> to squeeze out a few bytes here and there.

I just thought I'd follow up - it turns out that most of my space was
going to tuple headers (some 40ish bytes header, 16 bytes data). I just
took the data out of PostgreSQL and stuck it into partitioned ASCII files
and BerkeleyDB for indexes. That happens to work excellently and doesn't
require as fancy a machine as PostgreSQL did.

So the answer is to sometimesquestion your choice of tool ;-)