Re: Smaller data types use same disk space

Поиск
Список
Период
Сортировка
От McGehee, Robert
Тема Re: Smaller data types use same disk space
Дата
Msg-id 17B09E7789D3104E8F5EEB0582A8D66FDF9CEC3816@MSGRTPCCRF2WIN.DMN1.FMR.COM
обсуждение исходный текст
Ответ на Re: Smaller data types use same disk space  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Smaller data types use same disk space  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Smaller data types use same disk space  (Gavin Flower <GavinFlower@archidevsys.co.nz>)
Список pgsql-general
Very interesting points. Thanks for the documentation link and the point about alignment.

As a test of Tom's suggestion to group smallints together to avoid alignment problems, I changed the column order from
smallint, date, smallint, integer, real TO
smallint, smallint, date, integer, real

This resulted in a "Small" table that is 15% smaller than the original "Small" table (414MB vs 487MB). I wasn't aware
thattables could be optimized by switching column order like this. This could be a good note to make in either the
"DataTypes" portion of the PostgreSQL manual or the "Performance Optimization" portion of the PostgreSQL Wiki. 

One might even imagine a future version of PostgreSQL using an efficient disk layout that may not match the table
layoutin order to avoid wasted space from padding. I suppose this already happens to some extent with the different
storagemodes (plain, extended, external). 

Steve also correctly pointed out that my text string probably only takes up 3-4 bytes rather than 7 bytes. Meaning the
"Small"table uses only 7-8 bytes/row less than the "Big" table. For 10M rows, the expected savings should be about
70-80MB.This matches the 79MB of savings I see once I reorder the columns following Tom's suggestion. All is right in
theworld! 

Thanks to all,
Robert


SC> More questions than answers:
SC> What version of PostgreSQL?

9.1.1 (I should have mentioned that)

SC> How are your determining the space used by a table?

\d+

SC> Why are you assuming 7 bytes for a 3-character value? (Character values
SC> up to 126 characters long only have 1-character overhead.)

You are correct. I read the wrong documentation (docs for 8.2 says overhead is 4 bytes, but docs for my 9.1 version
saysoverhead is only one byte for small strings). So I believe I should assume only 4 bytes of total usage here, maybe
lessif it's compressed. 

SC> What is the fill-factor on the tables? (Should default to 100% but don't
SC> know how you are configured.)

I'm using 100%, but it shouldn't matter as I was giving sizes without the index anyway.

SC> Do the tables have OIDs or not?

No

-----Original Message-----
From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
Sent: Tuesday, July 24, 2012 10:00 PM
To: Adrian Klaver
Cc: McGehee, Robert; pgsql-general@postgresql.org
Subject: Re: [GENERAL] Smaller data types use same disk space

Adrian Klaver <adrian.klaver@gmail.com> writes:
> On 07/24/2012 03:21 PM, McGehee, Robert wrote:
>> I've created two tables labeled "Big" and "Small" that both store the same 10 million rows of data using 493MB and
487MBof disk space respectively. The difference is that the "Big" table uses data types that take up more space
(integerrather than smallint, float rather than real, etc). The "Big" table should need about 27 bytes/row versus 16
bytes/rowfor the "Small" table, indicating to me that the "Big" table should be 70% bigger in actual disk size. In
reality,it's only 1% bigger or 6MB (after clustering, vacuuming and analyzing). Why is this? Shouldn't the "Small"
tablebe about 110MB smaller (11 bytes for 10 million rows)? I'm estimating table size with \d+ 

> See here for the gory details:
> http://www.postgresql.org/docs/9.1/interactive/storage-page-layout.html

> See in particular:
> Table 55-4. HeapTupleHeaderData Layout
>  From the text:
> ""All table rows are structured in the same way. There is a fixed-size
> header (occupying 23 bytes on most machines.." which breaks you
> assumption of the Big/Small row size comparison.

Aside from the tuple header, there are alignment considerations that you
have to allow for.  So for instance, if your columns are int, smallint,
int, that will take the same amount of space as 3 ints, because the
savings disappears into alignment of the third int.  You need two
adjacent smallints to get any benefit.

            regards, tom lane

В списке pgsql-general по дате отправления:

Предыдущее
От: Henry Drexler
Дата:
Сообщение: Re: BI tools and postgresql
Следующее
От: Merlin Moncure
Дата:
Сообщение: Re: General guidance: Levenshtein distance versus other similarity algorithms