Re: Smaller data types use same disk space

Поиск
Список
Период
Сортировка
От Gavin Flower
Тема Re: Smaller data types use same disk space
Дата
Msg-id 5010627F.7070909@archidevsys.co.nz
обсуждение исходный текст
Ответ на Re: Smaller data types use same disk space  ("McGehee, Robert" <Robert.McGehee@geodecapital.com>)
Список pgsql-general
On 26/07/12 04:09, McGehee, Robert wrote:
> Very interesting points. Thanks for the documentation link and the point about alignment.
>
> As a test of Tom's suggestion to group smallints together to avoid alignment problems, I changed the column order
from
> smallint, date, smallint, integer, real TO
> smallint, smallint, date, integer, real
>
> This resulted in a "Small" table that is 15% smaller than the original "Small" table (414MB vs 487MB). I wasn't aware
thattables could be optimized by switching column order like this. This could be a good note to make in either the
"DataTypes" portion of the PostgreSQL manual or the "Performance Optimization" portion of the PostgreSQL Wiki. 
>
> One might even imagine a future version of PostgreSQL using an efficient disk layout that may not match the table
layoutin order to avoid wasted space from padding. I suppose this already happens to some extent with the different
storagemodes (plain, extended, external). 
>
> Steve also correctly pointed out that my text string probably only takes up 3-4 bytes rather than 7 bytes. Meaning
the"Small" table uses only 7-8 bytes/row less than the "Big" table. For 10M rows, the expected savings should be about
70-80MB.This matches the 79MB of savings I see once I reorder the columns following Tom's suggestion. All is right in
theworld! 
>
> Thanks to all,
> Robert
>
>
> SC> More questions than answers:
> SC> What version of PostgreSQL?
>
> 9.1.1 (I should have mentioned that)
>
> SC> How are your determining the space used by a table?
>
> \d+
>
> SC> Why are you assuming 7 bytes for a 3-character value? (Character values
> SC> up to 126 characters long only have 1-character overhead.)
>
> You are correct. I read the wrong documentation (docs for 8.2 says overhead is 4 bytes, but docs for my 9.1 version
saysoverhead is only one byte for small strings). So I believe I should assume only 4 bytes of total usage here, maybe
lessif it's compressed. 
>
> SC> What is the fill-factor on the tables? (Should default to 100% but don't
> SC> know how you are configured.)
>
> I'm using 100%, but it shouldn't matter as I was giving sizes without the index anyway.
>
> SC> Do the tables have OIDs or not?
>
> No
>
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Tuesday, July 24, 2012 10:00 PM
> To: Adrian Klaver
> Cc: McGehee, Robert; pgsql-general@postgresql.org
> Subject: Re: [GENERAL] Smaller data types use same disk space
>
> Adrian Klaver <adrian.klaver@gmail.com> writes:
>> On 07/24/2012 03:21 PM, McGehee, Robert wrote:
>>> I've created two tables labeled "Big" and "Small" that both store the same 10 million rows of data using 493MB and
487MBof disk space respectively. The difference is that the "Big" table uses data types that take up more space
(integerrather than smallint, float rather than real, etc). The "Big" table should need about 27 bytes/row versus 16
bytes/rowfor the "Small" table, indicating to me that the "Big" table should be 70% bigger in actual disk size. In
reality,it's only 1% bigger or 6MB (after clustering, vacuuming and analyzing). Why is this? Shouldn't the "Small"
tablebe about 110MB smaller (11 bytes for 10 million rows)? I'm estimating table size with \d+ 
>> See here for the gory details:
>> http://www.postgresql.org/docs/9.1/interactive/storage-page-layout.html
>> See in particular:
>> Table 55-4. HeapTupleHeaderData Layout
>>   From the text:
>> ""All table rows are structured in the same way. There is a fixed-size
>> header (occupying 23 bytes on most machines.." which breaks you
>> assumption of the Big/Small row size comparison.
> Aside from the tuple header, there are alignment considerations that you
> have to allow for.  So for instance, if your columns are int, smallint,
> int, that will take the same amount of space as 3 ints, because the
> savings disappears into alignment of the third int.  You need two
> adjacent smallints to get any benefit.
>
>             regards, tom lane
>
Curious, does changing the order to:

date, real, integer, smallint, smallint

make any difference?


More specifically, putting the types into decreasing order of the size
of alignment requirements - i.e. putting types which require 8 byte
alignment before those requiring 4 byte alignment...


Cheers,
Gavin

В списке pgsql-general по дате отправления:

Предыдущее
От: Chris Bartlett
Дата:
Сообщение: Re: View definition and schema search path bug or expected behaviour?
Следующее
От: Johann Spies
Дата:
Сообщение: Re: Full text search ts_heading strange result