Varchar vs foreign key vs enumerator - table and index size

Поиск
Список
Период
Сортировка
От Łukasz Walkowski
Тема Varchar vs foreign key vs enumerator - table and index size
Дата
Msg-id 3AAC8AFE-E908-4AF7-ACDE-F9461B4A62D1@homplex.pl
обсуждение исходный текст
Ответы Re: Varchar vs foreign key vs enumerator - table and index size  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Varchar vs foreign key vs enumerator - table and index size  (Vitalii Tymchyshyn <tivv00@gmail.com>)
Re: Varchar vs foreign key vs enumerator - table and index size  (Jim Nasby <jim@nasby.net>)
Список pgsql-performance
Hi,
This is my first post on this group so welcome everyone! Currently I'm working on optimizing a quite simple database
usedto store events from one website. Every event is a set of data describing user behaviour. The main table that
storesall events is built using schema:  

     Column      |            Type             | Modifiers
-----------------+-----------------------------+-----------
 id              | bigint                      | not null
 browser         | character varying(255)      |
 created         | timestamp without time zone |
 eventsource     | character varying(255)      |
 eventtype       | character varying(255)      |
 ipaddress       | character varying(255)      |
 objectid        | bigint                      |
 sessionid       | character varying(255)      |
 shopids         | integer[]                   |
 source          | character varying(255)      |
 sourceid        | bigint                      |
 supplierid      | bigint                      |
 cookieuuid      | uuid                        |
 serializeddata  | bytea                       |
 devicetype      | character varying(255)      |
 operatingsystem | character varying(255)      |

 It was a quick project to play with EclipseLink, Hibernate and some Jersey Rest services, so isn't perfect. However
thedatabase became quite usefull and we decided to optimize this table as it grew quite large (128GB right now without
indexes,about 630M records). There is only primary key index on this table. Here is the list of changes that I'd like
tomake to the table (some of them should be done from the scratch):  

1. Changing ipaddress from varchar to inet - this should save some space and lower the size of potential index.

2. Changing id for some composite id with created contained in it.

3. And this part is most interesting for me. Columns browser, eventsource, eventtype, devicetype, operatingsystem
containa small pool of strings - for example for devicetype this is set to Computer, Mobile, Tablet or Unknown. Browser
isset to normalized browser name. In every case I can store those data using one of 3 different methods:  

- store as varchar as it is now - nice and easy, but index on those columns is quite big and I think storing many of
similarstrings is waste of space.  

- store only id's and join external tables as needed, for example for browsers I only need smallint key, as there is a
limitednumber of browsers. The column browser becomes smallint and we have additional table with two columns (id,
browservarchar). This should save some space on event table, but if I want name of the browser in some report I need to
jointables. Second thing - on every insert there is constraint that is checked for this field and this can affect
performance.I was thinking about the same strategy for the remaining fields - this would give me 5 additional tables
and5 additional constraints on event table. Browser table will have about ~100 records, eventtype and eventsource will
haveabout 8-12 records each, devicetype - 4 records, operatingsystem - didn't really check this one, but I think
somethingaround 100 like browser.  

- introduce enumerator type for each of the column and store those values as enumerator. This one should be the most
spaceefficient, but it will be problematic in case of changing column values like browser or operatingsystem as
alteringenumerator isn't that simple.  

For browser average text length is 19 characters, for eventsource and eventtype eventsource average text lenght is 24
characters.Database encoding is set to UTF8.  

My question is - what is estimated difference in table size between those 3 variants of storing columns? In theory
thirdone should give me the smallest database and index size but is the most problematic from all of the above.  

Lukasz Walkowski

В списке pgsql-performance по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Query plan change with multiple elements in IN clause
Следующее
От: Kevin Grittner
Дата:
Сообщение: Re: How clustering for scale out works in PostgreSQL