Reducing the overhead of NUMERIC data

Поиск

Список

Период

Сортировка

От	Simon Riggs
Тема	Reducing the overhead of NUMERIC data
Дата	1 ноября 2005 г. 17:24:33
Msg-id	1130880137.8300.1648.camel@localhost.localdomain обсуждение исходный текст
Ответы	Re: Reducing the overhead of NUMERIC data Re: Reducing the overhead of NUMERIC data
Список	pgsql-hackers

Дерево обсуждения

Currently, the overhead of NUMERIC datatype is 8 bytes. Each value is
stored on disk as

typedef struct NumericData
{int32 varlen; /* Variable size (std varlena header) */int16 n_weight; /* Weight of 1st digit
*/uint16 n_sign_dscale; /* Sign + display scale */char n_data[1]; /* Digits (really array of NumericDigit)

} NumericData;

Let's see if we can reduce that:

varlen is int32 to match the standard varlena header. However, the max
number of digits of the datatype is less than the threshold at which
values get toasted. So no NUMERIC values ever get toasted - in which
case, why worry about matching the size of varlena - lets reduce it to 2
bytes which still gives us up to 1000 digits as we have now.

n_weight seems to exist because we do not store trailing zeroes. So
1000000 is stored as 1 with a weight of 6. My experience is that large
numbers of trailing zeroes do not occur with any frequency in real
measurement or financial data and that this is an over-optimization.
This is probably a hang over from the original algorithm, rather than a
conscious design goal for PostgreSQL?

n_sign_dscale shows us where the decimal point is. We could actually
store a marker representing the decimal point, which would cost us 0.5
byte rather than 2 bytes. Since we have 4 bits to represent a decimal
number, that leaves a few bits spare to represent either a decimal-
point-and-positive-sign and decimal-point-and-negative-sign. (We would
still need to store trailing zeroes even after the decimal point).

So, assuming I have this all correct, means we could reduce the on disk
storage for NUMERIC datatypes to the following struct. This gives an
overhead of just 2.5 bytes, plus the loss of the optimization of
trailing zeroes, which I assess as having almost no value anyway in
99.9999% of data values (literally...).

typedef struct NumericData
{int16 varlen; /* Variable size (std varlena header) */char n_data[1]; /* Digits (really array of
NumericDigit)
} NumericData;
so that sign/decimal point stored as 0.5 byte in the record.

A saving of 5.5 bytes per NUMERIC column per row could make an
interesting difference to row length in large tables.

The above would require reasonable amount of change to the existing
datatype code, so I'm not suggesting its a simple change to an .h file.
But the changes as proposed would seem to be able to be made to the
existing NUMERIC type, rather than invent another similar one.

Is my thinking straight on this?

Best Regards, Simon Riggs

В списке pgsql-hackers по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Reducing the overhead of NUMERIC data