Reducing the overhead of NUMERIC data

Поиск
Список
Период
Сортировка
От Simon Riggs
Тема Reducing the overhead of NUMERIC data
Дата
Msg-id 1130880137.8300.1648.camel@localhost.localdomain
обсуждение исходный текст
Ответы Re: Reducing the overhead of NUMERIC data  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Reducing the overhead of NUMERIC data  (Martijn van Oosterhout <kleptog@svana.org>)
Список pgsql-hackers
Currently, the overhead of NUMERIC datatype is 8 bytes. Each value is
stored on disk as

typedef struct NumericData
{int32    varlen;        /* Variable size (std varlena header) */int16    n_weight;    /* Weight of 1st digit
*/uint16   n_sign_dscale;    /* Sign + display scale */char    n_data[1];    /* Digits (really array of NumericDigit)
 
} NumericData;

Let's see if we can reduce that:

varlen is int32 to match the standard varlena header. However, the max
number of digits of the datatype is less than the threshold at which
values get toasted. So no NUMERIC values ever get toasted - in which
case, why worry about matching the size of varlena - lets reduce it to 2
bytes which still gives us up to 1000 digits as we have now.

n_weight seems to exist because we do not store trailing zeroes. So
1000000 is stored as 1 with a weight of 6. My experience is that large
numbers of trailing zeroes do not occur with any frequency in real
measurement or financial data and that this is an over-optimization.
This is probably a hang over from the original algorithm, rather than a
conscious design goal for PostgreSQL?

n_sign_dscale shows us where the decimal point is. We could actually
store a marker representing the decimal point, which would cost us 0.5
byte rather than 2 bytes. Since we have 4 bits to represent a decimal
number, that leaves a few bits spare to represent either a decimal-
point-and-positive-sign and decimal-point-and-negative-sign. (We would
still need to store trailing zeroes even after the decimal point).

So, assuming I have this all correct, means we could reduce the on disk
storage for NUMERIC datatypes to the following struct. This gives an
overhead of just 2.5 bytes, plus the loss of the optimization of
trailing zeroes, which I assess as having almost no value anyway in
99.9999% of data values (literally...).

typedef struct NumericData
{int16    varlen;        /* Variable size (std varlena header) */char    n_data[1];    /* Digits (really array of
NumericDigit)
} NumericData;
so that sign/decimal point stored as 0.5 byte in the record.

A saving of 5.5 bytes per NUMERIC column per row could make an
interesting difference to row length in large tables.

The above would require reasonable amount of change to the existing
datatype code, so I'm not suggesting its a simple change to an .h file.
But the changes as proposed would seem to be able to be made to the
existing NUMERIC type, rather than invent another similar one.

Is my thinking straight on this?

Best Regards, Simon Riggs



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alvaro Herrera
Дата:
Сообщение: Re: slru.c race condition
Следующее
От: "Dave Page"
Дата:
Сообщение: Attn: Richard Huxton