Re: Reducing the overhead of NUMERIC data
От | Simon Riggs |
---|---|
Тема | Re: Reducing the overhead of NUMERIC data |
Дата | |
Msg-id | 1131025786.8300.1911.camel@localhost.localdomain обсуждение исходный текст |
Ответ на | Re: Reducing the overhead of NUMERIC data (Simon Riggs <simon@2ndquadrant.com>) |
Ответы |
Re: Reducing the overhead of NUMERIC data
(Martijn van Oosterhout <kleptog@svana.org>)
Re: Reducing the overhead of NUMERIC data (Alvaro Herrera <alvherre@commandprompt.com>) |
Список | pgsql-hackers |
On Thu, 2005-11-03 at 08:27 +0000, Simon Riggs wrote: > On Wed, 2005-11-02 at 19:12 -0500, Tom Lane wrote: > > If we were willing to invent the "varlena2" datum format then we could > > save four bytes per numeric, plus reduce numeric's alignment requirement > > from int to short which would probably save another byte per value on > > average. I'm not sure that that's worth doing if numeric and inet are > > the only beneficiaries, but it might be. > > That and variations can be the next discussion. They sound good. Kicking off the discussion on that... Varlena2 datum format sounds interesting. If we did that, I'd also like to apply that thought to VAR/CHAR(32000) and below. (The benefit of varlena2 is saving of 2 bytes + ~1 byte alignment, yes?, the other two bytes come from the other numeric savings discussed). Alternatively, what I'd been thinking about was altering the self- contained nature of PostgreSQL datatypes. In other databases, CHAR(12) and NUMERIC(12) are fixed length datatypes. In PostgreSQL, they are dynamically varying datatypes. What actually happens is that in many other systems the datatype is the same, but additional metadata is provided for that particular attribute. So CHAR(12) is a datatype of CHAR with a metadata item called length which is set to 12 for that attribute. On PostgreSQL, CHAR(12) is a bpchar datatype with all instantiations of that datatype having a 4 byte varlena header. In this example, all of those instantiations having the varlena header set to 12, so essentially wasting the 4 byte header. It seems like it would be an interesting move to allow the attribute metadata to be stored in the TupleDesc, so we can store it once, rather than once per row. If we did this we would need two datatypes where currently we need only one. We would still need variable-length char datatype VARCHAR and we would be inventing a new fixed-char datatype with metadata of length CHAR(n). This would give us two things: - reduce many attributes by 4 bytes in length - allow attribute access to increase considerably in speed for queries, sorts etc since more of the tuple offsets will be constant Anyway, I accept that many will say I clearly don't understand Object Relational. It seems like this could be done without actually breaking anything. The question is, how much work would it be? Best Regards, Simon Riggs
В списке pgsql-hackers по дате отправления: