Обсуждение: Pie-in-sky dreaming about reworking tuple layout entirely

Поиск
Список
Период
Сортировка

Pie-in-sky dreaming about reworking tuple layout entirely

От
Gregory Stark
Дата:
I can't shake the feeling that merely tweaking the way our varlenas work with
a shortvarlena or with compressed varlena headers is missing the real source
of our headaches. It seems very strange to me to be trying to step through a
tuple with length bits at the head of every field. It's a lot of work spent
dealing with a terribly inconvenient format when we can pick the format to be
whatever we like.

If we were designing this from scratch I don't think would have picked this
approach at all. We would instead store offsets to every field whose offset
isn't static in the header so we can jump straight to a field without having
to look at all the previous fields at all.

I feel this would be far to invasive a change for me to contemplate currently
though. I don't know enough about what consequences it would have. I just
wanted to get the idea out there in case sometime down the line, perhaps even
not for several releasees, I might be prepared or someone else might be
prepared to tackle such a project. It might make a lot of things cleaner and
easier such as dealing with user defined fixed length fields whose length
depends on the typmod.

What I picture would be basically be an array of offsets into the tuple. A
field would be the data from its offset to the next. We would have to store an
offset to the end of the tuple as well, and probably another field with the
length of the array so we can find the end quickly without counting bits in
the null bitmap.

I also picture a kind of compression that store offsets modulo 256 as 1-byte
offsets with a separate list of where the 256 byte boundaries are. That means
we would probably still need a heap_deform tuple that steps through the list
of offsets, null bitmap, and 256-boundary list, and builds a in-memory list of
pointers to the fields with nulls for the null values.

Actually for the in-memory storage instead of a simple array of pointers I
picture an array of structs where each struct contains a pointer to the field,
a 4-byte length value, and the typmod (for a total of 12 bytes on a 32-bit
platform and 16 bytes on a 64-bit platform). Then instead of passing a Datum
directly to functions we would pass one of these structs (possibly by
reference?).

That means no function would ever have to store the length of a field internal
to the field itself, and it could always depend on having the typmod
available.

In real terms this actually doesn't reduce space usage, in fact it probably
increases it. It's effectively storing a 1-byte header for every field even if
it's something like an integer.

But you could imagine some further refinements. If you store all the
non-varlena fields first then you can just start the list of offsets at the
first varlena. heap_deform tuple could step through the tupldescriptor and
null bitmap first setting up the offsets to any field with an attlen without
the help of any offsets.

(Incidentally, to do that there's a tricky bit. You have to be able to put new
fixed length fields before varlena fields even if they're added later. But
that means you need to know they're missing on the old records so you can't
insert them out of order in the null bitmap. So the null bitmap would have to
be in the order they were added, and even if that's not the order they appear
physically or the order they appear logically. Even more confusing than the
original plan that got shot down as too confusing already.)

The main inefficiency in this is it would mean passing 3x as much data (only
2x as much data on 64-bit platforms) to functions. But given the way functions
are called with a pointer to 100-element arrays it doesn't seem like that
should be a big concern.

The upside is that data types could use different size storage for different
typmods without having to define whole new types based on internal
implementation details the user doesn't know about. NUMERIC could internally
use integer for some typmods and just perform integer arithmetic. It wouldn't
have to reserve any bits to indicate the scale or precision of the number
since the typmod is available whenever it's processing an element from the
table and it's passed around with the field by the system. There's no security
concern because the system is responsible for passing the typmod, not the
caller. Only C code would be able to construct arbitrary typmods; SQL or
Pl/PgSQL would have to use casts to safely set the typmod on their output.

On advantage of passing around this meta information automatically is that it
obviates the need for many core data types whose main advantage is that they
use more efficient storage than is possible with domains or user defined
types. The recently discussed UUID type, for example, offers nothing that
couldn't be implemented with a user-defined type using bytea storage
internally -- except for the fact that it doesn't store an extra 4 bytes of
length header in every value.

Another advantage is that we wouldn't need a whole slew of functions in the
catalogs that do the same thing slightly differently depending on the input
type. Currently each one has to be declared separately since there's no way to
know what type of datum they've been passed.

I suspect something like this will be necessary when it comes to implementing
per-column encoding anyways. Unless we adopt some sort of universal encoding
as the "preferred" database encoding and always convert to and from it, we'll
need to pass around the encoding of any text datum along with the datum too.

Well I'm happier having gotten these ideas down. Even if, as I said, I have no
plans to implement these ideas in the near future. It would mean totally
rewriting some pretty hair internal parts that I'm only starting to have a
basic understanding of.

I am planning to experiment with the shortvarlena form for shortvarchar et al
but I hope that's something we only use for a few versions. I still hope we
move to something like the above someday instead.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


Re: Pie-in-sky dreaming about reworking tuple layout entirely

От
"Merlin Moncure"
Дата:
On 10/3/06, Gregory Stark <stark@enterprisedb.com> wrote:
>
> I can't shake the feeling that merely tweaking the way our varlenas work with
> a shortvarlena or with compressed varlena headers is missing the real source
> of our headaches. It seems very strange to me to be trying to step through a
> tuple with length bits at the head of every field. It's a lot of work spent
> dealing with a terribly inconvenient format when we can pick the format to be
> whatever we like.

one advantage of the current system is that columns with nulls do not
require any storage.  so you can alter table add column for free on a
really big table.  ISTM that your approch would require moving all the
static fields in if you added a static field regardless, right?

merlin


Re: Pie-in-sky dreaming about reworking tuple layout entirely

От
Jim Nasby
Дата:
On Oct 3, 2006, at 4:06 PM, Merlin Moncure wrote:
> On 10/3/06, Gregory Stark <stark@enterprisedb.com> wrote:
>> I can't shake the feeling that merely tweaking the way our  
>> varlenas work with
>> a shortvarlena or with compressed varlena headers is missing the  
>> real source
>> of our headaches. It seems very strange to me to be trying to step  
>> through a
>> tuple with length bits at the head of every field. It's a lot of  
>> work spent
>> dealing with a terribly inconvenient format when we can pick the  
>> format to be
>> whatever we like.
>
> one advantage of the current system is that columns with nulls do not
> require any storage.  so you can alter table add column for free on a
> really big table.  ISTM that your approch would require moving all the
> static fields in if you added a static field regardless, right?

I'm thinking that for Greg's ideas to be workable, we'll need to  
divorce the on-disk  format from what was specified in CREATE TABLE,  
specifically so we can do things like put all the fixed-width stuff  
in front of the variable-width stuff (of course we could also further  
optimize placement beyond that).

IIRC, the show-stopper for doing that is how to deal with DDL  
changes. While we could try and get cute about that, there is a brute- 
force method that would work without a doubt: store some kind of  
catalog version number in each tuple (or maybe just in each page,  
since you could theoretically convert an entire page to a different  
format without too big a penalty while you've already got it in memory.

There are some caveats to this... there will be some limit on how  
many DDL changes you can make until you run out of version numbers.  
Worst-case, we could provide a command that would run through the  
entire table, ensuring that everything is up to the current version.  
Of course, we'd want some way to throttle that, but I don't think  
that'd be terribly difficult. One nice thing is that you shouldn't  
need to mess with any visibility info when you run this, so it should  
be able to just do everything in-place.

BTW, it seems like what we're really looking at between this and  
discussion of visibility changes, etc. is essentially re-designing  
the entire storage layout (for better or for worse).
--
Jim Nasby                                    jimn@enterprisedb.com
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)