Обсуждение: PG_PAGE_LAYOUT_VERSION 5 - time for change

Поиск
Список
Период
Сортировка

PG_PAGE_LAYOUT_VERSION 5 - time for change

От
Zdenek Kotala
Дата:
It seems that we are going to bump Page Layout Version to version 5 (see CRC 
patch for detail). Maybe it is good time to do some other changes. There is a 
list of ideas (please, do not beat me :-). Some of them we discussed in Prato 
and Greg maybe have more.

1) HeapTupleHeader modification

typedef struct HeapTupleFields
{TransactionId t_xmin;           /* inserting xact ID */        TransactionId t_xmax;           /* deleting or locking
xactID */
 
        union        {                CommandId       t_cid;                TransactionId   t_xvac;   /* VACUUM FULL
xactID */        }                       t_field3; uint16          t_infomask;
 
} HeapTupleFields;

typedef struct HeapTupleHeaderData
{        union        {                HeapTupleFields t_heap;                DatumTupleFields t_datum;        }
              t_choice;
 
        ItemPointerData t_ctid;         /* current TID of this or newer tuple */
        /* Fields below here must match MinimalTupleData! */
        uint16          t_infomask2;        uint8           t_hoff;
        /* ^ - 23 bytes - ^ */
        bits8           t_bits[1];
} HeapTupleHeaderData;

This also requires shuffle flags between infomask and infomask2. infomask2 
should have only flag: HASNULL,HASOID,HASVARWIDTH and HASEXTERNAL And minimal 
tuple does not need infomask field which will contains only transaction hint 
bits. Unfortunately, structure alligment is not much friendly.

2) Add page type (e.g. btree) and subtype (e.g. metapage) flag into page header. 
I think It will be useful when we will use share buffer for clog.

3) TOAST modification  a) TOAST table per attribute  b) replace chunk id with offset+variable chunk size  c) add column
identificationinto first chunk
 

Thats all. I think infomask/infomask2 shuffle flag should be done. TOAST 
modification complicates in-place upgrade.
    Comments other ideas?
        Zdenek


-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: PG_PAGE_LAYOUT_VERSION 5 - time for change

От
Gregory Stark
Дата:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:

> 3) TOAST modification
>   a) TOAST table per attribute
>   b) replace chunk id with offset+variable chunk size
>   c) add column identification into first chunk
>
> Thats all. I think infomask/infomask2 shuffle flag should be done. TOAST
> modification complicates in-place upgrade.

I don't think TOAST table per attribute is feasible You would end up with
thousands of toast tables. It might be interesting as an option if you plan to
drop the column but I don't see it as terribly interesting.

What seemed to make sense to me for solving your problem was including the
type oid in the toast chunks. I suppose attribute number might be just as good
-- it would let you save upgrading chunks for dropped columns at the expense
of having to look up the column info first.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication
support!


Re: PG_PAGE_LAYOUT_VERSION 5 - time for change

От
Tom Lane
Дата:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> 1) HeapTupleHeader modification

> typedef struct HeapTupleFields
> {
>     TransactionId t_xmin;           /* inserting xact ID */
>          TransactionId t_xmax;           /* deleting or locking xact ID */

>          union
>          {
>                  CommandId       t_cid;
>                  TransactionId   t_xvac;   /* VACUUM FULL xact ID */
>          }                       t_field3;
>      uint16          t_infomask;
> } HeapTupleFields;

This is unworkable (hint: the compiler will decide sizeof the struct
must be a multiple of 4).  I am also frightened to death of the proposal
to swap various bits around between infomask and infomask2 --- that is
*guaranteed* to break code silently.  And you didn't explain exactly
what it buys, anyway.  Not space savings in the Datum form; alignment
issues will prevent that.

> 2) Add page type (e.g. btree) and subtype (e.g. metapage) flag into page header. 
> I think It will be useful when we will use share buffer for clog.

I think this is a pretty bad idea, because it'll eat space on every page
for data that is only useful to indexes.  I don't believe that clog will
find it interesting either.  To share buffers with clog will require
a change in buffer lookup tags, not buffer contents.

> 3) TOAST modification
>    a) TOAST table per attribute
>    b) replace chunk id with offset+variable chunk size
>    c) add column identification into first chunk

I don't like 3a any more than Greg does.  3b sounds good until you
reflect that a genuinely variable chunk size would preclude random
access to sub-ranges of a toast value.  A column ID might be worth
adding for robustness purposes, though reducing the toast chunk payload
size to make that possible will cause you fits for in-place upgrade.
        regards, tom lane


Re: PG_PAGE_LAYOUT_VERSION 5 - time for change

От
Gregory Stark
Дата:
Tom Lane <tgl@sss.pgh.pa.us> writes:

>> 2) Add page type (e.g. btree) and subtype (e.g. metapage) flag into page header. 
>> I think It will be useful when we will use share buffer for clog.
>
> I think this is a pretty bad idea, because it'll eat space on every page
> for data that is only useful to indexes.  I don't believe that clog will
> find it interesting either.  To share buffers with clog will require
> a change in buffer lookup tags, not buffer contents.

Another example application which came to mind, if we ever wanted to do
something like retail vacuum, pruning, or hint bit setting from bgwriter it
would have to know how to tell heap pages apart from index pages. I'm not sure
whether that would have to be on the page or if it could be in the buffertag
as well?

If we do decide we want to do this it wouldn't have to be very much space. 16
page types with 16 subtypes each would be plenty which would fit on a single
byte.

>> 3) TOAST modification
>>    a) TOAST table per attribute
>>    b) replace chunk id with offset+variable chunk size
>>    c) add column identification into first chunk
>
> I don't like 3a any more than Greg does.  3b sounds good until you
> reflect that a genuinely variable chunk size would preclude random
> access to sub-ranges of a toast value.  

Hm, Heikki had me convinced it wouldn't but now that I try to explain it I
can't get it to work. I think the idea is you start a scan at the desired
offset and scan until you reach a chunk which overruns the end of the desired
piece. However you really need to start scanning at the last chunk *prior* to
the desired offset.

I think you can actually do this with btrees but I don't know if our apis
support it. If you scan to find the first chunk > the desired offset and then
scan backwards one tuple you should be looking at the chunk in which the
desired offset lies.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com Ask me about EnterpriseDB's Slony Replication
support!


Re: PG_PAGE_LAYOUT_VERSION 5 - time for change

От
Tom Lane
Дата:
Gregory Stark <stark@enterprisedb.com> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> ... 3b sounds good until you
>> reflect that a genuinely variable chunk size would preclude random
>> access to sub-ranges of a toast value.  

> Hm, Heikki had me convinced it wouldn't but now that I try to explain it I
> can't get it to work. I think the idea is you start a scan at the desired
> offset and scan until you reach a chunk which overruns the end of the desired
> piece. However you really need to start scanning at the last chunk *prior* to
> the desired offset.

Yeah, that was my conclusion too.

> I think you can actually do this with btrees but I don't know if our apis
> support it. If you scan to find the first chunk > the desired offset and then
> scan backwards one tuple you should be looking at the chunk in which the
> desired offset lies.

Well, that might work but it would typically cost you an extra fetch.
Do we really have a use case for variable chunk size that is worth the
cost?
        regards, tom lane


Re: PG_PAGE_LAYOUT_VERSION 5 - time for change

От
Heikki Linnakangas
Дата:
Tom Lane wrote:
> Gregory Stark <stark@enterprisedb.com> writes:
>> Tom Lane <tgl@sss.pgh.pa.us> writes:
>>> ... 3b sounds good until you
>>> reflect that a genuinely variable chunk size would preclude random
>>> access to sub-ranges of a toast value.  
> 
>> Hm, Heikki had me convinced it wouldn't but now that I try to explain it I
>> can't get it to work. I think the idea is you start a scan at the desired
>> offset and scan until you reach a chunk which overruns the end of the desired
>> piece. However you really need to start scanning at the last chunk *prior* to
>> the desired offset.
> 
> Yeah, that was my conclusion too.

Hmm, you're right. I think it can be made to work by storing the *end* 
offset of each chunk. To find the chunk containing offset X, search for 
the first chunk with end_offset > X.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: PG_PAGE_LAYOUT_VERSION 5 - time for change

От
Zdenek Kotala
Дата:
Gregory Stark napsal(a):
> Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> 
>> 3) TOAST modification
>>   a) TOAST table per attribute
>>   b) replace chunk id with offset+variable chunk size
>>   c) add column identification into first chunk
>>
>> Thats all. I think infomask/infomask2 shuffle flag should be done. TOAST
>> modification complicates in-place upgrade.
> 
> I don't think TOAST table per attribute is feasible You would end up with
> thousands of toast tables. It might be interesting as an option if you plan to
> drop the column but I don't see it as terribly interesting.

Yeah, I could not remember what was a problem with this.

> What seemed to make sense to me for solving your problem was including the
> type oid in the toast chunks. I suppose attribute number might be just as good
> -- it would let you save upgrading chunks for dropped columns at the expense
> of having to look up the column info first.

It does not solve my problem now. Because I need it solve for old version of 
PostgreSQL as well. But it should help in the future and also vacuum can easy 
clean chunks related to dropped columns.
    Zdenek

-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: PG_PAGE_LAYOUT_VERSION 5 - time for change

От
Zdenek Kotala
Дата:
Tom Lane napsal(a):
> Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
>> 1) HeapTupleHeader modification
> 
>> typedef struct HeapTupleFields
>> {
>>     TransactionId t_xmin;           /* inserting xact ID */
>>          TransactionId t_xmax;           /* deleting or locking xact ID */
> 
>>          union
>>          {
>>                  CommandId       t_cid;
>>                  TransactionId   t_xvac;   /* VACUUM FULL xact ID */
>>          }                       t_field3;
>>      uint16          t_infomask;
>> } HeapTupleFields;
> 
> This is unworkable (hint: the compiler will decide sizeof the struct
> must be a multiple of 4).  I am also frightened to death of the proposal
> to swap various bits around between infomask and infomask2 --- that is
> *guaranteed* to break code silently. 

Uh? If flags shuffle breaks code that is not good for in-place upgrade anyway. 
Do you mean something specific? I already transform all access to FLAGS into 
functions.

> And you didn't explain exactly what it buys, anyway.  Not space savings > in the Datum form; alignment issues will
preventthat.
 

OK. The idea is to consolidate structures. Idea is to have basic structure for data:

typedef struct DataHeaderData        uint16          t_infomask2;        uint8           t_hoff;        bits8
t_bits[1];
} DataHeaderData

which is correspond with minimal tuple an it is also useful for index tuple.
If I understand correctly then other  (transaction) information is not useful in 
executor (exclude when they are explicitly mentioned in select)
I'm not sure but I think we can store composite types without typid and typmod 
and it save some bytes. After that we can have structure e.g. 
VisibilityTupleHeader, DatumTupleHeader, IndexTupleHeader. And data on disk will 
be stored:

VisibilityTupleHeaderData|DataHeaderData|Data....
IndexTupleHeader|DataHeaderData|Data....

It has problem with aligment but visibility or index data could be place into 
line item pointer (IIRC somebody suggested it for vacuum improvement). And 
HeapTupleData structure should be extended:

t_data - pointer on DataHeaderData
t_type - type of data header
t_header - pointer to Visibility/Datum/Index header

The main idea behind is to have stable,general and minimalistic DataHeader 
structure.

It is just idea without deep examination. It seems to me as a good idea how to 
save a memory footprint as well, but maybe I'm wrong.
    Zdenek



Re: PG_PAGE_LAYOUT_VERSION 5 - time for change

От
Tom Lane
Дата:
Zdenek Kotala <Zdenek.Kotala@Sun.COM> writes:
> I'm not sure but I think we can store composite types without typid and typmod 

No, we can't.  At least, tuple header structure is not the reason why not.
        regards, tom lane


Re: PG_PAGE_LAYOUT_VERSION 5 - time for change

От
Tom Lane
Дата:
Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
> Hmm, you're right. I think it can be made to work by storing the *end* 
> offset of each chunk. To find the chunk containing offset X, search for 
> the first chunk with end_offset > X.

Yeah, that seems like it would work, and it would disentangle us
altogether from needing a hard-wired chunk size.  The only downside is
that it'd be a pain to convert in-place.  However, if we are also going
to add identifying information to the toast chunks (like the owning
column's number or datatype), then you could tell whether a toast chunk
had been converted by checking t_natts.  So in principle a toast table
could be converted a page at a time.  If the converted data didn't fit
you could push one of the chunks out to some new page of the file.

On the whole I like this a lot better than Zdenek's original proposal
http://archives.postgresql.org/pgsql-hackers/2008-10/msg00556.php
which didn't seem to me to solve much of anything.
        regards, tom lane


Re: PG_PAGE_LAYOUT_VERSION 5 - time for change

От
Zdenek Kotala
Дата:
Tom Lane napsal(a):
> Heikki Linnakangas <heikki.linnakangas@enterprisedb.com> writes:
>> Hmm, you're right. I think it can be made to work by storing the *end* 
>> offset of each chunk. To find the chunk containing offset X, search for 
>> the first chunk with end_offset > X.
> 
> Yeah, that seems like it would work, and it would disentangle us
> altogether from needing a hard-wired chunk size.  The only downside is
> that it'd be a pain to convert in-place.  However, if we are also going
> to add identifying information to the toast chunks (like the owning
> column's number or datatype), then you could tell whether a toast chunk
> had been converted by checking t_natts.  So in principle a toast table
> could be converted a page at a time.  If the converted data didn't fit
> you could push one of the chunks out to some new page of the file.

Yeah it was, main intention. Problem is toast index, but It is common problem 
not only for toast tables.

> On the whole I like this a lot better than Zdenek's original proposal
> http://archives.postgresql.org/pgsql-hackers/2008-10/msg00556.php
> which didn't seem to me to solve much of anything.

Agree. This approach is much better. It add more complexity now for converting 
chunk from old to the new version. But it add a benefit - for example vacuum can 
remove data from dropped columns and so on.
    Zdenek

-- 
Zdenek Kotala              Sun Microsystems
Prague, Czech Republic     http://sun.com/postgresql



Re: PG_PAGE_LAYOUT_VERSION 5 - time for change

От
Alvaro Herrera
Дата:
Heikki Linnakangas wrote:

> Hmm, you're right. I think it can be made to work by storing the *end*  
> offset of each chunk. To find the chunk containing offset X, search for  
> the first chunk with end_offset > X.

FWIW I'm trying to do this.  So far I've managed to make the basic thing
work, and I'm about to have a look at the slice interface.

(Quick note so that nobody wastes their time doing the same thing)

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: PG_PAGE_LAYOUT_VERSION 5 - time for change

От
Zdenek Kotala
Дата:
Alvaro Herrera napsal(a):
> Heikki Linnakangas wrote:
> 
>> Hmm, you're right. I think it can be made to work by storing the *end*  
>> offset of each chunk. To find the chunk containing offset X, search for  
>> the first chunk with end_offset > X.
> 
> FWIW I'm trying to do this.  So far I've managed to make the basic thing
> work, and I'm about to have a look at the slice interface.
> 
> (Quick note so that nobody wastes their time doing the same thing)

Thanks I'm now busy with space reservation development and it really helps to 
have everything ready in time.
Thanks Zdenek



toast by chunk-end (was Re: PG_PAGE_LAYOUT_VERSION 5 - time for change)

От
Alvaro Herrera
Дата:
Zdenek Kotala wrote:
> Alvaro Herrera napsal(a):
>> Heikki Linnakangas wrote:
>>
>>> Hmm, you're right. I think it can be made to work by storing the
>>> *end*  offset of each chunk. To find the chunk containing offset X,
>>> search for  the first chunk with end_offset > X.
>>
>> FWIW I'm trying to do this.  So far I've managed to make the basic thing
>> work, and I'm about to have a look at the slice interface.

Okay, so this seems to work.  It's missing writing the sanity checks on
the returned data, and a look at the SGML docs to see if anything needs
updating.  I'm also going to recheck code comments that may need
updates.

--
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Вложения

Re: toast by chunk-end (was Re: PG_PAGE_LAYOUT_VERSION 5 - time for change)

От
Zdenek Kotala
Дата:
Alvaro Herrera napsal(a):
> Zdenek Kotala wrote:
>> Alvaro Herrera napsal(a):
>>> Heikki Linnakangas wrote:
>>>
>>>> Hmm, you're right. I think it can be made to work by storing the 
>>>> *end*  offset of each chunk. To find the chunk containing offset X, 
>>>> search for  the first chunk with end_offset > X.
>>> FWIW I'm trying to do this.  So far I've managed to make the basic thing
>>> work, and I'm about to have a look at the slice interface.
> 
> Okay, so this seems to work.  It's missing writing the sanity checks on
> the returned data, and a look at the SGML docs to see if anything needs
> updating.  I'm also going to recheck code comments that may need
> updates.
> 
> 

Hi Alvaro,

Just a very quick look on your patch. See my comments:

1) TOAST_MAX_CHUNK_SIZE should be removed from controldata structure.

2) PG_PAGE_LAYOUT_VERSION should be bump

3) the other main idea of toast redesign has been to add colnum information to 
each chunk.

If I'm thinking more about it, it solves one problem but add another - index 
update when page layout is converted during a read. And there are another issues  which we need to solve - I will send
newmail.
 
    Zdenek


Re: toast by chunk-end (was Re: PG_PAGE_LAYOUT_VERSION 5 - time for change)

От
Heikki Linnakangas
Дата:
Zdenek Kotala wrote:
> Just a very quick look on your patch. See my comments:
> 
> ...
> 
> 2) PG_PAGE_LAYOUT_VERSION should be bump

The patch doesn't change the page layout AFAICS.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: toast by chunk-end (was Re: PG_PAGE_LAYOUT_VERSION 5 - time for change)

От
Heikki Linnakangas
Дата:
Zdenek Kotala wrote:
> Heikki Linnakangas napsal(a):
>> Zdenek Kotala wrote:
>>> Just a very quick look on your patch. See my comments:
>>>
>>> ...
>>>
>>> 2) PG_PAGE_LAYOUT_VERSION should be bump
>>
>> The patch doesn't change the page layout AFAICS.
>>
> 
> It is good question what is and what is not page layout. I think that 
> toast implementation is a member of page layout. OK it is called page 
> layout but better name should be On Disk Format (ODF). You will not able 
> to read 8.3 toasted table in 8.4.

It's clearly just a catalog change; the number and meaning of attributes 
has changed, and that's reflected in CATALOG_VERSION_NO.

We need to be pragmatic, though, and think about how the conversion 
would work, and if the version number change would help or hurt that 
process. I'm not clear how we would handle the toast table change. If 
we're going to handle it by retoasting all attributes when the main heap 
page is read in, then I suppose we'd actually change the version number 
of the *heap* page, not toast table pages, when the heap page is 
retoasted. However, if you want to do it toast-page at a time, or 
toast-tuple at a time, you can just look at the number of attributes on 
the toast tuple to determine which format it's in.

Note that bumping the version number is not free. We haven't made any 
changes in 8.4 this far that would require bumping it. If we do bump it, 
the next version with online-upgrade support will need to deal with it, 
if only to increment and write back the page.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


Re: toast by chunk-end (was Re: PG_PAGE_LAYOUT_VERSION 5 - time for change)

От
Zdenek Kotala
Дата:
Heikki Linnakangas napsal(a):
> Zdenek Kotala wrote:
>> Heikki Linnakangas napsal(a):
>>> Zdenek Kotala wrote:
>>>> Just a very quick look on your patch. See my comments:
>>>>
>>>> ...
>>>>
>>>> 2) PG_PAGE_LAYOUT_VERSION should be bump
>>>
>>> The patch doesn't change the page layout AFAICS.
>>>
>>
>> It is good question what is and what is not page layout. I think that 
>> toast implementation is a member of page layout. OK it is called page 
>> layout but better name should be On Disk Format (ODF). You will not 
>> able to read 8.3 toasted table in 8.4.
> 
> It's clearly just a catalog change; the number and meaning of attributes 
> has changed, and that's reflected in CATALOG_VERSION_NO.

By by opinion it is not only catalog change. Probably you are right that it is 
not part of page layout version. However, It changed column meaning on data 
tables. You need to convert whole toast table and reindex toast table indext. It 
is something what you cannot do online or you can but you need to exclusive lock 
on toast table.

> We need to be pragmatic, though, and think about how the conversion 
> would work, and if the version number change would help or hurt that 
> process. I'm not clear how we would handle the toast table change. If 
> we're going to handle it by retoasting all attributes when the main heap 
> page is read in, then I suppose we'd actually change the version number 
> of the *heap* page, not toast table pages, when the heap page is 
> retoasted. However, if you want to do it toast-page at a time, or 
> toast-tuple at a time, you can just look at the number of attributes on 
> the toast tuple to determine which format it's in.

I'm trying to write down a toast conversion concept. It looks like that
it is more complex that I expected.

> Note that bumping the version number is not free. We haven't made any 
> changes in 8.4 this far that would require bumping it. If we do bump it, 
> the next version with online-upgrade support will need to deal with it, 
> if only to increment and write back the page.

Yes, I know about it. But I'm afraid that 8.3->8.4 in-place upgrade will not work.
    Zdenek





Re: toast by chunk-end (was Re: PG_PAGE_LAYOUT_VERSION 5 - time for change)

От
Zdenek Kotala
Дата:
Heikki Linnakangas napsal(a):
> Zdenek Kotala wrote:
>> Heikki Linnakangas napsal(a):
>>> Zdenek Kotala wrote:
>>>> Just a very quick look on your patch. See my comments:
>>>>
>>>> ...
>>>>
>>>> 2) PG_PAGE_LAYOUT_VERSION should be bump
>>>
>>> The patch doesn't change the page layout AFAICS.
>>>
>>
>> It is good question what is and what is not page layout. I think that 
>> toast implementation is a member of page layout. OK it is called page 
>> layout but better name should be On Disk Format (ODF). You will not 
>> able to read 8.3 toasted table in 8.4.
> 
> It's clearly just a catalog change; the number and meaning of attributes 
> has changed, and that's reflected in CATALOG_VERSION_NO.

If I'm thinking more, it is not probably CATALOG_VERSION_NO as well. Because 
toast table is created on demand. It is not in BKI.

Maybe we should add something like TOAST_VERSION.

Do we bump catalog version when AM bump version?
    Zdenek


Re: Re: toast by chunk-end (was Re: PG_PAGE_LAYOUT_VERSION 5 - time for change)

От
Alvaro Herrera
Дата:
Zdenek Kotala wrote:

> If I'm thinking more, it is not probably CATALOG_VERSION_NO as well. 
> Because toast table is created on demand. It is not in BKI.

It's not catversion in the sense that there's no catalog change, but it
certainly requires a catversion bump due to internal changes.
Otherwise, developers who have working data directories today will see
weird errors when they update to a CVS version after this commit.

-- 
Alvaro Herrera                                http://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support


Re: Re: toast by chunk-end (was Re: PG_PAGE_LAYOUT_VERSION 5 - time for change)

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Zdenek Kotala wrote:
>> If I'm thinking more, it is not probably CATALOG_VERSION_NO as well. 
>> Because toast table is created on demand. It is not in BKI.

> It's not catversion in the sense that there's no catalog change, but it
> certainly requires a catversion bump due to internal changes.
> Otherwise, developers who have working data directories today will see
> weird errors when they update to a CVS version after this commit.

Yes.  The real purpose of catversion is to keep developers from wasting
time using an incompatible data directory.

As far as the point at hand goes: the original discussion about this
assumed that we'd add at least one "identity" column to toast tables,
which would allow the t_natts of a toast tuple to effectively serve
as a version number.  So that fixes the problem of how to know what
you are looking at.  What it doesn't solve is the problem of how to
know what range of index values to search for in a partial-fetch
operation.  If you just scan what would be the expected range of
converted chunk positions, you might miss all the old-format entries.

Anyone have a clue on that?
        regards, tom lane


Re: Re: toast by chunk-end (was Re: PG_PAGE_LAYOUT_VERSION 5 - time for change)

От
Zdenek Kotala
Дата:
Alvaro Herrera napsal(a):
> Zdenek Kotala wrote:
> 
>> If I'm thinking more, it is not probably CATALOG_VERSION_NO as well. 
>> Because toast table is created on demand. It is not in BKI.
> 
> It's not catversion in the sense that there's no catalog change, but it
> certainly requires a catversion bump due to internal changes.
> Otherwise, developers who have working data directories today will see
> weird errors when they update to a CVS version after this commit.
> 

I understand it but from upgrade point of view it is confusing. When you upgrade 
catalog then you catalog will not correspond with toast table structure and 
there is no clue if toast table is or is not already converted or which toast 
table structure is used.
    Zdenek


Re: toast by chunk-end (was Re: PG_PAGE_LAYOUT_VERSION 5 - time for change)

От
Zdenek Kotala
Дата:
Heikki Linnakangas napsal(a):
> Zdenek Kotala wrote:
>> Just a very quick look on your patch. See my comments:
>>
>> ...
>>
>> 2) PG_PAGE_LAYOUT_VERSION should be bump
> 
> The patch doesn't change the page layout AFAICS.
> 

It is good question what is and what is not page layout. I think that toast 
implementation is a member of page layout. OK it is called page layout but 
better name should be On Disk Format (ODF). You will not able to read 8.3 
toasted table in 8.4.
    Zdenek