Обсуждение: logical column position
I'd like to add a new column to pg_attribute that specifies the
attribute's "logical position" within its relation. The idea here is
to separate the logical order of the columns in a relation from the
on-disk storage of the relation's tuples. This allows us to easily &
quickly change column order, add an additional column before or after
an existing column, etc.
At present, attnum basically does three things: identifies an column
within a relation, indicates which columns are system columns, and
defines the order of a relation's columns. I'd like to move this last
functionality into a separate pg_attribute column named "attpos" (or
"attlogicalpos"):
        - when the table is created, attnum == attpos. System columns          have attpos < 0, as with attnum. At no
pointwill two          columns of the same relation have the same attpos.
 
        - when returning output to the client and no column ordering          is implied by the query (e.g. "SELECT *
..."),we sort the          columns in ascending attpos order.
 
        - when storing a tuple on disk, we don't consider attpos
        - if we want to change the order of the column's in a          relation, we can do so merely by updating
pg_attribute;no          changes to the on-disk storage of the relation should be          necessary
 
A few notes:
 (a) ISTM this should also apply to COPY TO and COPY FROM if the user     didn't supply a column list. Is this
reasonable?It would break     dumps of the table's contents, but then again, dumps aren't     guaranteed to remain
validover arbitrary changes to the table's     meta-data.
 
 (b) Using the above scheme that attnum == attpos initially, there     won't be any gaps in the sequence of attpos
values.That means     that if, for example, we want to move the column in position 50     to position 1, we'll need to
changethe position's of all the     columns in positions [1..49] (and suffer the resulting MVCC     bloat in
pg_attribute).Changing the column order is hardly a     performance critical operation, so that might be acceptable.
 
     If we want to avoid this, one easy (but arguably unclean) way to     do so would be to make the initial value of
attpos== attnum *     1000, and make attpos an int4 rather than an int2. Then, we can     do most column reordering
operationswith only a single     pg_attribute update -- in the worst-case that enough     re-orderings are done that we
overflowthe 999 "padding"     positions, we can just fall-back to doing multiple pg_attribute     updates. Is this
worthdoing, and/or is there a better way to     achieve the same effect?
 
 (c) Do I need to consider inheritance?
Comments are welcome.
-Neil
			
		Neil Conway writes: > (b) Using the above scheme that attnum == attpos initially, there > won't be any gaps in the sequence of attpos values. That means > that if, for example, we want to move the column in position 50 > to position 1, we'll need to change the position's of all the > columns in positions [1..49] (and suffer the resulting MVCC > bloat in pg_attribute). Changing the column order is hardly a > performance critical operation, so that might be acceptable. I don't think you can speak of "bloat" for pg_attribute. But you can speak of a problem when you want to do the old col = col + 1 in the presence of a unique index. > (c) Do I need to consider inheritance? Inheritance is based on column names, so initially no, but if there is a command to alter the column order, then it should have an ONLY option. -- Peter Eisentraut peter_e@gmx.net
On Mon, 17 Nov 2003, Neil Conway wrote: > I'd like to add a new column to pg_attribute that specifies the > attribute's "logical position" within its relation. The idea here is > to separate the logical order of the columns in a relation from the > on-disk storage of the relation's tuples. This allows us to easily & > quickly change column order, add an additional column before or after > an existing column, etc. That sounds excellent! > (a) ISTM this should also apply to COPY TO and COPY FROM if the user > didn't supply a column list. Is this reasonable? It would break > dumps of the table's contents, but then again, dumps aren't > guaranteed to remain valid over arbitrary changes to the table's > meta-data. You're just saying it'd break old dumps, right? I'd assume COPY FROM would use attpos ordering when writing out columns, or that every user-visible interaction with the table pretends the columns are in attpos order. So dumps would break no more or less than when adding or dropping a column currently, right? Jon
Peter Eisentraut <peter_e@gmx.net> writes: > I don't think you can speak of "bloat" for pg_attribute. But you > can speak of a problem when you want to do the old col = col + 1 in > the presence of a unique index. I'm sorry, but I'm not sure what either of these comments mean -- can you elaborate? -Neil
Jon Jensen <jon@endpoint.com> writes: > You're just saying it'd break old dumps, right? I'd assume COPY FROM > would use attpos ordering when writing out columns, or that every > user-visible interaction with the table pretends the columns are in > attpos order. So dumps would break no more or less than when adding > or dropping a column currently, right? Right -- AFAICS, the only change in COPY compatibility would be if you COPY TO'd a table and then changed the logical column order in some fashion, you would no longer be able to restore the dump (unless you specified a column list for the COPY FROM -- which, btw, pg_dump does). I don't think it will be a problem, I just thought I'd mention it. -Neil
> Right -- AFAICS, the only change in COPY compatibility would be if you > COPY TO'd a table and then changed the logical column order in some > fashion, you would no longer be able to restore the dump (unless you > specified a column list for the COPY FROM -- which, btw, pg_dump > does). I don't think it will be a problem, I just thought I'd mention > it. Well it's the same problem as if you'd dropped a column in the middle of the table. BTW, one main consideration is that all the postgres admin apps will now need to support ORDER BY attlognum for 7.5+. Chris
Neil Conway wrote: > Peter Eisentraut <peter_e@gmx.net> writes: > > I don't think you can speak of "bloat" for pg_attribute. But you > > can speak of a problem when you want to do the old col = col + 1 in > > the presence of a unique index. > > I'm sorry, but I'm not sure what either of these comments mean -- can > you elaborate? Peter is pointing out a problem with our unique indexes that might cause you a problem. Suppose you have a unique index in attlognum:test=> create table xx( lognum int);CREATE TABLEtest=>insert into xx values (1);INSERT 17145 1test=> insert into xx values (2);INSERT 17146 1test=> update xx set lognum= lognum + 1;UPDATE 2test=> create unique index yy on xx (lognum);CREATE INDEXtest=> update xx set lognum = lognum+ 1;ERROR: duplicate key violates unique constraint "yy" There is discussion to delay unique constraint failures until commit, then recheck them to see if they are still valid, sort of like what we do with deferred triggers. This would fix the problem because on commit, those values are unique, but aren't while the rows are updated invidually. If we don't get that working you might want to use the 1000 gap idea because it doesn't cause this problem, and we don't support >1600 columns, so a 1000 gap shouldn't cause a problem and can be modified later. If they hit 999 updates, just tell them to dump/reload the table. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
On Mon, 2003-11-17 at 20:24, Christopher Kings-Lynne wrote: > > Right -- AFAICS, the only change in COPY compatibility would be if you > > COPY TO'd a table and then changed the logical column order in some > > fashion, you would no longer be able to restore the dump (unless you > > specified a column list for the COPY FROM -- which, btw, pg_dump > > does). I don't think it will be a problem, I just thought I'd mention > > it. > > Well it's the same problem as if you'd dropped a column in the middle of > the table. > > BTW, one main consideration is that all the postgres admin apps will now > need to support ORDER BY attlognum for 7.5+. Yeah... how about maintaining attnum for the logical attribute number and create an attphysnum or something for the physical position instead? This is more intrusive into the source, but you don't need to teach new tricks to external entities.
Christopher Kings-Lynne writes: > BTW, one main consideration is that all the postgres admin apps will now > need to support ORDER BY attlognum for 7.5+. But that is only really important if they've also used the ALTER TABLE RESHUFFLE COLUMNS feature. So if they make one alteration for 7.5, they need to do another. That seems fair. -- Peter Eisentraut peter_e@gmx.net
>>BTW, one main consideration is that all the postgres admin apps will now >>need to support ORDER BY attlognum for 7.5+. > > > But that is only really important if they've also used the ALTER TABLE > RESHUFFLE COLUMNS feature. So if they make one alteration for 7.5, they > need to do another. That seems fair. Good point. Chris
On Tue, 18 Nov 2003, Peter Eisentraut wrote: > Christopher Kings-Lynne writes: > > > BTW, one main consideration is that all the postgres admin apps will now > > need to support ORDER BY attlognum for 7.5+. > > But that is only really important if they've also used the ALTER TABLE > RESHUFFLE COLUMNS feature. So if they make one alteration for 7.5, they > need to do another. That seems fair. The ability to reshuffle and to get the correct ordering in a client app are separate unless we're going to assume that all access goes through that particular client. If one user uses psql and shuffles them, a second user using fooclient may not see the new ordering.
Will adding the logical attribute number break all of the external tools? pg_dump, etc are all dependent on attnum now? Would it be possible to keep the meaning of attnum the same externally and add another column internally to represent the physical number? Dave
Dave Cramer wrote: > Will adding the logical attribute number break all of the external > tools? pg_dump, etc are all dependent on attnum now? > > Would it be possible to keep the meaning of attnum the same externally > and add another column internally to represent the physical number? Interesting idea. It would require a lot of code renaming in the backend, but it could be done. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
>>Will adding the logical attribute number break all of the external >>tools? pg_dump, etc are all dependent on attnum now? >> >>Would it be possible to keep the meaning of attnum the same externally >>and add another column internally to represent the physical number? > > > Interesting idea. It would require a lot of code renaming in the > backend, but it could be done. Given that the interfaces comprise pgadmin, phppgadmin, jdbc and odbc and all the main developers for those lists read all these posts, I think the massive amount of effort to maintain the external interface isn't worth it. I can vouch that it would take me exactly 2 minutes to add support for attlognums in phpPgAdmin. Chris
Christopher Kings-Lynne wrote: > >>> Will adding the logical attribute number break all of the external >>> tools? pg_dump, etc are all dependent on attnum now? >>> >>> Would it be possible to keep the meaning of attnum the same externally >>> and add another column internally to represent the physical number? >> >> >> >> Interesting idea. It would require a lot of code renaming in the >> backend, but it could be done. > > > Given that the interfaces comprise pgadmin, phppgadmin, jdbc and odbc > and all the main developers for those lists read all these posts, I > think the massive amount of effort to maintain the external interface > isn't worth it. > > I can vouch that it would take me exactly 2 minutes to add support for > attlognums in phpPgAdmin. Lesson 1 in "SQL for beginners" says "Don't use SELECT * if you rely on the order of columns". This discussion is about fixing a problem that only inexperienced programmers have. It's like an extra set of wheels on top of your car, just in case you drive wrong way... What's happening if you simply delete a column? Ordering 1,2,3,5? Insert another column of the same name, as a previously deleted, will it get the old position number? And so on. IMHO, way too much effort for working around situations that should be avoided anyway. Regards, Andreas
Andreas, The point of this is to maintain the column position. I don't think that an alter of a column type should move the column position. It may be that programmers should not rely on this, but it happens, and in very large projects. If we can avoid unexpected side-affects like moving the columns position, then I think we should. Dave On Wed, 2003-11-19 at 12:30, Andreas Pflug wrote: > Christopher Kings-Lynne wrote: > > > > >>> Will adding the logical attribute number break all of the external > >>> tools? pg_dump, etc are all dependent on attnum now? > >>> > >>> Would it be possible to keep the meaning of attnum the same externally > >>> and add another column internally to represent the physical number? > >> > >> > >> > >> Interesting idea. It would require a lot of code renaming in the > >> backend, but it could be done. > > > > > > Given that the interfaces comprise pgadmin, phppgadmin, jdbc and odbc > > and all the main developers for those lists read all these posts, I > > think the massive amount of effort to maintain the external interface > > isn't worth it. > > > > I can vouch that it would take me exactly 2 minutes to add support for > > attlognums in phpPgAdmin. > > Lesson 1 in "SQL for beginners" says "Don't use SELECT * if you rely on > the order of columns". This discussion is about fixing a problem that > only inexperienced programmers have. It's like an extra set of wheels on > top of your car, just in case you drive wrong way... > > What's happening if you simply delete a column? Ordering 1,2,3,5? Insert > another column of the same name, as a previously deleted, will it get > the old position number? And so on. IMHO, way too much effort for > working around situations that should be avoided anyway. > > Regards, > Andreas > > >
Dave Cramer wrote: >Andreas, > >The point of this is to maintain the column position. I don't think that >an alter of a column type should move the column position. > Why should ALTER COLUMN change the column number, i.e. position? >It may be that programmers should not rely on this, but it happens, and in very >large projects. If we can avoid unexpected side-affects like moving the >columns position, then I think we should. > > This is *expected* if behaviour if you delete and add columns; is there any DB system out there that allows to reshuffle the column ordering? Instead of some order-ordering facility it would be better to support all kinds of column type changes, not only binary compatible ones. This would help everybody, not only maintainers of ill-designed software. Regards, Andreas
Andreas, On Wed, 2003-11-19 at 13:07, Andreas Pflug wrote: > Dave Cramer wrote: > > >Andreas, > > > >The point of this is to maintain the column position. I don't think that > >an alter of a column type should move the column position. > > > Why should ALTER COLUMN change the column number, i.e. position? Rod's current proposed patch does that if you do an alter column alter type. This is an artifact of the underlying mechanism. (ren old col, add new col, update newcol=oldcol::newtype). Which is the point of the logical column number discussion, and the todo item. > > >It may be that programmers should not rely on this, but it happens, and in very > >large projects. If we can avoid unexpected side-affects like moving the > >columns position, then I think we should. > > > > > This is *expected* if behaviour if you delete and add columns; is there > any DB system out there that allows to reshuffle the column ordering? Yes, informix allows you to add the column before|after a column, and mysql allows for add column after col. those are the only two I know about.there could be more. > > Instead of some order-ordering facility it would be better to support > all kinds of column type changes, not only binary compatible ones. This > would help everybody, not only maintainers of ill-designed software. > > Regards, > Andreas > > > > >
Dave Cramer wrote: >Andreas, > > >On Wed, 2003-11-19 at 13:07, Andreas Pflug wrote: > > >>Dave Cramer wrote: >> >> >> >>>Andreas, >>> >>>The point of this is to maintain the column position. I don't think that >>>an alter of a column type should move the column position. >>> >>> >>> >>Why should ALTER COLUMN change the column number, i.e. position? >> >> > >Rod's current proposed patch does that if you do an alter column alter >type. This is an artifact of the underlying mechanism. (ren old col, add >new col, update newcol=oldcol::newtype). > I must have missed that, can't find it in hackers?!? In this case the old attnum value should simply be reused, to retain the original ordering. IMHO this is necessary to prevent problems with any object referencing a column (index, view, ...) The current proposal invents the attpos for column ordering purpose only, but views/indexes/etc will reference attnum, and would need updates. Actually, a column that changes its attnum is just like a table changing its oid, i.e. it's not the same object any more. This will provoke problems in administration tools (at least in pgAdmin3, which will try to refresh its display with the formerly known oid/attnum af ter executing a change), and maybe other places too. To put it differently: a ALTER COLUMN command may never-ever change the identifier of the column, i.e. attrelid/attnum. Regards, Andreas
Andreas Pflug kirjutas K, 19.11.2003 kell 20:45: > Dave Cramer wrote: > >>Why should ALTER COLUMN change the column number, i.e. position? > > > >Rod's current proposed patch does that if you do an alter column alter > >type. This is an artifact of the underlying mechanism. (ren old col, add > >new col, update newcol=oldcol::newtype). > > > I must have missed that, can't find it in hackers?!? Was on [PATCHES] IIRC. > In this case the old attnum value should simply be reused, to retain the > original ordering. IMHO this is necessary to prevent problems with any > object referencing a column (index, view, ...) Actually these have to be recreaqted, especially when changing column type. Rod's patchs does that too ;) > The current proposal > invents the attpos for column ordering purpose only, That's the only place _user_ sees it. The other uses are taken care of inide database backend. > but > views/indexes/etc will reference attnum, and would need updates. they also "reference" column type, and thus need to be updated anyway when column type changes. > Actually, a column that changes its attnum is just like a table changing > its oid, i.e. it's not the same object any more. This will provoke > problems in administration tools (at least in pgAdmin3, which will try > to refresh its display with the formerly known oid/attnum af ter > executing a change), and maybe other places too. Sure. _any_ change to database structure could break a client not (designed to be) aware of that change. > To put it differently: a ALTER COLUMN command may never-ever change the > identifier of the column, i.e. attrelid/attnum. to be even more restirictive: ALTER COLUMN may never-ever change the type of the column, as this too may break some apps. Nah! ------------- Hannu
Hannu Krosing wrote: >>To put it differently: a ALTER COLUMN command may never-ever change the >>identifier of the column, i.e. attrelid/attnum. >> >> > >to be even more restirictive: ALTER COLUMN may never-ever change the >type of the column, as this too may break some apps. Nah! > > > Yeah, and the data should be read only :-) Seriously: Methinks that only a part of the -patches thread was turned over to -hackers, some important parts are missing. First, there are column type changes that don't need any index/view/constraint recheck or data transformation at all, being of the very popular class "hell, I need to stuff 12 bytes in my varchar(10)". Some months ago, this was discussed, and there was consense that binarily compatible types may be changed with few special precautions (e.g. varchar(12) -> varchar(10) e.g. needs a check for len<=10). As a consequence, this kind of column type change is implemented in pgAdmin3". Probably a large percentage of real life column type changes are such binarily compatible ones, so it's senseful to handle them separately. Second, column type changes needing a nontrivial cast function should be implemented in a way that preserve attnum. This could be done like this: - decompile dependent objects, and memorize them for later recreation - ADD tmpCol, UPDATE tmpCol=col::newtype, DROP old column, cascading to dependent objects, RENAME tmpCol (known stuff) - restore old attnum, which is a simple UPDATE to pg_attribute at this stage - recreate all dependent objects Voila! No need for an additional attpos. Regards, Andreas
> Why should ALTER COLUMN change the column number, i.e. position? Because it creates a NEW column. >> It may be that programmers should not rely on this, but it happens, >> and in very >> large projects. If we can avoid unexpected side-affects like moving the >> columns position, then I think we should. >> >> > This is *expected* if behaviour if you delete and add columns; is there > any DB system out there that allows to reshuffle the column ordering? MySQL Chris
Andreas Pflug kirjutas N, 20.11.2003 kell 01:38: > Second, column type changes needing a nontrivial cast function should be > implemented in a way that preserve attnum. This could be done like this: > - decompile dependent objects, and memorize them for later recreation > - ADD tmpCol, UPDATE tmpCol=col::newtype, DROP old column, cascading to > dependent objects, RENAME tmpCol (known stuff) > - restore old attnum, which is a simple UPDATE to pg_attribute at this stage and suddenly your table is broken, as you can't retrieve the tmpCol when the attnum points to the dropped old column which has data in the format for old type ... the whole point of separating attnum and attpos is that attnum is used internally to retrieve the data and you can't change it by just UPDATEing pg_attribute. --------------- Hannu
Hannu Krosing wrote: > >attnum is used internally to retrieve the data > > > Oops... So if an additional column number is invented, it should not be a logical column number, but a physical storage number for internal data retrieval. This way, the "user interface" doesn't change, and all those "SELECT ... FROM pg_attribute ORDER BY attnum" continue delivering the expected result. Regards, Andreas
Andreas Pflug kirjutas N, 20.11.2003 kell 11:40: > Hannu Krosing wrote: > > > > >attnum is used internally to retrieve the data > > > > > > > Oops... > > So if an additional column number is invented, it should not be a > logical column number, but a physical storage number for internal data > retrieval. You are just shifting the interface problems to a place needing way more changes in the backend. There will be some problems either way. also, tools needing knowledge should start using information schema as much as they can, making internal reshufflings less of a problem. > This way, the "user interface" doesn't change, and all those > "SELECT ... FROM pg_attribute ORDER BY attnum" continue delivering the > expected result. Depending on what you expect ;) If you expect the above to give you all active columns as orderd as they are stored, then it does not give you what you expect. Btw, most of these concerns (and more) were already iterated when DROP column was done causing gaps in attnum. There were a lot of doomsday profecies, but in the end it went quite smoothly. The tools needing internal knowledge about storage (meaning any tool doing select .. from pg_...) have always needed some upgrades for new verions. IMHO, The only behaviour visible to common user we should worry about is SELECT * , and a special column for solving this is _the_ easiest way to do it. ------------- Hannu
Which is what started the whole discussion. Dave On Thu, 2003-11-20 at 04:40, Andreas Pflug wrote: > Hannu Krosing wrote: > > > > >attnum is used internally to retrieve the data > > > > > > > Oops... > > So if an additional column number is invented, it should not be a > logical column number, but a physical storage number for internal data > retrieval. This way, the "user interface" doesn't change, and all those > "SELECT ... FROM pg_attribute ORDER BY attnum" continue delivering the > expected result. > > Regards, > Andreas > >
On Wed, 19 Nov 2003 19:07:23 +0100, Andreas Pflug <pgadmin@pse-consulting.de> wrote: >is there >any DB system out there that allows to reshuffle the column ordering? Firebird:ALTER TABLE tname ALTER COLUMN cname POSITION 7; ServusManfred
Hannu Krosing <hannu@tm.ee> writes:
> You are just shifting the interface problems to a place needing way more
> changes in the backend. There will be some problems either way.
Exactly.  I'm considerably more worried about breaking out-of-the-way
places in the backend than I am about what order someone's admin tool
presents the columns in.
> Btw, most of these concerns (and more) were already iterated when DROP
> column was done causing gaps in attnum. There were a lot of doomsday
> profecies, but in the end it went quite smoothly.
That is a good comparison point.  I'm inclined to think that we should
do it in a way that minimizes backend changes.  The way to do that is
to keep attnum with its current definition (physical position) and add
a new column for the logical position, which only a small number of
places will need to care about.
        regards, tom lane
			
		Tom Lane writes: > Exactly. I'm considerably more worried about breaking out-of-the-way > places in the backend than I am about what order someone's admin tool > presents the columns in. Clearly, the effort of adding logical column numbers will consist of making choices between physical and logical numbers in the backend in some places. So one option is to replace some uses of attnum by attlognum. The other optionis to replace *all* uses of attnum by attphysnum and then replace some uses of attphysnum by attnum. To me, this looks like an equal "risk" as far as the backend goes. -- Peter Eisentraut peter_e@gmx.net
Peter Eisentraut <peter_e@gmx.net> writes:
> Tom Lane writes:
>> Exactly.  I'm considerably more worried about breaking out-of-the-way
>> places in the backend than I am about what order someone's admin tool
>> presents the columns in.
> Clearly, the effort of adding logical column numbers will consist of
> making choices between physical and logical numbers in the backend in some
> places.  So one option is to replace some uses of attnum by attlognum.
> The other optionis to replace *all* uses of attnum by attphysnum and then
> replace some uses of attphysnum by attnum.  To me, this looks like an
> equal "risk" as far as the backend goes.
This would be a reasonable assessment if we had our hands on every line
of backend code that exists.  But you are neglecting the probability of
breaking user-written C functions, PL languages outside the main distro,
etc.
If we were going to go about this in a way that does not localize the
changes, I'd be inclined to use "attlognum" and "attphysnum" ... that
is, *deliberately* break every use that hasn't been looked at and
updated.  Even that would not guarantee catching all the trouble spots;
for example loop indexes and attnums passed as function parameters might
not have names that would be caught by a simplistic search-and-replace
update.
I'm for localizing the changes.
        regards, tom lane
			
		Rod Taylor <pg@rbt.ca> writes:
> On Mon, 2003-11-17 at 20:24, Christopher Kings-Lynne wrote:
>> BTW, one main consideration is that all the postgres admin apps will now 
>> need to support ORDER BY attlognum for 7.5+.
> Yeah... how about maintaining attnum for the logical attribute number
> and create an attphysnum or something for the physical position instead?
> This is more intrusive into the source, but you don't need to teach new
> tricks to external entities.
> [ and similar remarks from other people elsewhere in the thread ]
It's completely fallacious to imagine that we could make this change be
transparent to external applications.  To take two examples:
1. How many places do you think know that pg_attribute.attnum links to
pg_attrdef.adnum?  pg_dump, psql, and the JDBC driver all appear to
know that, in a quick search of the CVS tree; I haven't even bothered to
look at pgadmin and the other apps that are likely to have such
dependencies.
2. How about linking pg_attribute.attnum to entries in pg_index.indkey?
Lots of apps know about that too.
Unless you are going to change the meanings of pg_index.indkey and
pg_attrdef.adnum, you can't simply redefine attnum as a logical column
position.  And if you do make such a change you will break code
elsewhere.
If we add a *new* column attlogpos, without changing the semantics
of attnum, then I think we have a fighting chance of making this work
without an undue amount of effort.  I see no prospect that we can
change the meaning of attnum without breaking things far and wide.
        regards, tom lane
			
		Tom Lane wrote: >It's completely fallacious to imagine that we could make this change be >transparent to external applications. To take two examples: > >1. How many places do you think know that pg_attribute.attnum links to >pg_attrdef.adnum? pg_dump, psql, and the JDBC driver all appear to >know that, in a quick search of the CVS tree; I haven't even bothered to >look at pgadmin and the other apps that are likely to have such >dependencies. > >2. How about linking pg_attribute.attnum to entries in pg_index.indkey? >Lots of apps know about that too. > >Unless you are going to change the meanings of pg_index.indkey and >pg_attrdef.adnum, you can't simply redefine attnum as a logical column >position. And if you do make such a change you will break code >elsewhere. > >If we add a *new* column attlogpos, without changing the semantics >of attnum, then I think we have a fighting chance of making this work >without an undue amount of effort. I see no prospect that we can >change the meaning of attnum without breaking things far and wide. > > > I don't quite understand your argumentation. Currently, attnum is used 1) to determine position (the concern) 2) as part of the unique identifier, as used by index, FK etc 3) as pointer for data retrieval. If only the retrieval functions would use attstoragenum or however you'd call it, all other references to attnum can remain untouched. Actual physical reordering would be hidden almost completely. This is a bit like abstracting a primary key from the row's physical storage location. Regards, Andreas > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 8: explain analyze is your friend > > >
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> I don't quite understand your argumentation.
My point is that to change attnum into a logical position without
breaking client apps (which is the ostensible reason for doing it
that way), we would need to redefine all system catalog entries that
reference columns by attnum so that they also store logical rather than
physical position.  That has a number of serious problems, one big one
being the difficulty of updating them all correctly during a column
renumbering operation.  More, it turns what would otherwise be a
relatively localized patch into a massive and bug-prone backend
modification.
I think it is better to consider attnum as sort of a mini-OID: any one
column has a uniquely assigned attnum that will never change and can
be relied on to identify that column.  This is essentially how it is
being used now (remember attnum is part of the PK for pg_attribute)
and the fact that it is also the physical position is really rather
incidental as far as the system catalogs are concerned.
You're quite right that attnum is serving three purposes, but that
doesn't mean that we can choose at random which purpose(s) to decouple.
Abandoning the assumption that attnum is a permanent identifier would
break a lot of things --- probably not only in the backend, either.
        regards, tom lane
			
		Tom Lane wrote: >Andreas Pflug <pgadmin@pse-consulting.de> writes: > > >>I don't quite understand your argumentation. >> >> > >My point is that to change attnum into a logical position without >breaking client apps (which is the ostensible reason for doing it >that way), we would need to redefine all system catalog entries that >reference columns by attnum so that they also store logical rather than >physical position. That has a number of serious problems, one big one >being the difficulty of updating them all correctly during a column >renumbering operation. More, it turns what would otherwise be a >relatively localized patch into a massive and bug-prone backend >modification. > >I think it is better to consider attnum as sort of a mini-OID: any one >column has a uniquely assigned attnum that will never change and can >be relied on to identify that column. This is essentially how it is >being used now (remember attnum is part of the PK for pg_attribute) >and the fact that it is also the physical position is really rather >incidental as far as the system catalogs are concerned. > > I agree considering attrelid/attnum as kind-of OID, but a relation's pg_class.oid won't change at ALTER TABLE either, I'd expect the same from ALTER COLUMN. >You're quite right that attnum is serving three purposes, but that >doesn't mean that we can choose at random which purpose(s) to decouple. >Abandoning the assumption that attnum is a permanent identifier would >break a lot of things --- probably not only in the backend, either. > > Maybe my proposal wasn't clear enough: Just as an index references a pg_class entry by it's OID, not some value identifying it's physical storage, all objects might continue referencing columns by attnum. Only tuple handling functions like heap_getattr and heap_formtuple need to know how to extract a Datum by its attnum from a HeapTuple or how to compile a HeapTuple correctly. If reshuffling columns is done inside of these functions, it would be transparent to the rest of the backend and the clients. Hopefully, there are not too much of such functions, or fancy modules bypassing them... Regards, Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> Maybe my proposal wasn't clear enough:
> Just as an index references a pg_class entry by it's OID, not some value 
> identifying it's physical storage, all objects might continue 
> referencing columns by attnum.
That's exactly the same thing I am saying.  Your mistake is to assume
that this function can be combined with identification of a (changeable)
logical column position.  It can't.  Changeability and immutability are
just not compatible requirements.
        regards, tom lane
			
		Tom Lane wrote: >Andreas Pflug <pgadmin@pse-consulting.de> writes: > > >>Maybe my proposal wasn't clear enough: >>Just as an index references a pg_class entry by it's OID, not some value >>identifying it's physical storage, all objects might continue >>referencing columns by attnum. >> >> > >That's exactly the same thing I am saying. Your mistake is to assume >that this function can be combined with identification of a (changeable) >logical column position. It can't. Changeability and immutability are >just not compatible requirements. > > In the mind of a programmer, a ALTER COLUMN doesn't create a new column, but merely changes some attributes of an existing column. In this sense, changeability and immutability are not controversal. Digging deeper: TupDesc contains an array of physical attr descriptions, and to access a column description attnum is taken as index into that array (taken from fastgetattr). return fetchatt(tupleDesc->attrs[attnum-1], ...) The physical location can easily reordered if there's an additional array, to translate attnum into the array index. return fetchatt(tupleDesc->attrs[tupleDesc->attrpos[attnum-1]] ... For sure, reordering (i.e. changing the attrpos array) may only be performed as long as the column isn't referenced. Regards, Andreas
Andreas Pflug <pgadmin@pse-consulting.de> writes:
> To put it differently: a ALTER COLUMN command may never-ever change the 
> identifier of the column, i.e. attrelid/attnum.
If the ALTER is changing the column type, it's not really the same
column anymore; I see nothing wrong with assigning a new attnum in that
scenario.  It's not like you can simply change the type and not go visit
the references in such a case.
        regards, tom lane
			
		Tom Lane wrote: >If the ALTER is changing the column type, it's not really the same >column anymore; > This doesn't strike. "If the ALTER is changing the number of columns, it's not really the same table anymore" is as true as your statement. Still, pg_class.oid remains the same for ADD and DROP column. > I see nothing wrong with assigning a new attnum in that >scenario. It's not like you can simply change the type and not go visit >the references in such a case. > > But this fix is about automatically updating references as well, making the ALTER COLUMN appear a low-impact change to the user (which obviously isn't true, unless my proposed shortcut for binary compatible type changes is implemented). When dropping and recreating an object, nobody would expect to get the same identifier. When altering, I *do* expect the identifier to remain the same. Regards, Andreas
Hannu Krosing wrote: > > >You are just shifting the interface problems to a place needing way more >changes in the backend. There will be some problems either way. > > Not quite. Certainly, basing internal storage on attstoragenum is more work in the backend, but less (precisely: zero) work on an unknown number of frontend tools and apps. >also, tools needing knowledge should start using information schema as >much as they can, making internal reshufflings less of a problem. > > We had this discussion. information_schema doesn't deliver enough info needed for admin tools. > > >>This way, the "user interface" doesn't change, and all those >>"SELECT ... FROM pg_attribute ORDER BY attnum" continue delivering the >>expected result. >> >> > >Depending on what you expect ;) > > Usually, nobody should care about the column ordering, but for those unfortunate guys that rely on a specific SELECT * ordering the list of columns displayed in admin tools must show that ordering; this is what current admin tools expect from attnum. No SQL user would ever care about internal storage details/pointers/counters, so any admin tool would need to ORDER BY CASE WHEN version>=7.5 THEN attpos ELSE attnum END (and the unique key to pg_attribute, as seen from the tool, changes from refoid/attnum to refoid/attindex too). >If you expect the above to give you all active columns as orderd as they >are stored, then it does not give you what you expect. > >Btw, most of these concerns (and more) were already iterated when DROP >column was done causing gaps in attnum. There were a lot of doomsday >profecies, but in the end it went quite smoothly. > I don't bother about missing attnum values, even 1,2,3,5,6,8 is nicely ordered. > The tools needing >internal knowledge about storage (meaning any tool doing select .. from >pg_...) have always needed some upgrades for new verions. > > Yes, but changes to pg_... should retain the usual meanings as much as possible, so older tools continue to work. The discussed change is problematic because old tools *seem* to work ok, but their attnum interpretation would be wrong. >IMHO, The only behaviour visible to common user we should worry about is >SELECT * , and a special column for solving this is _the_ easiest way to >do it. > > > Surely this is the easiest way. But it has the biggest impact on clients too. I'm just imagining what would happen to pgAdmin3. The column number would have to display attpos (this is what the user is interested in to see the ordering), while index, FK and so forth will continue to display attnum. This seems quite unwanted to me. --- Are there any comments on the proposed lean way to alter columns for trivial type changes? Regards, Andreas
>>We had this discussion. information_schema doesn't deliver enough info >>needed for admin tools. >> >> > >It should. This is the sole reason for existance of it. If it is not >enough, then it should be updated. > > It can't. ANSI says only objects owned by the user are shown. Admins might be quite unhappy about that... pg_catalog views don't help either, just look at pg_tables. It doesn't even have the oid, how should a table be identified uniquely? The system views are unusable for hardcore admin purposes, until they include *. So I'd rather use the tables directly. >> >>Usually, nobody should care about the column ordering, but for those >>unfortunate guys that rely on a specific SELECT * ordering the list of >>columns displayed in admin tools must show that ordering; this is what >>current admin tools expect from attnum. No SQL user would ever care >>about internal storage details/pointers/counters, so any admin tool >>would need to ORDER BY CASE WHEN version>=7.5 THEN attpos ELSE attnum >>END >> >> > >This won't work anyway if table is missing column attpos . > Sorry to be not precise enough, this was meant as meta code. Of course the query must be built version dependent. >Still there were several predictions of all admin tools breaking as a >result of gaps. > > wasn' me. > >Can you name one PG version change from A.N to A.M where old admin tools >have not needed any changes ? > > Older tools usually continue to work, they just don't know new features. Maybe some esotheric features break, so few people notice. >attnum interpretation of pgAdmin3 is already wrong - it claims it to be >Position even when some previous columns are dropped. So you can have a >table which has 1 column with "Position" 3 ;) > > attnum isn't interpreted at all in pgAdmin3, only used for ordering. It can't be used as key to a column any more, if altering a column would create a new attnum with old name (and old attpos). In this sense, the key is attrelid/attpos, because only these don't change for an ALTER COLUMN statement. Imagine a sql update to a non-pk column would change the pk of the row, so you'd have to reread the row by its content to obtain the new pk value. This wouldn't make you happy, right? Same here. > > >>>IMHO, The only behaviour visible to common user we should worry about is >>>SELECT * , and a special column for solving this is _the_ easiest way to >>>do it. >>> >>> >>> >>Surely this is the easiest way. But it has the biggest impact on clients >>too. >> >> > >Not all clients - just admin tools. And we dont have that many admin >tools. And IMNSHO admin tools should move to using information_schema as >much as possible. > >And if information_schema is inadequate then fix it instead of bypassing >it. > > It is unfixable. Being ANSI-conformant, it prevents the admin seeing everything. >I think keeping know-how about retrieving postgresql structure inside of >an application instead of making it readily available in >information_schema is either > a) an egoistic attemt of shutting out competition > b) laziness >or > c) both > ><grin> > > > <comment mode=suppress> ******** </comment> >It is also bad design, as it ties ones tool to backend structure too >tightly. > >Backend structure will keep changing and the last thing we want to hold >it back is some frontend tool which thinks it knows better how to >organize data in backend. What if some completely new storage is added >to postgreSQL (ancient Postgres versions had built-in support for >several storages). Should all frontend tools (including ?DBC drivers) >need updating or just information_schema ? > > Again, I'm not against using information_schema. I tried to use it (for non-admin tool purposes!) and failed badly. pgAdmin3 is a tool for *all* PostgreSQL features, not just for some common ANSI stuff. >>I'm just imagining what would happen to pgAdmin3. The column number >>would have to display attpos (this is what the user is interested in to >>see the ordering), >> >> > >No they are interested in position as you mentioned above, they didn't >want to see attnum (i.e 1,2,3,5,6,8 in your example) before either. > >I think it is a bug that pgAdmin3 shows attnum instead the real >position. > > > It shows what's in the db, to identify the column. I can't imagine why I should show a column number, programmers should address by column name and nothing else. >Sorry, I must have missed it ;( could you give a link to archived copy. > > About a week ago, when the message you answered on was posted. Took a week now to appear... >But I think that nobody objected, but nobody didn't volunteer to do the >work either ;) > >At least that was the impression i got from an answer to my similar >question on growing varchars and dropping isnull's without forcing >column copies and constraint checks. > > Yeah, interesting. For my observations, these trivial changes make 90-95 % of daily column change work, that's why I implemented it in pgAdmin3 (targeting the system tables directly...), so it's worth the effort handling them separately. I might add it some time to the backend (as an additional code path to the big version). Regards, Andreas
Andreas Pflug kirjutas N, 20.11.2003 kell 16:10: > Hannu Krosing wrote: > > > > > > >You are just shifting the interface problems to a place needing way more > >changes in the backend. There will be some problems either way. > > > > > Not quite. Certainly, basing internal storage on attstoragenum is more > work in the backend, but less (precisely: zero) work on an unknown > number of frontend tools and apps. With stress on "unknown number" ;) > >also, tools needing knowledge should start using information schema as > >much as they can, making internal reshufflings less of a problem. > > > > > We had this discussion. information_schema doesn't deliver enough info > needed for admin tools. It should. This is the sole reason for existance of it. If it is not enough, then it should be updated. Updating information_schema would also make developers of other admin tools happy. Remember - competition is good ;) Compatibility with old verions of admin tools wont happen anyway, so we should not let that lock backend into bad development decisions. Look what happened to DOS-WIN16-WIN32. > >>This way, the "user interface" doesn't change, and all those > >>"SELECT ... FROM pg_attribute ORDER BY attnum" continue delivering the > >>expected result. > >> > >> > > > >Depending on what you expect ;) > > > > > Usually, nobody should care about the column ordering, but for those > unfortunate guys that rely on a specific SELECT * ordering the list of > columns displayed in admin tools must show that ordering; this is what > current admin tools expect from attnum. No SQL user would ever care > about internal storage details/pointers/counters, so any admin tool > would need to ORDER BY CASE WHEN version>=7.5 THEN attpos ELSE attnum > END This won't work anyway if table is missing column attpos . You have to have different queries for different versions. Add it is preferential to keep these different queries in information_schema of corresponding databases not all in frontend tool. You still need different queries for old databases which did not support schemas. > (and the unique key to pg_attribute, as seen from the tool, changes > from refoid/attnum to refoid/attindex too). the key needs no change, just the ORDER BY clause. > >If you expect the above to give you all active columns as orderd as they > >are stored, then it does not give you what you expect. > > > >Btw, most of these concerns (and more) were already iterated when DROP > >column was done causing gaps in attnum. There were a lot of doomsday > >profecies, but in the end it went quite smoothly. > > > I don't bother about missing attnum values, even 1,2,3,5,6,8 is nicely > ordered. Still there were several predictions of all admin tools breaking as a result of gaps. > > The tools needing > >internal knowledge about storage (meaning any tool doing select .. from > >pg_...) have always needed some upgrades for new verions. > > > > > Yes, but changes to pg_... should retain the usual meanings as much as > possible, so older tools continue to work. Can you name one PG version change from A.N to A.M where old admin tools have not needed any changes ? > The discussed change is > problematic because old tools *seem* to work ok, but their attnum > interpretation would be wrong. attnum interpretation of pgAdmin3 is already wrong - it claims it to be Position even when some previous columns are dropped. So you can have a table which has 1 column with "Position" 3 ;) > >IMHO, The only behaviour visible to common user we should worry about is > >SELECT * , and a special column for solving this is _the_ easiest way to > >do it. > > > Surely this is the easiest way. But it has the biggest impact on clients > too. Not all clients - just admin tools. And we dont have that many admin tools. And IMNSHO admin tools should move to using information_schema as much as possible. And if information_schema is inadequate then fix it instead of bypassing it. I think keeping know-how about retrieving postgresql structure inside of an application instead of making it readily available in information_schema is eithera) an egoistic attemt of shutting out competitionb) laziness orc) both <grin> It is also bad design, as it ties ones tool to backend structure too tightly. Backend structure will keep changing and the last thing we want to hold it back is some frontend tool which thinks it knows better how to organize data in backend. What if some completely new storage is added to postgreSQL (ancient Postgres versions had built-in support for several storages). Should all frontend tools (including ?DBC drivers) need updating or just information_schema ? > I'm just imagining what would happen to pgAdmin3. The column number > would have to display attpos (this is what the user is interested in to > see the ordering), No they are interested in position as you mentioned above, they didn't want to see attnum (i.e 1,2,3,5,6,8 in your example) before either. I think it is a bug that pgAdmin3 shows attnum instead the real position. > while index, FK and so forth will continue to display > attnum. This seems quite unwanted to me. Actually the column name is shown (at least I could not find attnum anywhere in keys or indexes) > --- > Are there any comments on the proposed lean way to alter columns for > trivial type changes? Sorry, I must have missed it ;( could you give a link to archived copy. But I think that nobody objected, but nobody didn't volunteer to do the work either ;) At least that was the impression i got from an answer to my similar question on growing varchars and dropping isnull's without forcing column copies and constraint checks. ----------------- Hannu