Re: In-place upgrade: catalog side

Поиск
Список
Период
Сортировка
От Greg Smith
Тема Re: In-place upgrade: catalog side
Дата
Msg-id Pine.GSO.4.64.0812040401260.27355@westnet.com
обсуждение исходный текст
Ответ на Re: In-place upgrade: catalog side  (Zdenek Kotala <Zdenek.Kotala@Sun.COM>)
Ответы Re: In-place upgrade: catalog side  (Gregory Stark <stark@enterprisedb.com>)
Список pgsql-hackers
On Thu, 4 Dec 2008, Zdenek Kotala wrote:

> The TOAST problem is already addressed and script should handle it correctly. 
> But I don't like it much, because it is kind of magic.

I just read through the whole "toast by chunk-end" thread again and it 
does seem pretty complicated.  What is the magic part you're still not 
happy with?

> I'll send this version when I finish tests.

You really should feel free to forward these things over as soon as you've 
got something working, even if you're still running your own tests.  With 
all due respect to how much you've done here, the sooner we can get more 
people working with and on something closer to candidate code the better. 
I've have started a couple of days ago but couldn't find anything but the 
old script.  If some parts have comments like "this is an awful check for 
dropped columns that probably doesn't even work yet", that's OK.  We need 
to get other people helping out with this besides you.

> Problem is that dropped columns are only mark as a deleted and data are 
> still stored in tuples.  Catalog contains related information about 
> position and length, but when you perform dump and restore, this 
> information is lost and columns are shifted ...

Here's a good example; that seems a perfect problem for somebody else to 
work on.  I understand it now well enough to float ideas without even 
needing to see your code.  Stop worring about it, I'll grab responsibility 
for making sure it gets done by someone.

So, for everyone else who isn't Zdenek:  when columns are dropped, 
pg_attribute.attisdropped turns true and atttypid goes to 0.  pg_dump 
skips over them, and even if it didn't pg_restore doesn't know how to put 
them back.  I can think of a couple of hacks to work around this, and one 
of them might even work:

1) Create a dummy type that exists only to flag these during conversion. 
Re-add all the deleted columns by turning off attisdropped and flag them 
with that type.  Dump.  Restore.  Re-delete the columns.  My first pass 
through poking holes in this idea wonders how the dump will go crazy if it 
finds rows that were created after the column was dropped, that therefore 
have no value for it.

2) Query the database to find all these deleted columns and store the 
information we need about them, save that into some text files (similary 
to how relids are handled by the script right now).  After the schema 
restore, read that list in, iterating over the missing ones.  For each 
column that was gone, increment attnum for everything above that position 
to renumber a place for it.  Put a dummy column entry back in that's 
already marked as deleted.

3) Wander back into pre-upgrade land by putting together something that 
wanders through every table updating any row that contains data for a 
dropped column.  Since dropping columns isn't really common in giant data 
warehouses, something that had to wander over all the tuples related to a 
table that has lost a column should only need to consider a pretty small 
subset of the database.  You might even make it off-line without getting 
too many yelps from the giant DW crowd, seems like it would be easy to 
write something to estimate the amount of work needed in advance of doing 
it even (before you take the system down, run a check utility that says 
"The server currently has 65213 rows of data for tables with deleted 
columns").

Who wants to show off how much more they know about this than me by saying 
what's right or wrong with these various ideas?

If we care about the fact that columns never go away and are using (1) or 
(2), could also consider adding some additional meta-data to 8.4 such that 
something like vacuum can flag when a column no longer exists in any part 
of the data.  All deleted columns move from 8.3 to 8.4, but one day the 
8.5 upgrade could finally blow them away.  There's already plenty of 
per-table catalog data being proposed to push into 8.4 for making future 
upgrades easier, this seems like a possible candidate for something to 
make space for there.  As I just came to appreciate the problem I'm not 
sure about that.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Simon Riggs
Дата:
Сообщение: Re: Sync Rep: First Thoughts on Code
Следующее
От: ohp@pyrenet.fr
Дата:
Сообщение: Re: cvs head initdb hangs on unixware