Terrible upgrade experience
От | John Goerzen |
---|---|
Тема | Terrible upgrade experience |
Дата | |
Msg-id | r6zd84kcilm.fsf@emerald.southwind.net обсуждение исходный текст |
Список | pgsql-admin |
Hello, Please CC me any reply since I'm not on this list. Should I send this to -hackers as well? Yesterday, it became necessary to upgrade our Postgres 6.3.2 database servers to 6.4.2. We had planned to do this anyway, after hours, but one particular query in a mission-critical app connecting to 6.3.2 was causing it to coredump every time, and there were some notes in the HISTORY file about fixing memory overruns, so it was decided to upgrade the database server during the day, previous upgrades having taken less than an hour. Our machines here are BSD/OS 3.1. So anyway, first thing I do is compile 6.4.2. A hitch there already: it is looking for netdb.h in one of the files, but can't find it. A bit of detective work later, I find that file in /sco/usr/include and copy it into postgresql-6.4/src/include and things work. Then, I kill the server and all backends, set pg_hba.conf so only I can connect, and pg_dumpall -z the database. That worked fine. I also do a dump appropriate for pg_upgrade, figuring that maybe I'd try that out of necessary. I also tar all this stuff up, including my data and bin directories, just in case anything should go haywire. This will turn out to have been a wise precaution. I then move data to data.old, gmake install, initdb, and psql -e template1. It looks to be reloading the database OK for awhile. But then all of a sudden the screen is dumping pages and pages of psql's help text (the backslash commands, etc.) Noting that this is not good at all, I ctrl-C the restore, rm -rf data, initdb again, and try pg_upgrade. This complains that psql was unsuccessful and doesn't run either (it did appear to issue queries.) So panic time arrives. After restoring my old binaries and trying a dump with -d to dump everything as an INSERT query, and seeing psql spit out literally thousands of errors after trying to load this back into the new database, I end up doing this (on a new freshly-initdb'd data directory in 6.4.2): psql -e template1 < DumpFile &> results and: tail -f results in another window. Eventually, I discovered all these problems: * My User groups were not preserved from the previous version, so things like grant all to group foo; were failing. * The dumpfile would do things like create a table as one user, set the permissions, and then when it tried to use COPY, it would be connected as a different user that had no permissions to update. This is one of the most insidious problems; psql does not stop when it does not have permissions to copy; nor does it continue reading until the end marker. It simply interprets all the commands as psql commands and gets thoroughly confused. By the time I had finally fixed this, I had loaded my 36-megabyte dumpfile into vi about a dozen times, basically in a trial-and-error fashion waiting for it to find the next problem. * Tables containing CONSTRAINT or DEFAULT clauses were not created by CREATE TABLE (citing parse errors), and causing the same problem with COPY later on as described above. By that time, I was several hours behind schedule and finally just deleted those clauses from the dumpfile so I could get the server back online. * The old pg_dumpall dumped things as char4, char8, char2, etc. The new Postgres of course uses char(4) syntax. For some reason, it doesn't even accept the old syntax, meaning that every single table that used one of those things was not created either, and caused problems with COPY. So I wrote a sed to fix these problems, but that caused problems with CREATE INDEX later. Those were few enough that they could be manually fixed. * What was before a char8 is now treated as always having trailing spaces as a char(8). This is causing extreme havoc with programs that deal with it. Either that, or the dump corrupted the data by adding trailing spaces. Either way, changing the data is NOT acceptable and ought to be fixed. We are storing Unix usernames. Unix does not put trailing spaces, and so string comparisons are failing all over the place. By the time I had tracked down all these problems, the database had been down for about 4 or 5 hours instead of the 45 minutes that it has taken me to upgrade in the past. This really is not acceptable for something used in a mission-critical situation; we have been getting angry calls and e-mails from customers. I have still not had time to restore all the groups that were created; I don't understand why they were not carried over. In short, I'm not a happy camper. I've always been a PostgreSQL fan, and I look forward to the features coming in 6.5, but the problems with this procedure show that people have not tested it. I still find it hard to believe that nobody else ran into the char8 problems with upgrading. In my brief narrative above, I've skipped things that eventually led to no help. For instance, I tried the pg_dumpall from 6.4.2 on 6.3.2, but it didn't help. I seriously hope that these problems will be fixed before 6.4.3 or 6.5, and that there will be sufficient testing to ensure that the upgrade process at least works somewhat smoothly. Dumping and restoring the database is bad enough; it causes a good deal of downtime, but the process has traditionally been trouble-free. All of a sudden, it is not, and even someone that has been using this database since the early Postgre95 days spent many hours trying to fix it. -- John Goerzen SouthWind Internet Access, Inc. E-mail: Business, jgoerzen@southwind.net; Personal, jgoerzen@complete.org Developer, Debian GNU/Linux <http://www.debian.org>
В списке pgsql-admin по дате отправления: