Re: Deep Trouble
От | Karl DeBisschop |
---|---|
Тема | Re: Deep Trouble |
Дата | |
Msg-id | 39EC49CA.66D8C6DA@debisschop.net обсуждение исходный текст |
Ответ на | A better stop ("Warren Flemmer" <warren@netlab.co.za>) |
Список | pgsql-general |
Abe Asghar wrote: > > Hi guys, > > Havin some trouble. > > One of my databases appeared to be empty suddenly after having a large > amount of data in it. I contacted our server company and they gave me the > postgres dir. > > I have put back the folder of the newsdatabase from the base dir into the > base dir of Postgres and recreated the database. > > That's fine. > > However now the database is empty. When I do a cat on the file of the same > name as one of the tables - it has loads of data in it. However when I go > in to Postgres and try to list the table it comes back with ) rows. > > Any ideas I am desperate. > > I am using linux redhat and Postgres > > Thanks > Abe We saw this on PG 7.0.2 with RH 6.2smp (specifically kernel 2.2.16-3smp). What happened in our case was that the transaction If it's the same bug, the data is there, here's a couple of diagnostics from Tom Lane: >>>>> Message 1 >>>>>>> Well, I still don't know what caused the "not a btree" complaint, but I see what's the matter with your missing databases: the current transaction numbers are too small. In /var/lib/pgsql/data.rpmnew, pg_variable has a next transaction number of 3586: [tlane@sterno data.rpmnew]$ pwd /var/lib/pgsql/data.rpmnew [tlane@sterno data.rpmnew]$ od -l pg_variable 0000000 0 3586 0 321084263 0000020 0 0 0 0 * 0020000 and in the older DB it's 5634: [tlane@sterno data]$ pwd /disk/1/pgsql/data [tlane@sterno data]$ od -d pg_variable [tlane@sterno data]$ od -l pg_variable 0000000 0 5634 0 320895230 0000020 0 0 0 0 * 0020000 Now your DBs (other than template1) were all created by much higher transaction numbers. For example in the older installation "biobase"'s pg_database tuple has t_xmin = 9752, meaning it was created by transaction number 9752. That's far in the future according to the system's current view of the world, so the pg_database tuple is considered not to be committed yet, which is why you can't see it. You can connect to that DB anyway because the connection-time check for tuple validity is a little bit lax, but once you're connected most of the system-table tuples will look like they don't exist yet, ditto tuples you try to create, thus things act pretty broken. So now we know the immediate cause of the weird behavior, and the next question is how did pg_variable get into this state? Is it conceivable that you somehow restored an old copy over the current version of the file, or some such thing? BTW, in your running postmaster (port 5432), active transaction numbers seem to be around 22500, which itself seems suspiciously low, unless this is a very low-traffic server or has recently been initdb'd. Test method: [tlane@sterno data]$ PGPORT=5432 psql biobase Welcome to psql, the PostgreSQL interactive terminal. biobase=# create table foofoofoo (f1 int); CREATE biobase=# select oid,xmin,* from pg_class where relname = 'foofoofoo'; oid | xmin | relname | reltype | relowner | relam | relpages | reltuples | rellongrelid | relhasindex | relisshared | relkind | relnatts | relchecks | reltriggers | relukeys | relfkeys | relrefs | relhaspkey | relhasrules | relacl -----------+-------+-----------+---------+----------+-------+----------+-----------+--------------+-------------+-------------+---------+----------+-----------+-------------+----------+----------+---------+------------+-------------+-------- 327912528 | 22529 | foofoofoo | 0 | 10002 | 0 | 10 | 1000 | 0 | f | f | r | 1 | 0 | 0 | 0 | 0 | 0 | f | f | (1 row) biobase=# drop table foofoofoo; DROP >>>>>>>>>>>>> Message 2 > If it does happen again, is there any way I can reset the reset the > tarnasction number manually? I'm doing a pg_dumpall every six hours, but > it's sort of time consuming to have to do a complete install/dump/restore > when the DB goes haywire llike this. You can stop the postmaster and hack on pg_variable manually. The only trick is to know what to set the value to. Probably (byte length of pg_log / 4) - 1 would work. <<<<<<<<< end quoted messages We never solved this one -- we reverted Postgres to 7.0-2 (the -2 is the RPM), and have not had a problem with that installation since. We expect to have a free dual processor machine sometime around the end of the year, but we really cannot risk further experimentation on this issue with any of or current dual-processer systems which are all mission critical, and all have utilizations that are already as high as we'd like them to be. We hope to test more aggressively once that spare machine is free, but right now we don't have the hardware or manpower necessary. We have 7.0.2 running in several single processor machines with no problem. But the SMP does not seem to play well with 7.0.2, or at least with our build of 7.0.2, which might have different optimizations that Lamar's RPM, I suppose. -- Karl DeBisschop kdebisschop@alert.infoplease.com Learning Network/Information Please http://www.infoplease.com Netsaint Plugin Developer kdebisschop@users.sourceforge.net
В списке pgsql-general по дате отправления: