RE: [CORE] WAL & RC1 status
От | Vadim Mikheev |
---|---|
Тема | RE: [CORE] WAL & RC1 status |
Дата | |
Msg-id | 386541213.983645166958.JavaMail.root@web274-ec обсуждение исходный текст |
Список | pgsql-hackers |
> I've reported the major problems to the mailing lists > but gotten almost no feedback about what to do. I can't comment without access to code -:( > commit: 2001-02-26 17:19:57 > 0/0059996C: prv 0/00599948; xprv 0/00000000; xid 0; > RM 0 info 00 len 32 > checkpoint: redo 0/0059996C; undo 0/00000000; sui 29; > nextxid 18903; nextoid 35195; online > -- this is the last normal-looking checkpoint record. > -- Judging from the commit timestamps surrounding prior > -- checkpoints, checkpoints were happening every five > -- minutes approximately on the 5-minute mark, so You can't count on this: postmaster runs checkpoint "maker" in 5 minutes *after* prev checkpoint was created, not from the moment "maker" started. And checkpoint can take *minutes*. > -- this one happened about 17:20. > -- (There really should be a timestamp > -- in the checkpoint records...) Agreed. > commit: 2001-02-26 17:26:02 > ReadRecord: record with zero len at 0/005A4B4C > -- My dump program is unhappy here because the rest > -- of the page is zero. Given that there is a > -- continuation record at the start of the next > -- page, there certainly should have been record(s) > -- here. But it's worse than that: check the commit > -- timestamps and the xid numbers before and after the > -- discontinuity. Did time go backwards here? Commit timestamps are created *before* XLogInsert call, which can suspend backend for some time (in multi-user env). Random xid-s are also ok, generally. > -- Also notice the back-pointers in the first valid > -- record on the next page; they point not into the > -- zeroed space, which would suggest a mere failure > -- to write a buffer after filling it, but into the > -- middle of one of the valid records on the prior > -- page. It almost looks like page 5A6000 came from > -- a completely different run than page 5A4000. > Unexpected page info flags 0001 at offset 5A6000 > Skipping unexpected continuation record at offset 5A6000 > 0/005A6904: prv 0/005A48B4(?); xprv 0/005A48B4; xid 19047; ^^^^^^^^^^ ^^^^^^^^^^ Same. So, TX 19047 really inserted record at 0/005A48B4 position. > -- What's even nastier (and the immediate cause of > -- Scott's inability to restart) is that the pg_control > -- file's checkPoint pointer points to 0/005AF9F0, which > -- is *not* the location of this checkpoint, but of > -- the record after it. Well, well. Checkpoint position is taken from MyLastRecord - I wonder how could this internal var take "invalid" data from concurrent backend. Ok, we're leaving Krasnoyarsk in 8 hrs and should arrive SF Feb 5 ~ 10pm. Vadim ----------------------------------------------- FREE! The World's Best Email Address @email.com Reserve your name now at http://www.email.com
В списке pgsql-hackers по дате отправления: