Re: Disaster!

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: Disaster!
Дата
Msg-id 4221.1074892864@sss.pgh.pa.us
обсуждение исходный текст
Ответ на Re: Disaster!  (Martín Marqués<martin@bugs.unl.edu.ar>)
Ответы Re: Disaster!  (Alvaro Herrera <alvherre@dcc.uchile.cl>)
Список pgsql-hackers
Martín Marqués <martin@bugs.unl.edu.ar> writes:
> Tom, could you give a small insight on what occurred here, why those
> 8k of zeros fixed it, and what is a "WAL replay"?

I think what happened is that there was insufficient space to write out
a new page of the clog (transaction commit) file.  This would result in
a database panic, which is fine --- you're not gonna get much done
anyway if you are down to zero free disk space.  However, after Chris
freed up space, the system needed to replay the WAL from the last
checkpoint to ensure consistency.  The WAL entries evidently included
references to transactions whose commit bits were in the unwritten page.
Now there would also be WAL entries recording those commits, so once the
replay was complete everything would be cool.  But the clog access code
evidently got confused by being asked to read a page that didn't exist
in the file.  I'm not sure yet how that sequence of events occurred,
which is why I asked Chris for a stack trace.

Adding a page of zeroes fixed it by eliminating the read error
condition.  It was okay to do so because zeroes is the correct initial
state for a clog page (all transactions in it "still in progress").
After WAL replay, any completed transactions would be updated in the page.
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Disaster!
Следующее
От: Alvaro Herrera
Дата:
Сообщение: Re: Disaster!