ReadBuffer(P_NEW) versus valid buffers

Поиск

Список

Период

Сортировка

От	Tom Lane
Тема	ReadBuffer(P_NEW) versus valid buffers
Дата	23 сентября 2006 г. 17:35:39
Msg-id	26202.1159032931@sss.pgh.pa.us обсуждение исходный текст
Ответы	Re: ReadBuffer(P_NEW) versus valid buffers (Mark Kirkwood <markir@paradise.net.nz>)
Список	pgsql-hackers

Дерево обсуждения

Some off-list investigation of Dan Kavan's data loss problem,
http://archives.postgresql.org/pgsql-admin/2006-09/msg00092.php
has led to the conclusion that it seems to be a kernel bug.
The smoking gun is this strace excerpt:

> lseek(10, 0, SEEK_END)                  = 913072128
> write(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
> lseek(10, 0, SEEK_END)                  = 913080320
> write(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
> lseek(10, 0, SEEK_END)                  = 913088512
> write(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
> lseek(10, 0, SEEK_END)                  = 913088512
> write(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192
> lseek(10, 0, SEEK_END)                  = 913096704
> write(10, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 8192) = 8192

Note the lseek results --- surely each successive result ought to be 8K
more than the one before, but the fourth in this extract seems to have
forgotten about the immediately preceding write().

These calls are coming from successive ReadBuffer(rel, P_NEW)
operations, which should just extend the file each time.  But
the incorrect lseek result is causing ReadBuffer to re-find
the buffer we had just finished filling with a page of data,
and that leads it to this conclusion:
       /*        * We get here only in the corner case where we are trying to extend        * the relation but we found
apre-existing buffer marked BM_VALID.        * (This can happen because mdread doesn't complain about reads        *
beyondEOF --- which is arguably bogus, but changing it seems        * tricky.)  We *must* do smgrextend before
succeeding,else the        * page will not be reserved by the kernel, and the next P_NEW call        * will decide to
returnthe same page.  Clear the BM_VALID bit,        * do the StartBufferIO call that BufferAlloc didn't, and proceed.
     */
 

So ReadBuffer without hesitation zeroes out the page of data we just
filled, and returns it for re-filling.  There went some tuples :-(

Although this is clearly Not Our Bug, it's annoying that ReadBuffer
falls into the trap so easily instead of complaining.  I'm still
disinclined to try to change the behavior of mdread(), but what I am
considering doing is adding a check here to error out if not PageIsNew.
AFAICS, if we do find a buffer for a page supposedly past EOF, it should
be zero-filled because that's what mdread returns in this case.  So this
change would prevent Dan's silent-overwrite scenario without changing the
behavior for any legitimate case.

Thoughts, problems, better ideas?
        regards, tom lane

В списке pgsql-hackers по дате отправления:

Предыдущее

От: Peter Eisentraut
Дата: 23 сентября 2006 г., 16:53:56
Сообщение: Re: pgsql: We're going to have to spell dotless i as plain i, because

Следующее

От: Tom Lane
Дата: 23 сентября 2006 г., 18:35:31
Сообщение: Re: Fwd: Is the fsync() fake on FreeBSD6.1?

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

ReadBuffer(P_NEW) versus valid buffers

Предыдущее

Следующее