RE: [HACKERS] Recovery on incomplete write

Поиск
Список
Период
Сортировка
От Hiroshi Inoue
Тема RE: [HACKERS] Recovery on incomplete write
Дата
Msg-id 000701bf0f13$9c0790c0$2801007e@cadzone.tpf.co.jp
обсуждение исходный текст
Список pgsql-hackers
>
> > -----Original Message-----
> > From: Bruce Momjian [mailto:maillist@candle.pha.pa.us]
> > Sent: Tuesday, September 28, 1999 11:54 PM
> > To: Tom Lane
> > Cc: Hiroshi Inoue; pgsql-hackers
> > Subject: Re: [HACKERS] Recovery on incomplete write
> >
> >
> > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > > > I have wondered that md.c handles incomplete block(page)s
> > > > correctly.
> > > > Am I mistaken ?
> > >
> > > I think you are right, and there may be some other trouble
> spots in that
> > > file too.  I remember thinking that the code depended heavily on never
> > > having a partial block at the end of the file.
> > >
> > > But is it worth fixing?  The only way I can see for the file length
> > > to become funny is if we run out of disk space part way
> through writing
> > > a page, which seems unlikely...
> > >
> >
> > That is how he got started, the TODO item about running out of disk
> > space causing corrupted databases.  I think it needs a fix, if we can.
> >
>
> Maybe it isn't so difficult to fix.
> I would provide a patch.
>

Here is a patch.

1) mdnblocks() ignores a partial block at the end of relation files.
2) mdread() ignores a partial block of block number 0.
3) mdextend() adjusts its position to a multiple of BLCKSZ   before writing.
4) mdextend() truncates extra bytes in case of incomplete write.

If there's no objection,I would commit this change to the current
tree.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp

*** storage/smgr/md.c.orig    Thu Sep 30 10:50:58 1999
--- storage/smgr/md.c    Tue Oct  5 13:30:55 1999
***************
*** 233,239 **** int mdextend(Relation reln, char *buffer) {
!     long        pos;     int            nblocks;     MdfdVec    *v;

--- 233,239 ---- int mdextend(Relation reln, char *buffer) {
!     long        pos, nbytes;     int            nblocks;     MdfdVec    *v;

***************
*** 243,250 ****     if ((pos = FileSeek(v->mdfd_vfd, 0L, SEEK_END)) < 0)         return SM_FAIL;

!     if (FileWrite(v->mdfd_vfd, buffer, BLCKSZ) != BLCKSZ)         return SM_FAIL;
     /* remember that we did a write, so we can sync at xact commit */     v->mdfd_flags |= MDFD_DIRTY;
--- 243,264 ----     if ((pos = FileSeek(v->mdfd_vfd, 0L, SEEK_END)) < 0)         return SM_FAIL;

!     if (pos % BLCKSZ != 0) /* the last block is incomplete */
!     {
!         pos = BLCKSZ * (long)(pos / BLCKSZ);
!         if (FileSeek(v->mdfd_vfd, pos, SEEK_SET) < 0)
!             return SM_FAIL;
!     }
!
!     if ((nbytes = FileWrite(v->mdfd_vfd, buffer, BLCKSZ)) != BLCKSZ)
!     {
!         if (nbytes > 0)
!         {
!             FileTruncate(v->mdfd_vfd, pos);
!             FileSeek(v->mdfd_vfd, pos, SEEK_SET);
!         }         return SM_FAIL;
+     }
     /* remember that we did a write, so we can sync at xact commit */     v->mdfd_flags |= MDFD_DIRTY;
***************
*** 432,437 ****
--- 446,453 ----     {         if (nbytes == 0)             MemSet(buffer, 0, BLCKSZ);
+         else if (blocknum == 0 && nbytes > 0 && mdnblocks(reln) == 0)
+             MemSet(buffer, 0, BLCKSZ);         else             status = SM_FAIL;     }
***************
*** 1067,1072 **** {     long        len;

!     len = FileSeek(file, 0L, SEEK_END) - 1;
!     return (BlockNumber) ((len < 0) ? 0 : 1 + len / blcksz); }
--- 1083,1088 ---- {     long        len;

!     len = FileSeek(file, 0L, SEEK_END);
!     return (BlockNumber) (len / blcksz); }




В списке pgsql-hackers по дате отправления:

Предыдущее
От: Jose Antonio Cotelo lema
Дата:
Сообщение: User types using large objects. Is it really possible?
Следующее
От: "Hiroshi Inoue"
Дата:
Сообщение: Questions about bufmgr