Re: [HACKERS] fd,c just Assert()s that lseek() succeeds

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Re: [HACKERS] fd,c just Assert()s that lseek() succeeds
Дата
Msg-id 2982.1487617365@sss.pgh.pa.us
обсуждение исходный текст
Ответ на [HACKERS] fd,c just Assert()s that lseek() succeeds  (Tom Lane <tgl@sss.pgh.pa.us>)
Список pgsql-hackers
I wrote:
> I noticed while researching bug #14555 that fd.c contains two separate
> cases like
>     vfdP->seekPos = lseek(vfdP->fd, (off_t) 0, SEEK_CUR);
>     Assert(vfdP->seekPos != (off_t) -1);
> This seems, um, unwise.  It might somehow fail to fail in production
> builds, because elsewhere it's assumed that -1 means FileUnknownPos,
> but it seems like reporting the error would be a better plan.

I looked at things more closely and decided that this was really pretty
seriously broken.  Attached is a patch that attempts to spruce things up.

In LruDelete, it's more or less okay to not throw an error, mainly because
that's likely to get called during error abort and so throwing an error
would probably just lead to an infinite loop.  But by the same token,
throwing an error from the close() right after that is ill-advised, not to
mention that it would've left the LRU state corrupted since we'd already
unlinked the VFD.  I also noticed that really, most of the time, we should
know the current seek position and it shouldn't be necessary to do an
lseek here at all.  In the attached, if we don't have a seek position and
an lseek attempt doesn't give us one, we'll close the file but then
subsequent re-open attempts will fail (except in the somewhat-unlikely
case that a FileSeek(SEEK_SET) call comes between and allows us to
re-establish a known seek position).

Meanwhile, having an Assert instead of an honest test in LruInsert is
really dangerous: if that lseek failed, a subsequent read or write would
read or write from the start of the file, not where the caller expected,
leading to data corruption.

I also fixed a number of other places that were being sloppy about
behaving correctly when the seekPos is unknown.

Also, I changed FileSeek to return -1 with EINVAL for the cases where it
detects a bad offset, rather than throwing a hard elog(ERROR).  It seemed
pretty inconsistent that some bad-offset cases would get a failure return
while others got elog(ERROR).  It was missing an offset validity check for
the SEEK_CUR case on a closed file, too.

I'm inclined to treat this as a bug fix and back-patch it.  lseek()
failure on a valid FD is certainly unlikely, but if it did happen our
behavior would be pretty bad.  I'm also suspicious that some of these bugs
could be exposed even without an lseek() failure, because of the code in
FileRead and FileWrite that will reset seekPos to "unknown" after an
error.

            regards, tom lane

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index ce4bd0f..1065bc1 100644
*** a/src/backend/storage/file/fd.c
--- b/src/backend/storage/file/fd.c
*************** int            max_safe_fds = 32;    /* default if n
*** 160,166 ****
--- 160,173 ----

  #define FileIsNotOpen(file) (VfdCache[file].fd == VFD_CLOSED)

+ /*
+  * Note: a VFD's seekPos is normally always valid, but if for some reason
+  * an lseek() fails, it might become set to FileUnknownPos.  We can struggle
+  * along without knowing the seek position in many cases, but in some places
+  * we have to fail if we don't have it.
+  */
  #define FileUnknownPos ((off_t) -1)
+ #define FilePosIsUnknown(pos) ((pos) < 0)

  /* these are the assigned bits in fdstate below: */
  #define FD_TEMPORARY        (1 << 0)    /* T = delete when closed */
*************** typedef struct vfd
*** 174,180 ****
      File        nextFree;        /* link to next free VFD, if in freelist */
      File        lruMoreRecently;    /* doubly linked recency-of-use list */
      File        lruLessRecently;
!     off_t        seekPos;        /* current logical file position */
      off_t        fileSize;        /* current size of file (0 if not temporary) */
      char       *fileName;        /* name of file, or NULL for unused VFD */
      /* NB: fileName is malloc'd, and must be free'd when closing the VFD */
--- 181,187 ----
      File        nextFree;        /* link to next free VFD, if in freelist */
      File        lruMoreRecently;    /* doubly linked recency-of-use list */
      File        lruLessRecently;
!     off_t        seekPos;        /* current logical file position, or -1 */
      off_t        fileSize;        /* current size of file (0 if not temporary) */
      char       *fileName;        /* name of file, or NULL for unused VFD */
      /* NB: fileName is malloc'd, and must be free'd when closing the VFD */
*************** LruDelete(File file)
*** 967,985 ****

      vfdP = &VfdCache[file];

!     /* delete the vfd record from the LRU ring */
!     Delete(file);
!
!     /* save the seek position */
!     vfdP->seekPos = lseek(vfdP->fd, (off_t) 0, SEEK_CUR);
!     Assert(vfdP->seekPos != (off_t) -1);

      /* close the file */
      if (close(vfdP->fd))
!         elog(ERROR, "could not close file \"%s\": %m", vfdP->fileName);
!
!     --nfile;
      vfdP->fd = VFD_CLOSED;
  }

  static void
--- 974,1003 ----

      vfdP = &VfdCache[file];

!     /*
!      * Normally we should know the seek position, but if for some reason we
!      * have lost track of it, try again to get it.  If we still can't get it,
!      * we have a problem: we will be unable to restore the file seek position
!      * when and if the file is re-opened.  But we can't really throw an error
!      * and refuse to close the file, or activities such as transaction cleanup
!      * will be broken.
!      */
!     if (FilePosIsUnknown(vfdP->seekPos))
!     {
!         vfdP->seekPos = lseek(vfdP->fd, (off_t) 0, SEEK_CUR);
!         if (FilePosIsUnknown(vfdP->seekPos))
!             elog(LOG, "could not seek file \"%s\" before closing: %m",
!                  vfdP->fileName);
!     }

      /* close the file */
      if (close(vfdP->fd))
!         elog(LOG, "could not close file \"%s\": %m", vfdP->fileName);
      vfdP->fd = VFD_CLOSED;
+     --nfile;
+
+     /* delete the vfd record from the LRU ring */
+     Delete(file);
  }

  static void
*************** LruInsert(File file)
*** 1030,1051 ****
                                   vfdP->fileMode);
          if (vfdP->fd < 0)
          {
!             DO_DB(elog(LOG, "RE_OPEN FAILED: %d", errno));
              return -1;
          }
          else
          {
-             DO_DB(elog(LOG, "RE_OPEN SUCCESS"));
              ++nfile;
          }

!         /* seek to the right position */
          if (vfdP->seekPos != (off_t) 0)
          {
!             off_t returnValue PG_USED_FOR_ASSERTS_ONLY;

!             returnValue = lseek(vfdP->fd, vfdP->seekPos, SEEK_SET);
!             Assert(returnValue != (off_t) -1);
          }
      }

--- 1048,1086 ----
                                   vfdP->fileMode);
          if (vfdP->fd < 0)
          {
!             DO_DB(elog(LOG, "re-open failed: %m"));
              return -1;
          }
          else
          {
              ++nfile;
          }

!         /*
!          * Seek to the right position.  We need no special case for seekPos
!          * equal to FileUnknownPos, as lseek() will certainly reject that
!          * (thus completing the logic noted in LruDelete() that we will fail
!          * to re-open a file if we couldn't get its seek position before
!          * closing).
!          */
          if (vfdP->seekPos != (off_t) 0)
          {
!             if (lseek(vfdP->fd, vfdP->seekPos, SEEK_SET) < 0)
!             {
!                 /*
!                  * If we fail to restore the seek position, treat it like an
!                  * open() failure.
!                  */
!                 int            save_errno = errno;

!                 elog(LOG, "could not seek file \"%s\" after re-opening: %m",
!                      vfdP->fileName);
!                 (void) close(vfdP->fd);
!                 vfdP->fd = VFD_CLOSED;
!                 --nfile;
!                 errno = save_errno;
!                 return -1;
!             }
          }
      }

*************** int
*** 1566,1571 ****
--- 1601,1607 ----
  FileRead(File file, char *buffer, int amount)
  {
      int            returnCode;
+     Vfd           *vfdP;

      Assert(FileIsValid(file));

*************** FileRead(File file, char *buffer, int am
*** 1578,1588 ****
      if (returnCode < 0)
          return returnCode;

  retry:
!     returnCode = read(VfdCache[file].fd, buffer, amount);

      if (returnCode >= 0)
!         VfdCache[file].seekPos += returnCode;
      else
      {
          /*
--- 1614,1630 ----
      if (returnCode < 0)
          return returnCode;

+     vfdP = &VfdCache[file];
+
  retry:
!     returnCode = read(vfdP->fd, buffer, amount);

      if (returnCode >= 0)
!     {
!         /* if seekPos is unknown, leave it that way */
!         if (!FilePosIsUnknown(vfdP->seekPos))
!             vfdP->seekPos += returnCode;
!     }
      else
      {
          /*
*************** retry:
*** 1611,1617 ****
              goto retry;

          /* Trouble, so assume we don't know the file position anymore */
!         VfdCache[file].seekPos = FileUnknownPos;
      }

      return returnCode;
--- 1653,1659 ----
              goto retry;

          /* Trouble, so assume we don't know the file position anymore */
!         vfdP->seekPos = FileUnknownPos;
      }

      return returnCode;
*************** int
*** 1621,1626 ****
--- 1663,1669 ----
  FileWrite(File file, char *buffer, int amount)
  {
      int            returnCode;
+     Vfd           *vfdP;

      Assert(FileIsValid(file));

*************** FileWrite(File file, char *buffer, int a
*** 1633,1638 ****
--- 1676,1683 ----
      if (returnCode < 0)
          return returnCode;

+     vfdP = &VfdCache[file];
+
      /*
       * If enforcing temp_file_limit and it's a temp file, check to see if the
       * write would overrun temp_file_limit, and throw error if so.  Note: it's
*************** FileWrite(File file, char *buffer, int a
*** 1641,1655 ****
       * message if we do that.  All current callers would just throw error
       * immediately anyway, so this is safe at present.
       */
!     if (temp_file_limit >= 0 && (VfdCache[file].fdstate & FD_TEMPORARY))
      {
!         off_t        newPos = VfdCache[file].seekPos + amount;

!         if (newPos > VfdCache[file].fileSize)
          {
              uint64        newTotal = temporary_files_size;

!             newTotal += newPos - VfdCache[file].fileSize;
              if (newTotal > (uint64) temp_file_limit * (uint64) 1024)
                  ereport(ERROR,
                          (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
--- 1686,1713 ----
       * message if we do that.  All current callers would just throw error
       * immediately anyway, so this is safe at present.
       */
!     if (temp_file_limit >= 0 && (vfdP->fdstate & FD_TEMPORARY))
      {
!         off_t        newPos;

!         /*
!          * Normally we should know the seek position, but if for some reason
!          * we have lost track of it, try again to get it.  Here, it's fine to
!          * throw an error if we still can't get it.
!          */
!         if (FilePosIsUnknown(vfdP->seekPos))
!         {
!             vfdP->seekPos = lseek(vfdP->fd, (off_t) 0, SEEK_CUR);
!             if (FilePosIsUnknown(vfdP->seekPos))
!                 elog(ERROR, "could not seek file \"%s\": %m", vfdP->fileName);
!         }
!
!         newPos = vfdP->seekPos + amount;
!         if (newPos > vfdP->fileSize)
          {
              uint64        newTotal = temporary_files_size;

!             newTotal += newPos - vfdP->fileSize;
              if (newTotal > (uint64) temp_file_limit * (uint64) 1024)
                  ereport(ERROR,
                          (errcode(ERRCODE_CONFIGURATION_LIMIT_EXCEEDED),
*************** FileWrite(File file, char *buffer, int a
*** 1660,1666 ****

  retry:
      errno = 0;
!     returnCode = write(VfdCache[file].fd, buffer, amount);

      /* if write didn't set errno, assume problem is no disk space */
      if (returnCode != amount && errno == 0)
--- 1718,1724 ----

  retry:
      errno = 0;
!     returnCode = write(vfdP->fd, buffer, amount);

      /* if write didn't set errno, assume problem is no disk space */
      if (returnCode != amount && errno == 0)
*************** retry:
*** 1668,1684 ****

      if (returnCode >= 0)
      {
!         VfdCache[file].seekPos += returnCode;

!         /* maintain fileSize and temporary_files_size if it's a temp file */
!         if (VfdCache[file].fdstate & FD_TEMPORARY)
          {
!             off_t        newPos = VfdCache[file].seekPos;

!             if (newPos > VfdCache[file].fileSize)
              {
!                 temporary_files_size += newPos - VfdCache[file].fileSize;
!                 VfdCache[file].fileSize = newPos;
              }
          }
      }
--- 1726,1750 ----

      if (returnCode >= 0)
      {
!         /* if seekPos is unknown, leave it that way */
!         if (!FilePosIsUnknown(vfdP->seekPos))
!             vfdP->seekPos += returnCode;

!         /*
!          * Maintain fileSize and temporary_files_size if it's a temp file.
!          *
!          * If seekPos is -1 (unknown), this will do nothing; but we could only
!          * get here in that state if we're not enforcing temporary_files_size,
!          * so we don't care.
!          */
!         if (vfdP->fdstate & FD_TEMPORARY)
          {
!             off_t        newPos = vfdP->seekPos;

!             if (newPos > vfdP->fileSize)
              {
!                 temporary_files_size += newPos - vfdP->fileSize;
!                 vfdP->fileSize = newPos;
              }
          }
      }
*************** retry:
*** 1706,1712 ****
              goto retry;

          /* Trouble, so assume we don't know the file position anymore */
!         VfdCache[file].seekPos = FileUnknownPos;
      }

      return returnCode;
--- 1772,1778 ----
              goto retry;

          /* Trouble, so assume we don't know the file position anymore */
!         vfdP->seekPos = FileUnknownPos;
      }

      return returnCode;
*************** FileSync(File file)
*** 1732,1738 ****
  off_t
  FileSeek(File file, off_t offset, int whence)
  {
!     int            returnCode;

      Assert(FileIsValid(file));

--- 1798,1804 ----
  off_t
  FileSeek(File file, off_t offset, int whence)
  {
!     Vfd           *vfdP;

      Assert(FileIsValid(file));

*************** FileSeek(File file, off_t offset, int wh
*** 1741,1765 ****
                 (int64) VfdCache[file].seekPos,
                 (int64) offset, whence));

      if (FileIsNotOpen(file))
      {
          switch (whence)
          {
              case SEEK_SET:
                  if (offset < 0)
!                     elog(ERROR, "invalid seek offset: " INT64_FORMAT,
!                          (int64) offset);
!                 VfdCache[file].seekPos = offset;
                  break;
              case SEEK_CUR:
!                 VfdCache[file].seekPos += offset;
                  break;
              case SEEK_END:
!                 returnCode = FileAccess(file);
!                 if (returnCode < 0)
!                     return returnCode;
!                 VfdCache[file].seekPos = lseek(VfdCache[file].fd,
!                                                offset, whence);
                  break;
              default:
                  elog(ERROR, "invalid whence: %d", whence);
--- 1807,1839 ----
                 (int64) VfdCache[file].seekPos,
                 (int64) offset, whence));

+     vfdP = &VfdCache[file];
+
      if (FileIsNotOpen(file))
      {
          switch (whence)
          {
              case SEEK_SET:
                  if (offset < 0)
!                 {
!                     errno = EINVAL;
!                     return (off_t) -1;
!                 }
!                 vfdP->seekPos = offset;
                  break;
              case SEEK_CUR:
!                 if (FilePosIsUnknown(vfdP->seekPos) ||
!                     vfdP->seekPos + offset < 0)
!                 {
!                     errno = EINVAL;
!                     return (off_t) -1;
!                 }
!                 vfdP->seekPos += offset;
                  break;
              case SEEK_END:
!                 if (FileAccess(file) < 0)
!                     return (off_t) -1;
!                 vfdP->seekPos = lseek(vfdP->fd, offset, whence);
                  break;
              default:
                  elog(ERROR, "invalid whence: %d", whence);
*************** FileSeek(File file, off_t offset, int wh
*** 1772,1798 ****
          {
              case SEEK_SET:
                  if (offset < 0)
!                     elog(ERROR, "invalid seek offset: " INT64_FORMAT,
!                          (int64) offset);
!                 if (VfdCache[file].seekPos != offset)
!                     VfdCache[file].seekPos = lseek(VfdCache[file].fd,
!                                                    offset, whence);
                  break;
              case SEEK_CUR:
!                 if (offset != 0 || VfdCache[file].seekPos == FileUnknownPos)
!                     VfdCache[file].seekPos = lseek(VfdCache[file].fd,
!                                                    offset, whence);
                  break;
              case SEEK_END:
!                 VfdCache[file].seekPos = lseek(VfdCache[file].fd,
!                                                offset, whence);
                  break;
              default:
                  elog(ERROR, "invalid whence: %d", whence);
                  break;
          }
      }
!     return VfdCache[file].seekPos;
  }

  /*
--- 1846,1872 ----
          {
              case SEEK_SET:
                  if (offset < 0)
!                 {
!                     errno = EINVAL;
!                     return (off_t) -1;
!                 }
!                 if (vfdP->seekPos != offset)
!                     vfdP->seekPos = lseek(vfdP->fd, offset, whence);
                  break;
              case SEEK_CUR:
!                 if (offset != 0 || FilePosIsUnknown(vfdP->seekPos))
!                     vfdP->seekPos = lseek(vfdP->fd, offset, whence);
                  break;
              case SEEK_END:
!                 vfdP->seekPos = lseek(vfdP->fd, offset, whence);
                  break;
              default:
                  elog(ERROR, "invalid whence: %d", whence);
                  break;
          }
      }
!
!     return vfdP->seekPos;
  }

  /*

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Pavel Stehule
Дата:
Сообщение: [HACKERS] possible encoding issues with libxml2 functions
Следующее
От: Tom Lane
Дата:
Сообщение: Re: [HACKERS] Parallel bitmap heap scan