Обсуждение: Re: [GENERAL] Removing pgsql_tmp files

Поиск
Список
Период
Сортировка

Re: [GENERAL] Removing pgsql_tmp files

От
Tom Lane
Дата:
Michael Glaesemann <michael.glaesemann@myyearbook.com> writes:
> On Nov 8, 2010, at 16:03 , Tom Lane wrote:
>> That's very peculiar.  Do you keep query logs?  It would be useful to
>> try to correlate the temp files' PIDs and timestamps with the specific
>> queries that must have created them.

> We don't log all of them, but I checked those we did. It looks like it's happening when queries are timing out. I'm
seeingthis pattern pretty consistently: 

> temporary file + query
> canceling statement due to statement timeout
> second temp file

> Here's a sample:

> pid         | 877
> sess_id     | 4ccf7257.36d
> sess_line   | 16
> filename    | pgsql_tmp877.0
> accessed_at | 2010-09-15 12:14:45-04
> modified_at | 2010-11-01 22:37:00-04
> logged_at   | 2010-11-01 22:37:01.412-04
> error       | LOG
> sql_state   | 00000
> message     | temporary file: path "pg_tblspc/16384/pgsql_tmp/pgsql_tmp877.0", size 87184416

Oh, so you've got log_temp_files enabled?

Hmm.  If you look at FileClose() in fd.c, you'll discover that that
"temporary file" log message is emitted immediately before unlink'ing
the file.  It looks pretty safe ... but, scratching around, I notice
that there's a CHECK_FOR_INTERRUPTS at the end of ereport().  So a
cancel that was caught by that exact CHECK_FOR_INTERRUPTS call could
provoke this symptom.  The window for this is larger than it might seem
since the CHECK_FOR_INTERRUPTS could be responding to an interrupt that
came in sometime before that.

I think we need to re-order the operations there to ensure that the
unlink will still happen if the ereport gets interrupted.

            regards, tom lane

Re: [GENERAL] Removing pgsql_tmp files

От
Alvaro Herrera
Дата:
Excerpts from Tom Lane's message of lun nov 08 22:29:28 -0300 2010:

> Hmm.  If you look at FileClose() in fd.c, you'll discover that that
> "temporary file" log message is emitted immediately before unlink'ing
> the file.  It looks pretty safe ... but, scratching around, I notice
> that there's a CHECK_FOR_INTERRUPTS at the end of ereport().  So a
> cancel that was caught by that exact CHECK_FOR_INTERRUPTS call could
> provoke this symptom.  The window for this is larger than it might seem
> since the CHECK_FOR_INTERRUPTS could be responding to an interrupt that
> came in sometime before that.
>
> I think we need to re-order the operations there to ensure that the
> unlink will still happen if the ereport gets interrupted.

Would it work to put the removal inside a PG_CATCH block?

--
Álvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

Re: [GENERAL] Removing pgsql_tmp files

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Excerpts from Tom Lane's message of lun nov 08 22:29:28 -0300 2010:
>> I think we need to re-order the operations there to ensure that the
>> unlink will still happen if the ereport gets interrupted.

> Would it work to put the removal inside a PG_CATCH block?

Well, that still begs the question of what to do exactly.  After some
thought I believe the attached is the best fix.

            regards, tom lane

diff --git a/src/backend/storage/file/fd.c b/src/backend/storage/file/fd.c
index d9ab5e1ea2452131c2778acca6ad913ad4b333af..fd5ec7805fdcaedf73c3fa6aaa1a35970cf8e6db 100644
*** a/src/backend/storage/file/fd.c
--- b/src/backend/storage/file/fd.c
*************** void
*** 1032,1038 ****
  FileClose(File file)
  {
      Vfd           *vfdP;
-     struct stat filestats;

      Assert(FileIsValid(file));

--- 1032,1037 ----
*************** FileClose(File file)
*** 1055,1069 ****
      }

      /*
!      * Delete the file if it was temporary
       */
      if (vfdP->fdstate & FD_TEMPORARY)
      {
!         /* reset flag so that die() interrupt won't cause problems */
          vfdP->fdstate &= ~FD_TEMPORARY;
          if (log_temp_files >= 0)
          {
!             if (stat(vfdP->fileName, &filestats) == 0)
              {
                  if ((filestats.st_size / 1024) >= log_temp_files)
                      ereport(LOG,
--- 1054,1089 ----
      }

      /*
!      * Delete the file if it was temporary, and make a log entry if wanted
       */
      if (vfdP->fdstate & FD_TEMPORARY)
      {
!         /*
!          * If we get an error, as could happen within the ereport/elog calls,
!          * we'll come right back here during transaction abort.  Reset the
!          * flag to ensure that we can't get into an infinite loop.  This code
!          * is arranged to ensure that the worst-case consequence is failing
!          * to emit log message(s), not failing to attempt the unlink.
!          */
          vfdP->fdstate &= ~FD_TEMPORARY;
+
          if (log_temp_files >= 0)
          {
!             struct stat filestats;
!             int        stat_errno;
!
!             /* first try the stat() */
!             if (stat(vfdP->fileName, &filestats))
!                 stat_errno = errno;
!             else
!                 stat_errno = 0;
!
!             /* in any case do the unlink */
!             if (unlink(vfdP->fileName))
!                 elog(LOG, "could not unlink file \"%s\": %m", vfdP->fileName);
!
!             /* and last report the stat results */
!             if (stat_errno == 0)
              {
                  if ((filestats.st_size / 1024) >= log_temp_files)
                      ereport(LOG,
*************** FileClose(File file)
*** 1072,1081 ****
                                      (unsigned long) filestats.st_size)));
              }
              else
                  elog(LOG, "could not stat file \"%s\": %m", vfdP->fileName);
          }
-         if (unlink(vfdP->fileName))
-             elog(LOG, "could not unlink file \"%s\": %m", vfdP->fileName);
      }

      /* Unregister it from the resource owner */
--- 1092,1108 ----
                                      (unsigned long) filestats.st_size)));
              }
              else
+             {
+                 errno = stat_errno;
                  elog(LOG, "could not stat file \"%s\": %m", vfdP->fileName);
+             }
+         }
+         else
+         {
+             /* easy case, just do the unlink */
+             if (unlink(vfdP->fileName))
+                 elog(LOG, "could not unlink file \"%s\": %m", vfdP->fileName);
          }
      }

      /* Unregister it from the resource owner */