Mac OS X: system shutdown prevents checkpoint

Поиск
Список
Период
Сортировка
От Tom Lane
Тема Mac OS X: system shutdown prevents checkpoint
Дата
Msg-id 17395.1020144386@sss.pgh.pa.us
обсуждение исходный текст
Ответы Re: Mac OS X: system shutdown prevents checkpoint  ("Christopher Kings-Lynne" <chriskl@familyhealth.com.au>)
Re: Mac OS X: system shutdown prevents checkpoint  (Peter Bierman <bierman@apple.com>)
Список pgsql-hackers
I've been looking into Francois Suter's recent reports of Postgres not
shutting down cleanly on Mac OS X 10.1.  I find that it's quite
reproducible.  If you tell the system to shut down in the normal
fashion (eg, pick "Shut Down" from the Apple menu), the postmaster
does not terminate, leading to WAL recovery upon restart --- or
even worse, failure to restart if the postmaster PID recorded in the
lockfile happens to get assigned to some other daemon.

Observe the normal trace of postmaster shutdown (running with -d4,
logging of timestamps and PIDs enabled):

2002-04-30 00:08:30 [315]    DEBUG:  pmdie 15
2002-04-30 00:08:30 [315]    DEBUG:  smart shutdown request
2002-04-30 00:08:30 [331]    DEBUG:  shutting down
2002-04-30 00:08:32 [331]    DEBUG:  database system is shut down
2002-04-30 00:08:32 [331]    DEBUG:  proc_exit(0)
2002-04-30 00:08:32 [331]    DEBUG:  shmem_exit(0)
2002-04-30 00:08:32 [331]    DEBUG:  exit(0)
2002-04-30 00:08:32 [315]    DEBUG:  reaping dead processes
2002-04-30 00:08:32 [315]    DEBUG:  proc_exit(0)
2002-04-30 00:08:32 [315]    DEBUG:  shmem_exit(0)
2002-04-30 00:08:32 [315]    DEBUG:  exit(0)

The postmaster (here PID 315) forks a subprocess to flush shared buffers
and checkpoint the WAL log.  When the subprocess exits, the postmaster
removes its lockfile and shuts down.  The subprocess takes a minimum of
2 seconds because there's a sleep(2) in the checkpoint fsync code.

Now here's what I see in the case of shutting down the OS X system:

2002-04-30 00:25:35 [376]    DEBUG:  pmdie 15
2002-04-30 00:25:35 [376]    DEBUG:  smart shutdown request

... and nothing more.  Actual system shutdown (power down) occurred at
approximately 00:26:06 by my watch, over thirty seconds later than the
postmaster received SIGTERM.  So there was plenty of time to do the
checkpoint subprocess.  (Indeed, I believe that thirty seconds is the
grace period Darwin's init process allows SIGTERM'd processes before
giving up and hard-killing them.  So the system was actually sitting and
waiting for the postmaster.)

What we appear to have here is that the kernel is not allowing the
postmaster to fork a checkpoint subprocess.  But there's no indication
that the postmaster got a fork() error return, either.  Seems like it's
just hung.

Does this ring a bell with anyone?  Is it an OSX bug, or a "feature";
and if the latter, how can we work around it?
        regards, tom lane


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Civility of core/hackers group
Следующее
От: "Christopher Kings-Lynne"
Дата:
Сообщение: Re: Mac OS X: system shutdown prevents checkpoint