Re: Dynamic Shared Memory stuff

Поиск
Список
Период
Сортировка
От Noah Misch
Тема Re: Dynamic Shared Memory stuff
Дата
Msg-id 20131210231253.GB1299924@tornado.leadboat.com
обсуждение исходный текст
Ответ на Re: Dynamic Shared Memory stuff  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Ответы Re: Dynamic Shared Memory stuff
Re: Dynamic Shared Memory stuff
Список pgsql-hackers
On Tue, Dec 10, 2013 at 07:50:20PM +0200, Heikki Linnakangas wrote:
> On 12/10/2013 07:27 PM, Noah Misch wrote:
> >On Thu, Dec 05, 2013 at 06:12:48PM +0200, Heikki Linnakangas wrote:
> >>>On Wed, Nov 20, 2013 at 8:32 AM, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> >>>>* As discussed in the "Something fishy happening on frogmouth" thread, I
> >>>>don't like the fact that the dynamic shared memory segments will be
> >>>>permanently leaked if you kill -9 postmaster and destroy the data directory.

> >>I really think we need to do something about it. To use your earlier
> >>example of parallel sort, it's not acceptable to permanently leak a 512
> >>GB segment on a system with 1 TB of RAM.
> >
> >I don't.  Erasing your data directory after an unclean shutdown voids any
> >expectations for a thorough, automatic release of system resources.  Don't do
> >that.  The next time some new use of a persistent resource violates your hope
> >for this scenario, there may be no remedy.
> 
> Well, the point of erasing the data directory is to release system
> resources. I would normally expect "killall -9 <process>; rm -rf
> <data dir>" to thorougly get rid of the running program and all the
> resources. It's surprising enough that the regular shared memory
> segment is left behind

Your expectation is misplaced.  Processes and files are simply not the only
persistent system resources of interest.

> but at least that one gets cleaned up when
> you start a new server (on same port).

In the most-typical case, yes.  In rare cases involving multiple postmasters
starting and stopping, the successor to the erased data directory will not
clean up the sysv segment.

> Let's not add more cases like that, if we can avoid it.

Only if we can avoid it for a modicum of effort and feature compromise.
You're asking for PostgreSQL to reshape its use of persistent resources so you
can throw around "killall -9 postgres; rm -rf $PGDATA" without so much as a
memory leak.  That use case, not PostgreSQL, has the defect here.

> BTW, what if the data directory is seriously borked, and the server
> won't start? Sure, don't do that, but it would be nice to have a way
> to recover if you do anyway. (docs?)

If something is corrupting your data directory in an open-ended manner, you
have bigger problems than a memory leak until reboot.  Recovering DSM happens
before we read the control file, so the damage would need to fall among a
short list of files for this to happen (bugs excluded).  Nonetheless, I don't
object to documenting the varieties of system resources that PostgreSQL may
reserve and referencing the OS facilities for inspecting them.

Are you actually using PostgreSQL this way: frequent "killall -9 postgres; rm
-rf $PGDATA" after arbitrarily-bad $PGDATA corruption?  Some automated fault
injection test rig, perhaps?

> >>One idea is to create the shared memory object with shm_open, and wait
> >>until all the worker processes that need it have attached to it. Then,
> >>shm_unlink() it, before using it for anything. That way the segment will
> >>be automatically released once all the processes close() it, or die. In
> >>particular, kill -9 will release it. (This is a variant of my earlier
> >>idea to create a small number of anonymous shared memory file
> >>descriptors in postmaster startup with shm_open(), and pass them down to
> >>child processes with fork()). I think you could use that approach with
> >>SysV shared memory as well, by destroying the segment with
> >>sgmget(IPC_RMID) immediately after all processes have attached to it.
> >
> >That leaves a window in which we still leak the segment,
> 
> A small window is better than a large one.

Yes.

> Another refinement is to wait for all the processes to attach before
> setting the segment's size with ftruncate(). That way, when the
> window is open for leaking the segment, it's still 0-sized so
> leaking it is not a big deal.
> 
> >and it is less
> >general: not every use of DSM is conducive to having all processes attach in a
> >short span of time.
> 
> Let's cross that bridge when we get there. AFAICS it fits all the
> use cases discussed this far.

It does fit the use cases discussed thus far.

nm

-- 
Noah Misch
EnterpriseDB                                 http://www.enterprisedb.com



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: pg_stat_statements fingerprinting logic and ArrayExpr
Следующее
От: Andres Freund
Дата:
Сообщение: Re: Dynamic Shared Memory stuff