Re: POSIX shared memory redux

Поиск
Список
Период
Сортировка
От A.M.
Тема Re: POSIX shared memory redux
Дата
Msg-id BC618525-BB86-41BE-B8B4-D22419C99C45@themactionfaction.com
обсуждение исходный текст
Ответ на Re: POSIX shared memory redux  (Robert Haas <robertmhaas@gmail.com>)
Ответы Re: POSIX shared memory redux  (Martijn van Oosterhout <kleptog@svana.org>)
Re: POSIX shared memory redux  (Robert Haas <robertmhaas@gmail.com>)
Список pgsql-hackers
On Apr 13, 2011, at 9:30 PM, Robert Haas wrote:

> On Wed, Apr 13, 2011 at 6:11 PM, A.M. <agentm@themactionfaction.com> wrote:
>>> I don't see why we need to get rid of SysV shared memory; needing less
>>> of it seems just as good.
>>
>> 1. As long one keeps SysV shared memory around, the postgresql project has to maintain the annoying
platform-specificdocument on how to configure the poorly named kernel parameters. If the SysV region is very small,
thatmeans I can run more postgresql instances within the same kernel limits, but one can still hit the limits. My patch
allowsthe postgresql project to delete that page and the hassles with it. 
>>
>> 2. My patch proves that SysV is wholly unnecessary. Are you attached to it? (Pun intended.)
>
> With all due respect, I think this is an unproductive conversation.
> Your patch proves that SysV is wholly unnecessary only if we also
> agree that fcntl() locking is just as reliable as the nattch
> interlock, and Tom and I are trying to explain why we don't believe
> that's the case.  Saying that we're just wrong without responding to
> our points substantively doesn't move the conversation forward.

Sorry- it wasn't meant to be an attack- just a dumb pun. I am trying to argue that, even if the fcntl is unreliable,
thestartup procedure is just as reliable as it is now. The reasons being: 

1) the SysV nattch method's primary purpose is to protect the shmem region. This is no longer necessary in my patch
becausethe shared memory in unlinked immediately after creation, so only the initial postmaster and its children have
access.

2) the standard postgresql lock file remains the same

Furthermore, there is indeed a case where the SysV nattch cannot work while the fcntl locking can indeed catch: if two
separatemachines have a postgresql data directory mounted over NFS, postgresql will currently allow both machines to
starta postmaster in that directory because the SysV nattch check fails and then the pid in the lock file is the pid on
thefirst machine, so postgresql will say "starting anyway". With fcntl locking, this can be fixed. SysV only has
presenceon one kernel. 


>
> In case it's not clear, here again is what we're concerned about: A
> System V shm *cannot* be removed until nobody is attached to it.  A
> lock file can be removed, or the lock can be accidentally released by
> the apparently innocuous operation of closing a file descriptor.
>
>> Both you and Tom have somehow assumed that the patch alters current postgresql behavior. In fact, the opposite is
true.I haven't changed any of the existing behavior. The "robust" behavior remains. I merely added fcntl interlocking
ontop of the lock file to replace the SysV shmem check. 
>
> This seems contradictory.  If you replaced the SysV shmem check, then
> it's not there, which means you altered the behavior.

From what I understood, the primary purpose of the SysV check was to protect the shared memory from multiple stompers.
Theinterlock was a neat side-effect.  

The lock file contents are currently important to get the pid of a potential, conflicting postmaster. With the fcntl
API,we can return a live conflicting PID (whether a postmaster or a stuck child), so that's an improvement. This could
beused, for example, for STONITH, to reliably kill a dying replication clone- just loop on the pids returned from the
lock.

Even if the fcntl check passes, the pid in the lock file is checked, so the lock file behavior remains the same.

If you were to implement a daemon with a shared data directory but no shared memory, how would implement the interlock?
Wouldyou still insist on SysV shmem? Unix daemons generally rely on lock files alone. Perhaps there is a different API
onwhich we can agree. 

Cheers,
M

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Noah Misch
Дата:
Сообщение: Re: pg_dump --binary-upgrade vs. ALTER TYPE ... DROP ATTRIBUTE
Следующее
От: "A.M."
Дата:
Сообщение: Re: POSIX shared memory redux