On Apr 14, 2011, at 8:22 AM, Florian Weimer wrote:
> * Tom Lane:
>
>> Well, the fundamental point is that "ignoring NFS" is not the real
>> world. We can't tell people not to put data directories on NFS,
>> and even if we did tell them not to, they'd still do it. And NFS
>> locking is not trustworthy, because the remote lock daemon can crash
>> and restart (forgetting everything it ever knew) while your own machine
>> and the postmaster remain blissfully awake.
>
> Is this still the case with NFSv4? Does the local daemon still keep
> the lock state?
The lock handling has been fixed in NFSv4.
http://nfs.sourceforge.net/
"NFS Version 4 introduces support for byte-range locking and share reservation. Locking in NFS Version 4 is
lease-based,so an NFS Version 4 client must maintain contact with an NFS Version 4 server to continue extending its
openand lock leases."
http://linux.die.net/man/2/flock
"flock(2) does not lock files over NFS. Use fcntl(2) instead: that does work over NFS, given a sufficiently recent
versionof Linux and a server which supports locking."
I would need some more time to dig up what "recent version of Linux" specifies, but NFSv4 is likely required.
>
>> None of this is to say that an fcntl lock might not be a useful addition
>> to what we do already. It is to say that fcntl can't just replace what
>> we do already, because there are real-world failure cases that the
>> current solution handles and fcntl alone wouldn't.
>
> If it requires NFS misbehavior (possibly in an older version), and you
> have to start postmasters on separate nodes (which you normally
> wouldn't do), doesn't this make it increasingly unlikely that it's
> going to be triggered in the wild?
With the patch I offer, it would be possible to use shared storage and failover postgresql nodes on different machines
overNFS. (The second postmaster blocks and waits for the lock to be released.) Obviously, such as a setup isn't as
strongas using replication, but given a sufficiently fail-safe shared storage setup, it could be made reliable.
Cheers,
M