Обсуждение: Back-patch use of unnamed POSIX semaphores for Linux?

Поиск
Список
Период
Сортировка

Back-patch use of unnamed POSIX semaphores for Linux?

От
Tom Lane
Дата:
Just saw another report of what's probably systemd killing off Postgres'
SysV semaphores, as we've discussed previously at, eg,
https://www.postgresql.org/message-id/flat/57828C31.5060409%40gmail.com
Since the systemd people are generally impervious to suggestions that
they might be mistaken, I do not expect this problem to go away.

I think we should give serious consideration to back-patching commit
ecb0d20a9, which changed the default semaphore type to unnamed-POSIX
on Linux.  We've seen no problems in the buildfarm in the two months
that that's been in HEAD.  If we don't do this, we can expect to
continue seeing complaints of this sort until pre-v10 PG releases
fall out of use ... and I don't want to wait that long.

Commit ecb0d20a9 also changed the default for FreeBSD.  I'm not convinced
we should back-patch that part, because (a) unnamed-POSIX semas have
only been there since FreeBSD 9.0, which isn't that long ago, and (b)
the argument for switching is "it'll perform better" not "your server
will fail randomly without this change".

Comments?
        regards, tom lane



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Robert Haas
Дата:
On Tue, Dec 6, 2016 at 9:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Just saw another report of what's probably systemd killing off Postgres'
> SysV semaphores, as we've discussed previously at, eg,
> https://www.postgresql.org/message-id/flat/57828C31.5060409%40gmail.com
> Since the systemd people are generally impervious to suggestions that
> they might be mistaken, I do not expect this problem to go away.
>
> I think we should give serious consideration to back-patching commit
> ecb0d20a9, which changed the default semaphore type to unnamed-POSIX
> on Linux.  We've seen no problems in the buildfarm in the two months
> that that's been in HEAD.  If we don't do this, we can expect to
> continue seeing complaints of this sort until pre-v10 PG releases
> fall out of use ... and I don't want to wait that long.
>
> Commit ecb0d20a9 also changed the default for FreeBSD.  I'm not convinced
> we should back-patch that part, because (a) unnamed-POSIX semas have
> only been there since FreeBSD 9.0, which isn't that long ago, and (b)
> the argument for switching is "it'll perform better" not "your server
> will fail randomly without this change".
>
> Comments?

Urk.  That sounds like a scary thing to back-patch.  The fact that the
buildfarm has reported no problems is good as far as it goes, but user
environments can be expected to be considerably more diverse than the
buildfarm.  I wouldn't mind giving users the option to select unnamed
POSIX semas, but I don't think there's any guarantee that that's 100%
certain to work every place where the current implementation works -
and if not, then people will upgrade to the latest minor release and
everything will completely stop working.  Granted, that might not
happen, because maybe unnamed POSIX semas are one of those really
awesome operating system primitives that never has problems on any
system anywhere ever.  But I think it's pretty hard to be certain of
that.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Dec 6, 2016 at 9:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> I think we should give serious consideration to back-patching commit
>> ecb0d20a9, which changed the default semaphore type to unnamed-POSIX
>> on Linux.

> Urk.  That sounds like a scary thing to back-patch.

I don't deny that it's scary, but the alternative seems to be to be
rather badly broken on systemd-using distros for years to come.
That's pretty scary too.

> ... Granted, that might not
> happen, because maybe unnamed POSIX semas are one of those really
> awesome operating system primitives that never has problems on any
> system anywhere ever.  But I think it's pretty hard to be certain of
> that.

You're attacking a straw man.  I didn't propose changing our behavior
anywhere but Linux.  AFAIK, on that platform unnamed POSIX semaphores
are futexes, which have been a stable feature since 2003 according to
https://en.wikipedia.org/wiki/Futex#History.  Anybody who did need
to compile PG for use with a pre-2.6 kernel could override the default,
anyway.

Now, I did think of a problem we'd have to deal with, which is how
to avoid breaking ABI by changing sizeof(PGSemaphoreData).  I think
that's soluble though.
        regards, tom lane



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Michael Paquier
Дата:
On Wed, Dec 7, 2016 at 1:43 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> Urk.  That sounds like a scary thing to back-patch.  The fact that the
> buildfarm has reported no problems is good as far as it goes, but user
> environments can be expected to be considerably more diverse than the
> buildfarm.  I wouldn't mind giving users the option to select unnamed
> POSIX semas, but I don't think there's any guarantee that that's 100%
> certain to work every place where the current implementation works -
> and if not, then people will upgrade to the latest minor release and
> everything will completely stop working.

Potential risks involving minor upgrades are far higher than the risks
involved by systemd, so -1 for a backpatch seen from here.
-- 
Michael



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Tatsuo Ishii
Дата:
> Potential risks involving minor upgrades are far higher than the risks
> involved by systemd, so -1 for a backpatch seen from here.

As long as we would have a compile time switch to enable/disable the
back-patched feature, it seems it would be acceptable. In the worst
case, the back-patching could bring disasters, but in that case
packagers could turn off the switch and ship updated version of
packages.

Best regards,
--
Tatsuo Ishii
SRA OSS, Inc. Japan
English: http://www.sraoss.co.jp/index_en.php
Japanese:http://www.sraoss.co.jp



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Craig Ringer
Дата:
On 7 December 2016 at 10:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Just saw another report of what's probably systemd killing off Postgres'
> SysV semaphores, as we've discussed previously at, eg,
> https://www.postgresql.org/message-id/flat/57828C31.5060409%40gmail.com
> Since the systemd people are generally impervious to suggestions that
> they might be mistaken, I do not expect this problem to go away.
>
> I think we should give serious consideration to back-patching commit
> ecb0d20a9, which changed the default semaphore type to unnamed-POSIX
> on Linux.  We've seen no problems in the buildfarm in the two months
> that that's been in HEAD.  If we don't do this, we can expect to
> continue seeing complaints of this sort until pre-v10 PG releases
> fall out of use ... and I don't want to wait that long.
>
> Commit ecb0d20a9 also changed the default for FreeBSD.  I'm not convinced
> we should back-patch that part, because (a) unnamed-POSIX semas have
> only been there since FreeBSD 9.0, which isn't that long ago, and (b)
> the argument for switching is "it'll perform better" not "your server
> will fail randomly without this change".

That's a huge change to make for something that doesn't risk data
corruption, and that won't happen when using postgres with distro
packages.

As I understand it, it's only a problem if you're running postgres as
a normal user, not a "system user" which systemd defines at
compile-time as a user < 500 or < 1000 depending on the distro's
default login.conf . So it'll only affect people who're not using
their distro's packages and service mechanism, and are instead running
Pg under some other user, likely started manually with pg_ctl.

I see quite a few people who compile their own Pg rather than using
packages, and some who even fail to use the init system and instead
use custom scripts to launch Pg. But pretty much everything I've seen
uses a 'postgres' system-user. Clearly there are exceptions out there
in the wild, but I don't think it makes sense to backpatch this to
satisfy people who are, IMO, doing it wrong in the first place.

Especially since those people can reconfigure systemd not to do this
with the RemoveIPC and KillUserProcesses directives if they insist on
using a non-system user.

If they defined a systemd service to start postgres they'd be fine...
and isn't it pretty much basic sysadmin 101 to use your init system to
start services?

Don't get me wrong, I think systemd's behaviour is pretty stupid.
Mostly in terms of its magic definition of a "system user", which
shouldn't be something determined by a uid threshold at compile time.
But I don't think we should double down on it by backpatching a big
change that hasn't even seen in-the-wild loads from real world use
yet, just to make it easier on people who're doing things backwards in
the first place.

If it were possible to detect that systemd was about to clobber us and
log something informative, _that_ would be very nice to backpatch. I
don't see how that's possible though.

-- Craig Ringer                   http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Magnus Hagander
Дата:
On Wed, Dec 7, 2016 at 7:18 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
On 7 December 2016 at 10:53, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Just saw another report of what's probably systemd killing off Postgres'
> SysV semaphores, as we've discussed previously at, eg,
> https://www.postgresql.org/message-id/flat/57828C31.5060409%40gmail.com
> Since the systemd people are generally impervious to suggestions that
> they might be mistaken, I do not expect this problem to go away.
>
> I think we should give serious consideration to back-patching commit
> ecb0d20a9, which changed the default semaphore type to unnamed-POSIX
> on Linux.  We've seen no problems in the buildfarm in the two months
> that that's been in HEAD.  If we don't do this, we can expect to
> continue seeing complaints of this sort until pre-v10 PG releases
> fall out of use ... and I don't want to wait that long.
>
> Commit ecb0d20a9 also changed the default for FreeBSD.  I'm not convinced
> we should back-patch that part, because (a) unnamed-POSIX semas have
> only been there since FreeBSD 9.0, which isn't that long ago, and (b)
> the argument for switching is "it'll perform better" not "your server
> will fail randomly without this change".

That's a huge change to make for something that doesn't risk data
corruption, and that won't happen when using postgres with distro
packages.

As I understand it, it's only a problem if you're running postgres as
a normal user, not a "system user" which systemd defines at
compile-time as a user < 500 or < 1000 depending on the distro's
default login.conf . So it'll only affect people who're not using
their distro's packages and service mechanism, and are instead running
Pg under some other user, likely started manually with pg_ctl.

I see quite a few people who compile their own Pg rather than using
packages, and some who even fail to use the init system and instead
use custom scripts to launch Pg. But pretty much everything I've seen
uses a 'postgres' system-user. Clearly there are exceptions out there
in the wild, but I don't think it makes sense to backpatch this to
satisfy people who are, IMO, doing it wrong in the first place.

Especially since those people can reconfigure systemd not to do this
with the RemoveIPC and KillUserProcesses directives if they insist on
using a non-system user.

If they defined a systemd service to start postgres they'd be fine...
and isn't it pretty much basic sysadmin 101 to use your init system to
start services?

Don't get me wrong, I think systemd's behaviour is pretty stupid.
Mostly in terms of its magic definition of a "system user", which
shouldn't be something determined by a uid threshold at compile time.
But I don't think we should double down on it by backpatching a big
change that hasn't even seen in-the-wild loads from real world use
yet, just to make it easier on people who're doing things backwards in
the first place.

+1 (or several).

I don't think we should backpatch something that carries risk for people who do things "the right way" to help those that don't. Even if the behavior is stupid.

 

If it were possible to detect that systemd was about to clobber us and
log something informative, _that_ would be very nice to backpatch. I
don't see how that's possible though.

Surely there must be some way to ask systemd about it's configuration? And if we have that, then we could possibly teach pg_ctl about that and have it throw a big warning?

--

Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Peter Eisentraut
Дата:
On 12/6/16 9:53 PM, Tom Lane wrote:
> I think we should give serious consideration to back-patching commit
> ecb0d20a9, which changed the default semaphore type to unnamed-POSIX
> on Linux.

Even with that change, dynamic shared memory is still vulnerable to be
removed.  So backpatching the semaphore change wouldn't achieve any new
level of safety for users so that we could tell them, "you're good now".

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Stephen Frost
Дата:
All,

* Peter Eisentraut (peter.eisentraut@2ndquadrant.com) wrote:
> On 12/6/16 9:53 PM, Tom Lane wrote:
> > I think we should give serious consideration to back-patching commit
> > ecb0d20a9, which changed the default semaphore type to unnamed-POSIX
> > on Linux.
>
> Even with that change, dynamic shared memory is still vulnerable to be
> removed.  So backpatching the semaphore change wouldn't achieve any new
> level of safety for users so that we could tell them, "you're good now".

I tend to agree with Peter, Magnus, and Craig on this.  If you aren't
running PG as a system user on a system where systemd feels happy to
kill processes and remove shared memory segments and semaphores when you
log out, no amount of back-patching of anything is going to make you
'safe'.  As noted in the thread referenced, users who are trying to
(mistakenly) do this are already having to modify their logind.conf file
to not have PG outright killed when they log out, it's on them to make
sure systemd doesn't do other stupid things when they log out too if
they really want PG to be run as their user account.

Thanks!

Stephen

Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 12/6/16 9:53 PM, Tom Lane wrote:
>> I think we should give serious consideration to back-patching commit
>> ecb0d20a9, which changed the default semaphore type to unnamed-POSIX
>> on Linux.

> Even with that change, dynamic shared memory is still vulnerable to be
> removed.

Really?  I thought we concluded that it is safe because it is detectably
attached to running processes.  The trouble with SysV semaphores is that
they lack any such attachment, so systemd is left to guess whether they
are still in use.
        regards, tom lane



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Robert Haas
Дата:
On Tue, Dec 6, 2016 at 11:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Dec 6, 2016 at 9:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> I think we should give serious consideration to back-patching commit
>>> ecb0d20a9, which changed the default semaphore type to unnamed-POSIX
>>> on Linux.
>
>> Urk.  That sounds like a scary thing to back-patch.
>
> I don't deny that it's scary, but the alternative seems to be to be
> rather badly broken on systemd-using distros for years to come.
> That's pretty scary too.

Why can't this be configurable?

>> ... Granted, that might not
>> happen, because maybe unnamed POSIX semas are one of those really
>> awesome operating system primitives that never has problems on any
>> system anywhere ever.  But I think it's pretty hard to be certain of
>> that.
>
> You're attacking a straw man.  I didn't propose changing our behavior
> anywhere but Linux.  AFAIK, on that platform unnamed POSIX semaphores
> are futexes, which have been a stable feature since 2003 according to
> https://en.wikipedia.org/wiki/Futex#History.  Anybody who did need
> to compile PG for use with a pre-2.6 kernel could override the default,
> anyway.

Changing the behavior even just on Linux leaves plenty of room for
failure, even if the feature itself has been stable.  For example,
there are Linux machines where POSIX shared memory doesn't work, even
though POSIX shared memory is in general a supported feature on Linux
and has been for a long time.   So, if we were to change from System V
shared memory to POSIX shared memory in a minor release, anyone in
that situation would break.  It's hard to be sure the same thing
wouldn't happen in this case.  The fact that the feature's stable
doesn't prove that it works on every system in every configuration.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Tue, Dec 6, 2016 at 11:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Robert Haas <robertmhaas@gmail.com> writes:
>>> On Tue, Dec 6, 2016 at 9:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>> I think we should give serious consideration to back-patching commit
>>>> ecb0d20a9, which changed the default semaphore type to unnamed-POSIX
>>>> on Linux.

>>> Urk.  That sounds like a scary thing to back-patch.

>> I don't deny that it's scary, but the alternative seems to be to be
>> rather badly broken on systemd-using distros for years to come.
>> That's pretty scary too.

> Why can't this be configurable?

It already is.  Note that I said "default".

As things stand, it's only a configure-time choice, but I've been
thinking that we might be well advised to make it run-time configurable.
I do not believe that anyone's still using a Linux version wherein
POSIX semas wouldn't work, but I am not convinced that the same is true
for FreeBSD.  And a configure-run-time test is not a pleasant option
because it doesn't work for cross-compiles.  So really, on platforms
where we think POSIX semas might work, it'd be best to try a sem_init()
during postmaster start and then fall back to SysV if it doesn't work.

But this is all kind of moot if Peter is right that systemd will zap
POSIX shmem along with SysV semaphores.  I've been trying to reproduce
the issue on a Fedora 25 installation, and so far I can't get it to
zap anything, so I'm a bit at a loss how to prove things one way or
the other.
        regards, tom lane



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Robert Haas
Дата:
On Wed, Dec 7, 2016 at 3:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Tue, Dec 6, 2016 at 11:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> Robert Haas <robertmhaas@gmail.com> writes:
>>>> On Tue, Dec 6, 2016 at 9:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>>>> I think we should give serious consideration to back-patching commit
>>>>> ecb0d20a9, which changed the default semaphore type to unnamed-POSIX
>>>>> on Linux.
>
>>>> Urk.  That sounds like a scary thing to back-patch.
>
>>> I don't deny that it's scary, but the alternative seems to be to be
>>> rather badly broken on systemd-using distros for years to come.
>>> That's pretty scary too.
>
>> Why can't this be configurable?
>
> It already is.  Note that I said "default".
>
> As things stand, it's only a configure-time choice, but I've been
> thinking that we might be well advised to make it run-time configurable.

Sure.  A configure-time choice only benefits people who are compiling
from source, which as far as production is concerned is almost nobody.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Alex Hunsaker
Дата:


On Wed, Dec 7, 2016 at 1:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:


But this is all kind of moot if Peter is right that systemd will zap
POSIX shmem along with SysV semaphores.  I've been trying to reproduce
the issue on a Fedora 25 installation, and so far I can't get it to
zap anything, so I'm a bit at a loss how to prove things one way or
the other.


Don't know precisely about Fedora 25, but I've had success in the past with:
ssh in as the user
start postgres under tmux/screen
logout
do another ssh login/logout cycle

After logon, you should see "/usr/lib/systemd/systemd --user" running for that
user. After logout out, said proc should exit. If either of those is not true,
either systemd is not setup to track sessions (probably via pam) or it thinks
you still have an active logon. Another way to check if systemd thinks the user
is logged in is if /run/user/$USER exists.

Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Tom Lane
Дата:
Alex Hunsaker <badalex@gmail.com> writes:
> On Wed, Dec 7, 2016 at 1:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> But this is all kind of moot if Peter is right that systemd will zap
>> POSIX shmem along with SysV semaphores.  I've been trying to reproduce
>> the issue on a Fedora 25 installation, and so far I can't get it to
>> zap anything, so I'm a bit at a loss how to prove things one way or
>> the other.

> After logon, you should see "/usr/lib/systemd/systemd --user" running for
> that user. After logout out, said proc should exit.

Hmm ... after further experimentation, I still can't get this version of
systemd (231) to do anything evil.  It turns out that Fedora ships it with
KillUserProcesses turned off by default, and maybe having that on is a
prerequisite for the other behavior?  But that doesn't make a lot of sense
because we'd never be seeing the reports of databases moaning about lost
semaphores if the processes got killed first.  Anyway, I see nothing bad
happening if KillUserProcesses is off, while if it's on then the database
gets shut down reasonably politely via SIGTERM.

Color me confused ... maybe systemd's behavior has changed?
        regards, tom lane



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Andres Freund
Дата:
Hi,


On 2016-12-06 21:53:06 -0500, Tom Lane wrote:
> Just saw another report of what's probably systemd killing off Postgres'
> SysV semaphores, as we've discussed previously at, eg,
> https://www.postgresql.org/message-id/flat/57828C31.5060409%40gmail.com
> Since the systemd people are generally impervious to suggestions that
> they might be mistaken, I do not expect this problem to go away.

Would doing so actually solve the systemd issue? Doesn't systemd also
remove SYSV shared memory, which we still use a tiny bit of?


> I think we should give serious consideration to back-patching commit
> ecb0d20a9, which changed the default semaphore type to unnamed-POSIX
> on Linux.  We've seen no problems in the buildfarm in the two months
> that that's been in HEAD.  If we don't do this, we can expect to
> continue seeing complaints of this sort until pre-v10 PG releases
> fall out of use ... and I don't want to wait that long.

I'm a bit uncomfortable backpatching this change, before it has seen
production usage. Both the posix and sysv semaphore implementation has
evolved over the years, with changing performance characteristics. I
don't think it's fair to users to swap a proven solution out for
something that hasn't seen a lot of load.


Greetings,

Andres Freund



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Andres Freund
Дата:
On 2016-12-06 23:54:43 -0500, Tom Lane wrote:
> You're attacking a straw man.  I didn't propose changing our behavior
> anywhere but Linux.  AFAIK, on that platform unnamed POSIX semaphores
> are futexes, which have been a stable feature since 2003 according to
> https://en.wikipedia.org/wiki/Futex#History.  Anybody who did need
> to compile PG for use with a pre-2.6 kernel could override the default,
> anyway.

Back then futexes weren't "robust" though (crash handling and such was
unusable). They only started to be reliable in the ~2007-2008 frame
IIRC.  That still should be ok though.

Regards,

Andres



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Alvaro Herrera
Дата:
Tom Lane wrote:

> Hmm ... after further experimentation, I still can't get this version of
> systemd (231) to do anything evil.  It turns out that Fedora ships it with
> KillUserProcesses turned off by default, and maybe having that on is a
> prerequisite for the other behavior?  But that doesn't make a lot of sense
> because we'd never be seeing the reports of databases moaning about lost
> semaphores if the processes got killed first.  Anyway, I see nothing bad
> happening if KillUserProcesses is off, while if it's on then the database
> gets shut down reasonably politely via SIGTERM.
> 
> Color me confused ... maybe systemd's behavior has changed?

https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/ZNQW72UP36UAFMX53HPFFQTWTQDZVJ3M/

-- 
Álvaro Herrera                https://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Tom Lane
Дата:
Alvaro Herrera <alvherre@2ndquadrant.com> writes:
> Tom Lane wrote:
>> Color me confused ... maybe systemd's behavior has changed?

> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/ZNQW72UP36UAFMX53HPFFQTWTQDZVJ3M/

I see Lennart hasn't gotten any less convinced that he's always right
since I left Red Hat :-(

But anyway, it's a demonstrable fact that Fedora 25 has KillUserProcesses
off by default, even though it contains systemd-231.  I assume FESCO
brought down the hammer at some point.

This still doesn't address the real question, which is whether RemoveIPC
does anything if KillUserProcesses is off, and whether that behavior
has changed.  I don't see anything about RemoveIPC in that thread.
        regards, tom lane



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Robert Haas
Дата:
On Wed, Dec 7, 2016 at 6:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Alvaro Herrera <alvherre@2ndquadrant.com> writes:
>> Tom Lane wrote:
>>> Color me confused ... maybe systemd's behavior has changed?
>
>>
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/thread/ZNQW72UP36UAFMX53HPFFQTWTQDZVJ3M/
>
> I see Lennart hasn't gotten any less convinced that he's always right
> since I left Red Hat :-(

This thread does seem to give that impression.  It's nice to know
there are engineers in the world even more arrogant than we are.  :-)

> But anyway, it's a demonstrable fact that Fedora 25 has KillUserProcesses
> off by default, even though it contains systemd-231.  I assume FESCO
> brought down the hammer at some point.

https://pagure.io/fesco/issue/1600 seems to suggest that it's merely
in abeyance.  (See the first two updates and the last one for the
executive summary.)

> This still doesn't address the real question, which is whether RemoveIPC
> does anything if KillUserProcesses is off, and whether that behavior
> has changed.  I don't see anything about RemoveIPC in that thread.

http://www.dsm.fordham.edu/cgi-bin/man-cgi.pl?topic=logind.conf&sect=5
suggests that KillUserProcesses and RemoveIPC are separate cleanup
behaviors triggered by the same underlying cause (termination of last
session).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Dec 7, 2016 at 6:49 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> This still doesn't address the real question, which is whether RemoveIPC
>> does anything if KillUserProcesses is off, and whether that behavior
>> has changed.  I don't see anything about RemoveIPC in that thread.

> http://www.dsm.fordham.edu/cgi-bin/man-cgi.pl?topic=logind.conf&sect=5
> suggests that KillUserProcesses and RemoveIPC are separate cleanup
> behaviors triggered by the same underlying cause (termination of last
> session).

Yeah, I read that man page too, but ...

The test case I was using was to ssh into the box, launch a
postmaster using the old-school "nohup postmaster &" technique, and
log out.  What I saw was that the "/usr/lib/systemd/systemd --user"
process Alex referred to would be launched when the ssh connection
started, and would stick around as long as the postmaster was there,
if KillUserProcesses was off.  (If it was on, something SIGTERM'd
the postmaster as soon as I disconnected.)  So if they really are
independent behaviors, I'd have expected the same something to have
killed the semaphores as soon as I disconnected; but that did NOT
happen.

[ Yes, RemoveIPC is definitely on: I turned it on explicitly in
logind.conf, just in case the comment claiming it's on by default
is a lie. ]

BTW, I also tried this from the console, but the results were confused
by the fact that GNOME seems to launch approximately a metric buttload
of "helper" processes, which don't disappear when I log out.  If that's
the behavior Lennart is trying to get rid of, I can see his point; but
I tend to agree with the other comments in that thread that this should
be fixed in GNOME not by breaking longstanding working assumptions.

When I get a chance, I think I'll try F24 and see if it behaved
differently.  F23 might be interesting too if it's still downloadable.
        regards, tom lane



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Alex Hunsaker
Дата:


On Wed, Dec 7, 2016 at 3:42 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:


Hmm ... after further experimentation, I still can't get this version of
systemd (231) to do anything evil.  It turns out that Fedora ships it with
KillUserProcesses turned off by default, and maybe having that on is a
prerequisite for the other behavior?  But that doesn't make a lot of sense
because we'd never be seeing the reports of databases moaning about lost
semaphores if the processes got killed first.  Anyway, I see nothing bad
happening if KillUserProcesses is off, while if it's on then the database
gets shut down reasonably politely via SIGTERM.

Color me confused ... maybe systemd's behavior has changed?

Hrm, the following incantation seems to break for me on a fresh Fedora 25 system:
1) As root su to $USER and start postgres.
2) ssh in as $USER and then logout
3) # psql localhost

FATAL: semctl(4980742, 3, SETVAL, 0) failed: Invalid argument
LOG: server process (PID 14569) exited with exit code 1
...

Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Peter Eisentraut
Дата:
On 12/7/16 9:38 AM, Tom Lane wrote:
>> Even with that change, dynamic shared memory is still vulnerable to be
>> removed.
> Really?  I thought we concluded that it is safe because it is detectably
> attached to running processes.

The DSM implementation uses POSIX shared memory, which doesn't have an
attachment count like SysV shared memory.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Peter Eisentraut
Дата:
On 12/7/16 9:28 PM, Alex Hunsaker wrote:
> Hrm, the following incantation seems to break for me on a fresh Fedora
> 25 system:
> 1) As root su to $USER and start postgres.
> 2) ssh in as $USER and then logout
> 3) # psql localhost
> 
> FATAL: semctl(4980742, 3, SETVAL, 0) failed: Invalid argument
> LOG: server process (PID 14569) exited with exit code 1

Yeah, the way to trigger this is to run the postgres server not in a
"session", then log in interactively as that same user, thus creating a
session, and then logging out from that session, thus completely logging
out that user from all sessions.

(Thus, the way to trigger the KillUserProcesses behavior is quite the
opposite, because that only happens if you have the postgres server
running in a session.)

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 12/7/16 9:28 PM, Alex Hunsaker wrote:
>> Hrm, the following incantation seems to break for me on a fresh Fedora
>> 25 system:
>> 1) As root su to $USER and start postgres.
>> 2) ssh in as $USER and then logout
>> 3) # psql localhost
>> 
>> FATAL: semctl(4980742, 3, SETVAL, 0) failed: Invalid argument
>> LOG: server process (PID 14569) exited with exit code 1

> Yeah, the way to trigger this is to run the postgres server not in a
> "session", then log in interactively as that same user, thus creating a
> session, and then logging out from that session, thus completely logging
> out that user from all sessions.

> (Thus, the way to trigger the KillUserProcesses behavior is quite the
> opposite, because that only happens if you have the postgres server
> running in a session.)

Ah-hah, thanks for the insight.  I can now reproduce it, and I confirm
that aside from removing the semaphores, our POSIX shmem segment(s)
are removed from /dev/shm.  They presumably still are attached to whatever
processes have them mapped already, but this behavior is going to break
DSM usage in any case.  (The SysV shm segment does survive, presumably
because systemd notices its positive attach count.)

So I now agree that getting out from under SysV semaphores isn't going to
fix our problems with systemd ... at least, not unless we migrate *to*
not away from SysV shared memory for DSM.  Even then, we'd have to be
careful that standard usage patterns keep every DSM segment continually
attached to at least one process.  Dunno how practical that is.  And it
blows chunks in the goal of not being constrained by SHMMAX.

Oh well.
        regards, tom lane



Re: Back-patch use of unnamed POSIX semaphores for Linux?

От
Robert Haas
Дата:
On Thu, Dec 8, 2016 at 12:40 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Ah-hah, thanks for the insight.  I can now reproduce it, and I confirm
> that aside from removing the semaphores, our POSIX shmem segment(s)
> are removed from /dev/shm.  They presumably still are attached to whatever
> processes have them mapped already, but this behavior is going to break
> DSM usage in any case.  (The SysV shm segment does survive, presumably
> because systemd notices its positive attach count.)

Make sense.  Actually, it would be fairly unlucky for the DSM thing to
cause a failure for parallel query as it exists today, because there's
only about 4ms between when the segment is created and when all
backends attached.  But DSA - especially if we use it for anything
long-lived - will surely break.

> So I now agree that getting out from under SysV semaphores isn't going to
> fix our problems with systemd ... at least, not unless we migrate *to*
> not away from SysV shared memory for DSM.  Even then, we'd have to be
> careful that standard usage patterns keep every DSM segment continually
> attached to at least one process.  Dunno how practical that is.  And it
> blows chunks in the goal of not being constrained by SHMMAX.

dynamic_shared_memory_type = sysv is already supported, but it's not
the default precisely because of SHMMAX.  Keeping every DSM segment
attached to at least one process is normal for parallel query as it
exists today, but tough for any application that "pins" segments.  On
Windows, we do that: pinning a segment actually causes a user backend
to reach inside the postmaster's address space to open a file
descriptor.  Why Microsoft thought that was something a process should
be able to do to another process I couldn't say.  But on Linux where
segments don't go away on last close, and where such frightening APIs
don't exist, there's no guarantee that a segment will always be open
somewhere.

Hey, I have an idea.  Let's switch from processes to threads, and then
shared memory - including the dynamic kind - can be implemented using
malloc().

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



Re: [HACKERS] Back-patch use of unnamed POSIX semaphores for Linux?

От
Tom Lane
Дата:
Robert Haas <robertmhaas@gmail.com> writes:
> On Wed, Dec 7, 2016 at 3:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> As things stand, it's only a configure-time choice, but I've been
>> thinking that we might be well advised to make it run-time configurable.

> Sure.  A configure-time choice only benefits people who are compiling
> from source, which as far as production is concerned is almost nobody.

Attached is a proposed patch that unifies the ABI for the three Unix-y
types of semaphores.  I figured it wasn't worth trying to unify Windows
as well, since nobody would ever be doing run-time switching between
Windows and Unix code.

The practical impact of this is that the sem_t or semId/semNum data
is removed from the PGPROC array and placed in a separate array elsewhere
in shared memory.  On 64-bit Linux machines, sem_t is 64 bytes (or at
least, it is on my RHEL6 box), so this change undoes the 56-byte addition
that commit ecb0d20a9 caused.  I think this is probably a performance
win, even though it means an extra indirection to get at the sem_t,
because the speed of access to the sem_t really shouldn't matter: we
only touch that when we're putting a process to sleep or waking it up.
Not bloating PGPROC is probably worth something, though.

It would take additional work to get to being able to do run-time
switching between the Unix semaphore APIs, but that work would now be
localized in posix_sema.c and sysv_sema.c and would not affect code
elsewhere.

I think we need to do at least this much for v10, because otherwise
we'll face ABI issues if an extension is compiled against code with
one semaphore API choice and used with code with a different one.
That didn't use to be a problem because there was really no expectation
of anyone using a non-default semaphore API on any platform, but
I fear it will become an issue if we don't do this.

            regards, tom lane

diff --git a/src/backend/port/posix_sema.c b/src/backend/port/posix_sema.c
index 2b4b11c..603dc5a 100644
*** a/src/backend/port/posix_sema.c
--- b/src/backend/port/posix_sema.c
***************
*** 6,11 ****
--- 6,19 ----
   * We prefer the unnamed style of POSIX semaphore (the kind made with
   * sem_init).  We can cope with the kind made with sem_open, however.
   *
+  * In either implementation, typedef PGSemaphore is equivalent to "sem_t *".
+  * With unnamed semaphores, the sem_t structs live in an array in shared
+  * memory.  With named semaphores, that's not true because we cannot persuade
+  * sem_open to do its allocation there.  Therefore, the named-semaphore code
+  * *does not cope with EXEC_BACKEND*.  The sem_t structs will just be in the
+  * postmaster's private memory, where they are successfully inherited by
+  * forked backends, but they could not be accessed by exec'd backends.
+  *
   *
   * Portions Copyright (c) 1996-2016, PostgreSQL Global Development Group
   * Portions Copyright (c) 1994, Regents of the University of California
***************
*** 18,45 ****
  #include "postgres.h"

  #include <fcntl.h>
  #include <signal.h>
  #include <unistd.h>

  #include "miscadmin.h"
  #include "storage/ipc.h"
  #include "storage/pg_sema.h"


! #ifdef USE_NAMED_POSIX_SEMAPHORES
! /* PGSemaphore is pointer to pointer to sem_t */
! #define PG_SEM_REF(x)    (*(x))
! #else
! /* PGSemaphore is pointer to sem_t */
! #define PG_SEM_REF(x)    (x)
  #endif


  #define IPCProtection    (0600)    /* access/modify by user only */

  static sem_t **mySemPointers;    /* keep track of created semaphores */
  static int    numSems;            /* number of semas acquired so far */
! static int    maxSems;            /* allocated size of mySemaPointers array */
  static int    nextSemKey;            /* next name to try */


--- 26,63 ----
  #include "postgres.h"

  #include <fcntl.h>
+ #include <semaphore.h>
  #include <signal.h>
  #include <unistd.h>

  #include "miscadmin.h"
  #include "storage/ipc.h"
  #include "storage/pg_sema.h"
+ #include "storage/shmem.h"


! /* see file header comment */
! #if defined(USE_NAMED_POSIX_SEMAPHORES) && defined(EXEC_BACKEND)
! #error cannot use named POSIX semaphores with EXEC_BACKEND
  #endif

+ /* typedef PGSemaphore is equivalent to pointer to sem_t */
+ typedef struct PGSemaphoreData
+ {
+     sem_t        pgsem;
+ } PGSemaphoreData;
+
+ #define PG_SEM_REF(x)    (&(x)->pgsem)

  #define IPCProtection    (0600)    /* access/modify by user only */

+ #ifdef USE_NAMED_POSIX_SEMAPHORES
  static sem_t **mySemPointers;    /* keep track of created semaphores */
+ #else
+ static PGSemaphore sharedSemas; /* array of PGSemaphoreData in shared memory */
+ #endif
  static int    numSems;            /* number of semas acquired so far */
! static int    maxSems;            /* allocated size of above arrays */
  static int    nextSemKey;            /* next name to try */


*************** PosixSemaphoreKill(sem_t * sem)
*** 134,139 ****
--- 152,172 ----


  /*
+  * Report amount of shared memory needed for semaphores
+  */
+ Size
+ PGSemaphoreShmemSize(int maxSemas)
+ {
+ #ifdef USE_NAMED_POSIX_SEMAPHORES
+     /* No shared memory needed in this case */
+     return 0;
+ #else
+     /* Need a PGSemaphoreData per semaphore */
+     return mul_size(maxSemas, sizeof(PGSemaphoreData));
+ #endif
+ }
+
+ /*
   * PGReserveSemaphores --- initialize semaphore support
   *
   * This is called during postmaster start or shared memory reinitialization.
*************** PosixSemaphoreKill(sem_t * sem)
*** 147,161 ****
   * zero will be passed.
   *
   * In the Posix implementation, we acquire semaphores on-demand; the
!  * maxSemas parameter is just used to size the array that keeps track of
!  * acquired semas for subsequent releasing.
   */
  void
  PGReserveSemaphores(int maxSemas, int port)
  {
      mySemPointers = (sem_t **) malloc(maxSemas * sizeof(sem_t *));
      if (mySemPointers == NULL)
          elog(PANIC, "out of memory");
      numSems = 0;
      maxSems = maxSemas;
      nextSemKey = port * 1000;
--- 180,212 ----
   * zero will be passed.
   *
   * In the Posix implementation, we acquire semaphores on-demand; the
!  * maxSemas parameter is just used to size the arrays.  For unnamed
!  * semaphores, there is an array of PGSemaphoreData structs in shared memory.
!  * For named semaphores, we keep a postmaster-local array of sem_t pointers,
!  * which we use for releasing the semphores when done.
!  * (This design minimizes the dependency of postmaster shutdown on the
!  * contents of shared memory, which a failed backend might have clobbered.
!  * We can't do much about the possibility of sem_destroy() crashing, but
!  * we don't have to expose the counters to other processes.)
   */
  void
  PGReserveSemaphores(int maxSemas, int port)
  {
+ #ifdef USE_NAMED_POSIX_SEMAPHORES
      mySemPointers = (sem_t **) malloc(maxSemas * sizeof(sem_t *));
      if (mySemPointers == NULL)
          elog(PANIC, "out of memory");
+ #else
+
+     /*
+      * We must use ShmemAllocUnlocked(), since the spinlock protecting
+      * ShmemAlloc() won't be ready yet.  (This ordering is necessary when we
+      * are emulating spinlocks with semaphores.)
+      */
+     sharedSemas = (PGSemaphore)
+         ShmemAllocUnlocked(PGSemaphoreShmemSize(maxSemas));
+ #endif
+
      numSems = 0;
      maxSems = maxSemas;
      nextSemKey = port * 1000;
*************** ReleaseSemaphores(int status, Datum arg)
*** 173,191 ****
  {
      int            i;

      for (i = 0; i < numSems; i++)
          PosixSemaphoreKill(mySemPointers[i]);
      free(mySemPointers);
  }

  /*
   * PGSemaphoreCreate
   *
!  * Initialize a PGSemaphore structure to represent a sema with count 1
   */
! void
! PGSemaphoreCreate(PGSemaphore sema)
  {
      sem_t       *newsem;

      /* Can't do this in a backend, because static state is postmaster's */
--- 224,250 ----
  {
      int            i;

+ #ifdef USE_NAMED_POSIX_SEMAPHORES
      for (i = 0; i < numSems; i++)
          PosixSemaphoreKill(mySemPointers[i]);
      free(mySemPointers);
+ #endif
+
+ #ifdef USE_UNNAMED_POSIX_SEMAPHORES
+     for (i = 0; i < numSems; i++)
+         PosixSemaphoreKill(PG_SEM_REF(sharedSemas + i));
+ #endif
  }

  /*
   * PGSemaphoreCreate
   *
!  * Allocate a PGSemaphore structure with initial count 1
   */
! PGSemaphore
! PGSemaphoreCreate(void)
  {
+     PGSemaphore sema;
      sem_t       *newsem;

      /* Can't do this in a backend, because static state is postmaster's */
*************** PGSemaphoreCreate(PGSemaphore sema)
*** 195,208 ****
          elog(PANIC, "too many semaphores created");

  #ifdef USE_NAMED_POSIX_SEMAPHORES
!     *sema = newsem = PosixSemaphoreCreate();
  #else
!     PosixSemaphoreCreate(sema);
!     newsem = sema;
  #endif

!     /* Remember new sema for ReleaseSemaphores */
!     mySemPointers[numSems++] = newsem;
  }

  /*
--- 254,272 ----
          elog(PANIC, "too many semaphores created");

  #ifdef USE_NAMED_POSIX_SEMAPHORES
!     newsem = PosixSemaphoreCreate();
!     /* Remember new sema for ReleaseSemaphores */
!     mySemPointers[numSems] = newsem;
!     sema = (PGSemaphore) newsem;
  #else
!     sema = &sharedSemas[numSems];
!     newsem = PG_SEM_REF(sema);
!     PosixSemaphoreCreate(newsem);
  #endif

!     numSems++;
!
!     return sema;
  }

  /*
diff --git a/src/backend/port/sysv_sema.c b/src/backend/port/sysv_sema.c
index f6f1516..531d426 100644
*** a/src/backend/port/sysv_sema.c
--- b/src/backend/port/sysv_sema.c
***************
*** 27,33 ****
--- 27,40 ----
  #include "miscadmin.h"
  #include "storage/ipc.h"
  #include "storage/pg_sema.h"
+ #include "storage/shmem.h"
+

+ typedef struct PGSemaphoreData
+ {
+     int            semId;            /* semaphore set identifier */
+     int            semNum;            /* semaphore number within set */
+ } PGSemaphoreData;

  #ifndef HAVE_UNION_SEMUN
  union semun
*************** typedef int IpcSemaphoreId;        /* semaphor
*** 54,59 ****
--- 61,69 ----
  #define PGSemaMagic        537        /* must be less than SEMVMX */


+ static PGSemaphore sharedSemas; /* array of PGSemaphoreData in shared memory */
+ static int    numSharedSemas;        /* number of PGSemaphoreDatas used so far */
+ static int    maxSharedSemas;        /* allocated size of PGSemaphoreData array */
  static IpcSemaphoreId *mySemaSets;        /* IDs of sema sets acquired so far */
  static int    numSemaSets;        /* number of sema sets acquired so far */
  static int    maxSemaSets;        /* allocated size of mySemaSets array */
*************** IpcSemaphoreCreate(int numSems)
*** 274,279 ****
--- 284,298 ----


  /*
+  * Report amount of shared memory needed for semaphores
+  */
+ Size
+ PGSemaphoreShmemSize(int maxSemas)
+ {
+     return mul_size(maxSemas, sizeof(PGSemaphoreData));
+ }
+
+ /*
   * PGReserveSemaphores --- initialize semaphore support
   *
   * This is called during postmaster start or shared memory reinitialization.
*************** IpcSemaphoreCreate(int numSems)
*** 287,298 ****
   * zero will be passed.
   *
   * In the SysV implementation, we acquire semaphore sets on-demand; the
!  * maxSemas parameter is just used to size the array that keeps track of
!  * acquired sets for subsequent releasing.
   */
  void
  PGReserveSemaphores(int maxSemas, int port)
  {
      maxSemaSets = (maxSemas + SEMAS_PER_SET - 1) / SEMAS_PER_SET;
      mySemaSets = (IpcSemaphoreId *)
          malloc(maxSemaSets * sizeof(IpcSemaphoreId));
--- 306,331 ----
   * zero will be passed.
   *
   * In the SysV implementation, we acquire semaphore sets on-demand; the
!  * maxSemas parameter is just used to size the arrays.  There is an array
!  * of PGSemaphoreData structs in shared memory, and a postmaster-local array
!  * with one entry per SysV semaphore set, which we use for releasing the
!  * semaphore sets when done.  (This design ensures that postmaster shutdown
!  * doesn't rely on the contents of shared memory, which a failed backend might
!  * have clobbered.)
   */
  void
  PGReserveSemaphores(int maxSemas, int port)
  {
+     /*
+      * We must use ShmemAllocUnlocked(), since the spinlock protecting
+      * ShmemAlloc() won't be ready yet.  (This ordering is necessary when we
+      * are emulating spinlocks with semaphores.)
+      */
+     sharedSemas = (PGSemaphore)
+         ShmemAllocUnlocked(PGSemaphoreShmemSize(maxSemas));
+     numSharedSemas = 0;
+     maxSharedSemas = maxSemas;
+
      maxSemaSets = (maxSemas + SEMAS_PER_SET - 1) / SEMAS_PER_SET;
      mySemaSets = (IpcSemaphoreId *)
          malloc(maxSemaSets * sizeof(IpcSemaphoreId));
*************** ReleaseSemaphores(int status, Datum arg)
*** 323,333 ****
  /*
   * PGSemaphoreCreate
   *
!  * Initialize a PGSemaphore structure to represent a sema with count 1
   */
! void
! PGSemaphoreCreate(PGSemaphore sema)
  {
      /* Can't do this in a backend, because static state is postmaster's */
      Assert(!IsUnderPostmaster);

--- 356,368 ----
  /*
   * PGSemaphoreCreate
   *
!  * Allocate a PGSemaphore structure with initial count 1
   */
! PGSemaphore
! PGSemaphoreCreate(void)
  {
+     PGSemaphore sema;
+
      /* Can't do this in a backend, because static state is postmaster's */
      Assert(!IsUnderPostmaster);

*************** PGSemaphoreCreate(PGSemaphore sema)
*** 340,350 ****
--- 375,391 ----
          numSemaSets++;
          nextSemaNumber = 0;
      }
+     /* Use the next shared PGSemaphoreData */
+     if (numSharedSemas >= maxSharedSemas)
+         elog(PANIC, "too many semaphores created");
+     sema = &sharedSemas[numSharedSemas++];
      /* Assign the next free semaphore in the current set */
      sema->semId = mySemaSets[numSemaSets - 1];
      sema->semNum = nextSemaNumber++;
      /* Initialize it to count 1 */
      IpcSemaphoreInitialize(sema->semId, sema->semNum, 1);
+
+     return sema;
  }

  /*
diff --git a/src/backend/port/win32_sema.c b/src/backend/port/win32_sema.c
index c688210..c8b12be 100644
*** a/src/backend/port/win32_sema.c
--- b/src/backend/port/win32_sema.c
*************** static int    maxSems;            /* allocated size
*** 23,28 ****
--- 23,39 ----

  static void ReleaseSemaphores(int code, Datum arg);

+
+ /*
+  * Report amount of shared memory needed for semaphores
+  */
+ Size
+ PGSemaphoreShmemSize(int maxSemas)
+ {
+     /* No shared memory needed on Windows */
+     return 0;
+ }
+
  /*
   * PGReserveSemaphores --- initialize semaphore support
   *
*************** ReleaseSemaphores(int code, Datum arg)
*** 62,71 ****
  /*
   * PGSemaphoreCreate
   *
!  * Initialize a PGSemaphore structure to represent a sema with count 1
   */
! void
! PGSemaphoreCreate(PGSemaphore sema)
  {
      HANDLE        cur_handle;
      SECURITY_ATTRIBUTES sec_attrs;
--- 73,82 ----
  /*
   * PGSemaphoreCreate
   *
!  * Allocate a PGSemaphore structure with initial count 1
   */
! PGSemaphore
! PGSemaphoreCreate(void)
  {
      HANDLE        cur_handle;
      SECURITY_ATTRIBUTES sec_attrs;
*************** PGSemaphoreCreate(PGSemaphore sema)
*** 86,97 ****
      if (cur_handle)
      {
          /* Successfully done */
-         *sema = cur_handle;
          mySemSet[numSems++] = cur_handle;
      }
      else
          ereport(PANIC,
!                 (errmsg("could not create semaphore: error code %lu", GetLastError())));
  }

  /*
--- 97,110 ----
      if (cur_handle)
      {
          /* Successfully done */
          mySemSet[numSems++] = cur_handle;
      }
      else
          ereport(PANIC,
!                 (errmsg("could not create semaphore: error code %lu",
!                         GetLastError())));
!
!     return (PGSemaphore) cur_handle;
  }

  /*
*************** PGSemaphoreReset(PGSemaphore sema)
*** 106,112 ****
       * There's no direct API for this in Win32, so we have to ratchet the
       * semaphore down to 0 with repeated trylock's.
       */
!     while (PGSemaphoreTryLock(sema));
  }

  /*
--- 119,126 ----
       * There's no direct API for this in Win32, so we have to ratchet the
       * semaphore down to 0 with repeated trylock's.
       */
!     while (PGSemaphoreTryLock(sema))
!          /* loop */ ;
  }

  /*
*************** PGSemaphoreLock(PGSemaphore sema)
*** 127,133 ****
       * pending signals are serviced.
       */
      wh[0] = pgwin32_signal_event;
!     wh[1] = *sema;

      /*
       * As in other implementations of PGSemaphoreLock, we need to check for
--- 141,147 ----
       * pending signals are serviced.
       */
      wh[0] = pgwin32_signal_event;
!     wh[1] = sema;

      /*
       * As in other implementations of PGSemaphoreLock, we need to check for
*************** PGSemaphoreLock(PGSemaphore sema)
*** 182,190 ****
  void
  PGSemaphoreUnlock(PGSemaphore sema)
  {
!     if (!ReleaseSemaphore(*sema, 1, NULL))
          ereport(FATAL,
!                 (errmsg("could not unlock semaphore: error code %lu", GetLastError())));
  }

  /*
--- 196,205 ----
  void
  PGSemaphoreUnlock(PGSemaphore sema)
  {
!     if (!ReleaseSemaphore(sema, 1, NULL))
          ereport(FATAL,
!                 (errmsg("could not unlock semaphore: error code %lu",
!                         GetLastError())));
  }

  /*
*************** PGSemaphoreTryLock(PGSemaphore sema)
*** 197,203 ****
  {
      DWORD        ret;

!     ret = WaitForSingleObject(*sema, 0);

      if (ret == WAIT_OBJECT_0)
      {
--- 212,218 ----
  {
      DWORD        ret;

!     ret = WaitForSingleObject(sema, 0);

      if (ret == WAIT_OBJECT_0)
      {
*************** PGSemaphoreTryLock(PGSemaphore sema)
*** 213,219 ****

      /* Otherwise we are in trouble */
      ereport(FATAL,
!     (errmsg("could not try-lock semaphore: error code %lu", GetLastError())));

      /* keep compiler quiet */
      return false;
--- 228,235 ----

      /* Otherwise we are in trouble */
      ereport(FATAL,
!             (errmsg("could not try-lock semaphore: error code %lu",
!                     GetLastError())));

      /* keep compiler quiet */
      return false;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 59073e0..827e0d5 100644
*** a/src/backend/postmaster/postmaster.c
--- b/src/backend/postmaster/postmaster.c
*************** typedef struct
*** 484,490 ****
      VariableCache ShmemVariableCache;
      Backend    *ShmemBackendArray;
  #ifndef HAVE_SPINLOCKS
!     PGSemaphore SpinlockSemaArray;
  #endif
      int            NamedLWLockTrancheRequests;
      NamedLWLockTranche *NamedLWLockTrancheArray;
--- 484,490 ----
      VariableCache ShmemVariableCache;
      Backend    *ShmemBackendArray;
  #ifndef HAVE_SPINLOCKS
!     PGSemaphore *SpinlockSemaArray;
  #endif
      int            NamedLWLockTrancheRequests;
      NamedLWLockTranche *NamedLWLockTrancheArray;
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index 01bddce..29febb4 100644
*** a/src/backend/storage/ipc/ipci.c
--- b/src/backend/storage/ipc/ipci.c
*************** CreateSharedMemoryAndSemaphores(bool mak
*** 102,107 ****
--- 102,111 ----
          Size        size;
          int            numSemas;

+         /* Compute number of semaphores we'll need */
+         numSemas = ProcGlobalSemas();
+         numSemas += SpinlockSemas();
+
          /*
           * Size of the Postgres shared-memory block is estimated via
           * moderately-accurate estimates for the big hogs, plus 100K for the
*************** CreateSharedMemoryAndSemaphores(bool mak
*** 112,117 ****
--- 116,122 ----
           * need to be so careful during the actual allocation phase.
           */
          size = 100000;
+         size = add_size(size, PGSemaphoreShmemSize(numSemas));
          size = add_size(size, SpinlockSemaSize());
          size = add_size(size, hash_estimate_size(SHMEM_INDEX_SIZE,
                                                   sizeof(ShmemIndexEnt)));
*************** CreateSharedMemoryAndSemaphores(bool mak
*** 166,174 ****
          /*
           * Create semaphores
           */
-         numSemas = ProcGlobalSemas();
-         numSemas += SpinlockSemas();
          PGReserveSemaphores(numSemas, port);
      }
      else
      {
--- 171,185 ----
          /*
           * Create semaphores
           */
          PGReserveSemaphores(numSemas, port);
+
+         /*
+          * If spinlocks are disabled, initialize emulation layer (which
+          * depends on semaphores, so the order is important here).
+          */
+ #ifndef HAVE_SPINLOCKS
+         SpinlockSemaInit();
+ #endif
      }
      else
      {
diff --git a/src/backend/storage/ipc/procarray.c b/src/backend/storage/ipc/procarray.c
index e5d487d..bf38470 100644
*** a/src/backend/storage/ipc/procarray.c
--- b/src/backend/storage/ipc/procarray.c
*************** ProcArrayGroupClearXid(PGPROC *proc, Tra
*** 522,528 ****
          for (;;)
          {
              /* acts as a read barrier */
!             PGSemaphoreLock(&proc->sem);
              if (!proc->procArrayGroupMember)
                  break;
              extraWaits++;
--- 522,528 ----
          for (;;)
          {
              /* acts as a read barrier */
!             PGSemaphoreLock(proc->sem);
              if (!proc->procArrayGroupMember)
                  break;
              extraWaits++;
*************** ProcArrayGroupClearXid(PGPROC *proc, Tra
*** 532,538 ****

          /* Fix semaphore count for any absorbed wakeups */
          while (extraWaits-- > 0)
!             PGSemaphoreUnlock(&proc->sem);
          return;
      }

--- 532,538 ----

          /* Fix semaphore count for any absorbed wakeups */
          while (extraWaits-- > 0)
!             PGSemaphoreUnlock(proc->sem);
          return;
      }

*************** ProcArrayGroupClearXid(PGPROC *proc, Tra
*** 591,597 ****
          proc->procArrayGroupMember = false;

          if (proc != MyProc)
!             PGSemaphoreUnlock(&proc->sem);
      }
  }

--- 591,597 ----
          proc->procArrayGroupMember = false;

          if (proc != MyProc)
!             PGSemaphoreUnlock(proc->sem);
      }
  }

diff --git a/src/backend/storage/ipc/shmem.c b/src/backend/storage/ipc/shmem.c
index cc3af2d..a516194 100644
*** a/src/backend/storage/ipc/shmem.c
--- b/src/backend/storage/ipc/shmem.c
*************** InitShmemAllocation(void)
*** 117,152 ****
      Assert(shmhdr != NULL);

      /*
!      * If spinlocks are disabled, initialize emulation layer.  We have to do
!      * the space allocation the hard way, since obviously ShmemAlloc can't be
!      * called yet.
       */
! #ifndef HAVE_SPINLOCKS
!     {
!         PGSemaphore spinsemas;

!         spinsemas = (PGSemaphore) (((char *) shmhdr) + shmhdr->freeoffset);
!         shmhdr->freeoffset += MAXALIGN(SpinlockSemaSize());
!         SpinlockSemaInit(spinsemas);
!         Assert(shmhdr->freeoffset <= shmhdr->totalsize);
!     }
! #endif

      /*
!      * Initialize the spinlock used by ShmemAlloc; we have to do this the hard
!      * way, too, for the same reasons as above.
       */
-     ShmemLock = (slock_t *) (((char *) shmhdr) + shmhdr->freeoffset);
-     shmhdr->freeoffset += MAXALIGN(sizeof(slock_t));
-     Assert(shmhdr->freeoffset <= shmhdr->totalsize);
-
-     /* Make sure the first allocation begins on a cache line boundary. */
      aligned = (char *)
          (CACHELINEALIGN((((char *) shmhdr) + shmhdr->freeoffset)));
      shmhdr->freeoffset = aligned - (char *) shmhdr;

-     SpinLockInit(ShmemLock);
-
      /* ShmemIndex can't be set up yet (need LWLocks first) */
      shmhdr->index = NULL;
      ShmemIndex = (HTAB *) NULL;
--- 117,138 ----
      Assert(shmhdr != NULL);

      /*
!      * Initialize the spinlock used by ShmemAlloc.  We must use
!      * ShmemAllocUnlocked, since obviously ShmemAlloc can't be called yet.
       */
!     ShmemLock = (slock_t *) ShmemAllocUnlocked(sizeof(slock_t));

!     SpinLockInit(ShmemLock);

      /*
!      * Allocations after this point should go through ShmemAlloc, which
!      * expects to allocate everything on cache line boundaries.  Make sure the
!      * first allocation begins on a cache line boundary.
       */
      aligned = (char *)
          (CACHELINEALIGN((((char *) shmhdr) + shmhdr->freeoffset)));
      shmhdr->freeoffset = aligned - (char *) shmhdr;

      /* ShmemIndex can't be set up yet (need LWLocks first) */
      shmhdr->index = NULL;
      ShmemIndex = (HTAB *) NULL;
*************** ShmemAllocNoError(Size size)
*** 230,235 ****
--- 216,260 ----
  }

  /*
+  * ShmemAllocUnlocked -- allocate max-aligned chunk from shared memory
+  *
+  * Allocate space without locking ShmemLock.  This should be used for,
+  * and only for, allocations that must happen before ShmemLock is ready.
+  *
+  * We consider maxalign, rather than cachealign, sufficient here.
+  */
+ void *
+ ShmemAllocUnlocked(Size size)
+ {
+     Size        newStart;
+     Size        newFree;
+     void       *newSpace;
+
+     /*
+      * Ensure allocated space is adequately aligned.
+      */
+     size = MAXALIGN(size);
+
+     Assert(ShmemSegHdr != NULL);
+
+     newStart = ShmemSegHdr->freeoffset;
+
+     newFree = newStart + size;
+     if (newFree > ShmemSegHdr->totalsize)
+         ereport(ERROR,
+                 (errcode(ERRCODE_OUT_OF_MEMORY),
+                  errmsg("out of shared memory (%zu bytes requested)",
+                         size)));
+     ShmemSegHdr->freeoffset = newFree;
+
+     newSpace = (void *) ((char *) ShmemBase + newStart);
+
+     Assert(newSpace == (void *) MAXALIGN(newSpace));
+
+     return newSpace;
+ }
+
+ /*
   * ShmemAddrIsValid -- test if an address refers to shared memory
   *
   * Returns TRUE if the pointer points within the shared memory segment.
diff --git a/src/backend/storage/lmgr/lwlock.c b/src/backend/storage/lmgr/lwlock.c
index ffb2f72..03c4c78 100644
*** a/src/backend/storage/lmgr/lwlock.c
--- b/src/backend/storage/lmgr/lwlock.c
*************** LWLockWakeup(LWLock *lock)
*** 1012,1018 ****
           */
          pg_write_barrier();
          waiter->lwWaiting = false;
!         PGSemaphoreUnlock(&waiter->sem);
      }
  }

--- 1012,1018 ----
           */
          pg_write_barrier();
          waiter->lwWaiting = false;
!         PGSemaphoreUnlock(waiter->sem);
      }
  }

*************** LWLockDequeueSelf(LWLock *lock)
*** 1129,1135 ****
           */
          for (;;)
          {
!             PGSemaphoreLock(&MyProc->sem);
              if (!MyProc->lwWaiting)
                  break;
              extraWaits++;
--- 1129,1135 ----
           */
          for (;;)
          {
!             PGSemaphoreLock(MyProc->sem);
              if (!MyProc->lwWaiting)
                  break;
              extraWaits++;
*************** LWLockDequeueSelf(LWLock *lock)
*** 1139,1145 ****
           * Fix the process wait semaphore's count for any absorbed wakeups.
           */
          while (extraWaits-- > 0)
!             PGSemaphoreUnlock(&MyProc->sem);
      }

  #ifdef LOCK_DEBUG
--- 1139,1145 ----
           * Fix the process wait semaphore's count for any absorbed wakeups.
           */
          while (extraWaits-- > 0)
!             PGSemaphoreUnlock(MyProc->sem);
      }

  #ifdef LOCK_DEBUG
*************** LWLockAcquire(LWLock *lock, LWLockMode m
*** 1283,1289 ****

          for (;;)
          {
!             PGSemaphoreLock(&proc->sem);
              if (!proc->lwWaiting)
                  break;
              extraWaits++;
--- 1283,1289 ----

          for (;;)
          {
!             PGSemaphoreLock(proc->sem);
              if (!proc->lwWaiting)
                  break;
              extraWaits++;
*************** LWLockAcquire(LWLock *lock, LWLockMode m
*** 1320,1326 ****
       * Fix the process wait semaphore's count for any absorbed wakeups.
       */
      while (extraWaits-- > 0)
!         PGSemaphoreUnlock(&proc->sem);

      return result;
  }
--- 1320,1326 ----
       * Fix the process wait semaphore's count for any absorbed wakeups.
       */
      while (extraWaits-- > 0)
!         PGSemaphoreUnlock(proc->sem);

      return result;
  }
*************** LWLockAcquireOrWait(LWLock *lock, LWLock
*** 1444,1450 ****

              for (;;)
              {
!                 PGSemaphoreLock(&proc->sem);
                  if (!proc->lwWaiting)
                      break;
                  extraWaits++;
--- 1444,1450 ----

              for (;;)
              {
!                 PGSemaphoreLock(proc->sem);
                  if (!proc->lwWaiting)
                      break;
                  extraWaits++;
*************** LWLockAcquireOrWait(LWLock *lock, LWLock
*** 1481,1487 ****
       * Fix the process wait semaphore's count for any absorbed wakeups.
       */
      while (extraWaits-- > 0)
!         PGSemaphoreUnlock(&proc->sem);

      if (mustwait)
      {
--- 1481,1487 ----
       * Fix the process wait semaphore's count for any absorbed wakeups.
       */
      while (extraWaits-- > 0)
!         PGSemaphoreUnlock(proc->sem);

      if (mustwait)
      {
*************** LWLockWaitForVar(LWLock *lock, uint64 *v
*** 1662,1668 ****

          for (;;)
          {
!             PGSemaphoreLock(&proc->sem);
              if (!proc->lwWaiting)
                  break;
              extraWaits++;
--- 1662,1668 ----

          for (;;)
          {
!             PGSemaphoreLock(proc->sem);
              if (!proc->lwWaiting)
                  break;
              extraWaits++;
*************** LWLockWaitForVar(LWLock *lock, uint64 *v
*** 1692,1698 ****
       * Fix the process wait semaphore's count for any absorbed wakeups.
       */
      while (extraWaits-- > 0)
!         PGSemaphoreUnlock(&proc->sem);

      /*
       * Now okay to allow cancel/die interrupts.
--- 1692,1698 ----
       * Fix the process wait semaphore's count for any absorbed wakeups.
       */
      while (extraWaits-- > 0)
!         PGSemaphoreUnlock(proc->sem);

      /*
       * Now okay to allow cancel/die interrupts.
*************** LWLockUpdateVar(LWLock *lock, uint64 *va
*** 1759,1765 ****
          /* check comment in LWLockWakeup() about this barrier */
          pg_write_barrier();
          waiter->lwWaiting = false;
!         PGSemaphoreUnlock(&waiter->sem);
      }
  }

--- 1759,1765 ----
          /* check comment in LWLockWakeup() about this barrier */
          pg_write_barrier();
          waiter->lwWaiting = false;
!         PGSemaphoreUnlock(waiter->sem);
      }
  }

diff --git a/src/backend/storage/lmgr/proc.c b/src/backend/storage/lmgr/proc.c
index 83e9ca1..fc199a6 100644
*** a/src/backend/storage/lmgr/proc.c
--- b/src/backend/storage/lmgr/proc.c
*************** InitProcGlobal(void)
*** 224,230 ****
           */
          if (i < MaxBackends + NUM_AUXILIARY_PROCS)
          {
!             PGSemaphoreCreate(&(procs[i].sem));
              InitSharedLatch(&(procs[i].procLatch));
              LWLockInitialize(&(procs[i].backendLock), LWTRANCHE_PROC);
          }
--- 224,230 ----
           */
          if (i < MaxBackends + NUM_AUXILIARY_PROCS)
          {
!             procs[i].sem = PGSemaphoreCreate();
              InitSharedLatch(&(procs[i].procLatch));
              LWLockInitialize(&(procs[i].backendLock), LWTRANCHE_PROC);
          }
*************** InitProcess(void)
*** 420,426 ****
       * be careful and reinitialize its value here.  (This is not strictly
       * necessary anymore, but seems like a good idea for cleanliness.)
       */
!     PGSemaphoreReset(&MyProc->sem);

      /*
       * Arrange to clean up at backend exit.
--- 420,426 ----
       * be careful and reinitialize its value here.  (This is not strictly
       * necessary anymore, but seems like a good idea for cleanliness.)
       */
!     PGSemaphoreReset(MyProc->sem);

      /*
       * Arrange to clean up at backend exit.
*************** InitAuxiliaryProcess(void)
*** 575,581 ****
       * be careful and reinitialize its value here.  (This is not strictly
       * necessary anymore, but seems like a good idea for cleanliness.)
       */
!     PGSemaphoreReset(&MyProc->sem);

      /*
       * Arrange to clean up at process exit.
--- 575,581 ----
       * be careful and reinitialize its value here.  (This is not strictly
       * necessary anymore, but seems like a good idea for cleanliness.)
       */
!     PGSemaphoreReset(MyProc->sem);

      /*
       * Arrange to clean up at process exit.
diff --git a/src/backend/storage/lmgr/spin.c b/src/backend/storage/lmgr/spin.c
index 5039141..a6510a0 100644
*** a/src/backend/storage/lmgr/spin.c
--- b/src/backend/storage/lmgr/spin.c
***************
*** 23,33 ****
  #include "postgres.h"

  #include "storage/pg_sema.h"
  #include "storage/spin.h"


  #ifndef HAVE_SPINLOCKS
! PGSemaphore SpinlockSemaArray;
  #endif

  /*
--- 23,34 ----
  #include "postgres.h"

  #include "storage/pg_sema.h"
+ #include "storage/shmem.h"
  #include "storage/spin.h"


  #ifndef HAVE_SPINLOCKS
! PGSemaphore *SpinlockSemaArray;
  #endif

  /*
*************** PGSemaphore SpinlockSemaArray;
*** 37,43 ****
  Size
  SpinlockSemaSize(void)
  {
!     return SpinlockSemas() * sizeof(PGSemaphoreData);
  }

  #ifdef HAVE_SPINLOCKS
--- 38,44 ----
  Size
  SpinlockSemaSize(void)
  {
!     return SpinlockSemas() * sizeof(PGSemaphore);
  }

  #ifdef HAVE_SPINLOCKS
*************** SpinlockSemas(void)
*** 67,82 ****
  }

  /*
!  * Initialize semaphores.
   */
! extern void
! SpinlockSemaInit(PGSemaphore spinsemas)
  {
!     int            i;
      int            nsemas = SpinlockSemas();

      for (i = 0; i < nsemas; ++i)
!         PGSemaphoreCreate(&spinsemas[i]);
      SpinlockSemaArray = spinsemas;
  }

--- 68,91 ----
  }

  /*
!  * Initialize spinlock emulation.
!  *
!  * This must be called after PGReserveSemaphores().
   */
! void
! SpinlockSemaInit(void)
  {
!     PGSemaphore *spinsemas;
      int            nsemas = SpinlockSemas();
+     int            i;

+     /*
+      * We must use ShmemAllocUnlocked(), since the spinlock protecting
+      * ShmemAlloc() obviously can't be ready yet.
+      */
+     spinsemas = (PGSemaphore *) ShmemAllocUnlocked(SpinlockSemaSize());
      for (i = 0; i < nsemas; ++i)
!         spinsemas[i] = PGSemaphoreCreate();
      SpinlockSemaArray = spinsemas;
  }

*************** s_unlock_sema(volatile slock_t *lock)
*** 109,115 ****

      if (lockndx <= 0 || lockndx > NUM_SPINLOCK_SEMAPHORES)
          elog(ERROR, "invalid spinlock number: %d", lockndx);
!     PGSemaphoreUnlock(&SpinlockSemaArray[lockndx - 1]);
  }

  bool
--- 118,124 ----

      if (lockndx <= 0 || lockndx > NUM_SPINLOCK_SEMAPHORES)
          elog(ERROR, "invalid spinlock number: %d", lockndx);
!     PGSemaphoreUnlock(SpinlockSemaArray[lockndx - 1]);
  }

  bool
*************** tas_sema(volatile slock_t *lock)
*** 128,134 ****
      if (lockndx <= 0 || lockndx > NUM_SPINLOCK_SEMAPHORES)
          elog(ERROR, "invalid spinlock number: %d", lockndx);
      /* Note that TAS macros return 0 if *success* */
!     return !PGSemaphoreTryLock(&SpinlockSemaArray[lockndx - 1]);
  }

  #endif   /* !HAVE_SPINLOCKS */
--- 137,143 ----
      if (lockndx <= 0 || lockndx > NUM_SPINLOCK_SEMAPHORES)
          elog(ERROR, "invalid spinlock number: %d", lockndx);
      /* Note that TAS macros return 0 if *success* */
!     return !PGSemaphoreTryLock(SpinlockSemaArray[lockndx - 1]);
  }

  #endif   /* !HAVE_SPINLOCKS */
diff --git a/src/include/storage/pg_sema.h b/src/include/storage/pg_sema.h
index 2c94183..63546eb 100644
*** a/src/include/storage/pg_sema.h
--- b/src/include/storage/pg_sema.h
***************
*** 21,72 ****
  #define PG_SEMA_H

  /*
!  * PGSemaphoreData and pointer type PGSemaphore are the data structure
!  * representing an individual semaphore.  The contents of PGSemaphoreData
!  * vary across implementations and must never be touched by platform-
!  * independent code.  PGSemaphoreData structures are always allocated
!  * in shared memory (to support implementations where the data changes during
!  * lock/unlock).
   *
!  * pg_config.h must define exactly one of the USE_xxx_SEMAPHORES symbols.
   */
!
! #ifdef USE_NAMED_POSIX_SEMAPHORES
!
! #include <semaphore.h>
!
! typedef sem_t *PGSemaphoreData;
! #endif
!
! #ifdef USE_UNNAMED_POSIX_SEMAPHORES
!
! #include <semaphore.h>
!
! typedef sem_t PGSemaphoreData;
! #endif
!
! #ifdef USE_SYSV_SEMAPHORES
!
! typedef struct PGSemaphoreData
! {
!     int            semId;            /* semaphore set identifier */
!     int            semNum;            /* semaphore number within set */
! } PGSemaphoreData;
! #endif
!
! #ifdef USE_WIN32_SEMAPHORES
!
! typedef HANDLE PGSemaphoreData;
  #endif

- typedef PGSemaphoreData *PGSemaphore;


  /* Module initialization (called during postmaster start or shmem reinit) */
  extern void PGReserveSemaphores(int maxSemas, int port);

! /* Initialize a PGSemaphore structure to represent a sema with count 1 */
! extern void PGSemaphoreCreate(PGSemaphore sema);

  /* Reset a previously-initialized PGSemaphore to have count 0 */
  extern void PGSemaphoreReset(PGSemaphore sema);
--- 21,50 ----
  #define PG_SEMA_H

  /*
!  * struct PGSemaphoreData and pointer type PGSemaphore are the data structure
!  * representing an individual semaphore.  The contents of PGSemaphoreData vary
!  * across implementations and must never be touched by platform-independent
!  * code; hence, PGSemaphoreData is declared as an opaque struct here.
   *
!  * However, Windows is sufficiently unlike our other ports that it doesn't
!  * seem worth insisting on ABI compatibility for Windows too.  Hence, on
!  * that platform just define PGSemaphore as HANDLE.
   */
! #ifndef USE_WIN32_SEMAPHORES
! typedef struct PGSemaphoreData *PGSemaphore;
! #else
! typedef HANDLE PGSemaphore;
  #endif


+ /* Report amount of shared memory needed */
+ extern Size PGSemaphoreShmemSize(int maxSemas);

  /* Module initialization (called during postmaster start or shmem reinit) */
  extern void PGReserveSemaphores(int maxSemas, int port);

! /* Allocate a PGSemaphore structure with initial count 1 */
! extern PGSemaphore PGSemaphoreCreate(void);

  /* Reset a previously-initialized PGSemaphore to have count 0 */
  extern void PGSemaphoreReset(PGSemaphore sema);
diff --git a/src/include/storage/proc.h b/src/include/storage/proc.h
index 6fa7125..0344f42 100644
*** a/src/include/storage/proc.h
--- b/src/include/storage/proc.h
*************** struct PGPROC
*** 87,93 ****
      SHM_QUEUE    links;            /* list link if process is in a list */
      PGPROC      **procgloballist; /* procglobal list that owns this PGPROC */

!     PGSemaphoreData sem;        /* ONE semaphore to sleep on */
      int            waitStatus;        /* STATUS_WAITING, STATUS_OK or STATUS_ERROR */

      Latch        procLatch;        /* generic latch for process */
--- 87,93 ----
      SHM_QUEUE    links;            /* list link if process is in a list */
      PGPROC      **procgloballist; /* procglobal list that owns this PGPROC */

!     PGSemaphore sem;            /* ONE semaphore to sleep on */
      int            waitStatus;        /* STATUS_WAITING, STATUS_OK or STATUS_ERROR */

      Latch        procLatch;        /* generic latch for process */
*************** struct PGPROC
*** 116,122 ****
      proclist_node lwWaitLink;    /* position in LW lock wait list */

      /* Support for condition variables. */
!     proclist_node    cvWaitLink;    /* position in CV wait list */

      /* Info about lock the process is currently waiting for, if any. */
      /* waitLock and waitProcLock are NULL if not currently waiting. */
--- 116,122 ----
      proclist_node lwWaitLink;    /* position in LW lock wait list */

      /* Support for condition variables. */
!     proclist_node cvWaitLink;    /* position in CV wait list */

      /* Info about lock the process is currently waiting for, if any. */
      /* waitLock and waitProcLock are NULL if not currently waiting. */
diff --git a/src/include/storage/shmem.h b/src/include/storage/shmem.h
index 2560e6c..e4faebf 100644
*** a/src/include/storage/shmem.h
--- b/src/include/storage/shmem.h
*************** extern void InitShmemAccess(void *seghdr
*** 36,41 ****
--- 36,42 ----
  extern void InitShmemAllocation(void);
  extern void *ShmemAlloc(Size size);
  extern void *ShmemAllocNoError(Size size);
+ extern void *ShmemAllocUnlocked(Size size);
  extern bool ShmemAddrIsValid(const void *addr);
  extern void InitShmemIndex(void);
  extern HTAB *ShmemInitHash(const char *name, long init_size, long max_size,
diff --git a/src/include/storage/spin.h b/src/include/storage/spin.h
index 5041225..b95c9bc 100644
*** a/src/include/storage/spin.h
--- b/src/include/storage/spin.h
*************** extern int    SpinlockSemas(void);
*** 70,77 ****
  extern Size SpinlockSemaSize(void);

  #ifndef HAVE_SPINLOCKS
! extern void SpinlockSemaInit(PGSemaphore);
! extern PGSemaphore SpinlockSemaArray;
  #endif

  #endif   /* SPIN_H */
--- 70,77 ----
  extern Size SpinlockSemaSize(void);

  #ifndef HAVE_SPINLOCKS
! extern void SpinlockSemaInit(void);
! extern PGSemaphore *SpinlockSemaArray;
  #endif

  #endif   /* SPIN_H */

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Back-patch use of unnamed POSIX semaphores for Linux?

От
Craig Ringer
Дата:


On 11 Dec. 2016 07:44, "Tom Lane" <tgl@sss.pgh.pa.us> wrote:

I think we need to do at least this much for v10, because otherwise
we'll face ABI issues if an extension is compiled against code with
one semaphore API choice and used with code with a different one.

+1, this is a good idea. Your performance comments make sense too.