Обсуждение: Unexpected "shared memory block is still in use"

Поиск
Список
Период
Сортировка

Unexpected "shared memory block is still in use"

От
Tom Lane
Дата:
Just now, while running a parallel check-world on HEAD according to the
same script I've been using for quite some time, one of the TAP tests
died during initdb:

selecting dynamic shared memory implementation ... posix
selecting default max_connections ... 100
selecting default shared_buffers ... 128MB
selecting default timezone ... America/New_York
creating configuration files ... ok
running bootstrap script ... ok
performing post-bootstrap initialization ... 2019-05-08 13:59:19.963 EDT [18351] FATAL:  pre-existing shared memory
block(key 5440004, ID 1734475802) is still in use 
2019-05-08 13:59:19.963 EDT [18351] HINT:  Terminate any old server processes associated with data directory
"/home/postgres/pgsql/src/test/subscription/tmp_check/t_004_sync_publisher_data/pgdata".
child process exited with exit code 1
initdb: removing data directory "/home/postgres/pgsql/src/test/subscription/tmp_check/t_004_sync_publisher_data/pgdata"
Bail out!  system initdb failed

I have never seen this happen before in the TAP tests.

I think the odds are very high that this implies something wrong with
commit c09850992.

My immediate guess after eyeballing that patch quickly is that it was
not a good idea to redefine the rules used by bootstrap/standalone
backends.  In particular, it seems somewhat plausible that the bootstrap
process hadn't yet completely died when the standalone backend for the
post-bootstrap phase came along and decided there was a conflict (which
it never would have before).

            regards, tom lane



Re: Unexpected "shared memory block is still in use"

От
Noah Misch
Дата:
On Wed, May 08, 2019 at 02:32:46PM -0400, Tom Lane wrote:
> Just now, while running a parallel check-world on HEAD according to the
> same script I've been using for quite some time, one of the TAP tests
> died during initdb:
> 
> selecting dynamic shared memory implementation ... posix
> selecting default max_connections ... 100
> selecting default shared_buffers ... 128MB
> selecting default timezone ... America/New_York
> creating configuration files ... ok
> running bootstrap script ... ok
> performing post-bootstrap initialization ... 2019-05-08 13:59:19.963 EDT [18351] FATAL:  pre-existing shared memory
block(key 5440004, ID 1734475802) is still in use
 
> 2019-05-08 13:59:19.963 EDT [18351] HINT:  Terminate any old server processes associated with data directory
"/home/postgres/pgsql/src/test/subscription/tmp_check/t_004_sync_publisher_data/pgdata".
> child process exited with exit code 1
> initdb: removing data directory
"/home/postgres/pgsql/src/test/subscription/tmp_check/t_004_sync_publisher_data/pgdata"
> Bail out!  system initdb failed
> 
> I have never seen this happen before in the TAP tests.
> 
> I think the odds are very high that this implies something wrong with
> commit c09850992.

The odds are very high that you would not have gotten that error before that
commit.  But if the cause matches your guess, it's not something wrong with
the commit ...

> My immediate guess after eyeballing that patch quickly is that it was
> not a good idea to redefine the rules used by bootstrap/standalone
> backends.  In particular, it seems somewhat plausible that the bootstrap
> process hadn't yet completely died when the standalone backend for the
> post-bootstrap phase came along and decided there was a conflict (which
> it never would have before).

If so, I would sure try to fix the initdb sequence to not let that happen.  I
would not trust such a conflict to be harmless.

What OS, OS version, and filesystem?



Re: Unexpected "shared memory block is still in use"

От
Kyotaro HORIGUCHI
Дата:
At Wed, 8 May 2019 22:54:14 -0700, Noah Misch <noah@leadboat.com> wrote in <20190509055414.GB1066859@rfd.leadboat.com>
> On Wed, May 08, 2019 at 02:32:46PM -0400, Tom Lane wrote:
> > Just now, while running a parallel check-world on HEAD according to the
> > same script I've been using for quite some time, one of the TAP tests
> > died during initdb:
> > 
> > selecting dynamic shared memory implementation ... posix
> > selecting default max_connections ... 100
> > selecting default shared_buffers ... 128MB
> > selecting default timezone ... America/New_York
> > creating configuration files ... ok
> > running bootstrap script ... ok
> > performing post-bootstrap initialization ... 2019-05-08 13:59:19.963 EDT [18351] FATAL:  pre-existing shared memory
block(key 5440004, ID 1734475802) is still in use
 
> > 2019-05-08 13:59:19.963 EDT [18351] HINT:  Terminate any old server processes associated with data directory
"/home/postgres/pgsql/src/test/subscription/tmp_check/t_004_sync_publisher_data/pgdata".
> > child process exited with exit code 1
> > initdb: removing data directory
"/home/postgres/pgsql/src/test/subscription/tmp_check/t_004_sync_publisher_data/pgdata"
> > Bail out!  system initdb failed
> > 
> > I have never seen this happen before in the TAP tests.
> > 
> > I think the odds are very high that this implies something wrong with
> > commit c09850992.
> 
> The odds are very high that you would not have gotten that error before that
> commit.  But if the cause matches your guess, it's not something wrong with
> the commit ...
> 
> > My immediate guess after eyeballing that patch quickly is that it was
> > not a good idea to redefine the rules used by bootstrap/standalone
> > backends.  In particular, it seems somewhat plausible that the bootstrap
> > process hadn't yet completely died when the standalone backend for the
> > post-bootstrap phase came along and decided there was a conflict (which
> > it never would have before).
> 
> If so, I would sure try to fix the initdb sequence to not let that happen.  I
> would not trust such a conflict to be harmless.
> 
> What OS, OS version, and filesystem?

PGSharedMemoryCreate shows the error in SHMSTATE_ANALYSYS_FAILURE
case. PGSharedMemoryAttach returns the code when, for example,
shmat failed with ENOMEM. I'm afraid that the message is not
shown from SHMSTATE_ATTACHED..

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




Re: Unexpected "shared memory block is still in use"

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> On Wed, May 08, 2019 at 02:32:46PM -0400, Tom Lane wrote:
>> Just now, while running a parallel check-world on HEAD according to the
>> same script I've been using for quite some time, one of the TAP tests
>> died during initdb:
>> performing post-bootstrap initialization ... 2019-05-08 13:59:19.963 EDT [18351] FATAL:  pre-existing shared memory
block(key 5440004, ID 1734475802) is still in use 

> The odds are very high that you would not have gotten that error before that
> commit.  But if the cause matches your guess, it's not something wrong with
> the commit ...

Fair point.

> What OS, OS version, and filesystem?

Up-to-date RHEL6 (kernel 2.6.32-754.12.1.el6.x86_64), ext4 over LVM
on spinning rust with an LSI MegaRAID controller in front of it.

Since complaining, I've done half a dozen more parallel check-worlds
without issue, so the error was and still is rare.  This matches the
fact that we've not seen it in the buildfarm :-(.

            regards, tom lane



Re: Unexpected "shared memory block is still in use"

От
Tom Lane
Дата:
I wrote:
> Noah Misch <noah@leadboat.com> writes:
>> The odds are very high that you would not have gotten that error before that
>> commit.  But if the cause matches your guess, it's not something wrong with
>> the commit ...

> Fair point.

After more study and testing, I no longer believe my original thought
about a bootstrap-to-standalone-backend race condition.  The bootstrap
process definitely kills its SysV shmem segment before exiting.

However, I have a new theory, after noticing that c09850992 moved the
check for shm_nattch == 0.  Previously, if a shmem segment had zero attach
count, it was unconditionally considered not-a-threat.  Now, we'll try
shmat() anyway, and if that fails for any reason other than EACCES, we say
SHMSTATE_ANALYSIS_FAILURE which leads to the described error report.
So I suspect that what we hit was a race condition whereby some other
parallel test was using the same shmem ID and we managed to see its
segment successfully in shmctl but then it was gone by the time we did
shmat.  This leads me to think that EINVAL and EIDRM failures from
shmat had better be considered SHMSTATE_ENOENT not
SHMSTATE_ANALYSIS_FAILURE.

In principle this is a longstanding race condition, but I wonder
whether we made it more probable by moving the shm_nattch check.

            regards, tom lane



Re: Unexpected "shared memory block is still in use"

От
Tom Lane
Дата:
I wrote:
> However, I have a new theory, after noticing that c09850992 moved the
> check for shm_nattch == 0.  Previously, if a shmem segment had zero attach
> count, it was unconditionally considered not-a-threat.  Now, we'll try
> shmat() anyway, and if that fails for any reason other than EACCES, we say
> SHMSTATE_ANALYSIS_FAILURE which leads to the described error report.
> So I suspect that what we hit was a race condition whereby some other
> parallel test was using the same shmem ID and we managed to see its
> segment successfully in shmctl but then it was gone by the time we did
> shmat.  This leads me to think that EINVAL and EIDRM failures from
> shmat had better be considered SHMSTATE_ENOENT not
> SHMSTATE_ANALYSIS_FAILURE.
> In principle this is a longstanding race condition, but I wonder
> whether we made it more probable by moving the shm_nattch check.

Hah --- this is a real race condition, and I can demonstrate it very
easily by inserting a sleep right there, as in the attached
for-testing-only patch.

The particular parallelism level I use is

make -s check-world -j4 PROVE_FLAGS='-j4 --quiet --nocolor --nocount'

on a dual-socket 4-cores-per-socket Xeon machine.  With that command and
this patch, I frequently get multiple failures per run, and they all
report either EINVAL or EIDRM.

The patch generally reports that nattch had been 1, so my thought that
that change might've made it worse seems unfounded.  But we have
absolutely got a hittable race condition here.  The real fix should
be on the order of

        if (errno == EACCES)
            return SHMSTATE_FOREIGN;
+        else if (errno == EINVAL || errno == EIDRM)
+            return SHMSTATE_ENOENT;
        else
            return SHMSTATE_ANALYSIS_FAILURE;

(plus comments of course).

            regards, tom lane

diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index a9d7bf9..d390ba4 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -387,6 +387,8 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
     if (stat(DataDir, &statbuf) < 0)
         return SHMSTATE_ANALYSIS_FAILURE;    /* can't stat; be conservative */
 
+    pg_usleep(100000);
+
     /*
      * Attachment fails if we have no write permission.  Since that will never
      * happen with Postgres IPCProtection, such a failure shows the segment is
@@ -398,8 +400,9 @@ PGSharedMemoryAttach(IpcMemoryId shmId,
     {
         if (errno == EACCES)
             return SHMSTATE_FOREIGN;
-        else
-            return SHMSTATE_ANALYSIS_FAILURE;
+        elog(LOG, "shmat(0x%lx) failed: %m, nattch had been %ld",
+             (long) shmId, (long) shmStat.shm_nattch);
+        return SHMSTATE_ANALYSIS_FAILURE;
     }
     *addr = hdr;


Re: Unexpected "shared memory block is still in use"

От
Noah Misch
Дата:
On Thu, May 09, 2019 at 06:47:58PM -0400, Tom Lane wrote:
> I wrote:
> > However, I have a new theory, after noticing that c09850992 moved the
> > check for shm_nattch == 0.  Previously, if a shmem segment had zero attach
> > count, it was unconditionally considered not-a-threat.  Now, we'll try
> > shmat() anyway, and if that fails for any reason other than EACCES, we say
> > SHMSTATE_ANALYSIS_FAILURE which leads to the described error report.
> > So I suspect that what we hit was a race condition whereby some other
> > parallel test was using the same shmem ID and we managed to see its
> > segment successfully in shmctl but then it was gone by the time we did
> > shmat.  This leads me to think that EINVAL and EIDRM failures from
> > shmat had better be considered SHMSTATE_ENOENT not
> > SHMSTATE_ANALYSIS_FAILURE.
> > In principle this is a longstanding race condition, but I wonder
> > whether we made it more probable by moving the shm_nattch check.
> 
> Hah --- this is a real race condition, and I can demonstrate it very
> easily by inserting a sleep right there, as in the attached
> for-testing-only patch.
> 
> The particular parallelism level I use is
> 
> make -s check-world -j4 PROVE_FLAGS='-j4 --quiet --nocolor --nocount'
> 
> on a dual-socket 4-cores-per-socket Xeon machine.  With that command and
> this patch, I frequently get multiple failures per run, and they all
> report either EINVAL or EIDRM.
> 
> The patch generally reports that nattch had been 1, so my thought that
> that change might've made it worse seems unfounded.  But we have
> absolutely got a hittable race condition here.  The real fix should
> be on the order of
> 
>         if (errno == EACCES)
>             return SHMSTATE_FOREIGN;
> +        else if (errno == EINVAL || errno == EIDRM)
> +            return SHMSTATE_ENOENT;
>         else
>             return SHMSTATE_ANALYSIS_FAILURE;
> 
> (plus comments of course).

Looks good.  That is basically a defect in commit c09850992; the race passed
from irrelevance to relevance when that commit subjected more segments to the
test.  Thanks for diagnosing it.



Re: Unexpected "shared memory block is still in use"

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> Looks good.  That is basically a defect in commit c09850992; the race passed
> from irrelevance to relevance when that commit subjected more segments to the
> test.  Thanks for diagnosing it.

The bug's far older than that, surely, since before c09850992 we treated
*any* shmat failure as meaning we'd better fail.  I think you're right
that c09850992 might've made it slightly more probable, but most likely
the bottom line here is just that we haven't been doing parallel
check-worlds a lot until relatively recently.  The buildfarm would be
kind of unlikely to hit this I think --- AFAIK it doesn't launch multiple
postmasters using the same port number concurrently.  But parallel
invocation of TAP test scripts makes the hazard real.

Will go fix/backpatch in a minute.

            regards, tom lane



Re: Unexpected "shared memory block is still in use"

От
Tom Lane
Дата:
I wrote:
> Will go fix/backpatch in a minute.

Done now, but while thinking more about the issue, I had an idea: why is
it that we base the shmem key on the postmaster's port number, and not
on the data directory's inode number?  Using the port number not only
increases the risk of collisions (though admittedly only in testing
situations), but it *decreases* our ability to detect real conflicts.
Consider case where DBA wants to change the installation's port number,
and he edits postgresql.conf, but then uses "kill -9 && rm postmaster.pid"
rather than some saner way of stopping the old postmaster.  When he
starts the new one, it won't detect any remaining children of the old
postmaster because it'll be looking in the wrong range of shmem keys.
It seems like something tied to the data directory's identity would
be much more trustworthy.

I think the reason for doing it this way originally was to allow
one to identify which shmem segment is which in "ipcs -m" output.
But that was back when having to clean up shmem segments manually
was still a common task.  It's been a long time since I can remember
needing to figure out which was which.

            regards, tom lane



Re: Unexpected "shared memory block is still in use"

От
Noah Misch
Дата:
On Fri, May 10, 2019 at 04:46:40PM -0400, Tom Lane wrote:
> I wrote:
> > Will go fix/backpatch in a minute.
> 
> Done now, but while thinking more about the issue, I had an idea: why is
> it that we base the shmem key on the postmaster's port number, and not
> on the data directory's inode number?  Using the port number not only
> increases the risk of collisions (though admittedly only in testing
> situations), but it *decreases* our ability to detect real conflicts.
> Consider case where DBA wants to change the installation's port number,
> and he edits postgresql.conf, but then uses "kill -9 && rm postmaster.pid"
> rather than some saner way of stopping the old postmaster.  When he
> starts the new one, it won't detect any remaining children of the old
> postmaster because it'll be looking in the wrong range of shmem keys.
> It seems like something tied to the data directory's identity would
> be much more trustworthy.

Good point.  Since we now ignore (SHMSTATE_FOREIGN) any segment that bears
(st_dev,st_ino) not matching $PGDATA, the change you describe couldn't make us
fail to detect a real conflict or miss a cleanup opportunity.  It would reduce
the ability to test sysv_shmem.c; I suppose one could add a debug GUC to
override the start of the key space.

> I think the reason for doing it this way originally was to allow
> one to identify which shmem segment is which in "ipcs -m" output.
> But that was back when having to clean up shmem segments manually
> was still a common task.  It's been a long time since I can remember
> needing to figure out which was which.

I don't see that presenting a problem these days, agreed.



Re: Unexpected "shared memory block is still in use"

От
Tom Lane
Дата:
Noah Misch <noah@leadboat.com> writes:
> On Fri, May 10, 2019 at 04:46:40PM -0400, Tom Lane wrote:
>> Done now, but while thinking more about the issue, I had an idea: why is
>> it that we base the shmem key on the postmaster's port number, and not
>> on the data directory's inode number?  Using the port number not only
>> increases the risk of collisions (though admittedly only in testing
>> situations), but it *decreases* our ability to detect real conflicts.
>> Consider case where DBA wants to change the installation's port number,
>> and he edits postgresql.conf, but then uses "kill -9 && rm postmaster.pid"
>> rather than some saner way of stopping the old postmaster.  When he
>> starts the new one, it won't detect any remaining children of the old
>> postmaster because it'll be looking in the wrong range of shmem keys.
>> It seems like something tied to the data directory's identity would
>> be much more trustworthy.

> Good point.  Since we now ignore (SHMSTATE_FOREIGN) any segment that bears
> (st_dev,st_ino) not matching $PGDATA, the change you describe couldn't make us
> fail to detect a real conflict or miss a cleanup opportunity.  It would reduce
> the ability to test sysv_shmem.c; I suppose one could add a debug GUC to
> override the start of the key space.

Attached is a draft patch to change both shmem and sema key selection
to be based on data directory inode rather than port.

I considered using "st_ino ^ st_dev", or some such, but decided that
that would largely just make it harder to manually correlate IPC
keys with running postmasters.  It's generally easy to find out the
data directory inode number with "ls", but the extra work to find out
and XOR in the device number is not so easy, and it's not clear what
it'd buy us in typical scenarios.

The Windows code seems fine as-is: it's already using data directory
name, not port, to set up shmem, and it doesn't need anything for
semaphores.

I'm not quite sure what's going on in src/test/recovery/t/017_shm.pl.
As expected, the test for port number non-collision no longer sees
a failure.  After fixing that, the test passes, but it takes a
ridiculously long time (minutes); apparently each postmaster start/stop
cycle takes much longer than it ought to.  I suppose this patch is
breaking its assumptions, but I've not studied it.  We'd have to do
something about that before this would be committable.

I'll add this to the next commitfest.

            regards, tom lane

diff --git a/src/backend/port/posix_sema.c b/src/backend/port/posix_sema.c
index 3370adf..bdd6552 100644
--- a/src/backend/port/posix_sema.c
+++ b/src/backend/port/posix_sema.c
@@ -29,6 +29,7 @@
 #include <semaphore.h>
 #include <signal.h>
 #include <unistd.h>
+#include <sys/stat.h>

 #include "miscadmin.h"
 #include "storage/ipc.h"
@@ -181,10 +182,6 @@ PGSemaphoreShmemSize(int maxSemas)
  * are acquired here or in PGSemaphoreCreate, register an on_shmem_exit
  * callback to release them.
  *
- * The port number is passed for possible use as a key (for Posix, we use
- * it to generate the starting semaphore name).  In a standalone backend,
- * zero will be passed.
- *
  * In the Posix implementation, we acquire semaphores on-demand; the
  * maxSemas parameter is just used to size the arrays.  For unnamed
  * semaphores, there is an array of PGSemaphoreData structs in shared memory.
@@ -196,8 +193,22 @@ PGSemaphoreShmemSize(int maxSemas)
  * we don't have to expose the counters to other processes.)
  */
 void
-PGReserveSemaphores(int maxSemas, int port)
+PGReserveSemaphores(int maxSemas)
 {
+    struct stat statbuf;
+
+    /*
+     * We use the data directory's inode number to seed the search for free
+     * semaphore keys.  This minimizes the odds of collision with other
+     * postmasters, while maximizing the odds that we will detect and clean up
+     * semaphores left over from a crashed postmaster in our own directory.
+     */
+    if (stat(DataDir, &statbuf) < 0)
+        ereport(FATAL,
+                (errcode_for_file_access(),
+                 errmsg("could not stat data directory \"%s\": %m",
+                        DataDir)));
+
 #ifdef USE_NAMED_POSIX_SEMAPHORES
     mySemPointers = (sem_t **) malloc(maxSemas * sizeof(sem_t *));
     if (mySemPointers == NULL)
@@ -215,7 +226,7 @@ PGReserveSemaphores(int maxSemas, int port)

     numSems = 0;
     maxSems = maxSemas;
-    nextSemKey = port * 1000;
+    nextSemKey = statbuf.st_ino;

     on_shmem_exit(ReleaseSemaphores, 0);
 }
diff --git a/src/backend/port/sysv_sema.c b/src/backend/port/sysv_sema.c
index ac5106f..a1652cc 100644
--- a/src/backend/port/sysv_sema.c
+++ b/src/backend/port/sysv_sema.c
@@ -17,6 +17,7 @@
 #include <signal.h>
 #include <unistd.h>
 #include <sys/file.h>
+#include <sys/stat.h>
 #ifdef HAVE_SYS_IPC_H
 #include <sys/ipc.h>
 #endif
@@ -301,10 +302,6 @@ PGSemaphoreShmemSize(int maxSemas)
  * are acquired here or in PGSemaphoreCreate, register an on_shmem_exit
  * callback to release them.
  *
- * The port number is passed for possible use as a key (for SysV, we use
- * it to generate the starting semaphore key).  In a standalone backend,
- * zero will be passed.
- *
  * In the SysV implementation, we acquire semaphore sets on-demand; the
  * maxSemas parameter is just used to size the arrays.  There is an array
  * of PGSemaphoreData structs in shared memory, and a postmaster-local array
@@ -314,8 +311,22 @@ PGSemaphoreShmemSize(int maxSemas)
  * have clobbered.)
  */
 void
-PGReserveSemaphores(int maxSemas, int port)
+PGReserveSemaphores(int maxSemas)
 {
+    struct stat statbuf;
+
+    /*
+     * We use the data directory's inode number to seed the search for free
+     * semaphore keys.  This minimizes the odds of collision with other
+     * postmasters, while maximizing the odds that we will detect and clean up
+     * semaphores left over from a crashed postmaster in our own directory.
+     */
+    if (stat(DataDir, &statbuf) < 0)
+        ereport(FATAL,
+                (errcode_for_file_access(),
+                 errmsg("could not stat data directory \"%s\": %m",
+                        DataDir)));
+
     /*
      * We must use ShmemAllocUnlocked(), since the spinlock protecting
      * ShmemAlloc() won't be ready yet.  (This ordering is necessary when we
@@ -332,7 +343,7 @@ PGReserveSemaphores(int maxSemas, int port)
     if (mySemaSets == NULL)
         elog(PANIC, "out of memory");
     numSemaSets = 0;
-    nextSemaKey = port * 1000;
+    nextSemaKey = statbuf.st_ino;
     nextSemaNumber = SEMAS_PER_SET; /* force sema set alloc on 1st call */

     on_shmem_exit(ReleaseSemaphores, 0);
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 968506d..dab2920 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -390,11 +390,12 @@ PGSharedMemoryAttach(IpcMemoryId shmId,

     /*
      * Try to attach to the segment and see if it matches our data directory.
-     * This avoids key-conflict problems on machines that are running several
-     * postmasters under the same userid and port number.  (That would not
-     * ordinarily happen in production, but it can happen during parallel
-     * testing.  Since our test setups don't open any TCP ports on Unix, such
-     * cases don't conflict otherwise.)
+     * This avoids any risk of duplicate-shmem-key conflicts on machines that
+     * are running several postmasters under the same userid.
+     *
+     * (When we're called from PGSharedMemoryCreate, this stat call is
+     * duplicative; but since this isn't a high-traffic case it's not worth
+     * trying to optimize.)
      */
     if (stat(DataDir, &statbuf) < 0)
         return SHMSTATE_ANALYSIS_FAILURE;    /* can't stat; be conservative */
@@ -617,12 +618,9 @@ AnonymousShmemDetach(int status, Datum arg)
  * we do not fail upon collision with foreign shmem segments.  The idea here
  * is to detect and re-use keys that may have been assigned by a crashed
  * postmaster or backend.
- *
- * The port number is passed for possible use as a key (for SysV, we use
- * it to generate the starting shmem key).
  */
 PGShmemHeader *
-PGSharedMemoryCreate(Size size, int port,
+PGSharedMemoryCreate(Size size,
                      PGShmemHeader **shim)
 {
     IpcMemoryKey NextShmemSegID;
@@ -631,6 +629,17 @@ PGSharedMemoryCreate(Size size, int port,
     struct stat statbuf;
     Size        sysvsize;

+    /*
+     * We use the data directory's ID info (inode and device numbers) to
+     * positively identify shmem segments associated with this data dir, and
+     * also as seeds for searching for a free shmem key.
+     */
+    if (stat(DataDir, &statbuf) < 0)
+        ereport(FATAL,
+                (errcode_for_file_access(),
+                 errmsg("could not stat data directory \"%s\": %m",
+                        DataDir)));
+
     /* Complain if hugepages demanded but we can't possibly support them */
 #if !defined(MAP_HUGETLB)
     if (huge_pages == HUGE_PAGES_ON)
@@ -659,10 +668,10 @@ PGSharedMemoryCreate(Size size, int port,
     /*
      * Loop till we find a free IPC key.  Trust CreateDataDirLockFile() to
      * ensure no more than one postmaster per data directory can enter this
-     * loop simultaneously.  (CreateDataDirLockFile() does not ensure that,
-     * but prefer fixing it over coping here.)
+     * loop simultaneously.  (CreateDataDirLockFile() does not entirely ensure
+     * that, but prefer fixing it over coping here.)
      */
-    NextShmemSegID = 1 + port * 1000;
+    NextShmemSegID = statbuf.st_ino;

     for (;;)
     {
@@ -748,11 +757,6 @@ PGSharedMemoryCreate(Size size, int port,
     hdr->dsm_control = 0;

     /* Fill in the data directory ID info, too */
-    if (stat(DataDir, &statbuf) < 0)
-        ereport(FATAL,
-                (errcode_for_file_access(),
-                 errmsg("could not stat data directory \"%s\": %m",
-                        DataDir)));
     hdr->device = statbuf.st_dev;
     hdr->inode = statbuf.st_ino;

diff --git a/src/backend/port/win32_sema.c b/src/backend/port/win32_sema.c
index 013c122..32cc697 100644
--- a/src/backend/port/win32_sema.c
+++ b/src/backend/port/win32_sema.c
@@ -44,7 +44,7 @@ PGSemaphoreShmemSize(int maxSemas)
  * process exits.
  */
 void
-PGReserveSemaphores(int maxSemas, int port)
+PGReserveSemaphores(int maxSemas)
 {
     mySemSet = (HANDLE *) malloc(maxSemas * sizeof(HANDLE));
     if (mySemSet == NULL)
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index ccd7b6b..6cb6328 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -194,7 +194,7 @@ EnableLockPagesPrivilege(int elevel)
  * standard header.
  */
 PGShmemHeader *
-PGSharedMemoryCreate(Size size, int port,
+PGSharedMemoryCreate(Size size,
                      PGShmemHeader **shim)
 {
     void       *memAddress;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 3339804..67cdeb5 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -385,7 +385,7 @@ static void getInstallationPaths(const char *argv0);
 static void checkControlFile(void);
 static Port *ConnCreate(int serverFd);
 static void ConnFree(Port *port);
-static void reset_shared(int port);
+static void reset_shared(void);
 static void SIGHUP_handler(SIGNAL_ARGS);
 static void pmdie(SIGNAL_ARGS);
 static void reaper(SIGNAL_ARGS);
@@ -1175,7 +1175,7 @@ PostmasterMain(int argc, char *argv[])
     /*
      * Set up shared memory and semaphores.
      */
-    reset_shared(PostPortNumber);
+    reset_shared();

     /*
      * Estimate number of openable files.  This must happen after setting up
@@ -2599,17 +2599,16 @@ InitProcessGlobals(void)
  * reset_shared -- reset shared memory and semaphores
  */
 static void
-reset_shared(int port)
+reset_shared(void)
 {
     /*
      * Create or re-create shared memory and semaphores.
      *
      * Note: in each "cycle of life" we will normally assign the same IPC keys
-     * (if using SysV shmem and/or semas), since the port number is used to
-     * determine IPC keys.  This helps ensure that we will clean up dead IPC
-     * objects if the postmaster crashes and is restarted.
+     * (if using SysV shmem and/or semas).  This helps ensure that we will
+     * clean up dead IPC objects if the postmaster crashes and is restarted.
      */
-    CreateSharedMemoryAndSemaphores(port);
+    CreateSharedMemoryAndSemaphores();
 }


@@ -3919,7 +3918,7 @@ PostmasterStateMachine(void)
         /* re-read control file into local memory */
         LocalProcessControlFile(true);

-        reset_shared(PostPortNumber);
+        reset_shared();

         StartupPID = StartupDataBase();
         Assert(StartupPID != 0);
@@ -4947,7 +4946,7 @@ SubPostmasterMain(int argc, char *argv[])
         InitProcess();

         /* Attach process to shared data structures */
-        CreateSharedMemoryAndSemaphores(0);
+        CreateSharedMemoryAndSemaphores();

         /* And run the backend */
         BackendRun(&port);        /* does not return */
@@ -4961,7 +4960,7 @@ SubPostmasterMain(int argc, char *argv[])
         InitAuxiliaryProcess();

         /* Attach process to shared data structures */
-        CreateSharedMemoryAndSemaphores(0);
+        CreateSharedMemoryAndSemaphores();

         AuxiliaryProcessMain(argc - 2, argv + 2);    /* does not return */
     }
@@ -4974,7 +4973,7 @@ SubPostmasterMain(int argc, char *argv[])
         InitProcess();

         /* Attach process to shared data structures */
-        CreateSharedMemoryAndSemaphores(0);
+        CreateSharedMemoryAndSemaphores();

         AutoVacLauncherMain(argc - 2, argv + 2);    /* does not return */
     }
@@ -4987,7 +4986,7 @@ SubPostmasterMain(int argc, char *argv[])
         InitProcess();

         /* Attach process to shared data structures */
-        CreateSharedMemoryAndSemaphores(0);
+        CreateSharedMemoryAndSemaphores();

         AutoVacWorkerMain(argc - 2, argv + 2);    /* does not return */
     }
@@ -5005,7 +5004,7 @@ SubPostmasterMain(int argc, char *argv[])
         InitProcess();

         /* Attach process to shared data structures */
-        CreateSharedMemoryAndSemaphores(0);
+        CreateSharedMemoryAndSemaphores();

         /* Fetch MyBgworkerEntry from shared memory */
         shmem_slot = atoi(argv[1] + 15);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index d7d7335..8853706 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -91,7 +91,7 @@ RequestAddinShmemSpace(Size size)
  * This is a bit code-wasteful and could be cleaned up.)
  */
 void
-CreateSharedMemoryAndSemaphores(int port)
+CreateSharedMemoryAndSemaphores(void)
 {
     PGShmemHeader *shim = NULL;

@@ -163,14 +163,14 @@ CreateSharedMemoryAndSemaphores(int port)
         /*
          * Create the shmem segment
          */
-        seghdr = PGSharedMemoryCreate(size, port, &shim);
+        seghdr = PGSharedMemoryCreate(size, &shim);

         InitShmemAccess(seghdr);

         /*
          * Create semaphores
          */
-        PGReserveSemaphores(numSemas, port);
+        PGReserveSemaphores(numSemas);

         /*
          * If spinlocks are disabled, initialize emulation layer (which
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 43b9f17..29c5ec7 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -445,12 +445,10 @@ InitCommunication(void)
     if (!IsUnderPostmaster)        /* postmaster already did this */
     {
         /*
-         * We're running a postgres bootstrap process or a standalone backend.
-         * Though we won't listen on PostPortNumber, use it to select a shmem
-         * key.  This increases the chance of detecting a leftover live
-         * backend of this DataDir.
+         * We're running a postgres bootstrap process or a standalone backend,
+         * so we need to set up shmem.
          */
-        CreateSharedMemoryAndSemaphores(PostPortNumber);
+        CreateSharedMemoryAndSemaphores();
     }
 }

diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index e9b243f..d113813 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -76,6 +76,6 @@ extern void on_exit_reset(void);
 /* ipci.c */
 extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;

-extern void CreateSharedMemoryAndSemaphores(int port);
+extern void CreateSharedMemoryAndSemaphores(void);

 #endif                            /* IPC_H */
diff --git a/src/include/storage/pg_sema.h b/src/include/storage/pg_sema.h
index b95efa1..1ac4f9a 100644
--- a/src/include/storage/pg_sema.h
+++ b/src/include/storage/pg_sema.h
@@ -41,7 +41,7 @@ typedef HANDLE PGSemaphore;
 extern Size PGSemaphoreShmemSize(int maxSemas);

 /* Module initialization (called during postmaster start or shmem reinit) */
-extern void PGReserveSemaphores(int maxSemas, int port);
+extern void PGReserveSemaphores(int maxSemas);

 /* Allocate a PGSemaphore structure with initial count 1 */
 extern PGSemaphore PGSemaphoreCreate(void);
diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index ac24878..6bd5664 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -82,7 +82,7 @@ extern void PGSharedMemoryReAttach(void);
 extern void PGSharedMemoryNoReAttach(void);
 #endif

-extern PGShmemHeader *PGSharedMemoryCreate(Size size, int port,
+extern PGShmemHeader *PGSharedMemoryCreate(Size size,
                                            PGShmemHeader **shim);
 extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2);
 extern void PGSharedMemoryDetach(void);
diff --git a/src/test/recovery/t/017_shm.pl b/src/test/recovery/t/017_shm.pl
index 7f10ff5..0b988eb 100644
--- a/src/test/recovery/t/017_shm.pl
+++ b/src/test/recovery/t/017_shm.pl
@@ -152,14 +152,12 @@ print STDERR $single_stderr;
 like($single_stderr, $pre_existing_msg,
     'single-user mode detected live backend via shared memory');
 log_ipcs();
-# Fail to reject startup if shm key N has become available and we crash while
-# using key N+1.  This is unwanted, but expected.  Windows is immune, because
-# its GetSharedMemName() use DataDir strings, not numeric keys.
+# Reject startup if shm key N has become available and we crash while
+# using key N+1.
 $flea->stop;    # release first key
 is( $gnat->start(fail_ok => 1),
-    $TestLib::windows_os ? 0 : 1,
-    'key turnover fools only sysv_shmem.c');
-$gnat->stop;     # release first key (no-op on $TestLib::windows_os)
+    0,
+    'key turnover fools nobody');
 $flea->start;    # grab first key
 # cleanup
 TestLib::system_log('pg_ctl', 'kill', 'QUIT', $slow_pid);

Re: Unexpected "shared memory block is still in use"

От
Peter Eisentraut
Дата:
On 2019-08-14 01:22, Tom Lane wrote:
> Attached is a draft patch to change both shmem and sema key selection
> to be based on data directory inode rather than port.
> 
> I considered using "st_ino ^ st_dev", or some such, but decided that
> that would largely just make it harder to manually correlate IPC
> keys with running postmasters.  It's generally easy to find out the
> data directory inode number with "ls", but the extra work to find out
> and XOR in the device number is not so easy, and it's not clear what
> it'd buy us in typical scenarios.

For the POSIX APIs where the numbers are just converted to a string, why
not use both -- or forget about the inodes and use the actual data
directory string.

For the SYSV APIs, the scenario that came to my mind is if someone
starts a bunch of servers each on their own mount, it could happen that
the inodes of the data directories are very similar.

There is also the issue that AFAICT the key_t in the SYSV APIs is always
32-bit whereas inodes are 64-bit.  Probably not a big deal, but it might
prevent an exact one-to-one mapping.

Of course, ftok() is also available here as an existing solution.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Unexpected "shared memory block is still in use"

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> On 2019-08-14 01:22, Tom Lane wrote:
>> Attached is a draft patch to change both shmem and sema key selection
>> to be based on data directory inode rather than port.

> For the POSIX APIs where the numbers are just converted to a string, why
> not use both -- or forget about the inodes and use the actual data
> directory string.

Considering that we still need an operation equivalent to "nextSemKey++"
(in case of a key collision), I'm not really sure how working with strings
rather than ints would make life better.

> For the SYSV APIs, the scenario that came to my mind is if someone
> starts a bunch of servers each on their own mount, it could happen that
> the inodes of the data directories are very similar.

Sure.  That's why I didn't throw away any of the duplicate-key-handling
logic, and why we're still checking for st_dev match when inspecting
particular shmem blocks.  (It also seems likely that somebody
who's doing that would be using similar pathnames on the different
mounts, so that string-based approaches wouldn't exactly be free of
collision problems either.)

> There is also the issue that AFAICT the key_t in the SYSV APIs is always
> 32-bit whereas inodes are 64-bit.  Probably not a big deal, but it might
> prevent an exact one-to-one mapping.

True, although the width of inode numbers is probably pretty platform-
and filesystem-dependent.  We could consider trying some more complicated
mapping like xor'ing high and low halves, but I don't entirely see what
it buys us.

> Of course, ftok() is also available here as an existing solution.

I looked at that briefly, but I don't really see what it'd buy us either,
except for opacity which doesn't seem useful.  The Linux man page pretty
much says in so many words that it's a wrapper for st_ino and st_dev;
and how does it help us if other platforms do it differently?

(Actually, if Linux does it the way the man page suggests, it'd really
be a net negative, because there'd only be 24 bits of key variation
not 32.)

            regards, tom lane



Re: Unexpected "shared memory block is still in use"

От
Peter Eisentraut
Дата:
I agree with this patch and the reasons for it.

A related point, perhaps we should change the key printed into
postmaster.pid to be in hexadecimal format ("0x08x") so that it matches
what ipcs prints.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Unexpected "shared memory block is still in use"

От
Tom Lane
Дата:
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> writes:
> I agree with this patch and the reasons for it.

OK, thanks for reviewing.

> A related point, perhaps we should change the key printed into
> postmaster.pid to be in hexadecimal format ("0x08x") so that it matches
> what ipcs prints.

Hmm, that depends on whose ipcs you use :-(.  A quick survey
of my machines says it's

                key     shmid

Linux:          hex     decimal
FreeBSD:        decimal decimal
NetBSD:         decimal decimal
OpenBSD:        decimal decimal
macOS:          hex     decimal
HPUX:           hex     (not printed)

There's certainly room to argue that hex+decimal is most popular,
but I'm not sure that that outweighs possible compatibility issues
from changing postmaster.pid contents.  (Admittedly, it's not real
clear that anything would be paying attention to the shmem key,
so maybe there's no compatibility issue.)

If we did want to assume that we could change postmaster.pid,
it might be best to print the key both ways?

            regards, tom lane



Re: Unexpected "shared memory block is still in use"

От
Tom Lane
Дата:
I wrote:
> Attached is a draft patch to change both shmem and sema key selection
> to be based on data directory inode rather than port.
> ...
> I'm not quite sure what's going on in src/test/recovery/t/017_shm.pl.
> As expected, the test for port number non-collision no longer sees
> a failure.  After fixing that, the test passes, but it takes a
> ridiculously long time (minutes); apparently each postmaster start/stop
> cycle takes much longer than it ought to.  I suppose this patch is
> breaking its assumptions, but I've not studied it.

After looking closer, the problem is pretty obvious: the initial
loop is trying to create a cluster whose shmem key matches its
port-number-based expectation.  With this code, that will never
happen except by unlikely accident, so it wastes time with repeated
initdb/start/stop attempts.  After 100 tries it gives up and presses
on with the test, resulting in the apparent pass with long runtime.

I now understand the point you made upthread that this test could
only be preserved if we invent some way to force the choice of shmem
key.  While it wouldn't be hard to do that (say, invent a magic
environment variable), I really don't want to do so.  In the field,
such a behavior would have no positive use, and it could destroy our
newly-improved guarantees about detecting conflicting old processes.

However, there's another way to skin this cat.  We can have the
Perl test script create a conflicting shmem segment directly,
as in the attached second-draft patch.  I simplified the test
script quite a bit, since I don't see any particular value in
creating more than one test postmaster with this approach.

This still isn't committable as-is, since the test will just curl up
and die on machines lacking IPC::SharedMem.  (It could be rewritten
to rely only on the lower-level IPC::SysV module, but I doubt that's
worth the trouble, since either way it'd fail on Windows.)  I'm not
sure whether we should just not bother to run the test at all, or
if we should run it but skip the IPC-related parts; and my Perl-fu
isn't really up to implementing either behavior.

Another thing that might be interesting is to do more than just create
the conflicting segment, ie, try to put some data into it that would
fool the postmaster.  I'm not excited about that at all, but maybe
someone else is?

The attached patch is identical to the previous one except for the
changes in src/test/recovery/t/017_shm.pl.

            regards, tom lane

diff --git a/src/backend/port/posix_sema.c b/src/backend/port/posix_sema.c
index 3370adf..bdd6552 100644
--- a/src/backend/port/posix_sema.c
+++ b/src/backend/port/posix_sema.c
@@ -29,6 +29,7 @@
 #include <semaphore.h>
 #include <signal.h>
 #include <unistd.h>
+#include <sys/stat.h>

 #include "miscadmin.h"
 #include "storage/ipc.h"
@@ -181,10 +182,6 @@ PGSemaphoreShmemSize(int maxSemas)
  * are acquired here or in PGSemaphoreCreate, register an on_shmem_exit
  * callback to release them.
  *
- * The port number is passed for possible use as a key (for Posix, we use
- * it to generate the starting semaphore name).  In a standalone backend,
- * zero will be passed.
- *
  * In the Posix implementation, we acquire semaphores on-demand; the
  * maxSemas parameter is just used to size the arrays.  For unnamed
  * semaphores, there is an array of PGSemaphoreData structs in shared memory.
@@ -196,8 +193,22 @@ PGSemaphoreShmemSize(int maxSemas)
  * we don't have to expose the counters to other processes.)
  */
 void
-PGReserveSemaphores(int maxSemas, int port)
+PGReserveSemaphores(int maxSemas)
 {
+    struct stat statbuf;
+
+    /*
+     * We use the data directory's inode number to seed the search for free
+     * semaphore keys.  This minimizes the odds of collision with other
+     * postmasters, while maximizing the odds that we will detect and clean up
+     * semaphores left over from a crashed postmaster in our own directory.
+     */
+    if (stat(DataDir, &statbuf) < 0)
+        ereport(FATAL,
+                (errcode_for_file_access(),
+                 errmsg("could not stat data directory \"%s\": %m",
+                        DataDir)));
+
 #ifdef USE_NAMED_POSIX_SEMAPHORES
     mySemPointers = (sem_t **) malloc(maxSemas * sizeof(sem_t *));
     if (mySemPointers == NULL)
@@ -215,7 +226,7 @@ PGReserveSemaphores(int maxSemas, int port)

     numSems = 0;
     maxSems = maxSemas;
-    nextSemKey = port * 1000;
+    nextSemKey = statbuf.st_ino;

     on_shmem_exit(ReleaseSemaphores, 0);
 }
diff --git a/src/backend/port/sysv_sema.c b/src/backend/port/sysv_sema.c
index ac5106f..a1652cc 100644
--- a/src/backend/port/sysv_sema.c
+++ b/src/backend/port/sysv_sema.c
@@ -17,6 +17,7 @@
 #include <signal.h>
 #include <unistd.h>
 #include <sys/file.h>
+#include <sys/stat.h>
 #ifdef HAVE_SYS_IPC_H
 #include <sys/ipc.h>
 #endif
@@ -301,10 +302,6 @@ PGSemaphoreShmemSize(int maxSemas)
  * are acquired here or in PGSemaphoreCreate, register an on_shmem_exit
  * callback to release them.
  *
- * The port number is passed for possible use as a key (for SysV, we use
- * it to generate the starting semaphore key).  In a standalone backend,
- * zero will be passed.
- *
  * In the SysV implementation, we acquire semaphore sets on-demand; the
  * maxSemas parameter is just used to size the arrays.  There is an array
  * of PGSemaphoreData structs in shared memory, and a postmaster-local array
@@ -314,8 +311,22 @@ PGSemaphoreShmemSize(int maxSemas)
  * have clobbered.)
  */
 void
-PGReserveSemaphores(int maxSemas, int port)
+PGReserveSemaphores(int maxSemas)
 {
+    struct stat statbuf;
+
+    /*
+     * We use the data directory's inode number to seed the search for free
+     * semaphore keys.  This minimizes the odds of collision with other
+     * postmasters, while maximizing the odds that we will detect and clean up
+     * semaphores left over from a crashed postmaster in our own directory.
+     */
+    if (stat(DataDir, &statbuf) < 0)
+        ereport(FATAL,
+                (errcode_for_file_access(),
+                 errmsg("could not stat data directory \"%s\": %m",
+                        DataDir)));
+
     /*
      * We must use ShmemAllocUnlocked(), since the spinlock protecting
      * ShmemAlloc() won't be ready yet.  (This ordering is necessary when we
@@ -332,7 +343,7 @@ PGReserveSemaphores(int maxSemas, int port)
     if (mySemaSets == NULL)
         elog(PANIC, "out of memory");
     numSemaSets = 0;
-    nextSemaKey = port * 1000;
+    nextSemaKey = statbuf.st_ino;
     nextSemaNumber = SEMAS_PER_SET; /* force sema set alloc on 1st call */

     on_shmem_exit(ReleaseSemaphores, 0);
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 968506d..dab2920 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -390,11 +390,12 @@ PGSharedMemoryAttach(IpcMemoryId shmId,

     /*
      * Try to attach to the segment and see if it matches our data directory.
-     * This avoids key-conflict problems on machines that are running several
-     * postmasters under the same userid and port number.  (That would not
-     * ordinarily happen in production, but it can happen during parallel
-     * testing.  Since our test setups don't open any TCP ports on Unix, such
-     * cases don't conflict otherwise.)
+     * This avoids any risk of duplicate-shmem-key conflicts on machines that
+     * are running several postmasters under the same userid.
+     *
+     * (When we're called from PGSharedMemoryCreate, this stat call is
+     * duplicative; but since this isn't a high-traffic case it's not worth
+     * trying to optimize.)
      */
     if (stat(DataDir, &statbuf) < 0)
         return SHMSTATE_ANALYSIS_FAILURE;    /* can't stat; be conservative */
@@ -617,12 +618,9 @@ AnonymousShmemDetach(int status, Datum arg)
  * we do not fail upon collision with foreign shmem segments.  The idea here
  * is to detect and re-use keys that may have been assigned by a crashed
  * postmaster or backend.
- *
- * The port number is passed for possible use as a key (for SysV, we use
- * it to generate the starting shmem key).
  */
 PGShmemHeader *
-PGSharedMemoryCreate(Size size, int port,
+PGSharedMemoryCreate(Size size,
                      PGShmemHeader **shim)
 {
     IpcMemoryKey NextShmemSegID;
@@ -631,6 +629,17 @@ PGSharedMemoryCreate(Size size, int port,
     struct stat statbuf;
     Size        sysvsize;

+    /*
+     * We use the data directory's ID info (inode and device numbers) to
+     * positively identify shmem segments associated with this data dir, and
+     * also as seeds for searching for a free shmem key.
+     */
+    if (stat(DataDir, &statbuf) < 0)
+        ereport(FATAL,
+                (errcode_for_file_access(),
+                 errmsg("could not stat data directory \"%s\": %m",
+                        DataDir)));
+
     /* Complain if hugepages demanded but we can't possibly support them */
 #if !defined(MAP_HUGETLB)
     if (huge_pages == HUGE_PAGES_ON)
@@ -659,10 +668,10 @@ PGSharedMemoryCreate(Size size, int port,
     /*
      * Loop till we find a free IPC key.  Trust CreateDataDirLockFile() to
      * ensure no more than one postmaster per data directory can enter this
-     * loop simultaneously.  (CreateDataDirLockFile() does not ensure that,
-     * but prefer fixing it over coping here.)
+     * loop simultaneously.  (CreateDataDirLockFile() does not entirely ensure
+     * that, but prefer fixing it over coping here.)
      */
-    NextShmemSegID = 1 + port * 1000;
+    NextShmemSegID = statbuf.st_ino;

     for (;;)
     {
@@ -748,11 +757,6 @@ PGSharedMemoryCreate(Size size, int port,
     hdr->dsm_control = 0;

     /* Fill in the data directory ID info, too */
-    if (stat(DataDir, &statbuf) < 0)
-        ereport(FATAL,
-                (errcode_for_file_access(),
-                 errmsg("could not stat data directory \"%s\": %m",
-                        DataDir)));
     hdr->device = statbuf.st_dev;
     hdr->inode = statbuf.st_ino;

diff --git a/src/backend/port/win32_sema.c b/src/backend/port/win32_sema.c
index 013c122..32cc697 100644
--- a/src/backend/port/win32_sema.c
+++ b/src/backend/port/win32_sema.c
@@ -44,7 +44,7 @@ PGSemaphoreShmemSize(int maxSemas)
  * process exits.
  */
 void
-PGReserveSemaphores(int maxSemas, int port)
+PGReserveSemaphores(int maxSemas)
 {
     mySemSet = (HANDLE *) malloc(maxSemas * sizeof(HANDLE));
     if (mySemSet == NULL)
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index ccd7b6b..6cb6328 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -194,7 +194,7 @@ EnableLockPagesPrivilege(int elevel)
  * standard header.
  */
 PGShmemHeader *
-PGSharedMemoryCreate(Size size, int port,
+PGSharedMemoryCreate(Size size,
                      PGShmemHeader **shim)
 {
     void       *memAddress;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 62dc93d..a5446d5 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -385,7 +385,7 @@ static void getInstallationPaths(const char *argv0);
 static void checkControlFile(void);
 static Port *ConnCreate(int serverFd);
 static void ConnFree(Port *port);
-static void reset_shared(int port);
+static void reset_shared(void);
 static void SIGHUP_handler(SIGNAL_ARGS);
 static void pmdie(SIGNAL_ARGS);
 static void reaper(SIGNAL_ARGS);
@@ -1175,7 +1175,7 @@ PostmasterMain(int argc, char *argv[])
     /*
      * Set up shared memory and semaphores.
      */
-    reset_shared(PostPortNumber);
+    reset_shared();

     /*
      * Estimate number of openable files.  This must happen after setting up
@@ -2599,17 +2599,16 @@ InitProcessGlobals(void)
  * reset_shared -- reset shared memory and semaphores
  */
 static void
-reset_shared(int port)
+reset_shared(void)
 {
     /*
      * Create or re-create shared memory and semaphores.
      *
      * Note: in each "cycle of life" we will normally assign the same IPC keys
-     * (if using SysV shmem and/or semas), since the port number is used to
-     * determine IPC keys.  This helps ensure that we will clean up dead IPC
-     * objects if the postmaster crashes and is restarted.
+     * (if using SysV shmem and/or semas).  This helps ensure that we will
+     * clean up dead IPC objects if the postmaster crashes and is restarted.
      */
-    CreateSharedMemoryAndSemaphores(port);
+    CreateSharedMemoryAndSemaphores();
 }


@@ -3934,7 +3933,7 @@ PostmasterStateMachine(void)
         /* re-read control file into local memory */
         LocalProcessControlFile(true);

-        reset_shared(PostPortNumber);
+        reset_shared();

         StartupPID = StartupDataBase();
         Assert(StartupPID != 0);
@@ -4962,7 +4961,7 @@ SubPostmasterMain(int argc, char *argv[])
         InitProcess();

         /* Attach process to shared data structures */
-        CreateSharedMemoryAndSemaphores(0);
+        CreateSharedMemoryAndSemaphores();

         /* And run the backend */
         BackendRun(&port);        /* does not return */
@@ -4976,7 +4975,7 @@ SubPostmasterMain(int argc, char *argv[])
         InitAuxiliaryProcess();

         /* Attach process to shared data structures */
-        CreateSharedMemoryAndSemaphores(0);
+        CreateSharedMemoryAndSemaphores();

         AuxiliaryProcessMain(argc - 2, argv + 2);    /* does not return */
     }
@@ -4989,7 +4988,7 @@ SubPostmasterMain(int argc, char *argv[])
         InitProcess();

         /* Attach process to shared data structures */
-        CreateSharedMemoryAndSemaphores(0);
+        CreateSharedMemoryAndSemaphores();

         AutoVacLauncherMain(argc - 2, argv + 2);    /* does not return */
     }
@@ -5002,7 +5001,7 @@ SubPostmasterMain(int argc, char *argv[])
         InitProcess();

         /* Attach process to shared data structures */
-        CreateSharedMemoryAndSemaphores(0);
+        CreateSharedMemoryAndSemaphores();

         AutoVacWorkerMain(argc - 2, argv + 2);    /* does not return */
     }
@@ -5020,7 +5019,7 @@ SubPostmasterMain(int argc, char *argv[])
         InitProcess();

         /* Attach process to shared data structures */
-        CreateSharedMemoryAndSemaphores(0);
+        CreateSharedMemoryAndSemaphores();

         /* Fetch MyBgworkerEntry from shared memory */
         shmem_slot = atoi(argv[1] + 15);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index d7d7335..8853706 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -91,7 +91,7 @@ RequestAddinShmemSpace(Size size)
  * This is a bit code-wasteful and could be cleaned up.)
  */
 void
-CreateSharedMemoryAndSemaphores(int port)
+CreateSharedMemoryAndSemaphores(void)
 {
     PGShmemHeader *shim = NULL;

@@ -163,14 +163,14 @@ CreateSharedMemoryAndSemaphores(int port)
         /*
          * Create the shmem segment
          */
-        seghdr = PGSharedMemoryCreate(size, port, &shim);
+        seghdr = PGSharedMemoryCreate(size, &shim);

         InitShmemAccess(seghdr);

         /*
          * Create semaphores
          */
-        PGReserveSemaphores(numSemas, port);
+        PGReserveSemaphores(numSemas);

         /*
          * If spinlocks are disabled, initialize emulation layer (which
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 43b9f17..29c5ec7 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -445,12 +445,10 @@ InitCommunication(void)
     if (!IsUnderPostmaster)        /* postmaster already did this */
     {
         /*
-         * We're running a postgres bootstrap process or a standalone backend.
-         * Though we won't listen on PostPortNumber, use it to select a shmem
-         * key.  This increases the chance of detecting a leftover live
-         * backend of this DataDir.
+         * We're running a postgres bootstrap process or a standalone backend,
+         * so we need to set up shmem.
          */
-        CreateSharedMemoryAndSemaphores(PostPortNumber);
+        CreateSharedMemoryAndSemaphores();
     }
 }

diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index e9b243f..d113813 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -76,6 +76,6 @@ extern void on_exit_reset(void);
 /* ipci.c */
 extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;

-extern void CreateSharedMemoryAndSemaphores(int port);
+extern void CreateSharedMemoryAndSemaphores(void);

 #endif                            /* IPC_H */
diff --git a/src/include/storage/pg_sema.h b/src/include/storage/pg_sema.h
index b95efa1..1ac4f9a 100644
--- a/src/include/storage/pg_sema.h
+++ b/src/include/storage/pg_sema.h
@@ -41,7 +41,7 @@ typedef HANDLE PGSemaphore;
 extern Size PGSemaphoreShmemSize(int maxSemas);

 /* Module initialization (called during postmaster start or shmem reinit) */
-extern void PGReserveSemaphores(int maxSemas, int port);
+extern void PGReserveSemaphores(int maxSemas);

 /* Allocate a PGSemaphore structure with initial count 1 */
 extern PGSemaphore PGSemaphoreCreate(void);
diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index ac24878..6bd5664 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -82,7 +82,7 @@ extern void PGSharedMemoryReAttach(void);
 extern void PGSharedMemoryNoReAttach(void);
 #endif

-extern PGShmemHeader *PGSharedMemoryCreate(Size size, int port,
+extern PGShmemHeader *PGSharedMemoryCreate(Size size,
                                            PGShmemHeader **shim);
 extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2);
 extern void PGSharedMemoryDetach(void);
diff --git a/src/test/recovery/t/017_shm.pl b/src/test/recovery/t/017_shm.pl
index 7f10ff5..be35f90 100644
--- a/src/test/recovery/t/017_shm.pl
+++ b/src/test/recovery/t/017_shm.pl
@@ -4,16 +4,18 @@
 use strict;
 use warnings;
 use Config;
+use File::stat qw(stat);
 use IPC::Run 'run';
+use IPC::SharedMem;
+use IPC::SysV qw(IPC_CREAT S_IRUSR S_IWUSR);
 use PostgresNode;
 use Test::More;
 use TestLib;
 use Time::HiRes qw(usleep);

-plan tests => 5;
+plan tests => 4;

 my $tempdir = TestLib::tempdir;
-my $port;

 # Log "ipcs" diffs on a best-effort basis, swallowing any error.
 my $ipcs_before = "$tempdir/ipcs_before";
@@ -25,71 +27,40 @@ sub log_ipcs
     return;
 }

-# These tests need a $port such that nothing creates or removes a segment in
-# $port's IpcMemoryKey range while this test script runs.  While there's no
-# way to ensure that in general, we do ensure that if PostgreSQL tests are the
-# only actors.  With TCP, the first get_new_node picks a port number.  With
-# Unix sockets, use a postmaster, $port_holder, to represent a key space
-# reservation.  $port_holder holds a reservation on the key space of port
-# 1+$port_holder->port if it created the first IpcMemoryKey of its own port's
-# key space.  If multiple copies of this test script run concurrently, they
-# will pick different ports.  $port_holder postmasters use odd-numbered ports,
-# and tests use even-numbered ports.  In the absence of collisions from other
-# shmget() activity, gnat starts with key 0x7d001 (512001), and flea starts
-# with key 0x7d002 (512002).
-my $port_holder;
-if (!$PostgresNode::use_tcp)
-{
-    my $lock_port;
-    for ($lock_port = 511; $lock_port < 711; $lock_port += 2)
-    {
-        $port_holder = PostgresNode->get_new_node(
-            "port${lock_port}_holder",
-            port     => $lock_port,
-            own_host => 1);
-        $port_holder->init;
-        $port_holder->append_conf('postgresql.conf', 'max_connections = 5');
-        $port_holder->start;
-        # Match the AddToDataDirLockFile() call in sysv_shmem.c.  Assume all
-        # systems not using sysv_shmem.c do use TCP.
-        my $shmem_key_line_prefix = sprintf("%9lu ", 1 + $lock_port * 1000);
-        last
-          if slurp_file($port_holder->data_dir . '/postmaster.pid') =~
-          /^$shmem_key_line_prefix/m;
-        $port_holder->stop;
-    }
-    $port = $lock_port + 1;
-}
-
 # Node setup.
-sub init_start
-{
-    my $name = shift;
-    my $ret = PostgresNode->get_new_node($name, port => $port, own_host => 1);
-    defined($port) or $port = $ret->port;    # same port for all nodes
-    $ret->init;
-    # Limit semaphore consumption, since we run several nodes concurrently.
-    $ret->append_conf('postgresql.conf', 'max_connections = 5');
-    $ret->start;
-    log_ipcs();
-    return $ret;
-}
-my $gnat = init_start 'gnat';
-my $flea = init_start 'flea';
+my $gnat = PostgresNode->get_new_node('gnat');
+$gnat->init;
+
+# Create a shmem segment that will conflict with gnat's first choice
+# of shmem key.
+my $gnat_dir_stat = stat($gnat->data_dir);
+defined($gnat_dir_stat) or die('unable to stat ' . $gnat->data_dir);
+my $gnat_inode = $gnat_dir_stat->ino;
+note "gnat's datadir inode = $gnat_inode";
+
+my $gnat_conflict_shm =
+  IPC::SharedMem->new($gnat_inode, 1024, IPC_CREAT | S_IRUSR | S_IWUSR);
+
+$gnat->start;
+log_ipcs();
+
+$gnat->restart;    # should keep same shmem key
+log_ipcs();

 # Upon postmaster death, postmaster children exit automatically.
 $gnat->kill9;
 log_ipcs();
-$flea->restart;       # flea ignores the shm key gnat abandoned.
-log_ipcs();
 poll_start($gnat);    # gnat recycles its former shm key.
 log_ipcs();

-# After clean shutdown, the nodes swap shm keys.
-$gnat->stop;
-$flea->restart;
+note "removing the conflicting shmem ...";
+$gnat_conflict_shm->remove;
 log_ipcs();
-$gnat->start;
+
+# Upon postmaster death, postmaster children exit automatically.
+$gnat->kill9;
+log_ipcs();
+poll_start($gnat);    # gnat will now use its normal shmem key.
 log_ipcs();

 # Scenarios involving no postmaster.pid, dead postmaster, and a live backend.
@@ -152,24 +123,18 @@ print STDERR $single_stderr;
 like($single_stderr, $pre_existing_msg,
     'single-user mode detected live backend via shared memory');
 log_ipcs();
-# Fail to reject startup if shm key N has become available and we crash while
-# using key N+1.  This is unwanted, but expected.  Windows is immune, because
-# its GetSharedMemName() use DataDir strings, not numeric keys.
-$flea->stop;    # release first key
-is( $gnat->start(fail_ok => 1),
-    $TestLib::windows_os ? 0 : 1,
-    'key turnover fools only sysv_shmem.c');
-$gnat->stop;     # release first key (no-op on $TestLib::windows_os)
-$flea->start;    # grab first key
-# cleanup
+
+# cleanup slow backend
 TestLib::system_log('pg_ctl', 'kill', 'QUIT', $slow_pid);
 $slow_client->finish;    # client has detected backend termination
 log_ipcs();
-poll_start($gnat);       # recycle second key

+# now startup should work
+poll_start($gnat);
+log_ipcs();
+
+# finish testing
 $gnat->stop;
-$flea->stop;
-$port_holder->stop if $port_holder;
 log_ipcs();



Re: Unexpected "shared memory block is still in use"

От
Tom Lane
Дата:
I wrote:
> This still isn't committable as-is, since the test will just curl up
> and die on machines lacking IPC::SharedMem.

After a bit of research, here's a version that takes a stab at fixing
that.  There may be cleaner ways to do it, but this successfully skips
the test if it can't import the needed IPC modules.

This also fixes a problem that the previous script had with leaking
a shmem segment.  That's due to something that could be considered
a pre-existing bug, which is that if we use shmem key X+1, and the
postmaster crashes, and the next start is able to get shmem key X,
we don't clean up the shmem segment at X+1.  In principle, we could
note from the contents of postmaster.pid that X+1 was used before and
try to remove it.  In practice, I doubt this is worth worrying about
given how small the shmem segments are now, and the very low probability
of key collisions in the new regime.  Anyway it would be material for
a different patch.

I think this could be considered committable, but if anyone wants to
improve the test script, step right up.

            regards, tom lane

diff --git a/src/backend/port/posix_sema.c b/src/backend/port/posix_sema.c
index 3370adf..bdd6552 100644
--- a/src/backend/port/posix_sema.c
+++ b/src/backend/port/posix_sema.c
@@ -29,6 +29,7 @@
 #include <semaphore.h>
 #include <signal.h>
 #include <unistd.h>
+#include <sys/stat.h>

 #include "miscadmin.h"
 #include "storage/ipc.h"
@@ -181,10 +182,6 @@ PGSemaphoreShmemSize(int maxSemas)
  * are acquired here or in PGSemaphoreCreate, register an on_shmem_exit
  * callback to release them.
  *
- * The port number is passed for possible use as a key (for Posix, we use
- * it to generate the starting semaphore name).  In a standalone backend,
- * zero will be passed.
- *
  * In the Posix implementation, we acquire semaphores on-demand; the
  * maxSemas parameter is just used to size the arrays.  For unnamed
  * semaphores, there is an array of PGSemaphoreData structs in shared memory.
@@ -196,8 +193,22 @@ PGSemaphoreShmemSize(int maxSemas)
  * we don't have to expose the counters to other processes.)
  */
 void
-PGReserveSemaphores(int maxSemas, int port)
+PGReserveSemaphores(int maxSemas)
 {
+    struct stat statbuf;
+
+    /*
+     * We use the data directory's inode number to seed the search for free
+     * semaphore keys.  This minimizes the odds of collision with other
+     * postmasters, while maximizing the odds that we will detect and clean up
+     * semaphores left over from a crashed postmaster in our own directory.
+     */
+    if (stat(DataDir, &statbuf) < 0)
+        ereport(FATAL,
+                (errcode_for_file_access(),
+                 errmsg("could not stat data directory \"%s\": %m",
+                        DataDir)));
+
 #ifdef USE_NAMED_POSIX_SEMAPHORES
     mySemPointers = (sem_t **) malloc(maxSemas * sizeof(sem_t *));
     if (mySemPointers == NULL)
@@ -215,7 +226,7 @@ PGReserveSemaphores(int maxSemas, int port)

     numSems = 0;
     maxSems = maxSemas;
-    nextSemKey = port * 1000;
+    nextSemKey = statbuf.st_ino;

     on_shmem_exit(ReleaseSemaphores, 0);
 }
diff --git a/src/backend/port/sysv_sema.c b/src/backend/port/sysv_sema.c
index ac5106f..a1652cc 100644
--- a/src/backend/port/sysv_sema.c
+++ b/src/backend/port/sysv_sema.c
@@ -17,6 +17,7 @@
 #include <signal.h>
 #include <unistd.h>
 #include <sys/file.h>
+#include <sys/stat.h>
 #ifdef HAVE_SYS_IPC_H
 #include <sys/ipc.h>
 #endif
@@ -301,10 +302,6 @@ PGSemaphoreShmemSize(int maxSemas)
  * are acquired here or in PGSemaphoreCreate, register an on_shmem_exit
  * callback to release them.
  *
- * The port number is passed for possible use as a key (for SysV, we use
- * it to generate the starting semaphore key).  In a standalone backend,
- * zero will be passed.
- *
  * In the SysV implementation, we acquire semaphore sets on-demand; the
  * maxSemas parameter is just used to size the arrays.  There is an array
  * of PGSemaphoreData structs in shared memory, and a postmaster-local array
@@ -314,8 +311,22 @@ PGSemaphoreShmemSize(int maxSemas)
  * have clobbered.)
  */
 void
-PGReserveSemaphores(int maxSemas, int port)
+PGReserveSemaphores(int maxSemas)
 {
+    struct stat statbuf;
+
+    /*
+     * We use the data directory's inode number to seed the search for free
+     * semaphore keys.  This minimizes the odds of collision with other
+     * postmasters, while maximizing the odds that we will detect and clean up
+     * semaphores left over from a crashed postmaster in our own directory.
+     */
+    if (stat(DataDir, &statbuf) < 0)
+        ereport(FATAL,
+                (errcode_for_file_access(),
+                 errmsg("could not stat data directory \"%s\": %m",
+                        DataDir)));
+
     /*
      * We must use ShmemAllocUnlocked(), since the spinlock protecting
      * ShmemAlloc() won't be ready yet.  (This ordering is necessary when we
@@ -332,7 +343,7 @@ PGReserveSemaphores(int maxSemas, int port)
     if (mySemaSets == NULL)
         elog(PANIC, "out of memory");
     numSemaSets = 0;
-    nextSemaKey = port * 1000;
+    nextSemaKey = statbuf.st_ino;
     nextSemaNumber = SEMAS_PER_SET; /* force sema set alloc on 1st call */

     on_shmem_exit(ReleaseSemaphores, 0);
diff --git a/src/backend/port/sysv_shmem.c b/src/backend/port/sysv_shmem.c
index 968506d..dab2920 100644
--- a/src/backend/port/sysv_shmem.c
+++ b/src/backend/port/sysv_shmem.c
@@ -390,11 +390,12 @@ PGSharedMemoryAttach(IpcMemoryId shmId,

     /*
      * Try to attach to the segment and see if it matches our data directory.
-     * This avoids key-conflict problems on machines that are running several
-     * postmasters under the same userid and port number.  (That would not
-     * ordinarily happen in production, but it can happen during parallel
-     * testing.  Since our test setups don't open any TCP ports on Unix, such
-     * cases don't conflict otherwise.)
+     * This avoids any risk of duplicate-shmem-key conflicts on machines that
+     * are running several postmasters under the same userid.
+     *
+     * (When we're called from PGSharedMemoryCreate, this stat call is
+     * duplicative; but since this isn't a high-traffic case it's not worth
+     * trying to optimize.)
      */
     if (stat(DataDir, &statbuf) < 0)
         return SHMSTATE_ANALYSIS_FAILURE;    /* can't stat; be conservative */
@@ -617,12 +618,9 @@ AnonymousShmemDetach(int status, Datum arg)
  * we do not fail upon collision with foreign shmem segments.  The idea here
  * is to detect and re-use keys that may have been assigned by a crashed
  * postmaster or backend.
- *
- * The port number is passed for possible use as a key (for SysV, we use
- * it to generate the starting shmem key).
  */
 PGShmemHeader *
-PGSharedMemoryCreate(Size size, int port,
+PGSharedMemoryCreate(Size size,
                      PGShmemHeader **shim)
 {
     IpcMemoryKey NextShmemSegID;
@@ -631,6 +629,17 @@ PGSharedMemoryCreate(Size size, int port,
     struct stat statbuf;
     Size        sysvsize;

+    /*
+     * We use the data directory's ID info (inode and device numbers) to
+     * positively identify shmem segments associated with this data dir, and
+     * also as seeds for searching for a free shmem key.
+     */
+    if (stat(DataDir, &statbuf) < 0)
+        ereport(FATAL,
+                (errcode_for_file_access(),
+                 errmsg("could not stat data directory \"%s\": %m",
+                        DataDir)));
+
     /* Complain if hugepages demanded but we can't possibly support them */
 #if !defined(MAP_HUGETLB)
     if (huge_pages == HUGE_PAGES_ON)
@@ -659,10 +668,10 @@ PGSharedMemoryCreate(Size size, int port,
     /*
      * Loop till we find a free IPC key.  Trust CreateDataDirLockFile() to
      * ensure no more than one postmaster per data directory can enter this
-     * loop simultaneously.  (CreateDataDirLockFile() does not ensure that,
-     * but prefer fixing it over coping here.)
+     * loop simultaneously.  (CreateDataDirLockFile() does not entirely ensure
+     * that, but prefer fixing it over coping here.)
      */
-    NextShmemSegID = 1 + port * 1000;
+    NextShmemSegID = statbuf.st_ino;

     for (;;)
     {
@@ -748,11 +757,6 @@ PGSharedMemoryCreate(Size size, int port,
     hdr->dsm_control = 0;

     /* Fill in the data directory ID info, too */
-    if (stat(DataDir, &statbuf) < 0)
-        ereport(FATAL,
-                (errcode_for_file_access(),
-                 errmsg("could not stat data directory \"%s\": %m",
-                        DataDir)));
     hdr->device = statbuf.st_dev;
     hdr->inode = statbuf.st_ino;

diff --git a/src/backend/port/win32_sema.c b/src/backend/port/win32_sema.c
index 013c122..32cc697 100644
--- a/src/backend/port/win32_sema.c
+++ b/src/backend/port/win32_sema.c
@@ -44,7 +44,7 @@ PGSemaphoreShmemSize(int maxSemas)
  * process exits.
  */
 void
-PGReserveSemaphores(int maxSemas, int port)
+PGReserveSemaphores(int maxSemas)
 {
     mySemSet = (HANDLE *) malloc(maxSemas * sizeof(HANDLE));
     if (mySemSet == NULL)
diff --git a/src/backend/port/win32_shmem.c b/src/backend/port/win32_shmem.c
index ccd7b6b..6cb6328 100644
--- a/src/backend/port/win32_shmem.c
+++ b/src/backend/port/win32_shmem.c
@@ -194,7 +194,7 @@ EnableLockPagesPrivilege(int elevel)
  * standard header.
  */
 PGShmemHeader *
-PGSharedMemoryCreate(Size size, int port,
+PGSharedMemoryCreate(Size size,
                      PGShmemHeader **shim)
 {
     void       *memAddress;
diff --git a/src/backend/postmaster/postmaster.c b/src/backend/postmaster/postmaster.c
index 62dc93d..a5446d5 100644
--- a/src/backend/postmaster/postmaster.c
+++ b/src/backend/postmaster/postmaster.c
@@ -385,7 +385,7 @@ static void getInstallationPaths(const char *argv0);
 static void checkControlFile(void);
 static Port *ConnCreate(int serverFd);
 static void ConnFree(Port *port);
-static void reset_shared(int port);
+static void reset_shared(void);
 static void SIGHUP_handler(SIGNAL_ARGS);
 static void pmdie(SIGNAL_ARGS);
 static void reaper(SIGNAL_ARGS);
@@ -1175,7 +1175,7 @@ PostmasterMain(int argc, char *argv[])
     /*
      * Set up shared memory and semaphores.
      */
-    reset_shared(PostPortNumber);
+    reset_shared();

     /*
      * Estimate number of openable files.  This must happen after setting up
@@ -2599,17 +2599,16 @@ InitProcessGlobals(void)
  * reset_shared -- reset shared memory and semaphores
  */
 static void
-reset_shared(int port)
+reset_shared(void)
 {
     /*
      * Create or re-create shared memory and semaphores.
      *
      * Note: in each "cycle of life" we will normally assign the same IPC keys
-     * (if using SysV shmem and/or semas), since the port number is used to
-     * determine IPC keys.  This helps ensure that we will clean up dead IPC
-     * objects if the postmaster crashes and is restarted.
+     * (if using SysV shmem and/or semas).  This helps ensure that we will
+     * clean up dead IPC objects if the postmaster crashes and is restarted.
      */
-    CreateSharedMemoryAndSemaphores(port);
+    CreateSharedMemoryAndSemaphores();
 }


@@ -3934,7 +3933,7 @@ PostmasterStateMachine(void)
         /* re-read control file into local memory */
         LocalProcessControlFile(true);

-        reset_shared(PostPortNumber);
+        reset_shared();

         StartupPID = StartupDataBase();
         Assert(StartupPID != 0);
@@ -4962,7 +4961,7 @@ SubPostmasterMain(int argc, char *argv[])
         InitProcess();

         /* Attach process to shared data structures */
-        CreateSharedMemoryAndSemaphores(0);
+        CreateSharedMemoryAndSemaphores();

         /* And run the backend */
         BackendRun(&port);        /* does not return */
@@ -4976,7 +4975,7 @@ SubPostmasterMain(int argc, char *argv[])
         InitAuxiliaryProcess();

         /* Attach process to shared data structures */
-        CreateSharedMemoryAndSemaphores(0);
+        CreateSharedMemoryAndSemaphores();

         AuxiliaryProcessMain(argc - 2, argv + 2);    /* does not return */
     }
@@ -4989,7 +4988,7 @@ SubPostmasterMain(int argc, char *argv[])
         InitProcess();

         /* Attach process to shared data structures */
-        CreateSharedMemoryAndSemaphores(0);
+        CreateSharedMemoryAndSemaphores();

         AutoVacLauncherMain(argc - 2, argv + 2);    /* does not return */
     }
@@ -5002,7 +5001,7 @@ SubPostmasterMain(int argc, char *argv[])
         InitProcess();

         /* Attach process to shared data structures */
-        CreateSharedMemoryAndSemaphores(0);
+        CreateSharedMemoryAndSemaphores();

         AutoVacWorkerMain(argc - 2, argv + 2);    /* does not return */
     }
@@ -5020,7 +5019,7 @@ SubPostmasterMain(int argc, char *argv[])
         InitProcess();

         /* Attach process to shared data structures */
-        CreateSharedMemoryAndSemaphores(0);
+        CreateSharedMemoryAndSemaphores();

         /* Fetch MyBgworkerEntry from shared memory */
         shmem_slot = atoi(argv[1] + 15);
diff --git a/src/backend/storage/ipc/ipci.c b/src/backend/storage/ipc/ipci.c
index d7d7335..8853706 100644
--- a/src/backend/storage/ipc/ipci.c
+++ b/src/backend/storage/ipc/ipci.c
@@ -91,7 +91,7 @@ RequestAddinShmemSpace(Size size)
  * This is a bit code-wasteful and could be cleaned up.)
  */
 void
-CreateSharedMemoryAndSemaphores(int port)
+CreateSharedMemoryAndSemaphores(void)
 {
     PGShmemHeader *shim = NULL;

@@ -163,14 +163,14 @@ CreateSharedMemoryAndSemaphores(int port)
         /*
          * Create the shmem segment
          */
-        seghdr = PGSharedMemoryCreate(size, port, &shim);
+        seghdr = PGSharedMemoryCreate(size, &shim);

         InitShmemAccess(seghdr);

         /*
          * Create semaphores
          */
-        PGReserveSemaphores(numSemas, port);
+        PGReserveSemaphores(numSemas);

         /*
          * If spinlocks are disabled, initialize emulation layer (which
diff --git a/src/backend/utils/init/postinit.c b/src/backend/utils/init/postinit.c
index 43b9f17..29c5ec7 100644
--- a/src/backend/utils/init/postinit.c
+++ b/src/backend/utils/init/postinit.c
@@ -445,12 +445,10 @@ InitCommunication(void)
     if (!IsUnderPostmaster)        /* postmaster already did this */
     {
         /*
-         * We're running a postgres bootstrap process or a standalone backend.
-         * Though we won't listen on PostPortNumber, use it to select a shmem
-         * key.  This increases the chance of detecting a leftover live
-         * backend of this DataDir.
+         * We're running a postgres bootstrap process or a standalone backend,
+         * so we need to set up shmem.
          */
-        CreateSharedMemoryAndSemaphores(PostPortNumber);
+        CreateSharedMemoryAndSemaphores();
     }
 }

diff --git a/src/include/storage/ipc.h b/src/include/storage/ipc.h
index e9b243f..d113813 100644
--- a/src/include/storage/ipc.h
+++ b/src/include/storage/ipc.h
@@ -76,6 +76,6 @@ extern void on_exit_reset(void);
 /* ipci.c */
 extern PGDLLIMPORT shmem_startup_hook_type shmem_startup_hook;

-extern void CreateSharedMemoryAndSemaphores(int port);
+extern void CreateSharedMemoryAndSemaphores(void);

 #endif                            /* IPC_H */
diff --git a/src/include/storage/pg_sema.h b/src/include/storage/pg_sema.h
index b95efa1..1ac4f9a 100644
--- a/src/include/storage/pg_sema.h
+++ b/src/include/storage/pg_sema.h
@@ -41,7 +41,7 @@ typedef HANDLE PGSemaphore;
 extern Size PGSemaphoreShmemSize(int maxSemas);

 /* Module initialization (called during postmaster start or shmem reinit) */
-extern void PGReserveSemaphores(int maxSemas, int port);
+extern void PGReserveSemaphores(int maxSemas);

 /* Allocate a PGSemaphore structure with initial count 1 */
 extern PGSemaphore PGSemaphoreCreate(void);
diff --git a/src/include/storage/pg_shmem.h b/src/include/storage/pg_shmem.h
index ac24878..6bd5664 100644
--- a/src/include/storage/pg_shmem.h
+++ b/src/include/storage/pg_shmem.h
@@ -82,7 +82,7 @@ extern void PGSharedMemoryReAttach(void);
 extern void PGSharedMemoryNoReAttach(void);
 #endif

-extern PGShmemHeader *PGSharedMemoryCreate(Size size, int port,
+extern PGShmemHeader *PGSharedMemoryCreate(Size size,
                                            PGShmemHeader **shim);
 extern bool PGSharedMemoryIsInUse(unsigned long id1, unsigned long id2);
 extern void PGSharedMemoryDetach(void);
diff --git a/src/test/recovery/t/017_shm.pl b/src/test/recovery/t/017_shm.pl
index 7f10ff5..a29ef78 100644
--- a/src/test/recovery/t/017_shm.pl
+++ b/src/test/recovery/t/017_shm.pl
@@ -4,16 +4,30 @@
 use strict;
 use warnings;
 use Config;
+use File::stat qw(stat);
 use IPC::Run 'run';
 use PostgresNode;
 use Test::More;
 use TestLib;
 use Time::HiRes qw(usleep);

-plan tests => 5;
+# If we don't have shmem support, skip the whole thing
+eval {
+    require IPC::SharedMem;
+    IPC::SharedMem->import;
+    require IPC::SysV;
+    IPC::SysV->import(qw(IPC_CREAT IPC_EXCL S_IRUSR S_IWUSR));
+};
+if ($@)
+{
+    plan skip_all => 'SysV shared memory not supported by this platform';
+}
+else
+{
+    plan tests => 4;
+}

 my $tempdir = TestLib::tempdir;
-my $port;

 # Log "ipcs" diffs on a best-effort basis, swallowing any error.
 my $ipcs_before = "$tempdir/ipcs_before";
@@ -25,76 +39,80 @@ sub log_ipcs
     return;
 }

-# These tests need a $port such that nothing creates or removes a segment in
-# $port's IpcMemoryKey range while this test script runs.  While there's no
-# way to ensure that in general, we do ensure that if PostgreSQL tests are the
-# only actors.  With TCP, the first get_new_node picks a port number.  With
-# Unix sockets, use a postmaster, $port_holder, to represent a key space
-# reservation.  $port_holder holds a reservation on the key space of port
-# 1+$port_holder->port if it created the first IpcMemoryKey of its own port's
-# key space.  If multiple copies of this test script run concurrently, they
-# will pick different ports.  $port_holder postmasters use odd-numbered ports,
-# and tests use even-numbered ports.  In the absence of collisions from other
-# shmget() activity, gnat starts with key 0x7d001 (512001), and flea starts
-# with key 0x7d002 (512002).
-my $port_holder;
-if (!$PostgresNode::use_tcp)
-{
-    my $lock_port;
-    for ($lock_port = 511; $lock_port < 711; $lock_port += 2)
-    {
-        $port_holder = PostgresNode->get_new_node(
-            "port${lock_port}_holder",
-            port     => $lock_port,
-            own_host => 1);
-        $port_holder->init;
-        $port_holder->append_conf('postgresql.conf', 'max_connections = 5');
-        $port_holder->start;
-        # Match the AddToDataDirLockFile() call in sysv_shmem.c.  Assume all
-        # systems not using sysv_shmem.c do use TCP.
-        my $shmem_key_line_prefix = sprintf("%9lu ", 1 + $lock_port * 1000);
-        last
-          if slurp_file($port_holder->data_dir . '/postmaster.pid') =~
-          /^$shmem_key_line_prefix/m;
-        $port_holder->stop;
-    }
-    $port = $lock_port + 1;
-}
-
 # Node setup.
-sub init_start
-{
-    my $name = shift;
-    my $ret = PostgresNode->get_new_node($name, port => $port, own_host => 1);
-    defined($port) or $port = $ret->port;    # same port for all nodes
-    $ret->init;
-    # Limit semaphore consumption, since we run several nodes concurrently.
-    $ret->append_conf('postgresql.conf', 'max_connections = 5');
-    $ret->start;
-    log_ipcs();
-    return $ret;
-}
-my $gnat = init_start 'gnat';
-my $flea = init_start 'flea';
+my $gnat = PostgresNode->get_new_node('gnat');
+$gnat->init;
+
+# Create a shmem segment that will conflict with gnat's first choice
+# of shmem key.  (If we fail to create it because something else is
+# already using that key, that's perfectly fine, though the test will
+# exercise a different scenario than it usually does.)
+my $gnat_dir_stat = stat($gnat->data_dir);
+defined($gnat_dir_stat) or die('unable to stat ' . $gnat->data_dir);
+my $gnat_inode = $gnat_dir_stat->ino;
+note "gnat's datadir inode = $gnat_inode";
+
+# Note: must reference IPC::SysV's constants as functions, or this file
+# fails to compile when that module is not available.
+my $gnat_conflict_shm =
+  IPC::SharedMem->new($gnat_inode, 1024,
+    IPC_CREAT() | IPC_EXCL() | S_IRUSR() | S_IWUSR());
+note "could not create conflicting shmem" if !defined($gnat_conflict_shm);
+log_ipcs();
+
+$gnat->start;
+log_ipcs();
+
+$gnat->restart;    # should keep same shmem key
+log_ipcs();

 # Upon postmaster death, postmaster children exit automatically.
 $gnat->kill9;
 log_ipcs();
-$flea->restart;       # flea ignores the shm key gnat abandoned.
-log_ipcs();
 poll_start($gnat);    # gnat recycles its former shm key.
 log_ipcs();

-# After clean shutdown, the nodes swap shm keys.
+note "removing the conflicting shmem ...";
+$gnat_conflict_shm->remove if $gnat_conflict_shm;
+log_ipcs();
+
+# Upon postmaster death, postmaster children exit automatically.
+$gnat->kill9;
+log_ipcs();
+
+# In this start, gnat will use its normal shmem key, and fail to remove
+# the higher-keyed segment that the previous postmaster was using.
+# That's not great, but key collisions should be rare enough to not
+# make this a big problem.
+poll_start($gnat);
+log_ipcs();
 $gnat->stop;
-$flea->restart;
 log_ipcs();
+
+# Re-create the conflicting segment, and start/stop normally, just so
+# this test script doesn't leak the higher-keyed segment.
+note "re-creating conflicting shmem ...";
+$gnat_conflict_shm =
+  IPC::SharedMem->new($gnat_inode, 1024,
+    IPC_CREAT() | IPC_EXCL() | S_IRUSR() | S_IWUSR());
+note "could not create conflicting shmem" if !defined($gnat_conflict_shm);
+log_ipcs();
+
 $gnat->start;
 log_ipcs();
+$gnat->stop;
+log_ipcs();
+
+note "removing the conflicting shmem ...";
+$gnat_conflict_shm->remove if $gnat_conflict_shm;
+log_ipcs();

 # Scenarios involving no postmaster.pid, dead postmaster, and a live backend.
 # Use a regress.c function to emulate the responsiveness of a backend working
 # through a CPU-intensive task.
+$gnat->start;
+log_ipcs();
+
 my $regress_shlib = TestLib::perl2host($ENV{REGRESS_SHLIB});
 $gnat->safe_psql('postgres', <<EOSQL);
 CREATE FUNCTION wait_pid(int)
@@ -152,24 +170,18 @@ print STDERR $single_stderr;
 like($single_stderr, $pre_existing_msg,
     'single-user mode detected live backend via shared memory');
 log_ipcs();
-# Fail to reject startup if shm key N has become available and we crash while
-# using key N+1.  This is unwanted, but expected.  Windows is immune, because
-# its GetSharedMemName() use DataDir strings, not numeric keys.
-$flea->stop;    # release first key
-is( $gnat->start(fail_ok => 1),
-    $TestLib::windows_os ? 0 : 1,
-    'key turnover fools only sysv_shmem.c');
-$gnat->stop;     # release first key (no-op on $TestLib::windows_os)
-$flea->start;    # grab first key
-# cleanup
+
+# cleanup slow backend
 TestLib::system_log('pg_ctl', 'kill', 'QUIT', $slow_pid);
 $slow_client->finish;    # client has detected backend termination
 log_ipcs();
-poll_start($gnat);       # recycle second key

+# now startup should work
+poll_start($gnat);
+log_ipcs();
+
+# finish testing
 $gnat->stop;
-$flea->stop;
-$port_holder->stop if $port_holder;
 log_ipcs();



Re: Unexpected "shared memory block is still in use"

От
Peter Eisentraut
Дата:
On 2019-09-04 16:59, Tom Lane wrote:
>> A related point, perhaps we should change the key printed into
>> postmaster.pid to be in hexadecimal format ("0x08x") so that it matches
>> what ipcs prints.
> Hmm, that depends on whose ipcs you use :-(.  A quick survey
> of my machines says it's
> 
>                 key     shmid
> 
> Linux:          hex     decimal
> FreeBSD:        decimal decimal
> NetBSD:         decimal decimal
> OpenBSD:        decimal decimal
> macOS:          hex     decimal
> HPUX:           hex     (not printed)
> 
> There's certainly room to argue that hex+decimal is most popular,
> but I'm not sure that that outweighs possible compatibility issues
> from changing postmaster.pid contents.  (Admittedly, it's not real
> clear that anything would be paying attention to the shmem key,
> so maybe there's no compatibility issue.)

Let's just leave it decimal then.  At least then it's easier to compare
it to ls -i output.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services