On Tue, Feb 19, 2019 at 7:31 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference: 15641
> Logged by: Hans Buschmann
> Email address: buschmann@nidsa.net
> PostgreSQL version: 11.2
> Operating system: Windows Server 2019 Standard
> Description:
>
> I recently moved a production system from PG 10.7 to 11.2 on a different
> Server.
>
> The configuration settings where mostly taken from the old system and
> enhanced by new features of PG 11.
>
> pg_prewarm was used for a long time (with no specific configuration).
>
> Now I have added Huge page support for Windows in the OS and verified it
> with vmmap tool from Sysinternals to be active.
> (the shared buffers are locked in memory: Lock_WS is set).
>
> When pg_prewarm.autoprewarm is set to on (using the default after initial
> database import via pg_restore), the autoprewarm worker process
> terminates immediately and generates a huge number of logfile entries
> like:
>
> CPS PRD 2019-02-17 16:11:53 CET 00000 11:> LOG: background worker
> "autoprewarm worker" (PID 3996) exited with exit code 1
> CPS PRD 2019-02-17 16:11:53 CET 55000 1:> ERROR: could not map dynamic
> shared memory segment
Hmm. It's not clear to me how using large pages for the main
PostgreSQL shared memory region could have any impact on autoprewarm's
entirely separate DSM segment. I wonder if other DSM use cases are
impacted. Does parallel query work? For example, the following
produces a parallel query that uses a few DSM segments:
create table foo as select generate_series(1, 1000000)::int i;
analyze foo;
explain analyze select count(*) from foo f1 join foo f2 using (i);
Looking at the place where that error occurs, it seems like it simply
failed to find the handle, as if it didn't exist at all at the time
dsm_attach() was called. I'm not entirely sure how that could happen
just because you turned on huge pages. Is it possible that there is a
race where apw_load_buffers() manages to detach before the worker
attached, and the timing changes? At a glance, that shouldn't happen
because apw_start_database_worker() waits for the work to exit before
returning.
I think we'll need one of our Windows-enabled hackers to take a look.
PS Sorry for breaking the thread. I wish our archives app had a
"[re]send me this email" button, for people who subscribed after the
message was sent...
--
Thomas Munro
https://enterprisedb.com