Re: BUG #15641: Autoprewarm worker fails to start on Windows withhuge pages in use Old PostgreSQL community/pgsql-bugs x

Поиск
Список
Период
Сортировка
От Thomas Munro
Тема Re: BUG #15641: Autoprewarm worker fails to start on Windows withhuge pages in use Old PostgreSQL community/pgsql-bugs x
Дата
Msg-id CA+hUKGKpQJCWcgyy3QTC9vdn6uKAR_8r__A-MMm2GYfj45caag@mail.gmail.com
обсуждение исходный текст
Список pgsql-bugs
On Tue, Feb 19, 2019 at 7:31 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      15641
> Logged by:          Hans Buschmann
> Email address:      buschmann@nidsa.net
> PostgreSQL version: 11.2
> Operating system:   Windows Server 2019 Standard
> Description:
>
> I recently moved a production system from PG 10.7 to 11.2 on a different
> Server.
>
> The configuration settings where mostly taken from the old system and
> enhanced by new features of PG 11.
>
> pg_prewarm was used for a long time (with no specific configuration).
>
> Now I have added Huge page support for Windows in the OS and verified it
> with vmmap tool from Sysinternals to be active.
> (the shared buffers are locked in memory: Lock_WS is set).
>
> When pg_prewarm.autoprewarm is set to on (using the default after initial
> database import via pg_restore), the autoprewarm worker process
> terminates immediately and generates a huge number of logfile entries
> like:
>
> CPS PRD 2019-02-17 16:11:53 CET  00000 11:> LOG:  background worker
> "autoprewarm worker" (PID 3996) exited with exit code 1
> CPS PRD 2019-02-17 16:11:53 CET  55000  1:> ERROR:  could not map dynamic
> shared memory segment

Hmm.  It's not clear to me how using large pages for the main
PostgreSQL shared memory region could have any impact on autoprewarm's
entirely separate DSM segment.  I wonder if other DSM use cases are
impacted.  Does parallel query work?  For example, the following
produces a parallel query that uses a few DSM segments:

create table foo as select generate_series(1, 1000000)::int i;
analyze foo;
explain analyze select count(*) from foo f1 join foo f2 using (i);

Looking at the place where that error occurs, it seems like it simply
failed to find the handle, as if it didn't exist at all at the time
dsm_attach() was called.  I'm not entirely sure how that could happen
just because you turned on huge pages.  Is it possible that there is a
race where apw_load_buffers() manages to detach before the worker
attached, and the timing changes?  At a glance, that shouldn't happen
because apw_start_database_worker() waits for the work to exit before
returning.

I think we'll need one of our Windows-enabled hackers to take a look.

PS Sorry for breaking the thread.  I wish our archives app had a
"[re]send me this email" button, for people who subscribed after the
message was sent...

-- 
Thomas Munro
https://enterprisedb.com


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Michael Paquier
Дата:
Сообщение: Re: Segmentation Fault in logical decoding get/peek API
Следующее
От: Etsuro Fujita
Дата:
Сообщение: Re: BUG #15642: UPDATE statements that change a partition key andFDW partitions problem.