Обсуждение: Re: BUG #15641: Autoprewarm worker fails to start on Windows withhuge pages in use Old PostgreSQL community/pgsql-bugs x

Поиск
Список
Период
Сортировка
On Tue, Feb 19, 2019 at 7:31 AM PG Bug reporting form
<noreply@postgresql.org> wrote:
>
> The following bug has been logged on the website:
>
> Bug reference:      15641
> Logged by:          Hans Buschmann
> Email address:      buschmann@nidsa.net
> PostgreSQL version: 11.2
> Operating system:   Windows Server 2019 Standard
> Description:
>
> I recently moved a production system from PG 10.7 to 11.2 on a different
> Server.
>
> The configuration settings where mostly taken from the old system and
> enhanced by new features of PG 11.
>
> pg_prewarm was used for a long time (with no specific configuration).
>
> Now I have added Huge page support for Windows in the OS and verified it
> with vmmap tool from Sysinternals to be active.
> (the shared buffers are locked in memory: Lock_WS is set).
>
> When pg_prewarm.autoprewarm is set to on (using the default after initial
> database import via pg_restore), the autoprewarm worker process
> terminates immediately and generates a huge number of logfile entries
> like:
>
> CPS PRD 2019-02-17 16:11:53 CET  00000 11:> LOG:  background worker
> "autoprewarm worker" (PID 3996) exited with exit code 1
> CPS PRD 2019-02-17 16:11:53 CET  55000  1:> ERROR:  could not map dynamic
> shared memory segment

Hmm.  It's not clear to me how using large pages for the main
PostgreSQL shared memory region could have any impact on autoprewarm's
entirely separate DSM segment.  I wonder if other DSM use cases are
impacted.  Does parallel query work?  For example, the following
produces a parallel query that uses a few DSM segments:

create table foo as select generate_series(1, 1000000)::int i;
analyze foo;
explain analyze select count(*) from foo f1 join foo f2 using (i);

Looking at the place where that error occurs, it seems like it simply
failed to find the handle, as if it didn't exist at all at the time
dsm_attach() was called.  I'm not entirely sure how that could happen
just because you turned on huge pages.  Is it possible that there is a
race where apw_load_buffers() manages to detach before the worker
attached, and the timing changes?  At a glance, that shouldn't happen
because apw_start_database_worker() waits for the work to exit before
returning.

I think we'll need one of our Windows-enabled hackers to take a look.

PS Sorry for breaking the thread.  I wish our archives app had a
"[re]send me this email" button, for people who subscribed after the
message was sent...

-- 
Thomas Munro
https://enterprisedb.com



Thank you for taking a look.

I encountered this problem after switching the production system and then found it also on the new created replica.

I have no knowledge of the shared memory areas involved.

I did some further investigation and tried to reproduce it on the old System (WS2016, PG 11.2) but there it worked fine (without and with huge pages activated!).

Even on a developer machine under WS2019, PG 11.2 the error did not occur (both cases running on different generation of intel machines, Haswell and Nehalem, under different Hypervisors, WS2012R2 and WS2019).

I am really confused to not being able to reproduce the error outside of production and replica instances...

The error caused a massive flood of the logs (about 800 MB in about 1 day, on SSD)

I'll try to investigate further by configuring a second replica tomorrow, using the configuration of the production system as done per pg_basebackup.

I looked at the non-default configuration settings but could not identify anything special.

Here is a current list of the production System having 4GB of memory allocated to the VM.
(all values with XXX are a little obfuscated).

Here, to avoid the error, pg_prewarm.autoprewarm is off!

             name              |              current_setting               |
-------------------------------+--------------------------------------------+
 application_name              | psql                                       |
 archive_command               | copy "xxxxxx"                              |
 archive_mode                  | on                                         |
 auto_explain.log_analyze      | off                                        |
 auto_explain.log_min_duration | -1                                         |
 client_encoding               | WIN1252                                    |
 cluster_name                  | XXX_PROD                                   |
 data_checksums                | on                                         |
 DateStyle                     | ISO, DMY                                   |
 default_text_search_config    | pg_catalog.german                          |
 dynamic_shared_memory_type    | windows                                    |
 effective_cache_size          | 8GB                                        |
 lc_collate                    | C                                          |
 lc_ctype                      | German_Germany.1252                        |
 lc_messages                   | C                                          |
 lc_monetary                   | German_Germany.1252                        |
 lc_numeric                    | German_Germany.1252                        |
 lc_time                       | German_Germany.1252                        |
 listen_addresses              | *                                          |
 log_destination               | stderr                                     |
 log_directory                 | <XXX_PATH_TO_LOG)                          |
 log_file_mode                 | 0640                                       |
 log_line_prefix               | XXX PRD %t %i %e %2l:>                     |
 log_statement                 | mod                                        |
 log_temp_files                | 0                                          |
 log_timezone                  | CET                                        |
 logging_collector             | on                                         |
 maintenance_work_mem          | 128MB                                      |
 max_connections               | 200                                        |
 max_stack_depth               | 2MB                                        |
 max_wal_size                  | 1GB                                        |
 min_wal_size                  | 80MB                                       |
 pg_prewarm.autoprewarm        | off                                        |
 pg_stat_statements.max        | 8000                                       |
 pg_stat_statements.track      | all                                        |
 random_page_cost              | 1                                          |
 search_path                   | public, archiv, ablage, admin              |
 server_encoding               | UTF8                                       |
 server_version                | 11.2                                       |
 shared_buffers                | 768MB                                      |
 shared_preload_libraries      | auto_explain,pg_stat_statements,pg_prewarm |
 temp_buffers                  | 32MB                                       |
 TimeZone                      | CET                                        |
 transaction_deferrable        | off                                        |
 transaction_isolation         | read committed                             |
 transaction_read_only         | off                                        |
 update_process_title          | off                                        |
 wal_buffers                   | 16MB                                       |
 wal_compression               | on                                         |
 wal_segment_size              | 16MB                                       |
 work_mem                      | 64MB                                       |
(51 Zeilen)


Thanks

Hans Buschmann

On Thu, Feb 21, 2019 at 4:36 AM Hans Buschmann <buschmann@nidsa.net> wrote:
> I encountered this problem after switching the production system and then found it also on the new created replica.
>
> I have no knowledge of the shared memory areas involved.
>
> I did some further investigation and tried to reproduce it on the old System (WS2016, PG 11.2) but there it worked
fine(without and with huge pages activated!).
 
>
> Even on a developer machine under WS2019, PG 11.2 the error did not occur (both cases running on different generation
ofintel machines, Haswell and Nehalem, under different Hypervisors, WS2012R2 and WS2019).
 
>
> I am really confused to not being able to reproduce the error outside of production and replica instances...
>
> The error caused a massive flood of the logs (about 800 MB in about 1 day, on SSD)
>
> I'll try to investigate further by configuring a second replica tomorrow, using the configuration of the production
systemas done per pg_basebackup.
 

Just to confirm:  on the machines where it happens, does it happen on
every restart, and does it never happen if you set huge_pages = off?

CC'ing the authors of the auto-prewarm feature to see if they have ideas.

There is a known bug (fixed in commit 6c0fb941 for the next release)
that would cause spurious dsm_attach() failure that would look just
this this (dsm_attach() returns NULL), but that should be very rare
and couldn't cause the behaviour described here, because here the
background worker is repeatedly failing to attach in a loop (hence the
800MB of logs).

-- 
Thomas Munro
https://enterprisedb.com


hello

Since these are production systems, I did'nt set huge_pages=off.
(huge pages give performance, autoprewarm is not so necessary)

I think it occured on every start, but the systems were only started 1 to 2 times in this error mode.

In the other cases I tried yesterday the results where very confusing (error not reproducable with or without huge pages).

Hans Buschmann

Hi Thomas, Hans,
On Thu, Feb 21, 2019 at 2:16 PM Hans Buschmann <buschmann@nidsa.net> wrote:
>
> hello
>
> Since these are production systems, I did'nt set huge_pages=off.
> (huge pages give performance, autoprewarm is not so necessary)

I did turn autoprewarm on, windows server 2019 and postgresql 11.2 it
runs fine even with huge_pages=on (Thanks to neha sharma). As Thomas
said error is coming from per database worker and main worker waits
till per data database worker exists so from code review I do see an
issue of having an invalid handle in per database worker. A
reproducible testcase will really help. I shall see to recheck the
code again but I am not much hopeful without a proper testcase.

--
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com


On Thu, Feb 21, 2019 at 6:23 PM Mithun Cy <mithun.cy@gmail.com> wrote:

> said error is coming from per database worker and main worker waits
> till per data database worker exists so from code review I do see an
> issue of having an invalid handle in per database worker.

Sorry a typo error, I meant I do not see an issue from the code.




On the weekend, I did some more investigations:

It seems that Huge pages are NOT the cause of this problem.

The problem is only reproducable ONCE, after a database restart it disappears.

By reinstalling the original pg_pasebackup on another test VM the problem reappeared once.

Here is the start of  the error log:

CPS PRD 2019-02-24 12:11:57 CET  00000  1:> LOG:  database system was interrupted; last known up at 2019-02-17 16:14:05 CET
CPS PRD 2019-02-24 12:12:16 CET  00000  2:> LOG:  entering standby mode
CPS PRD 2019-02-24 12:12:16 CET  00000  3:> LOG:  redo starts at 0/23000028
CPS PRD 2019-02-24 12:12:16 CET  00000  4:> LOG:  consistent recovery state reached at 0/23000168
CPS PRD 2019-02-24 12:12:16 CET  00000  5:> LOG:  invalid record length at 0/24000060: wanted 24, got 0
CPS PRD 2019-02-24 12:12:16 CET  00000  9:> LOG:  database system is ready to accept read only connections
CPS PRD 2019-02-24 12:12:16 CET  3D000  1:> FATAL:  database 16384 does not exist
CPS PRD 2019-02-24 12:12:16 CET  00000 10:> LOG:  background worker "autoprewarm worker" (PID 3968) exited with exit code 1
CPS PRD 2019-02-24 12:12:16 CET  00000  1:> LOG:  autoprewarm successfully prewarmed 0 of 12402 previously-loaded blocks
CPS PRD 2019-02-24 12:12:17 CET  XX000  1:> FATAL:  could not connect to the primary server: FATAL:  no pg_hba.conf entry for replication connection from host "192.168.27.155", user "replicator", SSL off
CPS PRD 2019-02-24 12:12:17 CET  55000  1:> ERROR:  could not map dynamic shared memory segment
CPS PRD 2019-02-24 12:12:17 CET  00000 11:> LOG:  background worker "autoprewarm worker" (PID 3296) exited with exit code 1
CPS PRD 2019-02-24 12:12:17 CET  XX000  1:> FATAL:  could not connect to the primary server: FATAL:  no pg_hba.conf entry for replication connection from host "192.168.27.155", user "replicator", SSL off
CPS PRD 2019-02-24 12:12:17 CET  55000  1:> ERROR:  could not map dynamic shared memory segment
CPS PRD 2019-02-24 12:12:17 CET  00000 12:> LOG:  background worker "autoprewarm worker" (PID 2756) exited with exit code 1
CPS PRD 2019-02-24 12:12:17 CET  55000  1:> ERROR:  could not map dynamic shared memory segment
...
(PS: the correct replication function was not set, so causing the errors concerning replication)

It seems that an outdated autoprewarm.blocks causes the problem.

After a restart the autoprewarm.blocks file seems to be rewritten, so that the next start gives no error.

For a test, I copied the erroneus autoprewarm.blocks files over to the data section and the problem reappeared.


The autoprewarm.blocks file is not corrupted or moved around manually but rather a leftover from the preceding test installation.

On this instance I had installed a copy of the production database under 11.2.
By doing the production switch, I dropped the test database and pg_restored the current one.

This left the previous autoprewarm.blocks file in the data directory.

On the first start the autoprewarm files does not match the newly restored database (perhpas the cause of the fatal error: database 16384 does not exist)

So the problem lies in the initial detection of the autoprewarm.blocks file.

This seems easy to reproduce:

- Install/create a database with autoprewarm on and pg_prewarm loaded.
- Fill the autoprewarm cache with some data
- pg_dump the database
- drop the database
- create the database and pg_restore it from the dump
- start the instance and logs are flooded

I have taken no further investigation in the sourcecode due to limited skills so far...


Thanks

Hans Buschmann

Thanks Hans, for a simple reproducible tests.

On Sun, Feb 24, 2019 at 6:54 PM Hans Buschmann <buschmann@nidsa.net> wrote:
> Here is the start of  the error log:
>
> CPS PRD 2019-02-24 12:11:57 CET  00000  1:> LOG:  database system was interrupted; last known up at 2019-02-17 16:14:05 CET
> CPS PRD 2019-02-24 12:12:16 CET  00000  2:> LOG:  entering standby mode
> CPS PRD 2019-02-24 12:12:16 CET  00000  3:> LOG:  redo starts at 0/23000028
> CPS PRD 2019-02-24 12:12:16 CET  00000  4:> LOG:  consistent recovery state reached at 0/23000168
> CPS PRD 2019-02-24 12:12:16 CET  00000  5:> LOG:  invalid record length at 0/24000060: wanted 24, got 0
> CPS PRD 2019-02-24 12:12:16 CET  00000  9:> LOG:  database system is ready to accept read only connections
> CPS PRD 2019-02-24 12:12:16 CET  3D000  1:> FATAL:  database 16384 does not exist
> CPS PRD 2019-02-24 12:12:16 CET  00000 10:> LOG:  background worker "autoprewarm worker" (PID 3968) exited with exit code 1
> CPS PRD 2019-02-24 12:12:16 CET  00000  1:> LOG:  autoprewarm successfully prewarmed 0 of 12402 previously-loaded blocks
> CPS PRD 2019-02-24 12:12:17 CET  XX000  1:> FATAL:  could not connect to the primary server: FATAL:  no pg_hba.conf entry for replication connection from host "192.168.27.155", user "replicator", SSL off
> CPS PRD 2019-02-24 12:12:17 CET  55000  1:> ERROR:  could not map dynamic shared memory segment

As per the log Auto prewarm master did exit ("autoprewarm successfully prewarmed 0 of 12402 previously-loaded blocks") first. Then only we started getting "could not map dynamic shared memory segment".
That is, master has done dsm_detach and then workers started throwing error after that.

> This seems easy to reproduce:
>
> - Install/create a database with autoprewarm on and pg_prewarm loaded.
> - Fill the autoprewarm cache with some data
> - pg_dump the database
> - drop the database
> - create the database and pg_restore it from the dump
> - start the instance and logs are flooded
>
> I have taken no further investigation in the sourcecode due to limited skills so far...

I was able to reproduce same.

The  "worker.bgw_restart_time" is never set for autoprewarm workers so on error it get restarted after some period of time (default behavior). Since database itself is dropped our attempt to connect to that database failed and then worker exited. But again got restated by postmaster then we start seeing above DSM segment error.

I think every autoprewarm worker should be set with "worker.bgw_restart_time = BGW_NEVER_RESTART;" so that there shall not be repeated prewarm attempt of a dropped database. I will try to think further and submit a patch for same.

--
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com


Glad to hear you could reproduce the case easily.

I wanted to add that the problem as it seems now should'nt be restricted to Windows only.

Another thing is the semantic scope of pg_prewarm:

Prewarming affects the whole cluster, so at instance start we can meet some active and/or some dropped databases.

To not affect the other databases the prewarming should occur on all non dropped databases and omit only the dropped ones.

Hope your thinking gives a good patch... ;)

Hans Buschmann