[GENERAL] Out of memory/corrupted shared memory problem on server

Поиск
Список
Период
Сортировка
От Johann Spies
Тема [GENERAL] Out of memory/corrupted shared memory problem on server
Дата
Msg-id CAGZ55DR67Ejvef8GkLjFp8vqzfnhngfm2jzoerZ3WbCjxoLeWw@mail.gmail.com
обсуждение исходный текст
Ответы Re: [GENERAL] Out of memory/corrupted shared memory problem on server
Re: [GENERAL] Out of memory/corrupted shared memory problem on server
Список pgsql-general
While restoring a dump from our development server (768G ram) to the
production server, (PG 9.6.3 on Debian Stretch with 128G Ram) the
refreshing of a Materialized View fails like this:

local] js@wos=# REFRESH MATERIALIZED VIEW wos_2017_1.citation_window_mv ;
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
Time: 11556111.977 ms

In the log:
------------------
2017-08-24 19:23:26 SAST [7532-18] LOG:  server process (PID 4890) was
terminated by signal 9: Killed
2017-08-24 19:23:26 SAST [7532-19] DETAIL:  Failed process was
running: REFRESH MATERIALIZED VIEW wos_2017_1.citation_window_mv ;
2017-08-24 19:23:26 SAST [7532-20] LOG:  terminating any other active
server processes
2017-08-24 19:23:26 SAST [16376-1] crest@data_portal WARNING:
terminating connection because of crash of another server process
2017-08-24 19:23:26 SAST [16376-2] crest@data_portal DETAIL:  The
postmaster has commanded this server process to roll back the current
transaction and exit, because anothe\
r server process exited abnormally and possibly corrupted shared memory.
2017-08-24 19:23:26 SAST [16376-3] crest@data_portal HINT:  In a
moment you should be able to reconnect to the database and repeat your
command.
--------------------

This show a cross-database  problem.  The Mateiralized Vew is in
database wos while the other related problem seems to be in database
data_portal.  We could not determine what caused the problem in
database_portal.  Or was it caused by the out-of-memory problem in the
wos-process?

The Materialized View  uses a complex query and  should contain 69 772
381 records.

Monitoring the memory usage while running the refresh materialized
view command show  a steady increase by the process until reaches 100%
and breaks.

The server has 128G Ram with the following changes to the default
setup (and you can see how we tried to solve the problem by opting for
lower thresholds in many cases):

# http://edoceo.com/howto/postgresql-performance
# https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

# pgtune wiz ard 21.01.2016:
max_connections = 80
#shared_buffers = 32GB
shared_buffers = 14GB
#effective_cache_size = 96GB
effective_cache_size = 20GB
#work_mem = 4GB
work_mem = 2GB
maintenance_work_mem = 2GB
checkpoint_completion_target = 0.9
wal_buffers = 16MB
default_statistics_target = 100
temp_buffers = 1GB
fsync = true
max_worker_processes = 24        # (change requires restart)
max_parallel_workers_per_gather = 4    # taken from max_worker_processes
checkpoint_flush_after = 256kB
idle_in_transaction_session_timeout = 3600000

# Other:

# max_wal_size = (3 * checkpoint_segments) *16MB
# http://www.postgresql.org/docs/9.5/static/release-9-5.html
max_wal_size = 3GB # Replace checkpoint_segments
huge_pages = try



# - Archiving -
wal_level = archive
wal_sync_method = fdatasync
full_page_writes = on           # recover from partial page writes
wal_buffers = -1

#archive_mode = on      # allows archiving to be done
archive_mode = off      # allows archiving to be done

And in /etc/sysctl.conf:


# http://padmavyuha.blogspot.co.za/2010/12/configuring-shmmax-and-shmall-for.html

# (for 60GB)
kernel.shmall = 15728640
kernel.shmmax = 64424509440
# run "sudo sysctl -p" after editing

We are stuck at the moment and do not know how to proceed from here.
Help will be appreciated.

Regards
Johann

--
Because experiencing your loyal love is better than life itself,
my lips will praise you.  (Psalm 63:3)


В списке pgsql-general по дате отправления:

Предыдущее
От: Igor Korot
Дата:
Сообщение: Re: [GENERAL] Retrieving query results
Следующее
От: Christoph Moench-Tegeder
Дата:
Сообщение: Re: [GENERAL] Out of memory/corrupted shared memory problem on server