Out of memory error on automatic vacuum

Поиск
Список
Период
Сортировка
От Ekaterina Amez
Тема Out of memory error on automatic vacuum
Дата
Msg-id a5c8d612-693e-1546-5cbf-6d928287bdc6@zunibal.com
обсуждение исходный текст
Ответы Re: Out of memory error on automatic vacuum  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Список pgsql-performance

Hi all,

This morning I was checking postgres servers logs, looking for errors (we've recently upgraded them and changed default config) and long running queries when I found one of the servers had really big logs since yesterday.  It was giving the error of this mail's subject: out of memory, failed on request of size XXX on automatic vacuum of table YYY. A quick search revealed me some postgresql-lists messages talking about work_mem and shared_buffers configuration options, some kernel config options too. Although all of them were messages from several years ago, I decided to cut my shared_buffers configured value and restart server: now it looks like error is gone. But I'd like to understand what's beyond the logged error (it's really long and refers to things about inner functionalities that I'm missing), how to detect what config options are possibly conflicting and, most important, I want to know if I've solved it right.

-- Logged error (one of them) ==> Original log is in spanish and I've translated myself. I've replaced my database objects names

TopMemoryContext: 77568 total in 5 blocks; 15612 free (5 chunks); 61956 used
  TopTransactionContext: 8192 total in 1 blocks; 7604 free (40 chunks); 588 used
  TOAST to main relid map: 24576 total in 2 blocks; 13608 free (3 chunks); 10968 used
  AV worker: 8192 total in 1 blocks; 3228 free (6 chunks); 4964 used
    Autovacuum Portal: 0 total in 0 blocks; 0 free (0 chunks); 0 used => Is this line the one that points to the task that has run of of memory?
      Vacuum: 8192 total in 1 blocks; 8132 free (0 chunks); 60 used
  Operator class cache: 8192 total in 1 blocks; 4420 free (0 chunks); 3772 used
  smgr relation table: 8192 total in 1 blocks; 316 free (0 chunks); 7876 used
  TransactionAbortContext: 32768 total in 1 blocks; 32748 free (0 chunks); 20 used
  Portal hash: 8192 total in 1 blocks; 3460 free (0 chunks); 4732 used
  PortalMemory: 0 total in 0 blocks; 0 free (0 chunks); 0 used
  Relcache by OID: 8192 total in 1 blocks; 1884 free (0 chunks); 6308 used
  CacheMemoryContext: 516096 total in 6 blocks; 234136 free (5 chunks); 281960 used
    <index>: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    <index>: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_index_indrelid_index: 1024 total in 1 blocks; 676 free (0 chunks); 348 used
    pg_attrdef_adrelid_adnum_index: 1024 total in 1 blocks; 604 free (0 chunks); 420 used
    pg_db_role_setting_databaseid_rol_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
    pg_opclass_am_name_nsp_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
    pg_foreign_data_wrapper_name_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_enum_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_class_relname_nsp_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
    pg_foreign_server_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_statistic_relid_att_inh_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
    pg_cast_source_target_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
    pg_language_name_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_transform_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_collation_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_amop_fam_strat_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
    pg_index_indexrelid_index: 1024 total in 1 blocks; 676 free (0 chunks); 348 used
    pg_ts_template_tmplname_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
    pg_ts_config_map_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
    pg_opclass_oid_index: 1024 total in 1 blocks; 676 free (0 chunks); 348 used
    pg_foreign_data_wrapper_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_event_trigger_evtname_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_ts_dict_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_event_trigger_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_conversion_default_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
    pg_operator_oprname_l_r_n_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
    pg_trigger_tgrelid_tgname_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
    pg_enum_typid_label_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
    pg_ts_config_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_user_mapping_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_opfamily_am_name_nsp_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
    pg_foreign_table_relid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_type_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_aggregate_fnoid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_constraint_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_rewrite_rel_rulename_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
    pg_ts_parser_prsname_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
    pg_ts_config_cfgname_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
    pg_ts_parser_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_operator_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_namespace_nspname_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_ts_template_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_amop_opr_fam_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
    pg_default_acl_role_nsp_obj_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
    pg_collation_name_enc_nsp_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
    pg_range_rngtypid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_ts_dict_dictname_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
    pg_type_typname_nsp_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
    pg_opfamily_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_class_oid_index: 1024 total in 1 blocks; 676 free (0 chunks); 348 used
    pg_proc_proname_args_nsp_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
    pg_transform_type_lang_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
    pg_attribute_relid_attnum_index: 1024 total in 1 blocks; 604 free (0 chunks); 420 used
    pg_proc_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_language_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_namespace_oid_index: 1024 total in 1 blocks; 676 free (0 chunks); 348 used
    pg_amproc_fam_proc_index: 1024 total in 1 blocks; 436 free (0 chunks); 588 used
    pg_foreign_server_name_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_attribute_relid_attnam_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
    pg_conversion_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_user_mapping_user_server_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
    pg_conversion_name_nsp_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
    pg_authid_oid_index: 1024 total in 1 blocks; 676 free (0 chunks); 348 used
    pg_auth_members_member_role_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
    pg_tablespace_oid_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_shseclabel_object_index: 1024 total in 1 blocks; 476 free (0 chunks); 548 used
    pg_replication_origin_roname_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_database_datname_index: 1024 total in 1 blocks; 676 free (0 chunks); 348 used
    pg_replication_origin_roiident_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
    pg_auth_members_role_member_index: 1024 total in 1 blocks; 644 free (0 chunks); 380 used
    pg_database_oid_index: 1024 total in 1 blocks; 676 free (0 chunks); 348 used
    pg_authid_rolname_index: 1024 total in 1 blocks; 716 free (0 chunks); 308 used
  WAL record construction: 49520 total in 2 blocks; 6876 free (0 chunks); 42644 used
  PrivateRefCount: 8192 total in 1 blocks; 5516 free (0 chunks); 2676 used
  MdSmgr: 8192 total in 1 blocks; 8004 free (0 chunks); 188 used
  LOCALLOCK hash: 8192 total in 1 blocks; 2428 free (0 chunks); 5764 used
  Timezones: 104028 total in 2 blocks; 5516 free (0 chunks); 98512 used
  Postmaster: 8192 total in 1 blocks; 7796 free (8 chunks); 396 used
    ident parser context: 0 total in 0 blocks; 0 free (0 chunks); 0 used
    hba parser context: 7168 total in 3 blocks; 3148 free (1 chunks); 4020 used
  ErrorContext: 8192 total in 1 blocks; 8172 free (4 chunks); 20 used
Grand total: 1009356 bytes in 130 blocks; 436888 free (72 chunks); 572468 used
[2019-11-17 09:27:37.425 CET] – 11899 ERROR:  out of memory
[2019-11-17 09:27:37.425 CET] – 11899 DETAIL:  Failed on request of size 700378218.
[2019-11-17 09:27:37.425 CET] – 11899 CONTEXT:  automatic vacuum on table «my_db.public.my_table»


-- System specs:

Centos 6.5 i686.

12 GB RAM.

[root@myserver pg_log]# lscpu
Architecture:          i686
CPU op-mode(s):        32-bit, 64-bit
CPU(s):                4
On-line CPU(s) list:   0-3
Thread(s) per core:    1
Core(s) per socket:    4
Socket(s):             1
Vendor ID:             GenuineIntel

[root@myserver pg_log]# psql postgres -c "select version();"
                                                 version                                                
---------------------------------------------------------------------------------------------------------
 PostgreSQL 9.6.15 on i686-pc-linux-gnu, compiled by gcc (GCC) 4.4.7 20120313 (Red Hat 4.4.7-23), 32-bit
(1 row)

-- Postgres configured values

[root@myserver data]# cat postgresql.conf | grep CHANGED
listen_addresses = '*'        # what IP address(es) to listen on; - CHANGED
port = 5432                # (change requires restart) - CHANGED
shared_buffers = 1024MB            # min 128kB - Default 128MB - CHANGED ==> This value was 2048 when I was getting out of memory error.
work_mem = 64MB                # min 64kB - Default 4MB - CHANGED
maintenance_work_mem = 704MB        # min 1MB - Default 64MB - CHANGED
effective_io_concurrency = 2        # 1-1000; 0 disables prefetching - Df 1 - CHANGED
wal_buffers = 16MB            # min 32kB, -1 sets based on shared_buffers - Df -1 - CHANGED
max_wal_size = 1GB    # Default 1GB - CHANGED
min_wal_size = 512MB    # Default 80MB - CHANGED
checkpoint_completion_target = 0.8    # checkpoint target duration, 0.0 - 1.0 - Df 0.5 - CHANGED
effective_cache_size = 8448MB  #Default 4GB - CHANGED
log_min_duration_statement = 30000    # -1 is disabled, 0 logs all statements - Df -1 - CHANGED
log_line_prefix = '[%m] – %p %q- %u@%h:%d – %a  '    # special values: - CHANGED

I've now realized that work_mem is only 64MB, isn't it a bit low?

Please note that server has a 32bit OS installed (don't know why). I'm aware of some limitations with memory configuration on 32bit systems [1]

Summary: how to interprete the log message, and is the error controlled by the change made to shared_buffers value (from 2GB to 1GB).

Kind regards,

Ekaterina


[1] this post written by Robert Haas: http://rhaas.blogspot.com/2011/05/sharedbuffers-on-32-bit-systems.html

В списке pgsql-performance по дате отправления:

Предыдущее
От: Craig James
Дата:
Сообщение: Re: Simple DELETE on modest-size table runs 100% CPU forever
Следующее
От: Tomas Vondra
Дата:
Сообщение: Re: Out of memory error on automatic vacuum