Re: Out of Memory errors are frustrating as heck!

Поиск
Список
Период
Сортировка
От Gunther
Тема Re: Out of Memory errors are frustrating as heck!
Дата
Msg-id 87c5cafb-92c4-7788-5153-15a7e93ecfbe@gusw.net
обсуждение исходный текст
Ответ на Re: Out of Memory errors are frustrating as heck!  (Tom Lane <tgl@sss.pgh.pa.us>)
Ответы Re: Out of Memory errors are frustrating as heck!  (Justin Pryzby <pryzby@telsasoft.com>)
Re: Out of Memory errors are frustrating as heck!  (Jeff Janes <jeff.janes@gmail.com>)
Список pgsql-performance

Thanks for looking at my problem Tom Lane and Jeff Janes. Sorry for not having given enough detail.

The version is 10.2 latest. The database was originally built with 10.1 and then just started with 10.2. No dump and reload or pg_upgrade. Underlying system is 64bit Amazon Linux (CentOS like) running on an AMD64 VM (m5a) right now.

I said "crash" and that is wrong. Not a signal nor core dump. It is the ERROR:  out of memory. Only the query crashes. Although I don't know if may be the backend server might have left a core dump? Where would that be? Would it help anyone if I started the server with the -c option to get a core dump? I guess I could re-compile with gcc -g debugging symbols all on and then run with that -c option, and then use gdb to find out which line it was failing at and then inspect the query plan data structure? Would that be helpful? Does anyone want the coredump to inspect?

The short version is:

Grand total: 1437014672 bytes in 168424 blocks; 11879744 free (3423 chunks); 1425134928 used
2019-04-14 16:38:26.355 UTC [11061] ERROR:  out of memory
2019-04-14 16:38:26.355 UTC [11061] DETAIL:  Failed on request of size 8272 in memory context "ExecutorState".

Here is the out of memory error dump in its full glory.

TopMemoryContext: 2197400 total in 7 blocks; 42952 free (15 chunks); 2154448 used TableSpace cache: 8192 total in 1 blocks; 2096 free (0 chunks); 6096 used Type information cache: 24352 total in 2 blocks; 2624 free (0 chunks); 21728 used pgstat TabStatusArray lookup hash table: 8192 total in 1 blocks; 416 free (0 chunks); 7776 used TopTransactionContext: 8192 total in 1 blocks; 7720 free (2 chunks); 472 used RowDescriptionContext: 8192 total in 1 blocks; 6896 free (0 chunks); 1296 used MessageContext: 2097152 total in 9 blocks; 396480 free (10 chunks); 1700672 used Operator class cache: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used smgr relation table: 32768 total in 3 blocks; 16832 free (8 chunks); 15936 used TransactionAbortContext: 32768 total in 1 blocks; 32512 free (0 chunks); 256 used Portal hash: 8192 total in 1 blocks; 560 free (0 chunks); 7632 used TopPortalContext: 8192 total in 1 blocks; 7664 free (0 chunks); 528 used   PortalContext: 1024 total in 1 blocks; 624 free (0 chunks); 400 used:     ExecutorState: 1416621920 total in 168098 blocks; 8494152 free (3102 chunks); 1408127768 used       HashTableContext: 8192 total in 1 blocks; 7752 free (0 chunks); 440 used         HashBatchContext: 57432 total in 3 blocks; 16072 free (6 chunks); 41360 used       HashTableContext: 8192 total in 1 blocks; 7752 free (0 chunks); 440 used         HashBatchContext: 90288 total in 4 blocks; 16072 free (6 chunks); 74216 used       HashTableContext: 8192 total in 1 blocks; 7624 free (0 chunks); 568 used         HashBatchContext: 90288 total in 4 blocks; 16072 free (6 chunks); 74216 used       TupleSort main: 286912 total in 8 blocks; 246792 free (39 chunks); 40120 used       TupleSort main: 286912 total in 8 blocks; 246792 free (39 chunks); 40120 used       HashTableContext: 8454256 total in 6 blocks; 64848 free (32 chunks); 8389408 used         HashBatchContext: 106640 total in 3 blocks; 7936 free (0 chunks); 98704 used       TupleSort main: 452880 total in 8 blocks; 126248 free (27 chunks); 326632 used         Caller tuples: 4194304 total in 10 blocks; 1434888 free (20 chunks); 2759416 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used       ExprContext: 8192 total in 1 blocks; 7936 free (0 chunks); 256 used Relcache by OID: 16384 total in 2 blocks; 3512 free (2 chunks); 12872 used CacheMemoryContext: 1101328 total in 14 blocks; 386840 free (1 chunks); 714488 used   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: businessop_docid_ndx   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: businessop_sbjentityidroot_ndx   index info: 2048 total in 2 blocks; 704 free (1 chunks); 1344 used: businessop_sbjroleiid_ndx   index info: 2048 total in 2 blocks; 680 free (1 chunks); 1368 used: pg_toast_2619_index   index info: 2048 total in 2 blocks; 968 free (1 chunks); 1080 used: entity_id_fkidx   index info: 2048 total in 2 blocks; 696 free (1 chunks); 1352 used: entity_id_idx   index info: 2048 total in 2 blocks; 968 free (1 chunks); 1080 used: act_id_fkidx   index info: 2048 total in 2 blocks; 696 free (1 chunks); 1352 used: act_id_idx   index info: 2048 total in 2 blocks; 592 free (1 chunks); 1456 used: pg_constraint_conrelid_contypid_conname_index   index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: actrelationship_pkey   index info: 2048 total in 2 blocks; 624 free (1 chunks); 1424 used: actrelationship_target_idx   index info: 2048 total in 2 blocks; 624 free (1 chunks); 1424 used: actrelationship_source_idx   index info: 2048 total in 2 blocks; 680 free (1 chunks); 1368 used: documentinformation_pk   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_statistic_ext_relid_index   index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: docinfsubj_ndx_seii   index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: docinfsubj_ndx_sbjentcodeonly   index info: 2048 total in 2 blocks; 680 free (1 chunks); 1368 used: pg_toast_2618_index   index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: pg_index_indrelid_index   relation rules: 229376 total in 31 blocks; 5136 free (0 chunks); 224240 used: v_businessoperation   index info: 2048 total in 2 blocks; 648 free (2 chunks); 1400 used: pg_db_role_setting_databaseid_rol_index   index info: 2048 total in 2 blocks; 624 free (2 chunks); 1424 used: pg_opclass_am_name_nsp_index   index info: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used: pg_foreign_data_wrapper_name_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_enum_oid_index   index info: 2048 total in 2 blocks; 680 free (2 chunks); 1368 used: pg_class_relname_nsp_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_foreign_server_oid_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_publication_pubname_index   index info: 2048 total in 2 blocks; 592 free (3 chunks); 1456 used: pg_statistic_relid_att_inh_index   index info: 2048 total in 2 blocks; 680 free (2 chunks); 1368 used: pg_cast_source_target_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_language_name_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_transform_oid_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_collation_oid_index   index info: 3072 total in 2 blocks; 1136 free (2 chunks); 1936 used: pg_amop_fam_strat_index   index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: pg_index_indexrelid_index   index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used: pg_ts_template_tmplname_index   index info: 2048 total in 2 blocks; 704 free (3 chunks); 1344 used: pg_ts_config_map_index   index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: pg_opclass_oid_index   index info: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used: pg_foreign_data_wrapper_oid_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_event_trigger_evtname_index   index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used: pg_statistic_ext_name_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_publication_oid_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_ts_dict_oid_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_event_trigger_oid_index   index info: 3072 total in 2 blocks; 1216 free (3 chunks); 1856 used: pg_conversion_default_index   index info: 3072 total in 2 blocks; 1216 free (3 chunks); 1856 used: pg_operator_oprname_l_r_n_index   index info: 2048 total in 2 blocks; 680 free (2 chunks); 1368 used: pg_trigger_tgrelid_tgname_index   index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used: pg_enum_typid_label_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_ts_config_oid_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_user_mapping_oid_index   index info: 2048 total in 2 blocks; 704 free (3 chunks); 1344 used: pg_opfamily_am_name_nsp_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_foreign_table_relid_index   index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: pg_type_oid_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_aggregate_fnoid_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_constraint_oid_index   index info: 2048 total in 2 blocks; 680 free (2 chunks); 1368 used: pg_rewrite_rel_rulename_index   index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used: pg_ts_parser_prsname_index   index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used: pg_ts_config_cfgname_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_ts_parser_oid_index   index info: 2048 total in 2 blocks; 728 free (1 chunks); 1320 used: pg_publication_rel_prrelid_prpubid_index   index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: pg_operator_oid_index   index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: pg_namespace_nspname_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_ts_template_oid_index   index info: 2048 total in 2 blocks; 624 free (2 chunks); 1424 used: pg_amop_opr_fam_index   index info: 2048 total in 2 blocks; 672 free (3 chunks); 1376 used: pg_default_acl_role_nsp_obj_index   index info: 2048 total in 2 blocks; 704 free (3 chunks); 1344 used: pg_collation_name_enc_nsp_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_publication_rel_oid_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_range_rngtypid_index   index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used: pg_ts_dict_dictname_index   index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used: pg_type_typname_nsp_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_opfamily_oid_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_statistic_ext_oid_index   index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: pg_class_oid_index   index info: 2048 total in 2 blocks; 704 free (3 chunks); 1344 used: pg_proc_proname_args_nsp_index   index info: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used: pg_partitioned_table_partrelid_index   index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used: pg_transform_type_lang_index   index info: 2048 total in 2 blocks; 680 free (2 chunks); 1368 used: pg_attribute_relid_attnum_index   index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: pg_proc_oid_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_language_oid_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_namespace_oid_index   index info: 3072 total in 2 blocks; 1136 free (2 chunks); 1936 used: pg_amproc_fam_proc_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_foreign_server_name_index   index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used: pg_attribute_relid_attnam_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_conversion_oid_index   index info: 2048 total in 2 blocks; 728 free (1 chunks); 1320 used: pg_user_mapping_user_server_index   index info: 2048 total in 2 blocks; 728 free (1 chunks); 1320 used: pg_subscription_rel_srrelid_srsubid_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_sequence_seqrelid_index   index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used: pg_conversion_name_nsp_index   index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: pg_authid_oid_index   index info: 2048 total in 2 blocks; 728 free (1 chunks); 1320 used: pg_auth_members_member_role_index   index info: 1024 total in 1 blocks; 48 free (0 chunks); 976 used: pg_subscription_oid_index   index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: pg_tablespace_oid_index   index info: 2048 total in 2 blocks; 704 free (3 chunks); 1344 used: pg_shseclabel_object_index   index info: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used: pg_replication_origin_roname_index   index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: pg_database_datname_index   index info: 2048 total in 2 blocks; 760 free (2 chunks); 1288 used: pg_subscription_subname_index   index info: 1024 total in 1 blocks; 16 free (0 chunks); 1008 used: pg_replication_origin_roiident_index   index info: 2048 total in 2 blocks; 728 free (1 chunks); 1320 used: pg_auth_members_role_member_index   index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: pg_database_oid_index   index info: 2048 total in 2 blocks; 952 free (1 chunks); 1096 used: pg_authid_rolname_index WAL record construction: 49768 total in 2 blocks; 6368 free (0 chunks); 43400 used PrivateRefCount: 8192 total in 1 blocks; 2624 free (0 chunks); 5568 used MdSmgr: 8192 total in 1 blocks; 7256 free (1 chunks); 936 used LOCALLOCK hash: 16384 total in 2 blocks; 4600 free (2 chunks); 11784 used Timezones: 104120 total in 2 blocks; 2624 free (0 chunks); 101496 used ErrorContext: 8192 total in 1 blocks; 7936 free (4 chunks); 256 used
Grand total: 1437014672 bytes in 168424 blocks; 11879744 free (3423 chunks); 1425134928 used
2019-04-14 16:38:26.355 UTC [11061] ERROR:  out of memory
2019-04-14 16:38:26.355 UTC [11061] DETAIL:  Failed on request of size 8272 in memory context "ExecutorState".
I am delighted that Tom Lane is hinting that my query plan doesn't look so crazy, and it isn't. And delighted there may be a known bug involved.

I wonder if this is one data issue. May be a few rows have excessively long text fields? But even checking for that is rather difficult because there are many tables and columns involved. Would really be nice if the error would say exactly what plan step that ExecutorState referred to, so one could narrow it down.

regards,
-Gunther

On 4/14/2019 17:19, Tom Lane wrote:
Gunther <raj@gusw.net> writes:
For weeks now, I am banging my head at an "out of memory" situation. 
There is only one query I am running on an 8 GB system, whatever I try, 
I get knocked out on this out of memory. It is extremely impenetrable to 
understand and fix this error. I guess I could add a swap file, and then 
I would have to take the penalty of swapping. But how can I actually 
address an out of memory condition if the system doesn't tell me where 
it is happening?
You might want to see the query, but it is a huge plan, and I can't 
really break this down. It shouldn't matter though. But just so you can 
get a glimpse here is the plan:
Is that the whole plan?  With just three sorts and two materializes,
it really shouldn't use more than more-or-less 5X work_mem.  What do
you have work_mem set to, anyway?  Is this a 64-bit build of PG?

Also, are the estimated rowcounts shown here anywhere close to what
you expect in reality?  If there are any AFTER INSERT triggers on the
insertion target table, you'd also be accumulating per-row trigger
queue entries ... but if there's only circa 35K rows to be inserted,
it's hard to credit that eating more than a couple hundred KB, either.

Might this be a bug?
It's conceivable that you've hit some memory-leakage bug, but if so you
haven't provided nearly enough info for anyone else to reproduce it.
You haven't even shown us the actual error message :-(

https://wiki.postgresql.org/wiki/Guide_to_reporting_problems
		regards, tom lane

В списке pgsql-performance по дате отправления:

Предыдущее
От: Jeff Janes
Дата:
Сообщение: Re: Out of Memory errors are frustrating as heck!
Следующее
От: Peter
Дата:
Сообщение: Re: Out of Memory errors are frustrating as heck!