free(): invalid pointer , when to dumping data to a new server

Поиск
Список
Период
Сортировка
От Lars Aksel Opsahl
Тема free(): invalid pointer , when to dumping data to a new server
Дата
Msg-id AM7P189MB10282FF31D45127630C767739D1D0@AM7P189MB1028.EURP189.PROD.OUTLOOK.COM
обсуждение исходный текст
Ответы Re: free(): invalid pointer , when to dumping data to a new server  (Lars Aksel Opsahl <Lars.Opsahl@nibio.no>)
Список pgsql-admin

Hi


I have problem with a new server that crashes when restoring data to it. It's around 1.5 TB of raw data with both spatial and raster data.


I use Postgres 12.4 and POSTGIS="3.0.2 2fb2a18" [EXTENSION] PGSQL="120" GEOS="3.8.1-CAPI-1.13.3" SFCGAL="1.3.7" PROJ="7.1.1" GDAL="GDAL 3.1.3, released 2020/09/01" LIBXML="2.9.7" LIBJSON="0.13.1" LIBPROTOBUF="1.3.0" WAGYU="0.4.3 (Internal)" TOPOLOGY RASTER

(I have also tested with postgis 3.1 and latest form master)


For Os I use CentOS Linux release 8.2.2004 (Core) and kernel 4.18.0-193.19.1.el8_2.x86_64 on server with 64 “Intel(R) Xeon(R) Gold 6246R CPU @ 3.40GHz”


First I have restored 3 schemas with no errors the logs

388G data/sl/schema_grl_met_wisline_v2

49G data/sl/schema_met_vaer_dyrkbar

118G data/sl/schema_met_vaer_wisline


using this commands :

pg_restore -j 50 --format=d -d sl data/sl/schema_grl_met_wisline_v2/ 2> log/sl/restor_grl_met_wisline_v2.log

pg_restore -j 50 --format=d -d sl data/sl/schema_met_vaer_wisline/ 2> log/sl/restor_met_vaer_wisline.log

pg_restore -j 50 --format=d -d sl data/sl/schema_met_vaer_dyrkbar/ 2> log/sl/restor_met_vaer_dyrkbar.log


Then i start to restore the rest of data below.

744G data/sl/schema_sl_rest_ofd


Here I split the restore job in 3 parts, like below .


Command 1 below went OK and no errors in the client log file or on the server side.

pg_restore --section=pre-data -1 --format=d -d sl data/sl/schema_sl_rest_ofd/ 2> log/sl/restore_sl_rest_ofdata_pre_data.log


Command 2 below went OK according to logs the client side but not according logs on the server side and the last table with one more than one billion rows are empty.

pg_restore --section=data -j 50 --format=d -d sl data/sl/schema_sl_rest_ofd/ 2> log/sl/restore_sl_rest_ofdata_data.log


I do not understand why I do not have any errors in client log file, because on the server side I this.

< [local] postgres, sl, 2020-10-22 00:52:47.823 CEST >LOG: duration: 3417759.209 ms statement: COPY grl_sk.dtm10_norge_retiled (rid, rast, filename) FROM stdin;

< [local] postgres, sl, 2020-10-22 00:56:05.771 CEST >LOG: duration: 3615706.705 ms statement: COPY sl_sos.lsk_arome_v1 (flateid, point_uid_ref, epoch_hours, tg_l03, tg_l04, tswi_l03, tswi_l04

, tg_l05, tswi_l05, pr, t2m, sfcwindl, snw, twpr) FROM stdin;

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

free(): invalid pointer

< , , 2020-10-22 00:56:05.830 CEST >LOG: server process (PID 516912) was terminated by signal 6: Aborted

< , , 2020-10-22 00:56:05.830 CEST >LOG: terminating any other active server processes

< , , 2020-10-22 00:56:05.830 CEST >WARNING: terminating connection because of crash of another server process

< , , 2020-10-22 00:56:05.830 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< , , 2020-10-22 00:56:05.830 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< , , 2020-10-22 00:56:05.830 CEST >WARNING: terminating connection because of crash of another server process

< , , 2020-10-22 00:56:05.830 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< , , 2020-10-22 00:56:05.830 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< , , 2020-10-22 00:56:05.830 CEST >WARNING: terminating connection because of crash of another server process

< , , 2020-10-22 00:56:05.830 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< , , 2020-10-22 00:56:05.830 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< , , 2020-10-22 00:56:05.830 CEST >WARNING: terminating connection because of crash of another server process

< , , 2020-10-22 00:56:05.830 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< , , 2020-10-22 00:56:05.830 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< , , 2020-10-22 00:56:10.892 CEST >LOG: all server processes terminated; reinitializing

< , , 2020-10-22 00:56:11.324 CEST >LOG: database system was interrupted; last known up at 2020-10-22 00:52:33 CEST

< , , 2020-10-22 00:56:11.480 CEST >LOG: database system was not properly shut down; automatic recovery in progress

< , , 2020-10-22 00:56:11.521 CEST >LOG: redo starts at B9C/EA83BBF0

< , , 2020-10-22 00:56:35.074 CEST >LOG: redo done at BA0/FDFFEF48

< , , 2020-10-22 00:56:51.250 CEST >LOG: database system is ready to accept connections


I don't care about the last table, we dont't need that data,  so I run the last command below.

pg_restore --section=post-data -j 1 --format=d -d sl data/sl/schema_sl_rest_ofd/ 2> log/sl/restore_sl_rest_ofdata_post_data.log


Then I see this in the logs so I to run the above command multiple times.


First time :

< [local] postgres, sl, 2020-10-22 03:46:27.349 CEST >LOG: duration: 1746.370 ms statement: CREATE INDEX o_8_dtm10_norge_retiled_st_convexhull_idx ON grl_sk.o_8_dtm10_norge_retiled USING gist

(public.st_convexhull(rast));

free(): invalid pointer

free(): invalid pointer

< , , 2020-10-22 03:46:30.283 CEST >LOG: background worker "parallel worker" (PID 658814) was terminated by signal 6: Aborted

< , , 2020-10-22 03:46:30.283 CEST >LOG: terminating any other active server processes

< [local] postgres, sl, 2020-10-22 03:46:30.283 CEST >WARNING: terminating connection because of crash of another server process

< [local] postgres, sl, 2020-10-22 03:46:30.283 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< [local] postgres, sl, 2020-10-22 03:46:30.283 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< , , 2020-10-22 03:46:30.283 CEST >WARNING: terminating connection because of crash of another server process

< , , 2020-10-22 03:46:30.283 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< , , 2020-10-22 03:46:30.283 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< , , 2020-10-22 03:46:30.599 CEST >LOG: all server processes terminated; reinitializing

< , , 2020-10-22 03:46:31.019 CEST >LOG: database system was interrupted; last known up at 2020-10-22 03:41:23 CEST

< , , 2020-10-22 03:46:31.178 CEST >LOG: database system was not properly shut down; automatic recovery in progress

< , , 2020-10-22 03:46:31.224 CEST >LOG: redo starts at BAA/B07178F8

< , , 2020-10-22 03:46:37.209 CEST >LOG: redo done at BAC/7F3DFD70


Second time :

< [local] postgres, sl, 2020-10-22 08:13:08.882 CEST >LOG: duration: 417316.937 ms statement: CREATE INDEX geoidx_org_forestmap____ostl3_predictions ON org_forestmap.ostl3_predictions USING gi

st (geo);

< [local] munin, template1, 2020-10-22 08:13:36.797 CEST >FATAL: role "munin" does not exist

< [local] munin, template1, 2020-10-22 08:13:36.835 CEST >FATAL: role "munin" does not exist

< [local] postgres, sl_help, 2020-10-22 08:13:39.106 CEST >FATAL: database "sl_help" does not exist

< [local] postgres, sl_help, 2020-10-22 08:13:39.183 CEST >FATAL: database "sl_help" does not exist

< [local] munin, template1, 2020-10-22 08:13:44.607 CEST >FATAL: role "munin" does not exist

< [local] munin, template1, 2020-10-22 08:13:44.684 CEST >FATAL: role "munin" does not exist

< [local] munin, template1, 2020-10-22 08:18:36.225 CEST >FATAL: role "munin" does not exist

< [local] munin, template1, 2020-10-22 08:18:36.266 CEST >FATAL: role "munin" does not exist

< [local] postgres, sl_help, 2020-10-22 08:18:38.611 CEST >FATAL: database "sl_help" does not exist

< [local] postgres, sl_help, 2020-10-22 08:18:38.691 CEST >FATAL: database "sl_help" does not exist

< [local] munin, template1, 2020-10-22 08:18:44.234 CEST >FATAL: role "munin" does not exist

< [local] munin, template1, 2020-10-22 08:18:44.314 CEST >FATAL: role "munin" does not exist

free(): invalid pointer

free(): invalid pointer

< , , 2020-10-22 08:19:58.932 CEST >LOG: background worker "parallel worker" (PID 835249) was terminated by signal 6: Aborted

< , , 2020-10-22 08:19:58.932 CEST >LOG: terminating any other active server processes

< [local] postgres, sl, 2020-10-22 08:19:58.932 CEST >WARNING: terminating connection because of crash of another server process

< [local] postgres, sl, 2020-10-22 08:19:58.932 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< [local] postgres, sl, 2020-10-22 08:19:58.932 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< , , 2020-10-22 08:19:58.932 CEST >WARNING: terminating connection because of crash of another server process

< , , 2020-10-22 08:19:58.932 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< , , 2020-10-22 08:19:58.932 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< 192.168.229.48 postgres, sl, 2020-10-22 08:19:58.932 CEST >WARNING: terminating connection because of crash of another server process

< 192.168.229.48 postgres, sl, 2020-10-22 08:19:58.932 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< 192.168.229.48 postgres, sl, 2020-10-22 08:19:58.932 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< 192.168.229.48 postgres, sl, 2020-10-22 08:19:58.970 CEST >FATAL: the database system is in recovery mode

< , , 2020-10-22 08:19:59.041 CEST >LOG: all server processes terminated; reinitializing

< , , 2020-10-22 08:19:59.476 CEST >LOG: database system was interrupted; last known up at 2020-10-22 08:11:17 CEST

< , , 2020-10-22 08:20:01.901 CEST >LOG: database system was not properly shut down; automatic recovery in progress

< , , 2020-10-22 08:20:02.005 CEST >LOG: redo starts at BB1/AA4309A0

< , , 2020-10-22 08:20:03.517 CEST >LOG: invalid record length at BB1/FBF33FB0: wanted 24, got 0

< , , 2020-10-22 08:20:03.517 CEST >LOG: redo done at BB1/FBF33F78

< , , 2020-10-22 08:20:05.573 CEST >LOG: database system is ready to accept connections


Any idea about why this happens and what I can do to fix it ?


Do you believe Postgres 13 will work better ?


Thanks

Lars

В списке pgsql-admin по дате отправления:

Предыдущее
От: "David G. Johnston"
Дата:
Сообщение: Re: pgadmin help
Следующее
От: Lars Aksel Opsahl
Дата:
Сообщение: Re: free(): invalid pointer , when to dumping data to a new server