Re: free(): invalid pointer , when to dumping data to a new server

Поиск

Список

Период

Сортировка

От	Lars Aksel Opsahl
Тема	Re: free(): invalid pointer , when to dumping data to a new server
Дата	22 октября 2020 г. 13:18:13
Msg-id	AM7P189MB1028756DDD017F9828C8FDE99D1D0@AM7P189MB1028.EURP189.PROD.OUTLOOK.COM обсуждение исходный текст
Ответ на	free(): invalid pointer , when to dumping data to a new server (Lars Aksel Opsahl <Lars.Opsahl@nibio.no>)
Ответы	Re: free(): invalid pointer , when to dumping data to a new server
Список	pgsql-admin

Дерево обсуждения

From: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>
Sent: Thursday, October 22, 2020 9:32 AM
To: pgsql-admin@lists.postgresql.org <pgsql-admin@lists.postgresql.org>
Subject: free(): invalid pointer , when to dumping data to a new server

>From: Lars Aksel Opsahl <Lars.Opsahl@nibio.no>Sent: Thursday, October 22, 2020 9:32 AMTo: pgsql-admin@lists.postgresql.org <pgsql-admin@lists.postgresql.org>Subject: free(): invalid pointer , when to dumping data to a new server

>Hi

>I have problem with a new server that crashes when restoring data to it. It's around 1.5 TB of raw data with both spatial and raster data.

>I use Postgres 12.4 and POSTGIS="3.0.2 2fb2a18" [EXTENSION] PGSQL="120" GEOS="3.8.1-CAPI-1.13.3" SFCGAL="1.3.7" PROJ="7.1.1" GDAL="GDAL 3.1.3, released 2020/09/01" LIBXML="2.9.7" LIBJSON="0.13.1" LIBPROTOBUF="1.3.0" WAGYU="0.4.3 (Internal)" TOPOLOGY RASTER

>(I have also tested with postgis 3.1 and latest form master)

>For Os I use CentOS Linux release 8.2.2004 (Core) and kernel 4.18.0-193.19.1.el8_2.x86_64 on server with 64 “Intel(R) Xeon(R) Gold 6246R CPU @ 3.40GHz”

>First I have restored 3 schemas with no errors the logs

>388G data/sl/schema_grl_met_wisline_v2

>49G data/sl/schema_met_vaer_dyrkbar

>118G data/sl/schema_met_vaer_wisline

>using this commands :

>pg_restore -j 50 --format=d -d sl data/sl/schema_grl_met_wisline_v2/ 2> log/sl/restor_grl_met_wisline_v2.log

>pg_restore -j 50 --format=d -d sl data/sl/schema_met_vaer_wisline/ 2> log/sl/restor_met_vaer_wisline.log

>pg_restore -j 50 --format=d -d sl data/sl/schema_met_vaer_dyrkbar/ 2> log/sl/restor_met_vaer_dyrkbar.log

>Then i start to restore the rest of data below.

>744G data/sl/schema_sl_rest_ofd

>Here I split the restore job in 3 parts, like below .

>Command 1 below went OK and no errors in the client log file or on the server side.

>pg_restore --section=pre-data -1 --format=d -d sl data/sl/schema_sl_rest_ofd/ 2> log/sl/restore_sl_rest_ofdata_pre_data.log

>Command 2 below went OK according to logs the client side but not according logs on the server side and the last table with one more than one billion rows are empty.

>pg_restore --section=data -j 50 --format=d -d sl data/sl/schema_sl_rest_ofd/ 2> log/sl/restore_sl_rest_ofdata_data.log

>I do not understand why I do not have any errors in client log file, because on the server side I this.

>< [local] postgres, sl, 2020-10-22 00:52:47.823 CEST >LOG: duration: 3417759.209 ms statement: COPY grl_sk.dtm10_norge_retiled (rid, rast, filename) FROM stdin;

>< [local] postgres, sl, 2020-10-22 00:56:05.771 CEST >LOG: duration: 3615706.705 ms statement: COPY sl_sos.lsk_arome_v1 (flateid, point_uid_ref, epoch_hours, tg_l03, tg_l04, tswi_l03, tswi_l04

>, tg_l05, tswi_l05, pr, t2m, sfcwindl, snw, twpr) FROM stdin;

>free(): invalid pointer

>< , , 2020-10-22 00:56:05.830 CEST >LOG: server process (PID 516912) was terminated by signal 6: Aborted

>< , , 2020-10-22 00:56:05.830 CEST >LOG: terminating any other active server processes

>< , , 2020-10-22 00:56:05.830 CEST >WARNING: terminating connection because of crash of another server process

>< , , 2020-10-22 00:56:05.830 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

>< , , 2020-10-22 00:56:05.830 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

>< , , 2020-10-22 00:56:05.830 CEST >WARNING: terminating connection because of crash of another server process

>< , , 2020-10-22 00:56:05.830 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

>< , , 2020-10-22 00:56:05.830 CEST >WARNING: terminating connection because of crash of another server process

>< , , 2020-10-22 00:56:05.830 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

>< , , 2020-10-22 00:56:05.830 CEST >WARNING: terminating connection because of crash of another server process

>< , , 2020-10-22 00:56:05.830 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

>< , , 2020-10-22 00:56:10.892 CEST >LOG: all server processes terminated; reinitializing

>< , , 2020-10-22 00:56:11.324 CEST >LOG: database system was interrupted; last known up at 2020-10-22 00:52:33 CEST

>< , , 2020-10-22 00:56:11.480 CEST >LOG: database system was not properly shut down; automatic recovery in progress

>< , , 2020-10-22 00:56:11.521 CEST >LOG: redo starts at B9C/EA83BBF0

>< , , 2020-10-22 00:56:35.074 CEST >LOG: redo done at BA0/FDFFEF48

>< , , 2020-10-22 00:56:51.250 CEST >LOG: database system is ready to accept connections

>I don't care about the last table, we dont't need that data, so I run the last command below.

>pg_restore --section=post-data -j 1 --format=d -d sl data/sl/schema_sl_rest_ofd/ 2> log/sl/restore_sl_rest_ofdata_post_data.log

>Then I see this in the logs so I to run the above command multiple times.

>First time :

>< [local] postgres, sl, 2020-10-22 03:46:27.349 CEST >LOG: duration: 1746.370 ms statement: CREATE INDEX o_8_dtm10_norge_retiled_st_convexhull_idx ON grl_sk.o_8_dtm10_norge_retiled USING gist

>(public.st_convexhull(rast));

>free(): invalid pointer

>< , , 2020-10-22 03:46:30.283 CEST >LOG: background worker "parallel worker" (PID 658814) was terminated by signal 6: Aborted

>< , , 2020-10-22 03:46:30.283 CEST >LOG: terminating any other active server processes

>< [local] postgres, sl, 2020-10-22 03:46:30.283 CEST >WARNING: terminating connection because of crash of another server process

>< [local] postgres, sl, 2020-10-22 03:46:30.283 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

>< [local] postgres, sl, 2020-10-22 03:46:30.283 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

>< , , 2020-10-22 03:46:30.283 CEST >WARNING: terminating connection because of crash of another server process

>< , , 2020-10-22 03:46:30.283 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

>< , , 2020-10-22 03:46:30.283 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

>< , , 2020-10-22 03:46:30.599 CEST >LOG: all server processes terminated; reinitializing

>< , , 2020-10-22 03:46:31.019 CEST >LOG: database system was interrupted; last known up at 2020-10-22 03:41:23 CEST

>< , , 2020-10-22 03:46:31.178 CEST >LOG: database system was not properly shut down; automatic recovery in progress

>< , , 2020-10-22 03:46:31.224 CEST >LOG: redo starts at BAA/B07178F8

>< , , 2020-10-22 03:46:37.209 CEST >LOG: redo done at BAC/7F3DFD70

>Second time :

>< [local] postgres, sl, 2020-10-22 08:13:08.882 CEST >LOG: duration: 417316.937 ms statement: CREATE INDEX geoidx_org_forestmap____ostl3_predictions ON org_forestmap.ostl3_predictions USING gi

>st (geo);

>< [local] munin, template1, 2020-10-22 08:13:36.797 CEST >FATAL: role "munin" does not exist

>< [local] munin, template1, 2020-10-22 08:13:36.835 CEST >FATAL: role "munin" does not exist

>< [local] postgres, sl_help, 2020-10-22 08:13:39.106 CEST >FATAL: database "sl_help" does not exist

>< [local] postgres, sl_help, 2020-10-22 08:13:39.183 CEST >FATAL: database "sl_help" does not exist

>< [local] munin, template1, 2020-10-22 08:13:44.607 CEST >FATAL: role "munin" does not exist

>< [local] munin, template1, 2020-10-22 08:13:44.684 CEST >FATAL: role "munin" does not exist

>< [local] munin, template1, 2020-10-22 08:18:36.225 CEST >FATAL: role "munin" does not exist

>< [local] munin, template1, 2020-10-22 08:18:36.266 CEST >FATAL: role "munin" does not exist

>< [local] postgres, sl_help, 2020-10-22 08:18:38.611 CEST >FATAL: database "sl_help" does not exist

>< [local] postgres, sl_help, 2020-10-22 08:18:38.691 CEST >FATAL: database "sl_help" does not exist

>< [local] munin, template1, 2020-10-22 08:18:44.234 CEST >FATAL: role "munin" does not exist

>< [local] munin, template1, 2020-10-22 08:18:44.314 CEST >FATAL: role "munin" does not exist

>free(): invalid pointer

>< , , 2020-10-22 08:19:58.932 CEST >LOG: background worker "parallel worker" (PID 835249) was terminated by signal 6: Aborted

>< , , 2020-10-22 08:19:58.932 CEST >LOG: terminating any other active server processes

>< [local] postgres, sl, 2020-10-22 08:19:58.932 CEST >WARNING: terminating connection because of crash of another server process

>< [local] postgres, sl, 2020-10-22 08:19:58.932 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

>< [local] postgres, sl, 2020-10-22 08:19:58.932 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

>< , , 2020-10-22 08:19:58.932 CEST >WARNING: terminating connection because of crash of another server process

>< , , 2020-10-22 08:19:58.932 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

>< , , 2020-10-22 08:19:58.932 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

>< 192.168.229.48 postgres, sl, 2020-10-22 08:19:58.932 CEST >WARNING: terminating connection because of crash of another server process

>< 192.168.229.48 postgres, sl, 2020-10-22 08:19:58.932 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

>< 192.168.229.48 postgres, sl, 2020-10-22 08:19:58.932 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

>< 192.168.229.48 postgres, sl, 2020-10-22 08:19:58.970 CEST >FATAL: the database system is in recovery mode

>< , , 2020-10-22 08:19:59.041 CEST >LOG: all server processes terminated; reinitializing

>< , , 2020-10-22 08:19:59.476 CEST >LOG: database system was interrupted; last known up at 2020-10-22 08:11:17 CEST

>< , , 2020-10-22 08:20:01.901 CEST >LOG: database system was not properly shut down; automatic recovery in progress

>< , , 2020-10-22 08:20:02.005 CEST >LOG: redo starts at BB1/AA4309A0

>< , , 2020-10-22 08:20:03.517 CEST >LOG: invalid record length at BB1/FBF33FB0: wanted 24, got 0

>< , , 2020-10-22 08:20:03.517 CEST >LOG: redo done at BB1/FBF33F78

>< , , 2020-10-22 08:20:05.573 CEST >LOG: database system is ready to accept connections

>Any idea about why this happens and what I can do to fix it ?

>Do you believe Postgres 13 will work better ?

>Thanks

>Lars

I did a test with Postgres 13 and right after running this cmd "pg_restore --section=pre-data -1 --format=d -d sl data/sl/schema_sl_rest_ofd" where the rest of tables and schemes was created with no errors and before I was running this cmd "pg_restore --section=data -j 50 --format=d -d sl data/sl/schema_sl_rest_ofd" I found errors in the server log.

(The table grl_met_wisline_v2.nora_bc25_observation_200912 was with no errors two hours before.

"LOG: duration: 589215.382 ms statement: COPY grl_met_wisline_v2.nora_bc25_observation_200912 (point_uid_ref, epoch_hours, tg_l03, tg_l04, tswi_l03, tswi_l04, tg_l05, tswi_l05, pr, t2m, sfcwindl, snw, twpr) FROM stdin")

I the server log I found this

< , , 2020-10-22 14:23:44.167 CEST >LOG: server process (PID 1085979) was terminated by signal 6: Aborted

< , , 2020-10-22 14:23:44.167 CEST >LOG: terminating any other active server processes

< , , 2020-10-22 14:23:44.167 CEST >WARNING: terminating connection because of crash of another server process

< , , 2020-10-22 14:23:44.167 CEST >DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.

< , , 2020-10-22 14:23:44.167 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< , , 2020-10-22 14:23:44.167 CEST >CONTEXT: while scanning block 393158 of relation "grl_met_wisline_v2.nora_bc25_observation_201001"

< , , 2020-10-22 14:23:44.167 CEST >WARNING: terminating connection because of crash of another server process

< , , 2020-10-22 14:23:44.167 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< , , 2020-10-22 14:23:44.167 CEST >CONTEXT: while scanning block 59409 of relation "grl_met_wisline_v2.nora_bc25_observation_200912"

< , , 2020-10-22 14:23:44.167 CEST >WARNING: terminating connection because of crash of another server process

< , , 2020-10-22 14:23:44.167 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< , , 2020-10-22 14:23:44.167 CEST >CONTEXT: while scanning block 51913 of relation "grl_met_wisline_v2.nora_bc25_observation_200910"

< , , 2020-10-22 14:23:44.167 CEST >WARNING: terminating connection because of crash of another server process

< , , 2020-10-22 14:23:44.167 CEST >HINT: In a moment you should be able to reconnect to the database and repeat your command.

< , , 2020-10-22 14:23:44.185 CEST >LOG: all server processes terminated; reinitializing

< , , 2020-10-22 14:23:44.616 CEST >LOG: database system was interrupted; last known up at 2020-10-22 14:23:21 CEST

< , , 2020-10-22 14:23:44.782 CEST >LOG: database system was not properly shut down; automatic recovery in progress

< , , 2020-10-22 14:23:44.823 CEST >LOG: redo starts at 146/A2485468

< , , 2020-10-22 14:23:46.035 CEST >LOG: redo done at 146/A83A7F98

< , , 2020-10-22 14:23:47.043 CEST >LOG: database system is ready to accept connections

Lars

The

В списке pgsql-admin по дате отправления:

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Re: free(): invalid pointer , when to dumping data to a new server