rare crash - FailedAssertion snapbuild.c Line: 580
От | Erik Rijkers |
---|---|
Тема | rare crash - FailedAssertion snapbuild.c Line: 580 |
Дата | |
Msg-id | 4e588d5a81a61c799353e1adf5a126bb@xs4all.nl обсуждение исходный текст |
Ответы |
Re: rare crash - FailedAssertion snapbuild.c Line: 580
|
Список | pgsql-hackers |
Hello, To test postgres 11, I still regularly run series of short sessions of pgbench-over-logical-replication (basically the same thing that I used last year [1] - now in a perl incarnation). Most of the time the replication is stable and finishes correctly but sometimes (rarely) I get: TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals(safeXid, snap->xmin))", File: "snapbuild.c", Line: 580) This will probably be difficult to reproduce and to act upon but I wanted to report it anyway as in the course of the last few months I have seen it several times, on several machines. Always rarely, always postgres 11 (I did not try other versions). Erik Rijkers [1] https://www.postgresql.org/message-id/3897361c7010c4ac03f358173adbcd60%40xs4all.nl source/version: bf2d0462cd73 / REL_11_STABLE (compiled 20180828_1629) 1 primary, 4 replicas (on 1 host). Basically: pgbench --port=6515 --quiet --initialize --scale=5 postgres then: pgbench -M prepared -c 93 -j 8 -T 10 -P 2 -p 6515 postgres then: wait for log-repl sync. (I also added extra data type columns to the basic pgbench tables to test all the different data types) Below is a grep for the assert message, with 15 surrounding lines from the latest occurrence. $ p=REL_11_STABLE; cd /var/data1/pg_stuff/tmp/cascade/$p ; grep -A 15 -B 15 -Ei 'trap|assert' 65*/logfile.65* | less 6516JTq_E8/logfile.6516-2018-08-29 14:28:13.747 CEST [139409] LOG: logical decoding found consistent point at 0/89E038E0 6516JTq_E8/logfile.6516-2018-08-29 14:28:13.747 CEST [139409] DETAIL: There are no running transactions. 6516JTq_E8/logfile.6516-2018-08-29 14:28:14.522 CEST [139539] LOG: received replication command: CREATE_REPLICATION_SLOT "sub2_6517_6517_18834_sync_18804" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT 6516JTq_E8/logfile.6516-2018-08-29 14:28:15.681 CEST [139539] LOG: logical decoding found initial starting point at 0/8AF5D590 6516JTq_E8/logfile.6516-2018-08-29 14:28:15.681 CEST [139539] DETAIL: Waiting for transactions (approximately 1) older than 886297 to end. 6516JTq_E8/logfile.6516-2018-08-29 14:28:15.720 CEST [139539] LOG: logical decoding found consistent point at 0/8B018310 6516JTq_E8/logfile.6516-2018-08-29 14:28:15.720 CEST [139539] DETAIL: There are no running transactions. 6516JTq_E8/logfile.6516-2018-08-29 14:28:15.860 CEST [139407] LOG: received replication command: START_REPLICATION SLOT "sub2_6517_6517_18834_sync_18793" LOGICAL 0/89E03918 (proto_version '1', publication_names '"pub1_6516"') 6516JTq_E8/logfile.6516-2018-08-29 14:28:15.861 CEST [139407] LOG: starting logical decoding for slot "sub2_6517_6517_18834_sync_18793" 6516JTq_E8/logfile.6516-2018-08-29 14:28:15.861 CEST [139407] DETAIL: Streaming transactions committing after 0/89E03918, reading WAL from 0/841EE960. 6516JTq_E8/logfile.6516-2018-08-29 14:28:16.064 CEST [139407] LOG: logical decoding found consistent point at 0/841EF3E0 6516JTq_E8/logfile.6516-2018-08-29 14:28:16.064 CEST [139407] DETAIL: Logical decoding will begin using saved snapshot. 6516JTq_E8/logfile.6516-2018-08-29 14:28:18.953 CEST [139541] LOG: received replication command: CREATE_REPLICATION_SLOT "sub2_6517_6517_18834_sync_18814" TEMPORARY LOGICAL pgoutput USE_SNAPSHOT 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.002 CEST [139541] LOG: logical decoding found consistent point at 0/8C6BA408 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.002 CEST [139541] DETAIL: There are no running transactions. 6516JTq_E8/logfile.6516:TRAP: FailedAssertion("!(TransactionIdPrecedesOrEquals(safeXid, snap->xmin))", File: "snapbuild.c", Line: 580) 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [130398] LOG: server process (PID 139541) was terminated by signal 6: Aborted 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [130398] DETAIL: Failed process was running: BEGIN READ ONLY ISOLATION LEVEL REPEATABLE READ 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [130398] LOG: terminating any other active server processes 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [139405] WARNING: terminating connection because of crash of another server process 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [139405] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [139405] HINT: In a moment you should be able to reconnect to the database and repeat your command. 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [139405] CONTEXT: slot "sub2_6517_6517", output plugin "pgoutput", in the change callback, associated LSN 0/8B11CF00 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [130411] WARNING: terminating connection because of crash of another server process 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [130411] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [130411] HINT: In a moment you should be able to reconnect to the database and repeat your command. 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [130405] WARNING: terminating connection because of crash of another server process 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [130405] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.112 CEST [130405] HINT: In a moment you should be able to reconnect to the database and repeat your command. 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.116 CEST [130398] LOG: all server processes terminated; reinitializing 6516JTq_E8/logfile.6516-2018-08-29 14:28:20.515 CEST [139548] LOG: database system was interrupted; last known up at 2018-08-29 14:26:14 CEST
В списке pgsql-hackers по дате отправления:
Предыдущее
От: Tom LaneДата:
Сообщение: Re: 10.5 but not 10.4: backend startup during reindex system: could not read block 0 in file "base/16400/..": read only 0 of 8192 bytes