Обсуждение: BUG #16832: Interrupted system call when working with large data tables
BUG #16832: Interrupted system call when working with large data tables
От
PG Bug reporting form
Дата:
The following bug has been logged on the website: Bug reference: 16832 Logged by: Kimon Krenz Email address: kimon.krenz@part9.com PostgreSQL version: 13.1 Operating system: macOS Big Sur v. 11.0.1 Description: I've recently updated macOS to Big Sur, and simultaneously to PostgreSQL 13.1. Since these updates PostgreSQL keeps throwing 'Interrupted system call' errors in three instances, which produce similar 'could not open file pg_wal/" errors (see below 1.,2. and 3.). The problem might be linked to BUG #16827: macOS interrupted syscall leads to a crash. Unfortunately, the 'Interrupted system call' error occurs frequently, but inconsistently when using the same table. The only common demoninator is that the error more frequent when large tables i.e. +4GB are employed. I have unsuccesfully spend sometime on a reproducable case, this is mainly because the same table does not always lead to an error. 1. When VACUUM FULL on entire database or single large tables: terminal command: udl=# VACUUM (FULL,VERBOSE); INFO: vacuuming "itn_2007.roadlink_ms" INFO: "roadlink_ms": found 0 removable, 4005223 nonremovable row versions in 279046 pages DETAIL: 0 dead row versions cannot be removed yet. CPU: user: 6.03 s, system: 11.61 s, elapsed: 36.56 s. WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: Failed. !?> postgres.log: 2021-01-21 21:14:45.470 GMT [58691] PANIC: XX000: could not open file "pg_wal/00000001000000AC00000045": Interrupted system call 2021-01-21 21:14:45.470 GMT [58691] LOCATION: XLogFileInit, xlog.c:3277 2021-01-21 21:14:45.471 GMT [31083] LOG: 00000: WAL writer process (PID 58691) was terminated by signal 6: Abort trap: 6 2021-01-21 21:14:45.471 GMT [31083] LOCATION: LogChildExit, postmaster.c:3753 2021-01-21 21:14:45.471 GMT [31083] LOG: 00000: terminating any other active server processes 2021-01-21 21:14:45.471 GMT [31083] LOCATION: HandleChildCrash, postmaster.c:3474 2021-01-21 21:14:45.471 GMT [58993] WARNING: 57P02: terminating connection because of crash of another server process 2021-01-21 21:14:45.471 GMT [58993] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2021-01-21 21:14:45.471 GMT [58993] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2021-01-21 21:14:45.471 GMT [58993] LOCATION: quickdie, postgres.c:2802 2021-01-21 21:14:45.471 GMT [59697] WARNING: 57P02: terminating connection because of crash of another server process 2021-01-21 21:14:45.471 GMT [59697] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. OR 2021-01-21 21:51:51.093 GMT [60106] PANIC: XX000: could not open file "pg_wal/00000001000000B700000079": Interrupted system call 2021-01-21 21:51:51.093 GMT [60106] LOCATION: XLogFileInit, xlog.c:3277 2021-01-21 21:51:51.094 GMT [31083] LOG: 00000: WAL writer process (PID 60106) was terminated by signal 6: Abort trap: 6 2021-01-21 21:51:51.094 GMT [31083] LOCATION: LogChildExit, postmaster.c:3753 2021-01-21 21:51:51.094 GMT [31083] LOG: 00000: terminating any other active server processes 2021-01-21 21:51:51.094 GMT [31083] LOCATION: HandleChildCrash, postmaster.c:3474 2021-01-21 21:51:51.095 GMT [60228] WARNING: 57P02: terminating connection because of crash of another server process 2021-01-21 21:51:51.095 GMT [60228] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2021-01-21 21:51:51.095 GMT [60228] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2021-01-21 21:51:51.095 GMT [60228] LOCATION: quickdie, postgres.c:2802 2. When performing a simple inner join (table1 4.5GB, table2 900MB): udl=# drop table if exists itn_2019.roadlink_msn_2; create table itn_2019.roadlink_msn_2 as select * from itn_2019.roadlink_ms left join itn_2019.road_names ON fid = fid2; 2021-01-21 21:01:51.545 GMT [58711] ERROR: XX000: could not open temporary file "base/pgsql_tmp/pgsql_tmp58711.0.sharedfileset/i2of128.p0.0": Interrupted system call 2021-01-21 21:01:51.545 GMT [58711] LOCATION: PathNameOpenTemporaryFile, fd.c:1764 2021-01-21 21:01:51.545 GMT [58711] STATEMENT: drop table if exists itn_2019.roadlink_msn_2; create table itn_2019.roadlink_msn_2 as select * from itn_2019.roadlink_ms left join itn_2019.road_names ON fid = fid2; 2021-01-21 21:01:51.546 GMT [59276] ERROR: XX000: could not open temporary file "base/pgsql_tmp/pgsql_tmp58711.0.sharedfileset/i1of128.p2.0": Interrupted system call 2021-01-21 21:01:51.546 GMT [59276] LOCATION: PathNameOpenTemporaryFile, fd.c:1764 2021-01-21 21:01:51.546 GMT [59276] STATEMENT: drop table if exists itn_2019.roadlink_msn_2; create table itn_2019.roadlink_msn_2 as select * from itn_2019.roadlink_ms left join itn_2019.road_names ON fid = fid2; 2021-01-21 21:01:51.548 GMT [31083] LOG: 00000: background worker "parallel worker" (PID 59276) exited with exit code 1 2021-01-21 21:01:51.548 GMT [31083] LOCATION: LogChildExit, postmaster.c:3731 3. When restoring entire database (150GB) using pg_restore. I ended up restoring every table manually from the dump tar file, as the same table sometimes restored without a problem and sometimes threw first an 'Interrupted system call' error, but the second or third time was restored without problems. Best, Kimon
Hi, On 2021-01-21 22:16:55 +0000, PG Bug reporting form wrote: > I've recently updated macOS to Big Sur, and simultaneously to PostgreSQL > 13.1. > Since these updates PostgreSQL keeps throwing 'Interrupted system call' > errors in three instances, which produce similar 'could not open file > pg_wal/" errors (see below 1.,2. and 3.). > The problem might be linked to BUG #16827: macOS interrupted syscall leads > to a crash. There is additional information in another bug report at https://postgr.es/m/16827-7606aeb21d38c228%40postgresql.org I don't really know what to do here short term - adding EINTR handling to syscalls that traditionally never had returned EINTR (which used to only happen for "blocking" system calls) will be a fair amount of work. I'll also respond in the other thread, CCing you, as there's more information there. Greetings, Andres Freund