BUG #16832: Interrupted system call when working with large data tables

Поиск
Список
Период
Сортировка
От PG Bug reporting form
Тема BUG #16832: Interrupted system call when working with large data tables
Дата
Msg-id 16832-943d33fd58eb26e4@postgresql.org
обсуждение исходный текст
Ответы Re: BUG #16832: Interrupted system call when working with large data tables  (Andres Freund <andres@anarazel.de>)
Список pgsql-bugs
The following bug has been logged on the website:

Bug reference:      16832
Logged by:          Kimon Krenz
Email address:      kimon.krenz@part9.com
PostgreSQL version: 13.1
Operating system:   macOS Big Sur v. 11.0.1
Description:

I've recently updated macOS to Big Sur, and simultaneously to PostgreSQL
13.1.
Since these updates PostgreSQL keeps throwing 'Interrupted system call'
errors in three instances, which produce similar 'could not open file
pg_wal/" errors (see below 1.,2. and 3.).
The problem might be linked to BUG #16827: macOS interrupted syscall leads
to a crash.

Unfortunately, the 'Interrupted system call' error occurs frequently, but
inconsistently when using the same table. The only common demoninator is
that the error more frequent when large tables i.e. +4GB are employed. I
have unsuccesfully spend sometime on a reproducable case, this is mainly
because the same table does not always lead to an error.

1. When VACUUM FULL on entire database or single large tables: 

terminal command:

udl=# VACUUM (FULL,VERBOSE);
INFO:  vacuuming "itn_2007.roadlink_ms"
INFO:  "roadlink_ms": found 0 removable, 4005223 nonremovable row versions
in 279046 pages
DETAIL:  0 dead row versions cannot be removed yet.
CPU: user: 6.03 s, system: 11.61 s, elapsed: 36.56 s.
WARNING:  terminating connection because of crash of another server
process
DETAIL:  The postmaster has commanded this server process to roll back the
current transaction and exit, because another server process exited
abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and
repeat your command.
server closed the connection unexpectedly
    This probably means the server terminated abnormally
    before or while processing the request.
The connection to the server was lost. Attempting reset: Failed.
!?> 

postgres.log:
2021-01-21 21:14:45.470 GMT [58691] PANIC:  XX000: could not open file
"pg_wal/00000001000000AC00000045": Interrupted system call
2021-01-21 21:14:45.470 GMT [58691] LOCATION:  XLogFileInit, xlog.c:3277
2021-01-21 21:14:45.471 GMT [31083] LOG:  00000: WAL writer process (PID
58691) was terminated by signal 6: Abort trap: 6
2021-01-21 21:14:45.471 GMT [31083] LOCATION:  LogChildExit,
postmaster.c:3753
2021-01-21 21:14:45.471 GMT [31083] LOG:  00000: terminating any other
active server processes
2021-01-21 21:14:45.471 GMT [31083] LOCATION:  HandleChildCrash,
postmaster.c:3474
2021-01-21 21:14:45.471 GMT [58993] WARNING:  57P02: terminating connection
because of crash of another server process
2021-01-21 21:14:45.471 GMT [58993] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2021-01-21 21:14:45.471 GMT [58993] HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2021-01-21 21:14:45.471 GMT [58993] LOCATION:  quickdie, postgres.c:2802
2021-01-21 21:14:45.471 GMT [59697] WARNING:  57P02: terminating connection
because of crash of another server process
2021-01-21 21:14:45.471 GMT [59697] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.

OR

2021-01-21 21:51:51.093 GMT [60106] PANIC:  XX000: could not open file
"pg_wal/00000001000000B700000079": Interrupted system call
2021-01-21 21:51:51.093 GMT [60106] LOCATION:  XLogFileInit, xlog.c:3277
2021-01-21 21:51:51.094 GMT [31083] LOG:  00000: WAL writer process (PID
60106) was terminated by signal 6: Abort trap: 6
2021-01-21 21:51:51.094 GMT [31083] LOCATION:  LogChildExit,
postmaster.c:3753
2021-01-21 21:51:51.094 GMT [31083] LOG:  00000: terminating any other
active server processes
2021-01-21 21:51:51.094 GMT [31083] LOCATION:  HandleChildCrash,
postmaster.c:3474
2021-01-21 21:51:51.095 GMT [60228] WARNING:  57P02: terminating connection
because of crash of another server process
2021-01-21 21:51:51.095 GMT [60228] DETAIL:  The postmaster has commanded
this server process to roll back the current transaction and exit, because
another server process exited abnormally and possibly corrupted shared
memory.
2021-01-21 21:51:51.095 GMT [60228] HINT:  In a moment you should be able to
reconnect to the database and repeat your command.
2021-01-21 21:51:51.095 GMT [60228] LOCATION:  quickdie, postgres.c:2802

2. When performing a simple inner join (table1 4.5GB, table2 900MB):

udl=# drop table if exists itn_2019.roadlink_msn_2;
    create table itn_2019.roadlink_msn_2 as
    select *
    from 
    itn_2019.roadlink_ms left join itn_2019.road_names
    ON fid = fid2;


2021-01-21 21:01:51.545 GMT [58711] ERROR:  XX000: could not open temporary
file "base/pgsql_tmp/pgsql_tmp58711.0.sharedfileset/i2of128.p0.0":
Interrupted system call
2021-01-21 21:01:51.545 GMT [58711] LOCATION:  PathNameOpenTemporaryFile,
fd.c:1764
2021-01-21 21:01:51.545 GMT [58711] STATEMENT:  drop table if exists
itn_2019.roadlink_msn_2;
    create table itn_2019.roadlink_msn_2 as
    select *
    from 
    itn_2019.roadlink_ms left join itn_2019.road_names
    ON fid = fid2;
2021-01-21 21:01:51.546 GMT [59276] ERROR:  XX000: could not open temporary
file "base/pgsql_tmp/pgsql_tmp58711.0.sharedfileset/i1of128.p2.0":
Interrupted system call
2021-01-21 21:01:51.546 GMT [59276] LOCATION:  PathNameOpenTemporaryFile,
fd.c:1764
2021-01-21 21:01:51.546 GMT [59276] STATEMENT:  drop table if exists
itn_2019.roadlink_msn_2;
    create table itn_2019.roadlink_msn_2 as
    select *
    from 
    itn_2019.roadlink_ms left join itn_2019.road_names
    ON fid = fid2;
2021-01-21 21:01:51.548 GMT [31083] LOG:  00000: background worker "parallel
worker" (PID 59276) exited with exit code 1
2021-01-21 21:01:51.548 GMT [31083] LOCATION:  LogChildExit,
postmaster.c:3731

3. When restoring entire database (150GB) using pg_restore. I ended up
restoring every table manually from the dump tar file, as the same table
sometimes restored without a problem and sometimes threw first an
'Interrupted system call' error, but the second or third time was restored
without problems.

Best,
Kimon


В списке pgsql-bugs по дате отправления:

Предыдущее
От: Tom Lane
Дата:
Сообщение: Re: Bug in error reporting for multi-line JSON
Следующее
От: Jon Snell
Дата:
Сообщение: segmentation fault in pg_restore with corrupt file