Обсуждение: PANIC: could not flush dirty data: Operation not permitted power8,Redhat Centos
PANIC: could not flush dirty data: Operation not permitted power8,Redhat Centos
От
reiner peterke
Дата:
Hi All,
We build Postgres on Power and x86 With the latest Postgres 11 release (11.2) we get error on
power8 ppc64le (Redhat and CentOS). No error on SUSE on power8
No error on x86_64 (RH, Centos and SUSE)
from the log file
2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: listening on IPv4 address "0.0.0.0", port 5432
2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: listening on IPv6 address "::", port 5432
2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2019-04-09 12:30:10 UTC pid:204 xid:0 ip: LOG: database system was shut down at 2019-04-09 12:27:09 UTC
2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: database system is ready to accept connections
2019-04-09 12:31:46 UTC pid:203 xid:0 ip: LOG: received SIGHUP, reloading configuration files
2019-04-09 12:35:10 UTC pid:205 xid:0 ip: PANIC: could not flush dirty data: Operation not permitted
2019-04-09 12:35:10 UTC pid:203 xid:0 ip: LOG: checkpointer process (PID 205) was terminated by signal 6: Aborted
2019-04-09 12:35:10 UTC pid:203 xid:0 ip: LOG: terminating any other active server processes
2019-04-09 12:35:10 UTC pid:208 xid:0 ip: WARNING: terminating connection because of crash of another server process
2019-04-09 12:35:10 UTC pid:208 xid:0 ip: DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2019-04-09 12:35:10 UTC pid:208 xid:0 ip: HINT: In a moment you should be able to reconnect to the database and repeat your command.
2019-04-09 12:35:10 UTC pid:203 xid:0 ip: LOG: all server processes terminated; reinitializing
2019-04-09 12:35:10 UTC pid:224 xid:0 ip: LOG: database system was interrupted; last known up at 2019-04-09 12:30:10 UTC
2019-04-09 12:35:10 UTC pid:224 xid:0 ip: PANIC: could not flush dirty data: Operation not permitted
2019-04-09 12:35:10 UTC pid:203 xid:0 ip: LOG: startup process (PID 224) was terminated by signal 6: Aborted
2019-04-09 12:35:10 UTC pid:203 xid:0 ip: LOG: aborting startup due to startup process failure
2019-04-09 12:35:10 UTC pid:203 xid:0 ip: LOG: database system is shut down
2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: listening on IPv6 address "::", port 5432
2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: listening on Unix socket "/tmp/.s.PGSQL.5432"
2019-04-09 12:30:10 UTC pid:204 xid:0 ip: LOG: database system was shut down at 2019-04-09 12:27:09 UTC
2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: database system is ready to accept connections
2019-04-09 12:31:46 UTC pid:203 xid:0 ip: LOG: received SIGHUP, reloading configuration files
2019-04-09 12:35:10 UTC pid:205 xid:0 ip: PANIC: could not flush dirty data: Operation not permitted
2019-04-09 12:35:10 UTC pid:203 xid:0 ip: LOG: checkpointer process (PID 205) was terminated by signal 6: Aborted
2019-04-09 12:35:10 UTC pid:203 xid:0 ip: LOG: terminating any other active server processes
2019-04-09 12:35:10 UTC pid:208 xid:0 ip: WARNING: terminating connection because of crash of another server process
2019-04-09 12:35:10 UTC pid:208 xid:0 ip: DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
2019-04-09 12:35:10 UTC pid:208 xid:0 ip: HINT: In a moment you should be able to reconnect to the database and repeat your command.
2019-04-09 12:35:10 UTC pid:203 xid:0 ip: LOG: all server processes terminated; reinitializing
2019-04-09 12:35:10 UTC pid:224 xid:0 ip: LOG: database system was interrupted; last known up at 2019-04-09 12:30:10 UTC
2019-04-09 12:35:10 UTC pid:224 xid:0 ip: PANIC: could not flush dirty data: Operation not permitted
2019-04-09 12:35:10 UTC pid:203 xid:0 ip: LOG: startup process (PID 224) was terminated by signal 6: Aborted
2019-04-09 12:35:10 UTC pid:203 xid:0 ip: LOG: aborting startup due to startup process failure
2019-04-09 12:35:10 UTC pid:203 xid:0 ip: LOG: database system is shut down
from pg_config
pg_config output
BINDIR = /usr/local/postgres/11/bin
DOCDIR = /usr/local/postgres/11/share/doc
HTMLDIR = /usr/local/postgres/11/share/doc
INCLUDEDIR = /usr/local/postgres/11/include
PKGINCLUDEDIR = /usr/local/postgres/11/include
INCLUDEDIR-SERVER = /usr/local/postgres/11/include/server
LIBDIR = /usr/local/postgres/11/lib
PKGLIBDIR = /usr/local/postgres/11/lib
LOCALEDIR = /usr/local/postgres/11/share/locale
MANDIR = /usr/local/postgres/11/share/man
SHAREDIR = /usr/local/postgres/11/share
SYSCONFDIR = /usr/local/postgres/etc
PGXS = /usr/local/postgres/11/lib/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--with-tclconfig=/usr/lib64' '--with-perl' '--with-python' '--with-tcl' '--with-openssl' '--with-pam' '--with-gssapi' '--enable-nls' '--with-libxml' '--with-libxslt' '--with-ldap' '--prefix=/usr/local/postgres/11' 'CFLAGS=-O3 -g -pipe -Wall -D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -m64 -mcpu=power8 -mtune=power8 -DLINUX_OOM_SCORE_ADJ=0' '--with-libs=/usr/lib' '--with-includes=/usr/include' '--with-uuid=e2fs' '--sysconfdir=/usr/local/postgres/etc' '--with-llvm' 'PKG_CONFIG_PATH=:/usr/lib64/pkgconfig:/usr/share/pkgconfig'
CC = gcc
CPPFLAGS = -D_GNU_SOURCE -I/usr/include/libxml2 -I/usr/include
CFLAGS = -Wall -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement -Wendif-labels -Wmissing-format-attribute -Wformat-security -fno-strict-aliasing -fwrapv -fexcess-precision=standard -O3 -g -pipe -Wall -D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -m64 -mcpu=power8 -mtune=power8 -DLINUX_OOM_SCORE_ADJ=0
CFLAGS_SL = -fPIC
LDFLAGS = -L/usr/local/lib -L/usr/lib -Wl,--as-needed -Wl,-rpath,'/usr/local/postgres/11/lib',--enable-new-dtags
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgcommon -lpgport -lpthread -lxslt -lxml2 -lpam -lssl -lcrypto -lgssapi_krb5 -lz -lreadline -lrt -lcrypt -ldl -lm
VERSION = PostgreSQL 11.2
I get the feeling this is related to the fsync() issue.
why is it happening on Power RH and CentOS, but not on the other platforms?
Let me know if i need to provide any more information.
Reiner
Вложения
Re: PANIC: could not flush dirty data: Operation not permittedpower8, Redhat Centos
От
Andres Freund
Дата:
Hi, On 2019-04-12 20:04:00 +0200, reiner peterke wrote: > We build Postgres on Power and x86 With the latest Postgres 11 release (11.2) we get error on > power8 ppc64le (Redhat and CentOS). No error on SUSE on power8 > 2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: listening on IPv4 address "0.0.0.0", port 5432 > 2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: listening on IPv6 address "::", port 5432 > 2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" > 2019-04-09 12:30:10 UTC pid:204 xid:0 ip: LOG: database system was shut down at 2019-04-09 12:27:09 UTC > 2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: database system is ready to accept connections > 2019-04-09 12:31:46 UTC pid:203 xid:0 ip: LOG: received SIGHUP, reloading configuration files > 2019-04-09 12:35:10 UTC pid:205 xid:0 ip: PANIC: could not flush dirty data: Operation not permitted > 2019-04-09 12:35:10 UTC pid:203 xid:0 ip: LOG: checkpointer process (PID 205) was terminated by signal 6: Aborted Any chance you can strace this? Because I don't understand how you'd get a permission error here. > I get the feeling this is related to the fsync() issue. > why is it happening on Power RH and CentOS, but not on the other platforms? Yea, the PANIC is due to various OSs, including linux, basically feeling free to discard any diryt data after any integrity related calls fail (we could narrow it down, but it's hard, given the variability between versions). That is, if they signal such issues at all :( Greetings, Andres Freund
Andres Freund <andres@anarazel.de> writes: > On 2019-04-12 20:04:00 +0200, reiner peterke wrote: >> We build Postgres on Power and x86 With the latest Postgres 11 release (11.2) we get error on >> power8 ppc64le (Redhat and CentOS). No error on SUSE on power8 > Any chance you can strace this? Because I don't understand how you'd get > a permission error here. What kind of filesystem are the database files on? regards, tom lane
Re: PANIC: could not flush dirty data: Operation not permittedpower8, Redhat Centos
От
Thomas Munro
Дата:
On Sat, Apr 13, 2019 at 7:23 AM Andres Freund <andres@anarazel.de> wrote: > On 2019-04-12 20:04:00 +0200, reiner peterke wrote: > > We build Postgres on Power and x86 With the latest Postgres 11 release (11.2) we get error on > > power8 ppc64le (Redhat and CentOS). No error on SUSE on power8 Huh, I wonder what is different. I don't see this on EDB's CentOS 7.1 POWER8 system with an XFS filesystem. I ran it under strace -f and saw this: [pid 51614] sync_file_range2(0x19, 0x2, 0x8000, 0x2000, 0x2, 0x8) = 0 > > 2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: listening on IPv4 address "0.0.0.0", port 5432 > > 2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: listening on IPv6 address "::", port 5432 > > 2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" > > 2019-04-09 12:30:10 UTC pid:204 xid:0 ip: LOG: database system was shut down at 2019-04-09 12:27:09 UTC > > 2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: database system is ready to accept connections > > 2019-04-09 12:31:46 UTC pid:203 xid:0 ip: LOG: received SIGHUP, reloading configuration files > > 2019-04-09 12:35:10 UTC pid:205 xid:0 ip: PANIC: could not flush dirty data: Operation not permitted > > 2019-04-09 12:35:10 UTC pid:203 xid:0 ip: LOG: checkpointer process (PID 205) was terminated by signal 6: Aborted > > Any chance you can strace this? Because I don't understand how you'd get > a permission error here. Me neither. I hacked my tree so that it would use the msync() version instead of the sync_file_range() version but that worked too. -- Thomas Munro https://enterprisedb.com
Re: PANIC: could not flush dirty data: Operation not permitted power8, Redhat Centos
От
zedaardv@gmail.com
Дата:
sent by smoke signals at great danger to my self. > On 12 Apr 2019, at 23:16, Thomas Munro <thomas.munro@gmail.com> wrote: > >> On Sat, Apr 13, 2019 at 7:23 AM Andres Freund <andres@anarazel.de> wrote: >>> On 2019-04-12 20:04:00 +0200, reiner peterke wrote: >>> We build Postgres on Power and x86 With the latest Postgres 11 release (11.2) we get error on >>> power8 ppc64le (Redhat and CentOS). No error on SUSE on power8 > > Huh, I wonder what is different. I don't see this on EDB's CentOS > 7.1 POWER8 system with an XFS filesystem. I ran it under strace -f > and saw this: > > [pid 51614] sync_file_range2(0x19, 0x2, 0x8000, 0x2000, 0x2, 0x8) = 0 > >>> 2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: listening on IPv4 address "0.0.0.0", port 5432 >>> 2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: listening on IPv6 address "::", port 5432 >>> 2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" >>> 2019-04-09 12:30:10 UTC pid:204 xid:0 ip: LOG: database system was shut down at 2019-04-09 12:27:09 UTC >>> 2019-04-09 12:30:10 UTC pid:203 xid:0 ip: LOG: database system is ready to accept connections >>> 2019-04-09 12:31:46 UTC pid:203 xid:0 ip: LOG: received SIGHUP, reloading configuration files >>> 2019-04-09 12:35:10 UTC pid:205 xid:0 ip: PANIC: could not flush dirty data: Operation not permitted >>> 2019-04-09 12:35:10 UTC pid:203 xid:0 ip: LOG: checkpointer process (PID 205) was terminated by signal 6: Aborted >> >> Any chance you can strace this? Because I don't understand how you'd get >> a permission error here. > > Me neither. I hacked my tree so that it would use the msync() version > instead of the sync_file_range() version but that worked too. > > -- > Thomas Munro > https://enterprisedb.com I forgot to mention that this is happening in a docker container. I want to test it on a VM to see if it is container related. I am sick at the moment so i’m unable to do the test at themoment. Reiner
Re: PANIC: could not flush dirty data: Operation not permittedpower8, Redhat Centos
От
Justin Pryzby
Дата:
On Fri, Apr 12, 2019 at 08:04:00PM +0200, reiner peterke wrote: > We build Postgres on Power and x86 With the latest Postgres 11 release (11.2) we get error on > power8 ppc64le (Redhat and CentOS). No error on SUSE on power8 > > No error on x86_64 (RH, Centos and SUSE) So there's an error on power8 with RH but not SUSE. What kernel versions are used for each of the successful and not successful ? Justin
Re: PANIC: could not flush dirty data: Operation not permittedpower8, Redhat Centos
От
Thomas Munro
Дата:
On Mon, Apr 15, 2019 at 7:57 PM <zedaardv@gmail.com> wrote: > I forgot to mention that this is happening in a docker container. Huh, so there may be some configuration of Linux container that can fail here with EPERM, even though that error that does not appear in the man page, and doesn't make much intuitive sense. Would be good to figure out how that happens. If we could somehow confirm* that sync_file_range() with the non-waiting flags we are using is non-destructive of error state, as Andres speculated (that is, it cannot eat the only error report we're ever going to get to tell us that buffered dirty data may have been dropped), then I suppose we could just remove the data_sync_elevel() promotion here. As with the WSL case (before the PANIC commit and the subsequent don't-repeat-the-warning-forever patch), a user of this posited EPERM-generating container configuration would then get repeated warnings in the log forever (as they presumably did before). Repeated WARNING messages are probably OK here, I think... I mean, if, say, someone complains that FlubOS's Linux emulation fails here with EIEIO, I'd say they should put up with the warnings and complain over on the flub-hackers list, or whatever, and I'd say the same for containers that generate EPERM: either the man page or the containter technology needs work. But... I still think we should try to avoid making decisions based on knowledge of kernel implementation details, if it can be avoided. I'd probably rather treat EPERM explicitly differently (and eventually EIEIO too, if a report comes in) than drop the current paranoid coding completely. *I'm not looking at it myself. A sync_file_range() implementation is on my list of potential FreeBSD projects for a rainy day, so I don't want to study anything but the man page, even if it's wrong. -- Thomas Munro https://enterprisedb.com
Re: PANIC: could not flush dirty data: Operation not permittedpower8, Redhat Centos
От
Thomas Munro
Дата:
On Wed, Apr 17, 2019 at 1:04 PM Thomas Munro <thomas.munro@gmail.com> wrote: > On Mon, Apr 15, 2019 at 7:57 PM <zedaardv@gmail.com> wrote: > > I forgot to mention that this is happening in a docker container. > > Huh, so there may be some configuration of Linux container that can > fail here with EPERM, even though that error that does not appear in > the man page, and doesn't make much intuitive sense. Would be good to > figure out how that happens. Steve Dodd ran into the same problem in Borg[1]. It looks like what's happening here is that on PowerPC and ARM systems, there is a second system call sync_file_range2 that has the arguments arranged in a better order for their calling conventions (see Notes section of man sync_file_range), and glibc helpfully translates for you, but some container technologies forgot to include sync_file_range2 in their syscall forwarding table. Perhaps we should just handle this with the not_implemented_by_kernel mechanism I added for WSL. [1] https://lists.freedesktop.org/archives/systemd-devel/2019-August/043276.html -- Thomas Munro https://enterprisedb.com
Re: PANIC: could not flush dirty data: Operation not permittedpower8, Redhat Centos
От
Thomas Munro
Дата:
On Mon, Aug 19, 2019 at 7:32 AM Thomas Munro <thomas.munro@gmail.com> wrote: > On Wed, Apr 17, 2019 at 1:04 PM Thomas Munro <thomas.munro@gmail.com> wrote: > > On Mon, Apr 15, 2019 at 7:57 PM <zedaardv@gmail.com> wrote: > > > I forgot to mention that this is happening in a docker container. > > > > Huh, so there may be some configuration of Linux container that can > > fail here with EPERM, even though that error that does not appear in > > the man page, and doesn't make much intuitive sense. Would be good to > > figure out how that happens. > > Steve Dodd ran into the same problem in Borg[1]. It looks like what's > happening here is that on PowerPC and ARM systems, there is a second > system call sync_file_range2 that has the arguments arranged in a > better order for their calling conventions (see Notes section of man > sync_file_range), and glibc helpfully translates for you, but some > container technologies forgot to include sync_file_range2 in their > syscall forwarding table. Perhaps we should just handle this with the > not_implemented_by_kernel mechanism I added for WSL. I've just heard that it was fixed overnight in seccomp, which is probably what Docker is using to give you EPERM for syscalls it doesn't like the look of: https://github.com/systemd/systemd/pull/13352/commits/90ddac6087b5f8f3736364cfdf698e713f7e8869 Not being a Docker user, I'm sure if/when that will flow into the right places in a timely fashion but if not it looks like you can always configure your own profile or take one from somewhere else, probably something like this: https://github.com/moby/moby/commit/52d8f582c331e35f7b841171a1c22e2d9bbfd0b8 So it looks like we don't need to do anything at all on our side, unless someone knows better. -- Thomas Munro https://enterprisedb.com