RFC: seccomp-bpf support

Поиск
Список
Период
Сортировка
От Joe Conway
Тема RFC: seccomp-bpf support
Дата
Msg-id bc032e95-7e8b-ed00-8d87-ed9db449bdd6@joeconway.com
обсуждение исходный текст
Ответы Re: RFC: seccomp-bpf support  (David Fetter <david@fetter.org>)
Re: RFC: seccomp-bpf support  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Re: RFC: seccomp-bpf support  (Andres Freund <andres@anarazel.de>)
Список pgsql-hackers
SECCOMP ("SECure COMPuting with filters") is a Linux kernel syscall
filtering mechanism which allows reduction of the kernel attack surface
by preventing (or at least audit logging) normally unused syscalls.

Quoting from this link:
https://www.kernel.org/doc/Documentation/prctl/seccomp_filter.txt

   "A large number of system calls are exposed to every userland process
    with many of them going unused for the entire lifetime of the
    process. As system calls change and mature, bugs are found and
    eradicated. A certain subset of userland applications benefit by
    having a reduced set of available system calls. The resulting set
    reduces the total kernel surface exposed to the application. System
    call filtering is meant for use with those applications."

Recent security best-practices recommend, and certain highly
security-conscious organizations are beginning to require, that SECCOMP
be used to the extent possible. The major web browsers, container
runtime engines, and systemd are all examples of software that already
support seccomp.

---------
A seccomp (bpf) filter is comprised of a default action, and a set of
rules with actions pertaining to specific syscalls (possibly with even
more specific sets of arguments). Once loaded into the kernel, a filter
is inherited by all child processes and cannot be removed. It can,
however, be overlaid with another filter. For any given syscall match,
the most restrictive (a.k.a. highest precedence) action will be taken by
the kernel. PostgreSQL has already been run "in the wild" under seccomp
control in containers, and possibly systemd. Adding seccomp support into
PostgreSQL itself mitigates issues with these approaches, and has
several advantages:

* Container seccomp filters tend to be extremely broad/permissive,
  typically allowing about 6 out 7 of all syscalls. They must do this
  because the use cases for containers vary widely.
* systemd does not implement seccomp filters by default. Packagers may
  decide to do so, but there is no guarantee. Adding them post install
  potentially requires cooperation by groups outside control of
  the database admins.
* In the container and systemd case there is no particularly good way to
  inspect what filters are active. It is possible to observe actions
  taken, but again, control is possibly outside the database admin
  group. For example, the best way to understand what happened is to
  review the auditd log, which is likely not readable by the DBA.
* With built-in support, it is possible to lock down backend processes
  more tightly than the postmaster.
* With built-in support, it is possible to lock down different backend
  processes differently than each other, for example by using ALTER ROLE
  ... SET or ALTER DATABASE ... SET.
* With built-in support, it is possible to calculate and return (in the
  form of an SRF) the effective filters being applied to the postmaster
  and the current backend.
* With built-in support, it could be possible (this part not yet
  implemented) to have separate filters for different backend types,
  e.g. autovac workers, background writer, etc.

---------
Attached is a patch for discussion, adding support for seccomp-bpf
(nowadays generally just called seccomp) syscall filtering at
configure-time using libseccomp. I would like to get this in shape to be
committed by the end of the November CF if possible.

The code itself has been through several rounds of revision based on
discussions I have had with the author of libseccomp as well as a few
other folks. However as of the moment:

* Documentation - general discussion missing entirely
* No regression tests

---------
For convenience, here are a couple of additional links to relevant
information regarding seccomp:
https://en.wikipedia.org/wiki/Seccomp
https://github.com/seccomp/libseccomp

---------
Specific feedback requested:
1. Placement of pg_get_seccomp_filter() in
   src/backend/utils/adt/genfile.c
   originally made sense but after several rewrites no longer does.
   Ideas where it *should* go?
2. Where should a general discussion section go in the docs, if at all?
3. Currently this supports a global filter at the postmaster level,
   which is inherited by all child processes, and a secondary filter
   at the client backend session level. It likely makes sense to
   support secondary filters for other types of child processes,
   e.g. autovacuum workers, etc. Add that now (pg13), later release,
   or never?
4. What is the best way to approach testing of this feature? Tap
   testing perhaps?
5. Default GUC values - should we provide "starter" lists, or only a
   procedure for generating a list (as below).

---------
Notes on usage:
===============
In order to determine your minimally required allow lists, do something
like the following on a non-production server with the same architecture
as production:

0. Setup:
 * install libseccomp, libseccomp-dev, and seccomp
 * install auditd if not already installed
 * configure postgres --with-seccomp and maybe --enable-tap-tests to
   improve feature coverage (see below)

1. Modify postgresql.conf and/or create <pg_source_dir>/postgresql_tmp.conf
8<--------------------
seccomp = on
global_syscall_default = allow
global_syscall_allow = ''
global_syscall_log = ''
global_syscall_error = ''
global_syscall_kill = ''
session_syscall_default = log
session_syscall_allow = '*'
session_syscall_log = '*'
session_syscall_error = '*'
session_syscall_kill = '*'
8<--------------------

2. Modify /etc/audit/auditd.conf
 * disp_qos = 'lossless'
 * change max_log_file_action = 'ignore'

3. Stop auditd, clear out all audit.logs, start auditd:
 * systemctl stop auditd.service            # if running
 * echo -n "" > /var/log/audit/audit.log
 * systemctl start auditd.service

4. Start/restart postgres.

5. Exercise postgres as much as possible (one or more of the following):
 * make installcheck-world
 * make check world \
   EXTRA_REGRESS_OPTS=--temp-config=<pg_source_dir>/postgresql_tmp.conf
 * run your application through its paces
 * other random testing of relevant postgres features

  Note: at this point audit.log will start growing quickly. During `make
  check world` mine grew to just under 1 GB.

6. Process results:
 a) systemctl stop auditd.service
 b) Run the provided "get_syscalls.sh" script
 c) Cut and paste the result as the value of session_syscall_allow.

7. Optional:
 a) global_syscall_default = 'log'
 b) Repeat steps 3-5
 c) Repeat step 6a and 6b
 d) Cut and paste the result as the value of global_syscall_allow

8. Iterate steps 3-6b.
 * Output should be empty.
 * If there are any new syscalls, add to global_syscall_allow and
   session_syscall_allow.
 * Iterate until output of "get_syscalls.sh" script is empty.

9. Optional:
 * Change global and session defaults to "error" or "kill"
 * Reduce the allow lists if desired
 * This can be done for specific database users, by doing
   ALTER ROLE... SET session_syscall_allow to '<some reduced allow list>'

10. Adjust settings to taste, restart postgres, and monitor audit.log
    going forward.

Below are some values from my system. Note that I have made no attempt
thus far to do static code analysis -- this list was build using `make
check world` several times.
8<-------------------------
seccomp = on

global_syscall_default = log
global_syscall_allow =

'accept,access,bind,brk,chmod,clone,close,connect,dup,epoll_create1,epoll_ctl,epoll_wait,exit_group,fadvise64,fallocate,fcntl,fdatasync,fstat,fsync,ftruncate,futex,getdents,getegid,geteuid,getgid,getpeername,getpid,getppid,getrandom,getrusage,getsockname,getsockopt,getuid,ioctl,kill,link,listen,lseek,lstat,mkdir,mmap,mprotect,mremap,munmap,openat,pipe,poll,prctl,pread64,prlimit64,pwrite64,read,readlink,recvfrom,recvmsg,rename,rmdir,rt_sigaction,rt_sigprocmask,rt_sigreturn,seccomp,select,sendto,setitimer,set_robust_list,setsid,setsockopt,shmat,shmctl,shmdt,shmget,shutdown,socket,stat,statfs,symlink,sync_file_range,sysinfo,umask,uname,unlink,utime,wait4,write'
global_syscall_log = ''
global_syscall_error = ''
global_syscall_kill = ''

session_syscall_default = log
session_syscall_allow =

'access,brk,chmod,close,connect,epoll_create1,epoll_ctl,epoll_wait,exit_group,fadvise64,fallocate,fcntl,fdatasync,fstat,fsync,ftruncate,futex,getdents,getegid,geteuid,getgid,getpeername,getpid,getrandom,getrusage,getsockname,getsockopt,getuid,ioctl,kill,link,lseek,lstat,mkdir,mmap,mprotect,mremap,munmap,openat,poll,pread64,pwrite64,read,readlink,recvfrom,recvmsg,rename,rmdir,rt_sigaction,rt_sigprocmask,rt_sigreturn,select,sendto,setitimer,setsockopt,shutdown,socket,stat,symlink,sync_file_range,sysinfo,umask,uname,unlink,utime,write'
session_syscall_log = '*'
session_syscall_error = '*'
session_syscall_kill = '*'
8<-------------------------

That results in the following effective filters at the ("context"
equals) global and session levels:

8<-------------------------
select * from pg_get_seccomp_filter() order by 4,1;
     syscall     | syscallnum | filter_action  | context
-----------------+------------+----------------+---------
 accept          |         43 | global->allow  | global
 access          |         21 | global->allow  | global
 bind            |         49 | global->allow  | global
 brk             |         12 | global->allow  | global
 chmod           |         90 | global->allow  | global
 clone           |         56 | global->allow  | global
 close           |          3 | global->allow  | global
 connect         |         42 | global->allow  | global
 <default>       |         -1 | global->log    | global
 dup             |         32 | global->allow  | global
 epoll_create1   |        291 | global->allow  | global
 epoll_ctl       |        233 | global->allow  | global
 epoll_wait      |        232 | global->allow  | global
 exit_group      |        231 | global->allow  | global
 fadvise64       |        221 | global->allow  | global
 fallocate       |        285 | global->allow  | global
 fcntl           |         72 | global->allow  | global
 fdatasync       |         75 | global->allow  | global
 fstat           |          5 | global->allow  | global
 fsync           |         74 | global->allow  | global
 ftruncate       |         77 | global->allow  | global
 futex           |        202 | global->allow  | global
 getdents        |         78 | global->allow  | global
 getegid         |        108 | global->allow  | global
 geteuid         |        107 | global->allow  | global
 getgid          |        104 | global->allow  | global
 getpeername     |         52 | global->allow  | global
 getpid          |         39 | global->allow  | global
 getppid         |        110 | global->allow  | global
 getrandom       |        318 | global->allow  | global
 getrusage       |         98 | global->allow  | global
 getsockname     |         51 | global->allow  | global
 getsockopt      |         55 | global->allow  | global
 getuid          |        102 | global->allow  | global
 ioctl           |         16 | global->allow  | global
 kill            |         62 | global->allow  | global
 link            |         86 | global->allow  | global
 listen          |         50 | global->allow  | global
 lseek           |          8 | global->allow  | global
 lstat           |          6 | global->allow  | global
 mkdir           |         83 | global->allow  | global
 mmap            |          9 | global->allow  | global
 mprotect        |         10 | global->allow  | global
 mremap          |         25 | global->allow  | global
 munmap          |         11 | global->allow  | global
 openat          |        257 | global->allow  | global
 pipe            |         22 | global->allow  | global
 poll            |          7 | global->allow  | global
 prctl           |        157 | global->allow  | global
 pread64         |         17 | global->allow  | global
 prlimit64       |        302 | global->allow  | global
 pwrite64        |         18 | global->allow  | global
 read            |          0 | global->allow  | global
 readlink        |         89 | global->allow  | global
 recvfrom        |         45 | global->allow  | global
 recvmsg         |         47 | global->allow  | global
 rename          |         82 | global->allow  | global
 rmdir           |         84 | global->allow  | global
 rt_sigaction    |         13 | global->allow  | global
 rt_sigprocmask  |         14 | global->allow  | global
 rt_sigreturn    |         15 | global->allow  | global
 seccomp         |        317 | global->allow  | global
 select          |         23 | global->allow  | global
 sendto          |         44 | global->allow  | global
 setitimer       |         38 | global->allow  | global
 set_robust_list |        273 | global->allow  | global
 setsid          |        112 | global->allow  | global
 setsockopt      |         54 | global->allow  | global
 shmat           |         30 | global->allow  | global
 shmctl          |         31 | global->allow  | global
 shmdt           |         67 | global->allow  | global
 shmget          |         29 | global->allow  | global
 shutdown        |         48 | global->allow  | global
 socket          |         41 | global->allow  | global
 stat            |          4 | global->allow  | global
 statfs          |        137 | global->allow  | global
 symlink         |         88 | global->allow  | global
 sync_file_range |        277 | global->allow  | global
 sysinfo         |         99 | global->allow  | global
 umask           |         95 | global->allow  | global
 uname           |         63 | global->allow  | global
 unlink          |         87 | global->allow  | global
 utime           |        132 | global->allow  | global
 wait4           |         61 | global->allow  | global
 write           |          1 | global->allow  | global
 accept          |         43 | session->log   | session
 access          |         21 | session->allow | session
 bind            |         49 | session->log   | session
 brk             |         12 | session->allow | session
 chmod           |         90 | session->allow | session
 clone           |         56 | session->log   | session
 close           |          3 | session->allow | session
 connect         |         42 | session->allow | session
 <default>       |         -1 | session->log   | session
 dup             |         32 | session->log   | session
 epoll_create1   |        291 | session->allow | session
 epoll_ctl       |        233 | session->allow | session
 epoll_wait      |        232 | session->allow | session
 exit_group      |        231 | session->allow | session
 fadvise64       |        221 | session->allow | session
 fallocate       |        285 | session->allow | session
 fcntl           |         72 | session->allow | session
 fdatasync       |         75 | session->allow | session
 fstat           |          5 | session->allow | session
 fsync           |         74 | session->allow | session
 ftruncate       |         77 | session->allow | session
 futex           |        202 | session->allow | session
 getdents        |         78 | session->allow | session
 getegid         |        108 | session->allow | session
 geteuid         |        107 | session->allow | session
 getgid          |        104 | session->allow | session
 getpeername     |         52 | session->allow | session
 getpid          |         39 | session->allow | session
 getppid         |        110 | session->log   | session
 getrandom       |        318 | session->allow | session
 getrusage       |         98 | session->allow | session
 getsockname     |         51 | session->allow | session
 getsockopt      |         55 | session->allow | session
 getuid          |        102 | session->allow | session
 ioctl           |         16 | session->allow | session
 kill            |         62 | session->allow | session
 link            |         86 | session->allow | session
 listen          |         50 | session->log   | session
 lseek           |          8 | session->allow | session
 lstat           |          6 | session->allow | session
 mkdir           |         83 | session->allow | session
 mmap            |          9 | session->allow | session
 mprotect        |         10 | session->allow | session
 mremap          |         25 | session->allow | session
 munmap          |         11 | session->allow | session
 openat          |        257 | session->allow | session
 pipe            |         22 | session->log   | session
 poll            |          7 | session->allow | session
 prctl           |        157 | session->log   | session
 pread64         |         17 | session->allow | session
 prlimit64       |        302 | session->log   | session
 pwrite64        |         18 | session->allow | session
 read            |          0 | session->allow | session
 readlink        |         89 | session->allow | session
 recvfrom        |         45 | session->allow | session
 recvmsg         |         47 | session->allow | session
 rename          |         82 | session->allow | session
 rmdir           |         84 | session->allow | session
 rt_sigaction    |         13 | session->allow | session
 rt_sigprocmask  |         14 | session->allow | session
 rt_sigreturn    |         15 | session->allow | session
 seccomp         |        317 | session->log   | session
 select          |         23 | session->allow | session
 sendto          |         44 | session->allow | session
 setitimer       |         38 | session->allow | session
 set_robust_list |        273 | session->log   | session
 setsid          |        112 | session->log   | session
 setsockopt      |         54 | session->allow | session
 shmat           |         30 | session->log   | session
 shmctl          |         31 | session->log   | session
 shmdt           |         67 | session->log   | session
 shmget          |         29 | session->log   | session
 shutdown        |         48 | session->allow | session
 socket          |         41 | session->allow | session
 stat            |          4 | session->allow | session
 statfs          |        137 | session->log   | session
 symlink         |         88 | session->allow | session
 sync_file_range |        277 | session->allow | session
 sysinfo         |         99 | session->allow | session
 umask           |         95 | session->allow | session
 uname           |         63 | session->allow | session
 unlink          |         87 | session->allow | session
 utime           |        132 | session->allow | session
 wait4           |         61 | session->log   | session
 write           |          1 | session->allow | session
(170 rows)
8<-------------------------

If you made it all the way to here, thank you for your attention :-)

Joe

--
Crunchy Data - http://crunchydata.com
PostgreSQL Support for Secure Enterprises
Consulting, Training, & Open Source Development

Вложения

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Improve base backup protocol documentation
Следующее
От: Dmitry Dolgov
Дата:
Сообщение: Re: Index Skip Scan