Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)

Поиск
Список
Период
Сортировка
От Masahiko Sawada
Тема Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Дата
Msg-id CAD21AoCqx=95bfjdvuPJQpxgi6X-TZXchr6mQSpAHuri7dty1w@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Bruce Momjian <bruce@momjian.us>)
Ответы Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
On Fri, May 10, 2019 at 2:42 AM Bruce Momjian <bruce@momjian.us> wrote:
>
> On Thu, May  9, 2019 at 05:49:12PM +0900, Masahiko Sawada wrote:
> > In terms of keys, one advantage could be that we have less keys with
> > per-tablespace keys.
> >
> > Let me briefly explain the current design I'm thinking. The design
> > employees 2-tier key architecture. That is, a database cluster has one
> > master key and per-tablespace keys which are encrypted with the master
> > key before storing to disk. Each tablespace keys are generated
> > randomly inside database when CREATE TABLESPACE. The all encrypted
> > tablespace keys are stored together with the master key ID to the file
> > (say, $PGDATA/base/pg_tblsp_keys). That way, the startup process can
> > easily get all tablespace keys and the master key ID before starting
> > recovery, and therefore can get the all decrypted tablespace keys. The
> > reason why it doesn't store per-tablespace keys in a column of
> > pg_tabelspace is that we also encrypt pg_tablespace with the
> > tablespace key. We could take a way to not encrypt only pg_tablespace,
> > however it instead would require to scan pg_tablespace before
> > recovery, and eventually we would need to not encrypt pg_attribute
> > that should be encrypted.
> >
> > During the recovery I'm also thinking the idea you suggested; wrapper
> > WAL records have tablespace OID that is the lookup key for tablespace
> > key and the startup process can get the tablespace key.
> >
> > Given that the above design less data keys is better. Obviously
> > per-tablespace keys are less than per-table keys. And even if we
> > employee per-tablespace keys we can allow user to specify per-table
> > encryption by using the same encryption key within the tablespace.
> >
> > FYI one advantage of per-tablespace encryption from user perspective
> > would be less conversion when database migration. Using
> > default_tablespace parameter we need less modification of create table
> > DDL.
>
> I think we need to step back and see what we want to do.  There are six
> levels of possible encryption:
>
> 1.  client-side column encryption
> 2.  server-side column encryption
> 3.  table-level
> 4.  database-level
> 5.  tablespace-level
> 6.  cluster-level
>
> 1 & 2 encrypt the data in the WAL automatically, and option 6 is
> encrypting the entire WAL.  This leaves 3-5 as cases where there will be
> mismatch between the object-level encryption and WAL.  I don't think it
> is very valuable to use these options so reencryption will be easier.
> In many cases, taking any object offline might cause the application to
> fail, and having multiple encrypted data keys active will allow key
> replacement to be done on an as-needed basis.
>

Summarizing the design discussion so far and the discussion I had at
PGCon, there are several basic design items here. Each of them is
loosely related and there are trade-off.

1. Encryption Levels.
As Bruce suggested there are 6 levels.  The fine grained control will
help to suppress performance overheads of tables that we don't
actually need to encrypt. Even in terms of security it might help
since we don't give the key users who don't or cannot access to
encrypted tables. But whichever we choose the level, we can protect
data from attack bypassing PostgresSQL's ACL such as reading database
file directly, as long as we encrypt data inside database. Threats we
want to protect by has already gotten consensus so far, I think.

Among these levels, the tablespace level would be somewhat different
from others because it corresponds to physical directories rather than
database objects. So in principles it's possible that tables are
created on an encrypted tablespace while indexes are created on
non-encrypted tablespace, which does not make sense though. But having
less encryption keys would be better for simple architecture.

2. Encryption Objects.
Indexes, WAL and TOAST table pertaining to encrypted tables, and
temporary files must also be encrypted but we need to discuss whether
we encrypt non-user data as well such as SLRU data, vm and fsm, and
perhaps even other files such as 2PC state files, backend_label etc.
Encryption everything is required by some use case but it's also true
that there are users who wish to encrypt database while minimizing
performance overheads.

3. Encryption keys.
Encryption levels would be relevant with the number of encryption keys
we use. The database cluster levels would use single encryption key
and can encrypt everything easier including non-user data such as xact
WALs and SRLU data with the same key. On the other hand, for instance
the table level would use multiple keys and can encrypt tables with
different encryption keys. One advantage of having multiple keys in
database would be that it can re-encrypt encrypted database object
as-needed basis. For instance in multi tenant architecture, the
stopping database cluster would affect all services but we can
re-encrypt data one by one while minimizing downtime of each services
if we use multiple keys. Even in terms of security, having multiple
keys helps the diversification of risk.

4. Key rotation and 2 tier key hierarchy.
Another design point would be key rotation and using 2 tier key
hierarchy. Periodic key rotation is very important and is required by
some security standard.

For instance PCI DSS 3.6.4 states "Cryptographic key changes for keys
that have reached the end of their cryptoperiod (for example, after a
defined period of time has passed and/or after a certain amount of
cipher-text has been produced by a given key), as defined by the
associated application vendor or key owner, and based on industry best
practices and guidelines"[1]. A cryptoperiod is the time span during
which a specific cryptographic key is authorized for use[2] (sometimes
represents the number of transactions). It is defined based on
multiple factors such as key length, key strength, algorithms. So it
depends on users systems, key rotation can be required even every a
few months.

2 tier key hierarchy is a technique for faster key rotation; it uses 2
types of keys: master key and data key. The master key is stored
outside database whereas the data key is stored inside database. The
data key is used to encrypt actual database data and is encrypted with
the master key before storing to the disk. And the master key must be
stored outside database. When key rotation, we rotate the master key
and re-encrypt only data key rather than database data. Therefore
since we don't need to access and modify database data the key
rotation will complete in a second. Without it, we will end up with
re-encrypting database data, which could take a long time depending on
the encryption levels.

Because the data key must not be go outside database we might need to
provide a mechanism of key rotation in database side, for example
providing pg_rotate_encryption_key() SQL function or ALTER SYSTEM
ROTATE KEY SQL command so that PostgreSQL rotate data keys with the
new master key  So it might also require PostgreSQL to get the new
master key from an external location within the command, and therefore
it's relevant with key management.

5. Key management.
Encryption key (the master key in 2 tier key hierarchy) must be taken
from an external location. If only getting, it might be enough to get
it using shell command like curl. However it's more secure if we can
seamlessly integrate PostgreSQL with key management services. And DBAs
would no longer need to care key lifecycles. In addition to that,
dedicated programs or knowledge will not be necessary in individual
user systems. Furthermore, integration with KMS might be helpful even
for column-level encryption; we could combine pgcrypto with it to
provide per column TDE.

Apart from the above discussion, there are random concerns about the
design regarding to the fine grained design. For WAL encryption, as a
result of discussion so far I'm going to use the same encryption for
WAL encryption as that used for tables. Given that approach, it would
be required to make utility commands that read WAL (pg_waldump and
pg_rewind) be able to get arbitrary encryption keys. pg_waldump might
require even an encryption keys of WAL of which table has already been
dropped. As I discussed at PGCon[3], by rearranging WAL format would
solve this issue but it doesn't resolve fundamental issue.

Also, for system catalog encryption, it could be a hard part. System
catalogs are initially created at initdb time and created by copying
from template1 when CREATE DATABASE. Therefore we would need to either
modify initdb so that it's aware of encryption keys and KMS or modify
database creation so that it copies database file while encrypting
them.

There are two proposals so far: the cluster wide encryption and per
tablespace encryption. It's good if we pick a good thing from each
proposals in order to provide an useful TDE feature to users. And
whatever design the community accepts I'd like to contribute to do
that.

Comments and feedback are very welcome!

[1] https://www.pcisecuritystandards.org/document_library?category=pcidss&document=pci_dss
[2] https://en.wikipedia.org/wiki/Cryptoperiod
[3] https://www.slideshare.net/masahikosawada98/transparent-data-encryption-in-postgresql/28


Regards,

--
Masahiko Sawada
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Geoghegan
Дата:
Сообщение: Re: Sort support for macaddr8
Следующее
От: "Wu, Fei"
Дата:
Сообщение: Confusing comment for function ExecParallelEstimate