Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)

Поиск
Список
Период
Сортировка
От Sehrope Sarkuni
Тема Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)
Дата
Msg-id CAH7T-aoTTG2oiYSkb-Y7TmD5-NHmKmcx1gA_ThyL2PiyMuvCsg@mail.gmail.com
обсуждение исходный текст
Ответ на Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Bruce Momjian <bruce@momjian.us>)
Ответы Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Bruce Momjian <bruce@momjian.us>)
Re: [Proposal] Table-level Transparent Data Encryption (TDE) and KeyManagement Service (KMS)  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
On Mon, Aug 5, 2019 at 9:02 PM Bruce Momjian <bruce@momjian.us> wrote:
On Wed, Jul 31, 2019 at 09:25:01AM -0400, Sehrope Sarkuni wrote:
> Even if we do not include a separate per-relation salt or things like
> relfilenode when generating a derived key, we can still include other types of
> immutable attributes. For example the fork type could be included to eventually
> allow multiple forks for the same relation to be encrypted with the same IV =
> LSN + Page Number as the derived key per-fork would be distinct.

Yes, the fork number could be useful in this case.  I was thinking we
would just leave the extra bits as zeros and we can then set it to '1'
or something else for a different fork.

Key derivation has more flexibility as you're not limited by the number of unused bits in the IV.
 
> WAL encryption should not use the same key as page encryption so there's no
> need to design the IV to try to avoid matching the page IVs. Even a basic
> derivation with a single fixed WDEK = HKDF(MDEK, "WAL") and TDEK = HKDF(MDEK,
> "PAGE") would ensure separate keys. That's the the literal string "WAL" or
> "PAGE" being added as a salt to generate the respective keys, all that matters
> is they're different.

I was thinking the WAL would use the same key since the nonce is unique
between the two.  What value is there in using a different key?

Never having to worry about overlap in Key + IV usage is main advantage. While it's possible to structure IVs to avoid that from happening, it's much easier to completely avoid that situation by ensuring different parts of an application are using separate derived keys.
 
> Ideally WAL encryption would generating new derived keys as part of the WAL
> stream. The WAL stream is not fixed so you have the luxury of being able to add
> a "Use new random salt XZY going forward" records. Forcing generation of a new
> salt/key upon promotion of a replica would ensure that at least the WAL is
> unique going forward. Could also generate a new upon server startup, after

Ah, yes, good point, and using a derived key would make that easier.
The tricky part is what to use to create the new derived key, unless we
generate a random number and store that somewhere in the data directory,
but that might lead to fragility, so I am worried. 

Simplest approach for derived keys would be to use immutable attributes of the WAL files as an input to the key derivation. Something like HKDF(MDEK, "WAL:" || timeline_id || wal_segment_num) should be fine for this as it is:

* Unique per WAL file
* Known prior to writing to a given WAL file
* Known prior to reading a given WAL file
* Does not require any additional persistence

We have pg_rewind,
which allows to make the WAL go backwards.  What is the value in doing
this?

Good point re: pg_rewind. Having key rotation records in the stream would complicate that as you'd have to jump back / forward to figure out which key to use. It's doable but much more complicated.

A unique WDEK per WAL file that is derived from the segment number would not have that problem. A unique key per-file means the IVs can all start at zero and the each file can be treated as one encrypted stream. Any encryption/decryption code would only need to touch the write/read callsites.
 
> every N bytes, or a new one for each new WAL file. There's much more
> flexibility compared to page encryption.
>
> As WAL is a single continuous stream, we can start the IV for each derived WAL
> key from zero. There's no need to complicate it further as Key + IV will never
> be reused.

Uh, you want a new random key for each WAL file?  I was going to use the
WAL segment number as the nonce, which is always increasing, and easily
determined.  The file is 16MB.

Ideally yes as it would allow for multiple replicas promoted off the same primary to immediately diverge as each would have its own keys. I don't consider it a requirement but if it's possible without significant added complexity I say that's a win.

I'm still reading up on the file and record format to understand how complex that would be. Though given your point re: pg_rewind and the lack of handling for page encryption divergence when promoting multiple replicas, I doubt the complexity will be worth it.
 
> If WAL is always written as full pages we need to ensure that the empty parts
> of the page are actual zeros and not "encrypted zeroes". Otherwise an XOR of
> the empty section of the first write of a page against a subsequent one would
> give you the plain text.

Right, I think we need the segment number as part of the nonce for WAL.

+1 to using segment number but it's better as a derived key instead of coming up with new IV constructs and reusing the MDEK.
 
> The non-fixed size of the WAL allows for the addition of a MAC though I'm not
> sure yet the best way to incorporate it. It could be part of each encrypted
> record or its own summary record (providing a MAC for a series of WAL records).
> After I've gone through this a bit more I'm looking to put together a write up
> with this and some other thoughts in one place.

I don't think we want to add a MAC at this point since the MAC for 8k
pages seems unattainable.

Even without a per-page MAC, a MAC at some level for WAL has its own benefits such as perfect corruption detection. It could be per-record, per-N-records, per-checkpoint, or per-file. The current WAL file format already handles arbitrary gaps so there is significantly more flexibility in adding it vs pages. I'm not saying it should be a requirement but, unlike pages, I would not rule it out just yet as it may not be that complicated.

Regards,
-- Sehrope Sarkuni
Founder & CEO | JackDB, Inc. | https://www.jackdb.com/

 

В списке pgsql-hackers по дате отправления:

Предыдущее
От: Ryan Lambert
Дата:
Сообщение: Re: Built-in connection pooler
Следующее
От: Alexander Korotkov
Дата:
Сообщение: Re: SQL/JSON path: collation for comparisons, minor typos in docs