Re: Transparent column encryption

Поиск
Список
Период
Сортировка
От Peter Eisentraut
Тема Re: Transparent column encryption
Дата
Msg-id c3d59b70-cf22-a4bf-4613-91d040b988bd@enterprisedb.com
обсуждение исходный текст
Ответ на Re: Transparent column encryption  (Jacob Champion <jchampion@timescale.com>)
Ответы Re: Transparent column encryption  (Robert Haas <robertmhaas@gmail.com>)
Re: Transparent column encryption  (Jacob Champion <jchampion@timescale.com>)
Re: Transparent column encryption  (Bruce Momjian <bruce@momjian.us>)
Список pgsql-hackers
On 15.07.22 19:47, Jacob Champion wrote:
>>      The CEK key
>>      material is in turn encrypted by an assymmetric key called the column
>>      master key (CMK).
> 
> I'm not yet understanding why the CMK is asymmetric.

I'm not totally sure either.  I started to build it that way because 
other systems were doing it that way, too.  But I have been thinking 
about adding a symmetric alternative for the CMKs as well (probably AESKW).

I think there are a couple of reasons why asymmetric keys are possibly 
useful for CMKs:

Some other products make use of secure enclaves to do computations on 
(otherwise) encrypted values on the server.  I don't fully know how that 
works, but I suspect that asymmetric keys can play a role in that.  (I 
don't have any immediate plans for that in my patch.  It seems to be a 
dying technology at the moment.)

Asymmetric keys gives you some more options for how you set up the keys 
at the beginning.  For example, you create the asymmetric key pair on 
the host where your client program that wants access to the encrypted 
data will run.  You put the private key in an appropriate location for 
run time.  You send the public key to another host.  On that other host, 
you create the CEK, encrypt it with the CMK, and then upload it into the 
server (CREATE COLUMN ENCRYPTION KEY).  Then you can wipe that second 
host.  That way, you can be even more sure that the unencrypted CEK 
isn't left anywhere.  I'm not sure whether this method is very useful in 
practice, but it's interesting.

In any case, as I mentioned above, this particular aspect is up for 
discussion.

Also note that if you use a KMS (cmklookup "run" method), the actual 
algorithm doesn't even matter (depending on details of the KMS setup), 
since you just tell the KMS "decrypt this", and the KMS knows by itself 
what algorithm to use.  Maybe there should be a way to specify "unknown" 
in the ckdcmkalg field.

>> +#define PG_CEK_AEAD_AES_128_CBC_HMAC_SHA_256   130
>> +#define PG_CEK_AEAD_AES_192_CBC_HMAC_SHA_384   131
>> +#define PG_CEK_AEAD_AES_256_CBC_HMAC_SHA_384   132
>> +#define PG_CEK_AEAD_AES_256_CBC_HMAC_SHA_512   133
> 
> It looks like these ciphersuites were abandoned by the IETF. Are there
> existing implementations of them that have been audited/analyzed? Are
> they safe (and do we know that the claims made in the draft are
> correct)? How do they compare to other constructions like AES-GCM-SIV
> and XChacha20-Poly1305?

The short answer is, these same algorithms are used in equivalent 
products (see MS SQL Server, MongoDB).  They even reference the same 
exact draft document.

Besides that, here is my analysis for why these are good choices: You 
can't use any of the counter modes, because since the encryption happens 
on the client, there is no way to coordinate to avoid nonce reuse.  So 
among mainstream modes, you are basically left with AES-CBC with a 
random IV.  In that case, even if you happen to reuse an IV, the 
possible damage is very contained.

And then, if you want to use AEAD, you combine that with some MAC, and 
HMAC is just as good as any for that.

The referenced draft document doesn't really contain any additional 
cryptographic insights, it's just a guide on a particular way to put 
these two together.

So altogether I think this is a pretty solid choice.

>> +-- \gencr
>> +-- (This just tests the parameter passing; there is no encryption here.)
>> +CREATE TABLE test_gencr (a int, b text);
>> +INSERT INTO test_gencr VALUES (1, 'one') \gencr
>> +SELECT * FROM test_gencr WHERE a = 1 \gencr
>> + a |  b
>> +---+-----
>> + 1 | one
>> +(1 row)
>> +
>> +INSERT INTO test_gencr VALUES ($1, $2) \gencr 2 'two'
>> +SELECT * FROM test_gencr WHERE a IN ($1, $2) \gencr 2 3
>> + a |  b
>> +---+-----
>> + 2 | two
>> +(1 row)
> I'd expect \gencr to error out without sending plaintext. I know that
> under the hood this is just setting up a prepared statement, but if I'm
> using \gencr, presumably I really do want to be encrypting my data.
> Would it be a problem to always set force-column-encryption for the
> parameters we're given here? Any unencrypted columns could be provided
> directly.

Yeah, this needs a bit of refinement.  You don't want something named 
"encr" but it only encrypts some of the time.  We could possibly do what 
you suggest and make it set the force-encryption flag, or maybe rename 
it or add another command that just uses prepared statements and doesn't 
promise anything about encryption from its name.

This also ties in with how pg_dump will eventually work.  I think by 
default pg_dump will just dump things encrypted and set it up so that 
COPY writes it back encrypted.  But there should probably be a mode that 
dumps out plaintext and then uses one of these commands to load the 
plaintext back in.  What these psql commands need to do also depends on 
what pg_dump needs them to do.

>> +  <para>
>> +   Null values are not encrypted by transparent column encryption; null values
>> +   sent by the client are visible as null values in the database.  If the fact
>> +   that a value is null needs to be hidden from the server, this information
>> +   needs to be encoded into a nonnull value in the client somehow.
>> +  </para>
> 
> This is a major gap, IMO. Especially with the switch to authenticated
> ciphers, because it means you can't sign your NULL values. And having
> each client or user that's out there solve this with a magic in-band
> value seems like a recipe for pain.
> 
> Since we're requiring "canonical" use of text format, and the docs say
> there are no embedded or trailing nulls allowed in text values, could we
> steal the use of a single zero byte to mean NULL? One additional
> complication would be that the client would have to double-check that
> we're not writing a NULL into a NOT NULL column, and complain if it
> reads one during decryption. Another complication would be that the
> client would need to complain if it got a plaintext NULL.

You're already alluding to some of the complications.  Also consider 
that null values could arise from, say, outer joins.  So you could be in 
a situation where encrypted and unencrypted null values coexist.  And of 
course the server doesn't know about the encrypted null values.  So how 
do you maintain semantics, like for aggregate functions, primary keys, 
anything that treats null values specially?  How do clients deal with a 
mix of encrypted and unencrypted null values, how do they know which one 
is real.  What if the client needs to send a null value back as a 
parameter?  All of this would create enormous complications, if they can 
be solved at all.

I think a way to look at this is that this column encryption feature 
isn't suitable for disguising the existence or absence of data, it can 
only disguise the particular data that you know exists.

>> +   <para>
>> +    The <quote>associated data</quote> in these algorithms consists of 4
>> +    bytes: The ASCII letters <literal>P</literal> and <literal>G</literal>
>> +    (byte values 80 and 71), followed by the algorithm ID as a 16-bit unsigned
>> +    integer in network byte order.
>> +   </para>
> 
> Is this AD intended as a placeholder for the future, or does it serve a
> particular purpose?

It has been recommended that you include the identity of the encryption 
algorithm in the AD.  This protects the client from having to decrypt 
stuff that wasn't meant to be decrypted (in that way).



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Alexander Korotkov
Дата:
Сообщение: Re: PATCH: Add Table Access Method option to pgbench
Следующее
От: Bharath Rupireddy
Дата:
Сообщение: Re: Use "WAL segment" instead of "log segment" consistently in user-facing messages