Re: storing an explicit nonce
От | Robert Haas |
---|---|
Тема | Re: storing an explicit nonce |
Дата | |
Msg-id | CA+TgmoY0KBnTJzwAx6WqKNNyistzEstOnEuY7R53ZPX2_yoiUg@mail.gmail.com обсуждение исходный текст |
Ответ на | Re: storing an explicit nonce (Stephen Frost <sfrost@snowman.net>) |
Ответы |
Re: storing an explicit nonce
(Stephen Frost <sfrost@snowman.net>)
|
Список | pgsql-hackers |
On Wed, May 26, 2021 at 2:37 PM Stephen Frost <sfrost@snowman.net> wrote: > > Anybody got a better idea? > > If we stipulate (and document) that all replicas need their own keys > then we no longer need to worry about nonce re-use between the primary > and the replica. Not sure that's *better*, per se, but I do think it's > worth consideration. Teaching pg_basebackup how to decrypt and then > re-encrypt with a different key wouldn't be challenging. I agree that we could do that and that it's possible worth considering. However, it would be easy - and tempting - for users to violate the no-nonce-reuse principle. For example, consider a hypothetical user who takes a backup on Monday via a filesystem snapshot - which might be either (a) a snapshot of the cluster while it is stopped, or (b) a snapshot of the cluster while it's running, from which crash recovery can be safely performed as long as it's a true atomic snapshot, or (c) a snapshot taken between pg_start_backup and pg_stop_backup which will be used just like a backup taken by pg_basebackup. In any of these cases, there's no opportunity for a tool we provide to intervene and re-key. Now, we would provide a tool that re-keys in such situations and tell people to be sure they run it before using any of those backups, and maybe that's the best we can do. However, that tool is going to run for a good long time because it has to rewrite the entire cluster, so someone with a terabyte-scale database is going to be sorely tempted to skip this "unnecessary" and time-consuming step. If it were possible to set things up so that good things happen automatically and without user action, that would be swell. Here's another idea: suppose that a nonce is 128 bits, 64 of which are randomly generated at server startup, and the other 64 of which are a counter. If you're willing to assume that the 64 bits generated randomly at server startup are not going to collide in practice, because the number of server lifetimes per key should be very small compared to 2^64, then this gets you the benefits of a randomly-generate nonce without needing to keep on generating new cryptographically strong random numbers, and pretty much regardless of what users do with their backups. If you replay an FPI, you can write out the page exactly as you got it from the master, without re-encrypting. If you modify and then write a page, you generate a nonce for it containing your own server lifetime identifier. > Yes, if the amount of space available is variable then there's an added > cost for that. While I appreciate the concern about having that be > expensive, for my 2c at least, I like to think that having this sudden > space that's available for use may lead to other really interesting > capabilities beyond the ones we're talking about here, so I'm not really > thrilled with the idea of boiling it down to just two cases. Although I'm glad you like some things about this idea, I think the proposed system will collapse if we press it too hard. We're going to need to be judicious. > One thing to be absolutely clear about here though is that simply taking > a hash() of the ciphertext and storing that with the data does *not* > provide cryptographic data integrity validation for the page because it > doesn't involve the actual key or IV at all and the hash is done after > the ciphertext is generated- therefore an attacker can change the data > and just change the hash to match and you'd never know. Ah, right. So you'd actually want something more like hash(dboid||tsoid||relfilenode||blockno||block_contents||secret). Maybe not generated exactly that way: perhaps the secret is really the IV for the hash function rather than part of the hashed data, or whatever. However you do it exactly, it prevents someone from verifying - or faking - a signature unless they have the secret. > very hard for the attacker to discover) and suddently you're doing what > AES-GCM *already* does for you, except you're trying to hack it yourself > instead of using the tools available which were written by experts. I am all in favor of using the expert-written tools provided we can figure out how to do it in a way we all agree is correct. > What this means for your proposal above is that the actual data > validation information will be generated in two different ways depending > on if we're using AES-GCM and doing TDE, or if we're doing just the data > validation piece and not encrypting anything. That's maybe not ideal > but I don't think it's a huge issue either and your proposal will still > address the question of if we end up missing anything when it comes to > how the special area is handled throughout the code. Hmm. Is there no expert-written method for this sort of thing without encryption? One thing that I think would be really helpful is to be able to take a TDE-ified cluster and run it through decryption, ending up with a cluster that still has extra special space but which isn't actually encrypted any more. Ideally it can end up in a state where integrity validation still works. This might be something people just Want To Do, and they're willing to sacrifice the space. But it would also be real nice for testing and debugging. Imagine for example that the data on page X is physiologically corrupted i.e. decryption produces something that looks like a page, but there's stuff wrong with it, like the item pointers point to a page offset greater than the page size. Well, what you really want to do with this page is run pg_filedump on it, or hexdump, or od, or pg_hexedit, or whatever your favorite tool is, so that you can figure out what's going on, but that's going to be hard if the pages are all encrypted. I guess nothing in what you are saying really precludes that, but I agree that if we have to switch up the method for creating the integrity verifier thing in this situation, that's not great. > If it'd help, I'd be happy to jump on a call to discuss further. Also > happy to continue on this thread too, of course. I am finding the written discussion to be helpful right now, and it has the advantage of being easy to refer back to later, so my vote would be to keep doing this for now and we can always reassess if it seems to make sense. -- Robert Haas EDB: http://www.enterprisedb.com
В списке pgsql-hackers по дате отправления: