Обсуждение: Fast Search on Encrypted Feild
I have a encrypted column use encrypt function.
Querying against this column is almost not acceptable – returning 12 rows took 25,908 ms.
The query was simply Select decrypt(phn_phone_enc) FROM phn WHERE decrypt(phn_phone_enc,’xxx’,’xxx’) LIKE ‘123%’
So I built index like: CREATE INDEX idx_phn_phone_dec ON phn (decrypt(phn_phone_enc, ‘xxx’, ‘xxx’))
This returns 12 rows in 68 ms.
Would this be the solution for the fast encrypted field search or does this raise the security issue?
Kindest regards,
Naoko Reeves
On Sat, Nov 14, 2009 at 4:27 PM, Naoko Reeves <naoko@lawlogix.com> wrote: > I have a encrypted column use encrypt function. > > Querying against this column is almost not acceptable – returning 12 rows > took 25,908 ms. > > The query was simply Select decrypt(phn_phone_enc) FROM phn WHERE > decrypt(phn_phone_enc,’xxx’,’xxx’) LIKE ‘123%’ > > So I built index like: CREATE INDEX idx_phn_phone_dec ON phn > (decrypt(phn_phone_enc, ‘xxx’, ‘xxx’)) > > This returns 12 rows in 68 ms. > > Would this be the solution for the fast encrypted field search or does this > raise the security issue? You are storing the unencrypted phone number in the index...can't do that. As I see it, any solution that needs to support 'LIKE' (or anything other than equality case) is going to be problematic because it has to expose details of the encrypted data to work. It may be possible to rig something...how high are your security requirements? merlin
Merlin, Thank you for your quick response. I see... our security requirements are: We are encrypting PII information within our DB and because of the sensitive nature of our data, we must balancing both performanceand security to meet our client requirements. Our clients are mainly lawyers and handles clients case (government, healthcare, education). If you could provide me any advice that would be great otherwise I understand that I have to go without wildcard search. Naoko > Would this be the solution for the fast encrypted field search or does this > raise the security issue? You are storing the unencrypted phone number in the index...can't do that. As I see it, any solution that needs to support 'LIKE' (or anything other than equality case) is going to be problematic because it has to expose details of the encrypted data to work. It may be possible to rig something...how high are your security requirements? merlin
Naoko Reeves wrote: > > I have a encrypted column use encrypt function. > > Querying against this column is almost not acceptable – returning 12 > rows took 25,908 ms. > > The query was simply Select decrypt(phn_phone_enc) FROM phn WHERE > decrypt(phn_phone_enc,’xxx’,’xxx’) LIKE ‘123%’ > > So I built index like: CREATE INDEX idx_phn_phone_dec ON phn > (decrypt(phn_phone_enc, ‘xxx’, ‘xxx’)) > > This returns 12 rows in 68 ms. > > Would this be the solution for the fast encrypted field search or does > this raise the security issue? > very ugly for all the reasons given by the others. if all your LIKE searches are specifically for the first 3 chars of the phn_phone field, I dunno, but maybe you could have a seperate field (and index) which consists of just the first three characters encrypted. this would, however, allow people with access to the database but without the cipher key to group your data by this prefixm, and potentially the knowlege of that grouping could be used as an aid in cracking the keys.
"Naoko Reeves" <naoko@lawlogix.com> wrote: > > Merlin, > Thank you for your quick response. I see... our security requirements are: > We are encrypting PII information within our DB and because of the sensitive nature of our data, we must balancing bothperformance and security to meet our client requirements. > Our clients are mainly lawyers and handles clients case (government, healthcare, education). > If you could provide me any advice that would be great otherwise I understand that I have to go without wildcard search. For the most part, you have either performance or security, but not both. As others have pointed out, anything you do to speed up searches basically results in storing an unencrypted version of the data in an index. Storing the data redundantly is often worthwhile. For example, in the US, it's generally considered proper to store the SSN encrypted, because it's a target for identity theft, but the last 4 digits of the SSN aren't considered enough information to steal someone's identity, so they are usually stored unencrypted, and thus provide fast searching. Since this convention is so common in the US, most people search on the last 4 digits anyway, so it works out well. Another option that might be worthwhile is deidentifying the data, which is a practice often done for HIPPA-protected information. I was hoping to point you toward a good reference on how to do this on the web, but Google is failing me. The basic technique is to store the protected data in a different table, then encrypt the key that links those two tables together. Of course, the database can no longer enforce referential integrity at that point, and it's totally on your application to manage that. By doing so, you get speedy, indexed searches, and the decryption overhead is only felt when it's time to decrypt the foreign key and link the data. For a query that's selecting 50 rows out of 50,000, this is a huge win, and in the worst case, it's still no worse than encrypting the data itself. -- Bill Moran http://www.potentialtech.com
On 14 Nov 2009, at 22:27, Naoko Reeves wrote: > I have a encrypted column use encrypt function. > Querying against this column is almost not acceptable – returning 12 rows took 25,908 ms. > The query was simply Select decrypt(phn_phone_enc) FROM phn WHERE decrypt(phn_phone_enc,’xxx’,’xxx’) LIKE ‘123%’ > So I built index like: CREATE INDEX idx_phn_phone_dec ON phn (decrypt(phn_phone_enc, ‘xxx’, ‘xxx’)) > This returns 12 rows in 68 ms. > Would this be the solution for the fast encrypted field search or does this raise the security issue? Is there some way you can invert the process? Normally if you verify encrypted data (typically passwords) you would encrypt the user-specified data and compare that tothe encrypted data in the database instead of decrypting both and comparing the actual data. I doubt you can do that withpartial data though, and since you're showing a LIKE expression here... One thing I notice in your query is that you're decrypting your data twice; you're calling two different functions for thesame purpose. You may notice a speed-up if you use decrypt(text) instead of decrypt(text, text, text) in your where clause,provided that the function is defined stable[1] or immutable[2] as opposed to the default volatile[3]. Or is decrypting only a part of the encrypted string significantly faster? That would imply some linear encryption algorithmin which case you may be able to use my initial suggestion: Encrypting '123' would create something that's comparableto the first 3 encrypted characters of an encrypted phone number. A query would look like: SELECT decrypt(phn_phone_enc)FROM phn WHERE phn_phone_enc LIKE encrypt('123')||'%'. Here as well it is important that encrypt(text)is defined stable or immutable. 1. The same input data combined with data from the DB (as it is visible to the transaction) always yields the same result. 2. The same input data always yields the same result. 3. There is no correlation between the input data and the result. Alban Hertroys -- Screwing up is the best way to attach something to the ceiling. !DSPAM:737,4affebf911071302014309!
On Sat, Nov 14, 2009 at 5:08 PM, John R Pierce <pierce@hogranch.com> wrote: > Naoko Reeves wrote: >> >> I have a encrypted column use encrypt function. >> >> Querying against this column is almost not acceptable – returning 12 rows >> took 25,908 ms. >> >> The query was simply Select decrypt(phn_phone_enc) FROM phn WHERE >> decrypt(phn_phone_enc,’xxx’,’xxx’) LIKE ‘123%’ >> >> So I built index like: CREATE INDEX idx_phn_phone_dec ON phn >> (decrypt(phn_phone_enc, ‘xxx’, ‘xxx’)) >> >> This returns 12 rows in 68 ms. >> >> Would this be the solution for the fast encrypted field search or does >> this raise the security issue? >> > > very ugly for all the reasons given by the others. > > if all your LIKE searches are specifically for the first 3 chars of the > phn_phone field, I dunno, but maybe you could have a seperate field (and > index) which consists of just the first three characters encrypted. > > this would, however, allow people with access to the database but without > the cipher key to group your data by this prefixm, and potentially the > knowlege of that grouping could be used as an aid in cracking the keys. I'm not good enough with encryption to know for sure, but I suspect that doing this would give a _determined_ attacker enough information to break into your data (just for starters, you reduce the pool of numbers that would have to be guessed by a few orders of magnitude, and you give away geographical location). By determined I mean someone with a complete copy of your database, time, cpu power, and a bad attitude :-). I'm not saying not to do it, but the risks should be known. If you are just looking to block casual access to the data, it's probably ok. Even a single digit would reduce the brute force search by 90% if the numbers distribute well. That said, I'm betting you can't really budget 25 seconds of cpu time for a common lookup search. So you either have to compromise security or re-think the way you are doing lookups... merlin merlin
As Alban pointed out encrypting the search value and compare stored encrypted value is very fast though it can't do LIKE search. After I received valuable input from Merlin, Bill and John, I did some research regarding "search against encrypted field" in general and as in everyone's advice, I must acknowledge the cost of encrypted data for searching and considering alternative method (partial encryption, store in different table etc). Thank you very again much for all the advice. > I have a encrypted column use encrypt function. > Querying against this column is almost not acceptable - returning 12 rows took 25,908 ms. > The query was simply Select decrypt(phn_phone_enc) FROM phn WHERE decrypt(phn_phone_enc,'xxx','xxx') LIKE '123%' > So I built index like: CREATE INDEX idx_phn_phone_dec ON phn (decrypt(phn_phone_enc, 'xxx', 'xxx')) > This returns 12 rows in 68 ms. > Would this be the solution for the fast encrypted field search or does this raise the security issue? Is there some way you can invert the process? Normally if you verify encrypted data (typically passwords) you would encrypt the user-specified data and compare that to the encrypted data in the database instead of decrypting both and comparing the actual data. I doubt you can do that with partial data though, and since you're showing a LIKE expression here... One thing I notice in your query is that you're decrypting your data twice; you're calling two different functions for the same purpose. You may notice a speed-up if you use decrypt(text) instead of decrypt(text, text, text) in your where clause, provided that the function is defined stable[1] or immutable[2] as opposed to the default volatile[3]. Or is decrypting only a part of the encrypted string significantly faster? That would imply some linear encryption algorithm in which case you may be able to use my initial suggestion: Encrypting '123' would create something that's comparable to the first 3 encrypted characters of an encrypted phone number. A query would look like: SELECT decrypt(phn_phone_enc) FROM phn WHERE phn_phone_enc LIKE encrypt('123')||'%'. Here as well it is important that encrypt(text) is defined stable or immutable. 1. The same input data combined with data from the DB (as it is visible to the transaction) always yields the same result. 2. The same input data always yields the same result. 3. There is no correlation between the input data and the result. Alban Hertroys -- Screwing up is the best way to attach something to the ceiling. !DSPAM:991,4affebf711071508957761!