Обсуждение: Fast Search on Encrypted Feild

Поиск
Список
Период
Сортировка

Fast Search on Encrypted Feild

От
"Naoko Reeves"
Дата:

I have a encrypted column use encrypt function.

Querying against this column is almost not acceptable – returning 12 rows took 25,908 ms.

The query was simply Select decrypt(phn_phone_enc) FROM phn WHERE decrypt(phn_phone_enc,’xxx’,’xxx’) LIKE ‘123%’

So I built index like: CREATE INDEX idx_phn_phone_dec ON phn (decrypt(phn_phone_enc, ‘xxx’, ‘xxx’))

This returns 12 rows in 68 ms.

Would this be the solution for the fast encrypted field search or does this raise the security issue?

 

Kindest regards,

 

Naoko Reeves

Re: Fast Search on Encrypted Feild

От
Merlin Moncure
Дата:
On Sat, Nov 14, 2009 at 4:27 PM, Naoko Reeves <naoko@lawlogix.com> wrote:
> I have a encrypted column use encrypt function.
>
> Querying against this column is almost not acceptable – returning 12 rows
> took 25,908 ms.
>
> The query was simply Select decrypt(phn_phone_enc) FROM phn WHERE
> decrypt(phn_phone_enc,’xxx’,’xxx’) LIKE ‘123%’
>
> So I built index like: CREATE INDEX idx_phn_phone_dec ON phn
> (decrypt(phn_phone_enc, ‘xxx’, ‘xxx’))
>
> This returns 12 rows in 68 ms.
>
> Would this be the solution for the fast encrypted field search or does this
> raise the security issue?

You are storing the unencrypted phone number in the index...can't do
that.   As I see it, any solution that needs to support 'LIKE' (or
anything other than equality case) is going to be problematic because
it has to expose details of the encrypted data to work.  It may be
possible to rig something...how high are your security requirements?

merlin

Re: Fast Search on Encrypted Feild

От
"Naoko Reeves"
Дата:
Merlin,
Thank you for your quick response. I see... our security requirements are:
We are encrypting PII information within our DB and because of the sensitive nature of our data, we must balancing both
performanceand security to meet our client requirements. 
Our clients are mainly lawyers and handles clients case (government, healthcare, education).
If you could provide me any advice that would be great otherwise I understand that I have to go without wildcard
search.

Naoko

> Would this be the solution for the fast encrypted field search or does this
> raise the security issue?

You are storing the unencrypted phone number in the index...can't do
that.   As I see it, any solution that needs to support 'LIKE' (or
anything other than equality case) is going to be problematic because
it has to expose details of the encrypted data to work.  It may be
possible to rig something...how high are your security requirements?

merlin

Re: Fast Search on Encrypted Feild

От
John R Pierce
Дата:
Naoko Reeves wrote:
>
> I have a encrypted column use encrypt function.
>
> Querying against this column is almost not acceptable – returning 12
> rows took 25,908 ms.
>
> The query was simply Select decrypt(phn_phone_enc) FROM phn WHERE
> decrypt(phn_phone_enc,’xxx’,’xxx’) LIKE ‘123%’
>
> So I built index like: CREATE INDEX idx_phn_phone_dec ON phn
> (decrypt(phn_phone_enc, ‘xxx’, ‘xxx’))
>
> This returns 12 rows in 68 ms.
>
> Would this be the solution for the fast encrypted field search or does
> this raise the security issue?
>

very ugly for all the reasons given by the others.

if all your LIKE searches are specifically for the first 3 chars of the
phn_phone field, I dunno, but maybe you could have a seperate field (and
index) which consists of just the first three characters encrypted.

this would, however, allow people with access to the database but
without the cipher key to group your data by this prefixm, and
potentially the knowlege of that grouping could be used as an aid in
cracking the keys.




Re: Fast Search on Encrypted Feild

От
Bill Moran
Дата:
"Naoko Reeves" <naoko@lawlogix.com> wrote:
>
> Merlin,
> Thank you for your quick response. I see... our security requirements are:
> We are encrypting PII information within our DB and because of the sensitive nature of our data, we must balancing
bothperformance and security to meet our client requirements. 
> Our clients are mainly lawyers and handles clients case (government, healthcare, education).
> If you could provide me any advice that would be great otherwise I understand that I have to go without wildcard
search.

For the most part, you have either performance or security, but not both.
As others have pointed out, anything you do to speed up searches basically
results in storing an unencrypted version of the data in an index.

Storing the data redundantly is often worthwhile.  For example, in the US,
it's generally considered proper to store the SSN encrypted, because it's
a target for identity theft, but the last 4 digits of the SSN aren't
considered enough information to steal someone's identity, so they are
usually stored unencrypted, and thus provide fast searching.  Since this
convention is so common in the US, most people search on the last 4 digits
anyway, so it works out well.

Another option that might be worthwhile is deidentifying the data, which is
a practice often done for HIPPA-protected information.  I was hoping to
point you toward a good reference on how to do this on the web, but Google
is failing me.

The basic technique is to store the protected data in a different table, then
encrypt the key that links those two tables together.  Of course, the database
can no longer enforce referential integrity at that point, and it's totally
on your application to manage that.  By doing so, you get speedy, indexed
searches, and the decryption overhead is only felt when it's time to
decrypt the foreign key and link the data.  For a query that's selecting
50 rows out of 50,000, this is a huge win, and in the worst case, it's still
no worse than encrypting the data itself.

--
Bill Moran
http://www.potentialtech.com

Re: Fast Search on Encrypted Feild

От
Alban Hertroys
Дата:
On 14 Nov 2009, at 22:27, Naoko Reeves wrote:

> I have a encrypted column use encrypt function.
> Querying against this column is almost not acceptable – returning 12 rows took 25,908 ms.
> The query was simply Select decrypt(phn_phone_enc) FROM phn WHERE decrypt(phn_phone_enc,’xxx’,’xxx’) LIKE ‘123%’
> So I built index like: CREATE INDEX idx_phn_phone_dec ON phn (decrypt(phn_phone_enc, ‘xxx’, ‘xxx’))
> This returns 12 rows in 68 ms.
> Would this be the solution for the fast encrypted field search or does this raise the security issue?

Is there some way you can invert the process?
Normally if you verify encrypted data (typically passwords) you would encrypt the user-specified data and compare that
tothe encrypted data in the database instead of decrypting both and comparing the actual data. I doubt you can do that
withpartial data though, and since you're showing a LIKE expression here... 

One thing I notice in your query is that you're decrypting your data twice; you're calling two different functions for
thesame purpose. You may notice a speed-up if you use decrypt(text) instead of decrypt(text, text, text) in your where
clause,provided that the function is defined stable[1] or immutable[2] as opposed to the default volatile[3]. 

Or is decrypting only a part of the encrypted string significantly faster? That would imply some linear encryption
algorithmin which case you may be able to use my initial suggestion: Encrypting '123' would create something that's
comparableto the first 3 encrypted characters of an encrypted phone number. A query would look like: SELECT
decrypt(phn_phone_enc)FROM phn WHERE phn_phone_enc LIKE encrypt('123')||'%'. Here as well it is important that
encrypt(text)is defined stable or immutable. 

1. The same input data combined with data from the DB (as it is visible to the transaction) always yields the same
result.
2. The same input data always yields the same result.
3. There is no correlation between the input data and the result.

Alban Hertroys

--
Screwing up is the best way to attach something to the ceiling.


!DSPAM:737,4affebf911071302014309!



Re: Fast Search on Encrypted Feild

От
Merlin Moncure
Дата:
On Sat, Nov 14, 2009 at 5:08 PM, John R Pierce <pierce@hogranch.com> wrote:
> Naoko Reeves wrote:
>>
>> I have a encrypted column use encrypt function.
>>
>> Querying against this column is almost not acceptable – returning 12 rows
>> took 25,908 ms.
>>
>> The query was simply Select decrypt(phn_phone_enc) FROM phn WHERE
>> decrypt(phn_phone_enc,’xxx’,’xxx’) LIKE ‘123%’
>>
>> So I built index like: CREATE INDEX idx_phn_phone_dec ON phn
>> (decrypt(phn_phone_enc, ‘xxx’, ‘xxx’))
>>
>> This returns 12 rows in 68 ms.
>>
>> Would this be the solution for the fast encrypted field search or does
>> this raise the security issue?
>>
>
> very ugly for all the reasons given by the others.
>
> if all your LIKE searches are specifically for the first 3 chars of the
> phn_phone field, I dunno, but maybe you could have a seperate field (and
> index) which consists of just the first three characters encrypted.
>
> this would, however, allow people with access to the database but without
> the cipher key to group your data by this prefixm, and potentially the
> knowlege of that grouping could be used as an aid in cracking the keys.

I'm not good enough with encryption to know for sure, but I suspect
that doing this would give a _determined_ attacker enough information
to break into your data (just for starters, you reduce the pool of
numbers that would have to be guessed by a few orders of magnitude,
and you give away geographical location).  By determined I mean
someone with a complete copy of your database, time, cpu power, and a
bad attitude :-).  I'm not saying not to do it, but the risks should
be known.  If you are just looking to block casual access to the data,
it's probably ok.  Even a single digit would reduce the brute force
search by 90% if the numbers distribute well.

That said, I'm betting you can't really budget 25 seconds of cpu time
for a common lookup search.  So you either have to compromise security
or re-think the way you are doing lookups...

merlin

merlin

Re: Fast Search on Encrypted Feild

От
"Naoko Reeves"
Дата:
As Alban pointed out encrypting the search value and compare stored
encrypted value is very fast though it can't do LIKE search.
After I received valuable input from Merlin, Bill and John, I did some
research regarding "search against encrypted field" in general and as in
everyone's advice, I must acknowledge the cost of encrypted data for
searching and considering alternative method (partial encryption, store
in different table etc).
Thank you very again much for all the advice.

> I have a encrypted column use encrypt function.
> Querying against this column is almost not acceptable - returning 12
rows took 25,908 ms.
> The query was simply Select decrypt(phn_phone_enc) FROM phn WHERE
decrypt(phn_phone_enc,'xxx','xxx') LIKE '123%'
> So I built index like: CREATE INDEX idx_phn_phone_dec ON phn
(decrypt(phn_phone_enc, 'xxx', 'xxx'))
> This returns 12 rows in 68 ms.
> Would this be the solution for the fast encrypted field search or does
this raise the security issue?

Is there some way you can invert the process?
Normally if you verify encrypted data (typically passwords) you would
encrypt the user-specified data and compare that to the encrypted data
in the database instead of decrypting both and comparing the actual
data. I doubt you can do that with partial data though, and since you're
showing a LIKE expression here...

One thing I notice in your query is that you're decrypting your data
twice; you're calling two different functions for the same purpose. You
may notice a speed-up if you use decrypt(text) instead of decrypt(text,
text, text) in your where clause, provided that the function is defined
stable[1] or immutable[2] as opposed to the default volatile[3].

Or is decrypting only a part of the encrypted string significantly
faster? That would imply some linear encryption algorithm in which case
you may be able to use my initial suggestion: Encrypting '123' would
create something that's comparable to the first 3 encrypted characters
of an encrypted phone number. A query would look like: SELECT
decrypt(phn_phone_enc) FROM phn WHERE phn_phone_enc LIKE
encrypt('123')||'%'. Here as well it is important that encrypt(text) is
defined stable or immutable.

1. The same input data combined with data from the DB (as it is visible
to the transaction) always yields the same result.
2. The same input data always yields the same result.
3. There is no correlation between the input data and the result.

Alban Hertroys

--
Screwing up is the best way to attach something to the ceiling.


!DSPAM:991,4affebf711071508957761!