hash options

Поиск
Список
Период
Сортировка
От Little, Douglas
Тема hash options
Дата
Msg-id 8585BA53443004458E0BAA6134C5A7FB9CD327F6@EGEXCMB01.oww.root.lcl
обсуждение исходный текст
Ответы Re: hash options  (Chris Angelico <rosuav@gmail.com>)
Re: hash options  (David W Noon <dwnoon@ntlworld.com>)
Список pgsql-general

Hello,

 

I’m working on a data warehouse dimensionalization process   where I need to hash a text string to use as the key. 

I’ve implemented with MD5.  It works fine,  the problem I have is the size of the md5 (32 bytes) is often longer than the original string – thus not accomplishing what I want – space savings.

 

Does anybody have alternative hash function recommendations?  

 I looked at the options I knew of

select length(encode('ar=514','hex')); -- 12

select length(decode('ar=514','base64')); -- 24

select length(DIGEST('ar=514', 'md5')) -- 16bytes

select length(DIGEST('ar=514', 'sha1')) -- 20bytes

 

function is currently written in pg/plsql,  but I’m considering switching to python for broader library choice.

 

 

 

Source data is delimited list of name/value pairs.  Length range from 0-2500 bytes.

ar=514,cc=CA,ci=Montreal,cn=North+America,co=Sympatico,cs=Canada,nt=Xdsl,rc=QC,rs=Quebec,tp=High,tz=GMT%2D5

 

Thanks in advance

Doug Little

 

Sr. Data Warehouse Architect | Business Intelligence Architecture | Orbitz Worldwide

Douglas.Little@orbitz.com

 Description: cid:image001.jpg@01CABEC8.D4980670  orbitz.com | ebookers.com | hotelclub.com | cheaptickets.com | ratestogo.com | asiahotels.com

 

Вложения

В списке pgsql-general по дате отправления:

Предыдущее
От: John R Pierce
Дата:
Сообщение: Re: Database takes up MUCH more disk space than it should
Следующее
От: Chris Angelico
Дата:
Сообщение: Re: hash options