Обсуждение: Need help requiring uniqueness in text columns
I have a table MESSAGE with a text column and a timestamp column. I want to make sure that I have no rows with duplicates for both values. I have two ideas right now for how to do this. IDEA 1: CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, my_text_col); IDEA 2: CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col)); I am speculating that an index on the md5 is cheaper than on a text column. I'm willing to risk the chance of a hash collision. I don't want to use this index to allow searching inside the text column. I just want to protect against duplication. Are either of these approaches any good? What are other ways to guarantee uniqueness for the pair of my timestamp column and my text column? TIA Matt -- Programming, economics, gardening, life in Cleveland. http://blog.tplus1.com
Matthew Wilson wrote: > I have a table MESSAGE with a text column and a timestamp column. I > want to make sure that I have no rows with duplicates for both values. > I have two ideas right now for how to do this. > > IDEA 1: > > CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, my_text_col); > > IDEA 2: > > CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col)); > > I am speculating that an index on the md5 is cheaper than on a text > column. I'm willing to risk the chance of a hash collision. > > I don't want to use this index to allow searching inside the text > column. I just want to protect against duplication. > > Are either of these approaches any good? What are other ways to > guarantee uniqueness for the pair of my timestamp column and my text > column? Idea 1 will only work if your text columns never grow too long - I can't recall what the limit is off the top of my head but PG will issue warnings when you reach it. Idea 2 works. You can even use it for exact match searching if you like if you do WHERE my_timestamp_col='...' AND md5(my_text_col)=md5('...') Depending on your app, you might want to round your timestamps to avoid getting two identical messages in the system only a millisecond apart. -- Stuart Bishop <stuart@stuartbishop.net> http://www.stuartbishop.net/
Вложения
Hello IDEA 3: Use two hash functions: CREATE UNIQUE INDEX uidx ON TEST((decode(md5(a),'hex')),(hashtext(a))); removing spaces helps CREATE UNIQUE INDEX uidx ON test((decode(md5(lower(replace(a,' ',''))),'hex'))); Regards Pavel Stehule CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col)); > On 02/01/2008, Matthew Wilson <matt@tplus1.com> wrote: > I have a table MESSAGE with a text column and a timestamp column. I > want to make sure that I have no rows with duplicates for both values. > I have two ideas right now for how to do this. > > IDEA 1: > > CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, my_text_col); > > IDEA 2: > > CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col)); > > I am speculating that an index on the md5 is cheaper than on a text > column. I'm willing to risk the chance of a hash collision. > > I don't want to use this index to allow searching inside the text > column. I just want to protect against duplication. > > Are either of these approaches any good? What are other ways to > guarantee uniqueness for the pair of my timestamp column and my text > column? > > TIA > > Matt > > > -- > Programming, economics, gardening, life in Cleveland. > http://blog.tplus1.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match >
On Wed 02 Jan 2008 04:23:46 AM EST, Pavel Stehule wrote: > Hello > > IDEA 3: > > Use two hash functions: > > CREATE UNIQUE INDEX uidx ON TEST((decode(md5(a),'hex')),(hashtext(a))); > > removing spaces helps > CREATE UNIQUE INDEX uidx ON test((decode(md5(lower(replace(a,' ',''))),'hex'))); > > Regards > Pavel Stehule > > CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col)); What is the advantage of this method? -- Programming, economics, gardening, life in Cleveland. http://blog.tplus1.com
On 02/01/2008, Matthew Wilson <matt@tplus1.com> wrote: > On Wed 02 Jan 2008 04:23:46 AM EST, Pavel Stehule wrote: > > Hello > > > > IDEA 3: > > > > Use two hash functions: > > > > CREATE UNIQUE INDEX uidx ON TEST((decode(md5(a),'hex')),(hashtext(a))); > > > > removing spaces helps > > CREATE UNIQUE INDEX uidx ON test((decode(md5(lower(replace(a,' ',''))),'hex'))); > > > > Regards > > Pavel Stehule > > > > CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col)); > > What is the advantage of this method? > much less unique index. Regards Pavel > -- > Programming, economics, gardening, life in Cleveland. > http://blog.tplus1.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org/ >