Обсуждение: Need help requiring uniqueness in text columns

Поиск
Список
Период
Сортировка

Need help requiring uniqueness in text columns

От
Matthew Wilson
Дата:
I have a table MESSAGE with a text column and a timestamp column.  I
want to make sure that I have no rows with duplicates for both values.
I have two ideas right now for how to do this.

IDEA 1:

CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, my_text_col);

IDEA 2:

CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));

I am speculating that an index on the md5 is cheaper than on a text
column.  I'm willing to risk the chance of a hash collision.

I don't want to use this index to allow searching inside the text
column.  I just want to protect against duplication.

Are either of these approaches any good?  What are other ways to
guarantee uniqueness for the pair of my timestamp column and my text
column?

TIA

Matt


--
Programming, economics, gardening, life in Cleveland.
http://blog.tplus1.com

Re: Need help requiring uniqueness in text columns

От
Stuart Bishop
Дата:
Matthew Wilson wrote:
> I have a table MESSAGE with a text column and a timestamp column.  I
> want to make sure that I have no rows with duplicates for both values.
> I have two ideas right now for how to do this.
>
> IDEA 1:
>
> CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, my_text_col);
>
> IDEA 2:
>
> CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));
>
> I am speculating that an index on the md5 is cheaper than on a text
> column.  I'm willing to risk the chance of a hash collision.
>
> I don't want to use this index to allow searching inside the text
> column.  I just want to protect against duplication.
>
> Are either of these approaches any good?  What are other ways to
> guarantee uniqueness for the pair of my timestamp column and my text
> column?

Idea 1 will only work if your text columns never grow too long - I can't
recall what the limit is off the top of my head but PG will issue warnings
when you reach it.

Idea 2 works. You can even use it for exact match searching if you like if
you do WHERE my_timestamp_col='...' AND md5(my_text_col)=md5('...')

Depending on your app, you might want to round your timestamps to avoid
getting two identical messages in the system only a millisecond apart.

--
Stuart Bishop <stuart@stuartbishop.net>
http://www.stuartbishop.net/


Вложения

Re: Need help requiring uniqueness in text columns

От
"Pavel Stehule"
Дата:
Hello

IDEA 3:

Use two hash functions:

CREATE UNIQUE INDEX uidx ON TEST((decode(md5(a),'hex')),(hashtext(a)));

removing spaces helps
CREATE UNIQUE INDEX uidx ON test((decode(md5(lower(replace(a,' ',''))),'hex')));

Regards
Pavel Stehule

CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));
>

On 02/01/2008, Matthew Wilson <matt@tplus1.com> wrote:
> I have a table MESSAGE with a text column and a timestamp column.  I
> want to make sure that I have no rows with duplicates for both values.
> I have two ideas right now for how to do this.
>
> IDEA 1:
>
> CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, my_text_col);
>
> IDEA 2:
>
> CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));
>
> I am speculating that an index on the md5 is cheaper than on a text
> column.  I'm willing to risk the chance of a hash collision.
>
> I don't want to use this index to allow searching inside the text
> column.  I just want to protect against duplication.
>
> Are either of these approaches any good?  What are other ways to
> guarantee uniqueness for the pair of my timestamp column and my text
> column?
>
> TIA
>
> Matt
>
>
> --
> Programming, economics, gardening, life in Cleveland.
> http://blog.tplus1.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>        choose an index scan if your joining column's datatypes do not
>        match
>

Re: Need help requiring uniqueness in text columns

От
Matthew Wilson
Дата:
On Wed 02 Jan 2008 04:23:46 AM EST, Pavel Stehule wrote:
> Hello
>
> IDEA 3:
>
> Use two hash functions:
>
> CREATE UNIQUE INDEX uidx ON TEST((decode(md5(a),'hex')),(hashtext(a)));
>
> removing spaces helps
> CREATE UNIQUE INDEX uidx ON test((decode(md5(lower(replace(a,' ',''))),'hex')));
>
> Regards
> Pavel Stehule
>
> CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));

What is the advantage of this method?

--
Programming, economics, gardening, life in Cleveland.
http://blog.tplus1.com

Re: Need help requiring uniqueness in text columns

От
"Pavel Stehule"
Дата:
On 02/01/2008, Matthew Wilson <matt@tplus1.com> wrote:
> On Wed 02 Jan 2008 04:23:46 AM EST, Pavel Stehule wrote:
> > Hello
> >
> > IDEA 3:
> >
> > Use two hash functions:
> >
> > CREATE UNIQUE INDEX uidx ON TEST((decode(md5(a),'hex')),(hashtext(a)));
> >
> > removing spaces helps
> > CREATE UNIQUE INDEX uidx ON test((decode(md5(lower(replace(a,' ',''))),'hex')));
> >
> > Regards
> > Pavel Stehule
> >
> > CREATE UNIQUE INDEX nodups on MESSAGE (my_timestamp_col, md5(my_text_col));
>
> What is the advantage of this method?
>

much less unique index.

Regards
Pavel

> --
> Programming, economics, gardening, life in Cleveland.
> http://blog.tplus1.com
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org/
>