Re: Determining logically unique entities across many partially complete rows where at least one column matches

Поиск
Список
Период
Сортировка
От Rob Sargent
Тема Re: Determining logically unique entities across many partially complete rows where at least one column matches
Дата
Msg-id 4A819103.8040102@gmail.com
обсуждение исходный текст
Ответ на Determining logically unique entities across many partially complete rows where at least one column matches  (Jamie Tufnell <diesql@googlemail.com>)
Список pgsql-sql
Seems to me that if you can safely identify which snippets correspond to 
a given entity you want a single id for the entity.  An entity-snippet 
relationship seems a must. I would not lean too heavily on a single 
table solution unless you're considering arrays for openid,email and 
phone.  (And given the one-to-many-real-people on phone I would be leery 
of this "identifier".)

Jamie Tufnell wrote:
> Hi,
>
> I am faced with a modeling problem and thought I'd see if anyone has run
> into something similar and can offer some advice.
>
> Basically my problem domain is cataloguing "snippets of information" about
> "entities" which are loosely identified.
>
> Entities can be identified up to 3 different methods (email, phone or
> openid.)
>
> Entities can have zero or many emails, phone numbers and openids.  The
> only restriction is they must have at least one value in one of those three
> columns.
>
>
> Some sample data:
>
> snippet #1
> email: null
> phone: +1234567890
> openid: john@myopenid.net
> information: This is snippet #1
>
> snippet #2
> email: foo@bar.com
> phone: null
> openid: johnny.name
> information: This is snippet #2
>
> At this point snippet #1 and #2 could refer to different entities.
>
> snippet #3
> email: bar@baz.com
> phone: +1234567890
> openid: johnny.name
> information: This is snippet #3
>
> But now all three snippets definitely refer to the same entity, as far as
> we're concerned:
>
> Entity: 1
> OpenIDs: johnny.name, john@myopenid.net
> Phones: +1234567890
> Emails: foo@bar.com, bar@baz.com
>
> So as far as modeling this goes, I'm stuck between:
>
> 1. Normalizing as usual with some serious triggers to maintain the
>    relationships.
> 2. Just having a snippets table with these fields inline and make these
>    inferences at query time.
> 3. Something in between.
> 4. Using a document store like CouchDB.
>
> The kinds of queries I need to do right now (which will no doubt change):
>
> * Return all snippets.
> * Return all distinct entities.
> * Find all id for a distinct entity given a single piece of id.
> * Find all snippets for a distinct entity.
>
> To do it in one table, I am thinking something like this:
>
> create table snippets (
>   id serial not null primary key,
>   email text,
>   phone_number text,
>   openid text,
>   information text not null,
>   check (email is not null or
>          phone_number is not null or openid is not null)
> );
>
> with queries like:
>
> * Find all snippets for one distinct entity, searching by openid:
>
> select * from snippets
> where phone_number =
>   (select phone_number from snippets where openid = 'john@myopenid.net')
> or email =
>   (select email from snippets where openid = 'john@myopenid.net')
> or openid in
>   (select openid from snippets
>    where phone_number =
>     (select phone_number from snippets where openid = 'john@myopenid.net')
>    or email =
>     (select email from snippets where openid = 'john@myopenid.net'));
>
>
> Or if I was to model as usual I am thinking something like this:
>
> create table entities (
>   id serial not null primary key
> );
>
> create table entity_has_email (
>   entity_id integer not null
>     references entities (id) on delete cascade on update cascade,
>   email text not null unique
> );
>
> create table entity_has_phone_number (
>   entity_id integer not null
>     references entities (id) on delete cascade on update cascade,
>   phone_number text not null unique
> );
>
> create table entity_has_openid (
>   entity_id integer not null
>     references entities (id) on delete cascade on update cascade,
>   openid text not null unique
> );
>
> create table snippets (
>   id serial not null primary key,
>   entity_id integer not null
>     references entities (id) on delete cascade on update cascade,
>   information text not null
> );
>
> (followed by a mass of on insert/update/delete triggers)
>
> with queries like:
>
> * Find all snippets for a distinct entity, by one identifying field:
>
> select s.* from snippets s
>   join entity_has_email e on s.entity_id = e.id
>   join entity_has_phone_number p on s.entity_id = p.id
>   join entity_has_openid o on s.entity_id = o.id
> where o.openid = 'john@myopenid.net';
>
> Another option, sort of half way between the two could be:
>
> create table snippets (
>   id serial not null primary key,
>   entity_id integer not null
>     references entities (id) on delete cascade on update cascade,
>   information text not null
> );
>
> create table entities (
>   id serial not null primary key,
>   email text,
>   phone_number text,
>   openid text,
>   check (email is not null or
>          phone_number is not null or openid is not null)
> );
>
> * Find all snippets for a distinct entity, by openid = 'john@myopenid.net'
>
> select * from snippets
> where entity_id in (
>   select id from entities
>   where phone_number =
>     (select phone_number from entities where openid = 'john@myopenid.net')
>   or email =
>     (select email from entities where openid = 'john@myopenid.net')
>   or openid in
>     (select openid from entities
>      where phone_number =
>        (select phone_number from entities where openid = 'john@myopenid.net'
>      or email =
>        (select email from entities where openid = 'john@myopenid.net') ));
>
> At this point I am leaning towards that last method.
>
> Has anyone had to model something similar?  Did you use one of these
> methods or something else?
>
> Any/all comments appreciated!
>
> J
>
>   


В списке pgsql-sql по дате отправления:

Предыдущее
От: Jasen Betts
Дата:
Сообщение: Re: Determining logically unique entities across many partially complete rows where at least one column matches
Следующее
От: "Peter Headland"
Дата:
Сообщение: Re: Updating a specific number of rows in pl/pgsql