Обсуждение: Thoughts on a surrogate key lookup function?

Поиск
Список
Период
Сортировка

Thoughts on a surrogate key lookup function?

От
Nick
Дата:
Are there any existing trigger functions (preferably C) that could
retrieve a missing value for a compound foreign key on insert or
update? If this overall sounds like a really bad idea, please let me
know as well. This functionality could really speed my project up
though.

For example,

CREATE TABLE email_addresses (id BIGSERIAL, email_address VARCHAR);
ALTER TABLE email_addresses ADD CONSTRAINT _email_addresses_pkey
PRIMARY KEY (id,email_address);
CREATE TABLE users (id BIGSERIAL, email_address_id BIGINT,
email_address VARCHAR);
ALTER TABLE users ADD CONSTRAINT users_fkey_email_address_id FOREIGN
KEY (email_address_id,email_address) REFERENCES
email_addresses(id,email_address) ON UPDATE CASCADE ON DELETE SET
NULL;
CREATE TRIGGER "1-auto_email_address_id" BEFORE INSERT OR UPDATE ON
users FOR EACH ROW EXECUTE PROCEDURE _auto_id('email_address_id');

If I would like to insert a new user AND new email_address I would
assign the email_address_id of NULL or -1.

INSERT INTO users (id, email_address_id, email_address) VALUES
(1,-1,'foo@bar.com') which would do...

SELECT id FROM email_addresses WHERE email_address = 'foo@bar.com'
INTO NEW.email_address_id

If it cannot find a value, it then does...

INSERT INTO email_addresses (email_address) VALUES ('foo@bar.com')
RETURNING id INTO NEW.email_address_id

If I would like to insert a new user and existing email address, I
would assign the email_address_id of 0.

INSERT INTO users (id, email_address_id, email_address) VALUES
(2,0,'foo@bar.com') which would...

SELECT id FROM email_addresses WHERE email_address = 'foo@bar.com'
INTO NEW.email_address_id

If it cannot find a value, it will raise an exception.

If I insert or update users and email_address_id is > 0 then it gets
the natual value by id...

INSERT INTO users (id, email_address_id, email_address) VALUES
(3,2,NULL) which will

SELECT email_address FROM email_addresses WHERE id = 2 INTO
NEW.email_address

And if both email_address_id and email_address are NULL then, both
values just get inserted into users as null.

Declaring the surrogate as -1 (select or insert) or 0 (select) would
save time having to lookup or create the value before inserting into
users. Ive been using a plperl function for this and really like the
results but im wondering if theres a faster way.

The foreign key constraint already scans the email_addresses table for
values so im wondering if theres a way to bootstrap that process or
maybe thats too risky? Any thoughts would be greatly appreciated.

Re: Thoughts on a surrogate key lookup function?

От
Merlin Moncure
Дата:
On Fri, Nov 5, 2010 at 5:27 PM, Nick <nboutelier@gmail.com> wrote:
> Are there any existing trigger functions (preferably C) that could
> retrieve a missing value for a compound foreign key on insert or
> update? If this overall sounds like a really bad idea, please let me
> know as well. This functionality could really speed my project up
> though.

I think your issues are really SQL issues.  See my comments below:

> For example,
>
> CREATE TABLE email_addresses (id BIGSERIAL, email_address VARCHAR);
> ALTER TABLE email_addresses ADD CONSTRAINT _email_addresses_pkey
> PRIMARY KEY (id,email_address);

email addresses should be unique, so this is pretty silly.  You are
not getting fast lookups on email which is what you need.  You have
two choices here: keep the surrogate on email_addresses, in which case
I'd do the tables like this:
CREATE TABLE email_addresses
(
  email_address_id BIGSERIAL primary key,
  email_address VARCHAR unique
);

create table users
(
  user_id BIGSERIAL primary key,
  email_address_id BIGINT references email_addresses on delete cascade/set null,
)

your insert will look like this (pseudo code):
select email_address_id from email_addresses where email_address =
'foo@foo.com';

if not found then
  insert into email_addresses(email_address) returning email_address_id;
else
  insert into users(email_address_id) values (resolved_id)
end if;

OR, you can go the natural route (which tend to prefer):
CREATE TABLE email_addresses
(
  email_address VARCHAR primary key
);

create table users
(
  user_id BIGSERIAL primary key,
  email_address VARCHAR references email_addresses on update cascade
on delete cascade/set null,
)

your insert will look like this (pseudo code):
insert into email_addresses(email_address)
  select 'foo@foo.com' where not exists
   (select 1 from email_addresses where email_address = 'foo@foo.com')

insert into users (email_address) values ('foo@foo.com');

Obviously this is a rough sketch, you may need to consider locking,
contention, etc.  But a trigger is overkill for this problem.

merlin

Re: Thoughts on a surrogate key lookup function?

От
Nick
Дата:
Merlin, thanks for the reply. Yes, using email_addresses was a very
silly example. Maybe the following is a better example...

CREATE TABLE first_names (id INT, first_name VARCHAR);
ALTER TABLE first_names ADD CONSTRAINT first_names_pkey PRIMARY KEY
(id,first_name);
ALTER TABLE first_names ADD CONSTRAINT first_names_unique_first_name
UNIQUE (first_name);

CREATE TABLE last_names (id INT, first_name VARCHAR);
ALTER TABLE last_names ADD CONSTRAINT last_names_pkey PRIMARY KEY
(id,last_name);
ALTER TABLE last_names ADD CONSTRAINT last_names_unique_last_name
UNIQUE (last_name);

CREATE TABLE referrals (id INT, ref_code VARCHAR);
ALTER TABLE referrals ADD CONSTRAINT referrals_pkey PRIMARY KEY
(id,ref_code);
ALTER TABLE referrals ADD CONSTRAINT referrals_unique_ref_code UNIQUE
(ref_code);

CREATE TABLE users (id INT, first_name_id INT, first_name VARCHAR,
last_name_id INT, last_name VARCHAR, ref_code_id INT, ref_code DATE);
ALTER TABLE users ADD CONSTRAINT users_fkey_first_names FOREIGN KEY
(first_name_id,first_name) REFERENCES first_names(id,first_name) ON
UPDATE CASCADE ON DELETE SET NULL;
ALTER TABLE users ADD CONSTRAINT users_fkey_last_names FOREIGN KEY
(last_name_id,last_name) REFERENCES last_names(id,last_name) ON UPDATE
CASCADE ON DELETE SET NULL;
ALTER TABLE users ADD CONSTRAINT users_fkey_referrals FOREIGN KEY
(ref_code_id,ref_code) REFERENCES referrals(id,ref_code) ON UPDATE
CASCADE ON DELETE SET NULL;
CREATE TRIGGER "auto_first_name_id" BEFORE INSERT OR UPDATE ON users
FOR EACH ROW EXECUTE PROCEDURE _auto_id('first_name_id');
CREATE TRIGGER "auto_last_name_id" BEFORE INSERT OR UPDATE ON users
FOR EACH ROW EXECUTE PROCEDURE _auto_id('last_name_id');
CREATE TRIGGER "auto_ref_code_id" BEFORE INSERT OR UPDATE ON users FOR
EACH ROW EXECUTE PROCEDURE _auto_id('ref_code_id');

If I would like to insert a new user, first name, last name I would
give the surrogates a value of NULL or -1. Their referral code must
exist so ill give that surrogate a value of 0.
INSERT INTO users (id, first_name_id, first_name, last_name_id,
last_name, ref_code_id, ref_code) VALUES (1,-1,'John',-1,'Doe',
0,'xyz') which would...

SELECT id FROM first_names WHERE first_name = 'John' INTO
NEW.first_name_id
IF NOT FOUND INSERT INTO first_names (first_name) VALUES ('John')
RETURNING id INTO NEW.first_name_id

SELECT id FROM last_names WHERE last_name = 'Doe' INTO
NEW.last_name_id
IF NOT FOUND INSERT INTO last_names (last_name) VALUES ('Doe')
RETURNING id INTO NEW.last_name_id

SELECT id FROM referral_codes WHERE ref_code = 'xyz' INTO
NEW.ref_code_id
IF NOT FOUND raise exception

If I want to insert the new user John Smith, and I already know the
surrogate value for John and I dont want to add a ref_code then I can
do...
INSERT INTO users (id, first_name_id, first_name, last_name_id,
last_name, ref_code_id, ref_code) VALUES
(2,1,NULL,-1,'Smith',NULL,NULL) which would...

SELECT first_name FROM first_names WHERE id = 1 INTO NEW.first_name
IF NOT FOUND raise exception

SELECT id FROM last_names WHERE last_name = 'Smith' INTO
NEW.last_name_id
IF NOT FOUND INSERT INTO last_names (last_name) VALUES ('Smith')
RETURNING id INTO NEW.last_name_id

So by adding both the surrogate and natural keys to users table and
toggling the surrogate on insert by 0 (must exist) or -1 (select or
insert) I can bypass a much more complex insert statement. Is this
frowned upon? I havent had many issues (but some ive been able to work
around) with this as a plperl trigger and am pleased with how much
easier it makes my inserts (besides the execution speed).

-Nick


On Nov 6, 6:28 am, mmonc...@gmail.com (Merlin Moncure) wrote:
> On Fri, Nov 5, 2010 at 5:27 PM, Nick <nboutel...@gmail.com> wrote:
> > Are there any existing trigger functions (preferably C) that could
> > retrieve a missing value for a compound foreign key on insert or
> > update? If this overall sounds like a really bad idea, please let me
> > know as well. This functionality could really speed my project up
> > though.
>
> I think your issues are really SQL issues.  See my comments below:
>
> > For example,
>
> > CREATE TABLE email_addresses (id BIGSERIAL, email_address VARCHAR);
> > ALTER TABLE email_addresses ADD CONSTRAINT _email_addresses_pkey
> > PRIMARY KEY (id,email_address);
>
> email addresses should be unique, so this is pretty silly.  You are
> not getting fast lookups on email which is what you need.  You have
> two choices here: keep the surrogate on email_addresses, in which case
> I'd do the tables like this:
> CREATE TABLE email_addresses
> (
>   email_address_id BIGSERIAL primary key,
>   email_address VARCHAR unique
> );
>
> create table users
> (
>   user_id BIGSERIAL primary key,
>   email_address_id BIGINT references email_addresses on delete cascade/set null,
> )
>
> your insert will look like this (pseudo code):
> select email_address_id from email_addresses where email_address =
> '...@foo.com';
>
> if not found then
>   insert into email_addresses(email_address) returning email_address_id;
> else
>   insert into users(email_address_id) values (resolved_id)
> end if;
>
> OR, you can go the natural route (which tend to prefer):
> CREATE TABLE email_addresses
> (
>   email_address VARCHAR primary key
> );
>
> create table users
> (
>   user_id BIGSERIAL primary key,
>   email_address VARCHAR references email_addresses on update cascade
> on delete cascade/set null,
> )
>
> your insert will look like this (pseudo code):
> insert into email_addresses(email_address)
>   select '...@foo.com' where not exists
>    (select 1 from email_addresses where email_address = '...@foo.com')
>
> insert into users (email_address) values ('...@foo.com');
>
> Obviously this is a rough sketch, you may need to consider locking,
> contention, etc.  But a trigger is overkill for this problem.
>
> merlin
>
> --
> Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org)
> To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general



Re: Thoughts on a surrogate key lookup function?

От
Merlin Moncure
Дата:
On Sat, Nov 6, 2010 at 1:01 PM, Nick <nboutelier@gmail.com> wrote:
> Merlin, thanks for the reply. Yes, using email_addresses was a very
> silly example. Maybe the following is a better example...
>
> CREATE TABLE first_names (id INT, first_name VARCHAR);
> ALTER TABLE first_names ADD CONSTRAINT first_names_pkey PRIMARY KEY
> (id,first_name);
> ALTER TABLE first_names ADD CONSTRAINT first_names_unique_first_name
> UNIQUE (first_name);
>
> CREATE TABLE last_names (id INT, first_name VARCHAR);
> ALTER TABLE last_names ADD CONSTRAINT last_names_pkey PRIMARY KEY
> (id,last_name);
> ALTER TABLE last_names ADD CONSTRAINT last_names_unique_last_name
> UNIQUE (last_name);
>
> CREATE TABLE referrals (id INT, ref_code VARCHAR);
> ALTER TABLE referrals ADD CONSTRAINT referrals_pkey PRIMARY KEY
> (id,ref_code);
> ALTER TABLE referrals ADD CONSTRAINT referrals_unique_ref_code UNIQUE
> (ref_code);
>
> CREATE TABLE users (id INT, first_name_id INT, first_name VARCHAR,
> last_name_id INT, last_name VARCHAR, ref_code_id INT, ref_code DATE);
> ALTER TABLE users ADD CONSTRAINT users_fkey_first_names FOREIGN KEY
> (first_name_id,first_name) REFERENCES first_names(id,first_name) ON
> UPDATE CASCADE ON DELETE SET NULL;
> ALTER TABLE users ADD CONSTRAINT users_fkey_last_names FOREIGN KEY
> (last_name_id,last_name) REFERENCES last_names(id,last_name) ON UPDATE
> CASCADE ON DELETE SET NULL;
> ALTER TABLE users ADD CONSTRAINT users_fkey_referrals FOREIGN KEY
> (ref_code_id,ref_code) REFERENCES referrals(id,ref_code) ON UPDATE
> CASCADE ON DELETE SET NULL;
> CREATE TRIGGER "auto_first_name_id" BEFORE INSERT OR UPDATE ON users
> FOR EACH ROW EXECUTE PROCEDURE _auto_id('first_name_id');
> CREATE TRIGGER "auto_last_name_id" BEFORE INSERT OR UPDATE ON users
> FOR EACH ROW EXECUTE PROCEDURE _auto_id('last_name_id');
> CREATE TRIGGER "auto_ref_code_id" BEFORE INSERT OR UPDATE ON users FOR
> EACH ROW EXECUTE PROCEDURE _auto_id('ref_code_id');
>
> If I would like to insert a new user, first name, last name I would
> give the surrogates a value of NULL or -1. Their referral code must
> exist so ill give that surrogate a value of 0.
> INSERT INTO users (id, first_name_id, first_name, last_name_id,
> last_name, ref_code_id, ref_code) VALUES (1,-1,'John',-1,'Doe',
> 0,'xyz') which would...
>
> SELECT id FROM first_names WHERE first_name = 'John' INTO
> NEW.first_name_id
> IF NOT FOUND INSERT INTO first_names (first_name) VALUES ('John')
> RETURNING id INTO NEW.first_name_id
>
> SELECT id FROM last_names WHERE last_name = 'Doe' INTO
> NEW.last_name_id
> IF NOT FOUND INSERT INTO last_names (last_name) VALUES ('Doe')
> RETURNING id INTO NEW.last_name_id
>
> SELECT id FROM referral_codes WHERE ref_code = 'xyz' INTO
> NEW.ref_code_id
> IF NOT FOUND raise exception
>
> If I want to insert the new user John Smith, and I already know the
> surrogate value for John and I dont want to add a ref_code then I can
> do...
> INSERT INTO users (id, first_name_id, first_name, last_name_id,
> last_name, ref_code_id, ref_code) VALUES
> (2,1,NULL,-1,'Smith',NULL,NULL) which would...
>
> SELECT first_name FROM first_names WHERE id = 1 INTO NEW.first_name
> IF NOT FOUND raise exception
>
> SELECT id FROM last_names WHERE last_name = 'Smith' INTO
> NEW.last_name_id
> IF NOT FOUND INSERT INTO last_names (last_name) VALUES ('Smith')
> RETURNING id INTO NEW.last_name_id
>
> So by adding both the surrogate and natural keys to users table and
> toggling the surrogate on insert by 0 (must exist) or -1 (select or
> insert) I can bypass a much more complex insert statement. Is this
> frowned upon? I havent had many issues (but some ive been able to work
> around) with this as a plperl trigger and am pleased with how much
> easier it makes my inserts (besides the execution speed).

It's a neat idea, but all things considered, you are better off using
one of the two approaches I outlined above:
*) Your idea need extra composite index on two fields (each unique)
that serves no integrity purpose
*) Referring table has extra fields, pick natural or surrogate, but not both...
*) It's not faster.  Any way you slice this, you need lookup on the
master table, even if the system does it internally through RI in the
purely natural case. insert where not exists...will do exactly what
you are doing above, and does it in one statement, not two.

If you want to do this inside database, it's more common to do this in
regular function, not trigger function.  Just make a function
insert_user() that handles logic checking and dispense with all the
extra fields...

merlin