Обсуждение: Thoughts on a surrogate key lookup function?
Are there any existing trigger functions (preferably C) that could retrieve a missing value for a compound foreign key on insert or update? If this overall sounds like a really bad idea, please let me know as well. This functionality could really speed my project up though. For example, CREATE TABLE email_addresses (id BIGSERIAL, email_address VARCHAR); ALTER TABLE email_addresses ADD CONSTRAINT _email_addresses_pkey PRIMARY KEY (id,email_address); CREATE TABLE users (id BIGSERIAL, email_address_id BIGINT, email_address VARCHAR); ALTER TABLE users ADD CONSTRAINT users_fkey_email_address_id FOREIGN KEY (email_address_id,email_address) REFERENCES email_addresses(id,email_address) ON UPDATE CASCADE ON DELETE SET NULL; CREATE TRIGGER "1-auto_email_address_id" BEFORE INSERT OR UPDATE ON users FOR EACH ROW EXECUTE PROCEDURE _auto_id('email_address_id'); If I would like to insert a new user AND new email_address I would assign the email_address_id of NULL or -1. INSERT INTO users (id, email_address_id, email_address) VALUES (1,-1,'foo@bar.com') which would do... SELECT id FROM email_addresses WHERE email_address = 'foo@bar.com' INTO NEW.email_address_id If it cannot find a value, it then does... INSERT INTO email_addresses (email_address) VALUES ('foo@bar.com') RETURNING id INTO NEW.email_address_id If I would like to insert a new user and existing email address, I would assign the email_address_id of 0. INSERT INTO users (id, email_address_id, email_address) VALUES (2,0,'foo@bar.com') which would... SELECT id FROM email_addresses WHERE email_address = 'foo@bar.com' INTO NEW.email_address_id If it cannot find a value, it will raise an exception. If I insert or update users and email_address_id is > 0 then it gets the natual value by id... INSERT INTO users (id, email_address_id, email_address) VALUES (3,2,NULL) which will SELECT email_address FROM email_addresses WHERE id = 2 INTO NEW.email_address And if both email_address_id and email_address are NULL then, both values just get inserted into users as null. Declaring the surrogate as -1 (select or insert) or 0 (select) would save time having to lookup or create the value before inserting into users. Ive been using a plperl function for this and really like the results but im wondering if theres a faster way. The foreign key constraint already scans the email_addresses table for values so im wondering if theres a way to bootstrap that process or maybe thats too risky? Any thoughts would be greatly appreciated.
On Fri, Nov 5, 2010 at 5:27 PM, Nick <nboutelier@gmail.com> wrote: > Are there any existing trigger functions (preferably C) that could > retrieve a missing value for a compound foreign key on insert or > update? If this overall sounds like a really bad idea, please let me > know as well. This functionality could really speed my project up > though. I think your issues are really SQL issues. See my comments below: > For example, > > CREATE TABLE email_addresses (id BIGSERIAL, email_address VARCHAR); > ALTER TABLE email_addresses ADD CONSTRAINT _email_addresses_pkey > PRIMARY KEY (id,email_address); email addresses should be unique, so this is pretty silly. You are not getting fast lookups on email which is what you need. You have two choices here: keep the surrogate on email_addresses, in which case I'd do the tables like this: CREATE TABLE email_addresses ( email_address_id BIGSERIAL primary key, email_address VARCHAR unique ); create table users ( user_id BIGSERIAL primary key, email_address_id BIGINT references email_addresses on delete cascade/set null, ) your insert will look like this (pseudo code): select email_address_id from email_addresses where email_address = 'foo@foo.com'; if not found then insert into email_addresses(email_address) returning email_address_id; else insert into users(email_address_id) values (resolved_id) end if; OR, you can go the natural route (which tend to prefer): CREATE TABLE email_addresses ( email_address VARCHAR primary key ); create table users ( user_id BIGSERIAL primary key, email_address VARCHAR references email_addresses on update cascade on delete cascade/set null, ) your insert will look like this (pseudo code): insert into email_addresses(email_address) select 'foo@foo.com' where not exists (select 1 from email_addresses where email_address = 'foo@foo.com') insert into users (email_address) values ('foo@foo.com'); Obviously this is a rough sketch, you may need to consider locking, contention, etc. But a trigger is overkill for this problem. merlin
Merlin, thanks for the reply. Yes, using email_addresses was a very silly example. Maybe the following is a better example... CREATE TABLE first_names (id INT, first_name VARCHAR); ALTER TABLE first_names ADD CONSTRAINT first_names_pkey PRIMARY KEY (id,first_name); ALTER TABLE first_names ADD CONSTRAINT first_names_unique_first_name UNIQUE (first_name); CREATE TABLE last_names (id INT, first_name VARCHAR); ALTER TABLE last_names ADD CONSTRAINT last_names_pkey PRIMARY KEY (id,last_name); ALTER TABLE last_names ADD CONSTRAINT last_names_unique_last_name UNIQUE (last_name); CREATE TABLE referrals (id INT, ref_code VARCHAR); ALTER TABLE referrals ADD CONSTRAINT referrals_pkey PRIMARY KEY (id,ref_code); ALTER TABLE referrals ADD CONSTRAINT referrals_unique_ref_code UNIQUE (ref_code); CREATE TABLE users (id INT, first_name_id INT, first_name VARCHAR, last_name_id INT, last_name VARCHAR, ref_code_id INT, ref_code DATE); ALTER TABLE users ADD CONSTRAINT users_fkey_first_names FOREIGN KEY (first_name_id,first_name) REFERENCES first_names(id,first_name) ON UPDATE CASCADE ON DELETE SET NULL; ALTER TABLE users ADD CONSTRAINT users_fkey_last_names FOREIGN KEY (last_name_id,last_name) REFERENCES last_names(id,last_name) ON UPDATE CASCADE ON DELETE SET NULL; ALTER TABLE users ADD CONSTRAINT users_fkey_referrals FOREIGN KEY (ref_code_id,ref_code) REFERENCES referrals(id,ref_code) ON UPDATE CASCADE ON DELETE SET NULL; CREATE TRIGGER "auto_first_name_id" BEFORE INSERT OR UPDATE ON users FOR EACH ROW EXECUTE PROCEDURE _auto_id('first_name_id'); CREATE TRIGGER "auto_last_name_id" BEFORE INSERT OR UPDATE ON users FOR EACH ROW EXECUTE PROCEDURE _auto_id('last_name_id'); CREATE TRIGGER "auto_ref_code_id" BEFORE INSERT OR UPDATE ON users FOR EACH ROW EXECUTE PROCEDURE _auto_id('ref_code_id'); If I would like to insert a new user, first name, last name I would give the surrogates a value of NULL or -1. Their referral code must exist so ill give that surrogate a value of 0. INSERT INTO users (id, first_name_id, first_name, last_name_id, last_name, ref_code_id, ref_code) VALUES (1,-1,'John',-1,'Doe', 0,'xyz') which would... SELECT id FROM first_names WHERE first_name = 'John' INTO NEW.first_name_id IF NOT FOUND INSERT INTO first_names (first_name) VALUES ('John') RETURNING id INTO NEW.first_name_id SELECT id FROM last_names WHERE last_name = 'Doe' INTO NEW.last_name_id IF NOT FOUND INSERT INTO last_names (last_name) VALUES ('Doe') RETURNING id INTO NEW.last_name_id SELECT id FROM referral_codes WHERE ref_code = 'xyz' INTO NEW.ref_code_id IF NOT FOUND raise exception If I want to insert the new user John Smith, and I already know the surrogate value for John and I dont want to add a ref_code then I can do... INSERT INTO users (id, first_name_id, first_name, last_name_id, last_name, ref_code_id, ref_code) VALUES (2,1,NULL,-1,'Smith',NULL,NULL) which would... SELECT first_name FROM first_names WHERE id = 1 INTO NEW.first_name IF NOT FOUND raise exception SELECT id FROM last_names WHERE last_name = 'Smith' INTO NEW.last_name_id IF NOT FOUND INSERT INTO last_names (last_name) VALUES ('Smith') RETURNING id INTO NEW.last_name_id So by adding both the surrogate and natural keys to users table and toggling the surrogate on insert by 0 (must exist) or -1 (select or insert) I can bypass a much more complex insert statement. Is this frowned upon? I havent had many issues (but some ive been able to work around) with this as a plperl trigger and am pleased with how much easier it makes my inserts (besides the execution speed). -Nick On Nov 6, 6:28 am, mmonc...@gmail.com (Merlin Moncure) wrote: > On Fri, Nov 5, 2010 at 5:27 PM, Nick <nboutel...@gmail.com> wrote: > > Are there any existing trigger functions (preferably C) that could > > retrieve a missing value for a compound foreign key on insert or > > update? If this overall sounds like a really bad idea, please let me > > know as well. This functionality could really speed my project up > > though. > > I think your issues are really SQL issues. See my comments below: > > > For example, > > > CREATE TABLE email_addresses (id BIGSERIAL, email_address VARCHAR); > > ALTER TABLE email_addresses ADD CONSTRAINT _email_addresses_pkey > > PRIMARY KEY (id,email_address); > > email addresses should be unique, so this is pretty silly. You are > not getting fast lookups on email which is what you need. You have > two choices here: keep the surrogate on email_addresses, in which case > I'd do the tables like this: > CREATE TABLE email_addresses > ( > email_address_id BIGSERIAL primary key, > email_address VARCHAR unique > ); > > create table users > ( > user_id BIGSERIAL primary key, > email_address_id BIGINT references email_addresses on delete cascade/set null, > ) > > your insert will look like this (pseudo code): > select email_address_id from email_addresses where email_address = > '...@foo.com'; > > if not found then > insert into email_addresses(email_address) returning email_address_id; > else > insert into users(email_address_id) values (resolved_id) > end if; > > OR, you can go the natural route (which tend to prefer): > CREATE TABLE email_addresses > ( > email_address VARCHAR primary key > ); > > create table users > ( > user_id BIGSERIAL primary key, > email_address VARCHAR references email_addresses on update cascade > on delete cascade/set null, > ) > > your insert will look like this (pseudo code): > insert into email_addresses(email_address) > select '...@foo.com' where not exists > (select 1 from email_addresses where email_address = '...@foo.com') > > insert into users (email_address) values ('...@foo.com'); > > Obviously this is a rough sketch, you may need to consider locking, > contention, etc. But a trigger is overkill for this problem. > > merlin > > -- > Sent via pgsql-general mailing list (pgsql-gene...@postgresql.org) > To make changes to your subscription:http://www.postgresql.org/mailpref/pgsql-general
On Sat, Nov 6, 2010 at 1:01 PM, Nick <nboutelier@gmail.com> wrote: > Merlin, thanks for the reply. Yes, using email_addresses was a very > silly example. Maybe the following is a better example... > > CREATE TABLE first_names (id INT, first_name VARCHAR); > ALTER TABLE first_names ADD CONSTRAINT first_names_pkey PRIMARY KEY > (id,first_name); > ALTER TABLE first_names ADD CONSTRAINT first_names_unique_first_name > UNIQUE (first_name); > > CREATE TABLE last_names (id INT, first_name VARCHAR); > ALTER TABLE last_names ADD CONSTRAINT last_names_pkey PRIMARY KEY > (id,last_name); > ALTER TABLE last_names ADD CONSTRAINT last_names_unique_last_name > UNIQUE (last_name); > > CREATE TABLE referrals (id INT, ref_code VARCHAR); > ALTER TABLE referrals ADD CONSTRAINT referrals_pkey PRIMARY KEY > (id,ref_code); > ALTER TABLE referrals ADD CONSTRAINT referrals_unique_ref_code UNIQUE > (ref_code); > > CREATE TABLE users (id INT, first_name_id INT, first_name VARCHAR, > last_name_id INT, last_name VARCHAR, ref_code_id INT, ref_code DATE); > ALTER TABLE users ADD CONSTRAINT users_fkey_first_names FOREIGN KEY > (first_name_id,first_name) REFERENCES first_names(id,first_name) ON > UPDATE CASCADE ON DELETE SET NULL; > ALTER TABLE users ADD CONSTRAINT users_fkey_last_names FOREIGN KEY > (last_name_id,last_name) REFERENCES last_names(id,last_name) ON UPDATE > CASCADE ON DELETE SET NULL; > ALTER TABLE users ADD CONSTRAINT users_fkey_referrals FOREIGN KEY > (ref_code_id,ref_code) REFERENCES referrals(id,ref_code) ON UPDATE > CASCADE ON DELETE SET NULL; > CREATE TRIGGER "auto_first_name_id" BEFORE INSERT OR UPDATE ON users > FOR EACH ROW EXECUTE PROCEDURE _auto_id('first_name_id'); > CREATE TRIGGER "auto_last_name_id" BEFORE INSERT OR UPDATE ON users > FOR EACH ROW EXECUTE PROCEDURE _auto_id('last_name_id'); > CREATE TRIGGER "auto_ref_code_id" BEFORE INSERT OR UPDATE ON users FOR > EACH ROW EXECUTE PROCEDURE _auto_id('ref_code_id'); > > If I would like to insert a new user, first name, last name I would > give the surrogates a value of NULL or -1. Their referral code must > exist so ill give that surrogate a value of 0. > INSERT INTO users (id, first_name_id, first_name, last_name_id, > last_name, ref_code_id, ref_code) VALUES (1,-1,'John',-1,'Doe', > 0,'xyz') which would... > > SELECT id FROM first_names WHERE first_name = 'John' INTO > NEW.first_name_id > IF NOT FOUND INSERT INTO first_names (first_name) VALUES ('John') > RETURNING id INTO NEW.first_name_id > > SELECT id FROM last_names WHERE last_name = 'Doe' INTO > NEW.last_name_id > IF NOT FOUND INSERT INTO last_names (last_name) VALUES ('Doe') > RETURNING id INTO NEW.last_name_id > > SELECT id FROM referral_codes WHERE ref_code = 'xyz' INTO > NEW.ref_code_id > IF NOT FOUND raise exception > > If I want to insert the new user John Smith, and I already know the > surrogate value for John and I dont want to add a ref_code then I can > do... > INSERT INTO users (id, first_name_id, first_name, last_name_id, > last_name, ref_code_id, ref_code) VALUES > (2,1,NULL,-1,'Smith',NULL,NULL) which would... > > SELECT first_name FROM first_names WHERE id = 1 INTO NEW.first_name > IF NOT FOUND raise exception > > SELECT id FROM last_names WHERE last_name = 'Smith' INTO > NEW.last_name_id > IF NOT FOUND INSERT INTO last_names (last_name) VALUES ('Smith') > RETURNING id INTO NEW.last_name_id > > So by adding both the surrogate and natural keys to users table and > toggling the surrogate on insert by 0 (must exist) or -1 (select or > insert) I can bypass a much more complex insert statement. Is this > frowned upon? I havent had many issues (but some ive been able to work > around) with this as a plperl trigger and am pleased with how much > easier it makes my inserts (besides the execution speed). It's a neat idea, but all things considered, you are better off using one of the two approaches I outlined above: *) Your idea need extra composite index on two fields (each unique) that serves no integrity purpose *) Referring table has extra fields, pick natural or surrogate, but not both... *) It's not faster. Any way you slice this, you need lookup on the master table, even if the system does it internally through RI in the purely natural case. insert where not exists...will do exactly what you are doing above, and does it in one statement, not two. If you want to do this inside database, it's more common to do this in regular function, not trigger function. Just make a function insert_user() that handles logic checking and dispense with all the extra fields... merlin