Обсуждение: INSERT or UPDATE
I have spent the last couple of days reading up on SQL, of which I know very little, and PL/pgSQl, of which I know less. I am trying to decide how best to approach the following requirement. Given a legal name and a common name and associated details, we wish to insert this information into a table, entities. As well, we believe it useful to allow a specific entity more than one common name. So, at the moment we are considering having another table, identifiers, that takes entity_id = entity.id (synthetic sequenced PK for entities), the identifier_type (always 'AKNA' for this collection of identifiers) and identifier_value = entity.common_name. This seems straight forward enough when initially inserting an entity. However, it is conceivable that over the lifetime of the system a particular entity might change its common name. For example the former "John Tash Enterprises" might become popularly known as "JTE Inc." while the legal name remains unchanged. When we update the entity record and set the common_name = "JTE Inc." then we need insert an identifier row to match. However, identifiers for a given entity can be maintained separately from the entity itself. It is therefore possible, indeed likely, that the identifier "JTE Inc." for that entity already exists. Likely, but not certain. In any case, the old identifier row remains unchanged after the new is inserted. The issue then is how to determine on an UPDATE entities whether it is necessary to INSERT a new identifier using values provided from the entities row. From what I have gathered, what one does is simply insert the new identifiers row. If there is a primary key conflict then the update fails, which the function handles gracefully. If not, then it succeeds. I have also formed the opinion that what one does is write a function or functions, such as fn_aknau(entity_id, name), and tie these with triggers to the appropriate actions on entities such as: CREATE TRIGGER tr_entities_aioru AFTER INSERT OR UPDATE ON entities FOR EACH ROW EXECUTE fn_aknai(entities.id, 'AKNA', entities.common_name); Is my appreciation correct or am I missing the mark entirely? Is this considered the proper place and means to accomplish this sort of task in an RDBMS? Does it belong elsewhere? Am I correct in inferring that the values in the columns id and common_name will be those of entities AFTER the insert or update and that these will be available to the body of the function? Is the trigger dependent upon a SUCCESSFUL INSERT or UPDATE of entities or will it be called regardless? Must the function be written in PL/pgSQl (or similar PL) or could this function be written in straight SQL? Should it be straight SQL if possible? What should the function return, if anything? Fairly basic stuff I am sure but somewhat mystifying for me at the moment. Any help would be appreciated. -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
On Mon, April 6, 2009 17:00, Dann Corbit wrote: . > > It is a difficult question. > > For instance, there are many possibilities when a collision occurs. > > I guess that for some collisions, sharing the name is OK. > I failed to explicitly state what the PK looked like. entity_id(entities.id) + identifier_type ('AKNA') + identifier_value(entities.common_name) There will only be a PK collision when we attempt to add a duplicate common name for the same entity, which means it already exists and does not need to be added again. -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
> -----Original Message----- > From: James B. Byrne [mailto:byrnejb@harte-lyne.ca] > Sent: Monday, April 06, 2009 2:06 PM > To: Dann Corbit > Cc: pgsql-general@postgresql.org > Subject: RE: [GENERAL] INSERT or UPDATE > > On Mon, April 6, 2009 17:00, Dann Corbit wrote: > . > > > > It is a difficult question. > > > > For instance, there are many possibilities when a collision occurs. > > > > I guess that for some collisions, sharing the name is OK. > > > > I failed to explicitly state what the PK looked like. > > entity_id(entities.id) + > identifier_type ('AKNA') + > identifier_value(entities.common_name) > > There will only be a PK collision when we attempt to add a duplicate > common name for the same entity, which means it already exists and > does not need to be added again. The pedagogic solution for this type of problem is called merge. The last I knew, PostgreSQL did not directly support merge. So you can accomplish the same thing in two stages: 1. Check for existence and perform an update if the key is present 2. If the key is not present, then perform an insert. Again, this may or may not be the right thing to do.
Dann Corbit wrote on 06.04.2009 23:15: >>> I guess that for some collisions, sharing the name is OK. >>> >> I failed to explicitly state what the PK looked like. >> >> entity_id(entities.id) + >> identifier_type ('AKNA') + >> identifier_value(entities.common_name) >> >> There will only be a PK collision when we attempt to add a duplicate >> common name for the same entity, which means it already exists and >> does not need to be added again. > > The pedagogic solution for this type of problem is called merge. > The last I knew, PostgreSQL did not directly support merge. > So you can accomplish the same thing in two stages: > 1. Check for existence and perform an update if the key is present > 2. If the key is not present, then perform an insert. You don't actually need to check for existence. Just do the update, if no rows were updated, you can insert (UPDATE will do an existence check anyway) Thomas
On Mon, April 6, 2009 17:15, Dann Corbit wrote: > > The pedagogic solution for this type of problem is called merge. > The last I knew, PostgreSQL did not directly support merge. > So you can accomplish the same thing in two stages: > 1. Check for existence and perform an update if the key is present > 2. If the key is not present, then perform an insert. > > Again, this may or may not be the right thing to do. > Forgive my obtuseness, but what does the preliminary SELECT accomplish? When the trigger fires we already know whether or not the entities row existed previously, what we are deciding is how to handle the concurrent identifiers table entry. I initially thought along these lines (select insert/update depending on the return value) but I gradually realized that it did not matter whether the identifier row was already there or not. If it exists then an UNIQUE key constraint prevents duplicates. If it does not exist then the INSERT succeeds. The previous identifier associated with the original common name has to remain on file to allow lookups by former names. Thus, we never update an identifier row in this fashion. -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
> -----Original Message----- > From: James B. Byrne [mailto:byrnejb@harte-lyne.ca] > Sent: Monday, April 06, 2009 5:16 PM > To: Dann Corbit > Cc: pgsql-general@postgresql.org > Subject: RE: [GENERAL] INSERT or UPDATE > > > On Mon, April 6, 2009 17:15, Dann Corbit wrote: > > > > > The pedagogic solution for this type of problem is called merge. > > The last I knew, PostgreSQL did not directly support merge. > > So you can accomplish the same thing in two stages: > > 1. Check for existence and perform an update if the key is present > > 2. If the key is not present, then perform an insert. > > > > Again, this may or may not be the right thing to do. > > > > Forgive my obtuseness, but what does the preliminary SELECT > accomplish? When the trigger fires we already know whether or not > the entities row existed previously, what we are deciding is how to > handle the concurrent identifiers table entry. > > I initially thought along these lines (select insert/update > depending on the return value) but I gradually realized that it did > not matter whether the identifier row was already there or not. If > it exists then an UNIQUE key constraint prevents duplicates. If it > does not exist then the INSERT succeeds. The previous identifier > associated with the original common name has to remain on file to > allow lookups by former names. Thus, we never update an identifier > row in this fashion. If a transaction involves rows where some succeed and some fail, all will roll back. If that is the desired behavior, or if all operations are singleton, then you won't see any problems.
On Mon, April 6, 2009 20:23, Dann Corbit wrote: > > If a transaction involves rows where some succeed and some fail, > all will roll back. If that is the desired behavior, or if all > operations are singleton, then you won't see any problems. > Do I understand correctly that this means that even if the function "handles" a failed insert then if the function occurs inside a transaction then that transaction fails and is rolled back regardless? -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
> -----Original Message----- > From: James B. Byrne [mailto:byrnejb@harte-lyne.ca] > Sent: Monday, April 06, 2009 5:43 PM > To: Dann Corbit > Cc: pgsql-general@postgresql.org > Subject: RE: [GENERAL] INSERT or UPDATE > > > On Mon, April 6, 2009 20:23, Dann Corbit wrote: > > > > > If a transaction involves rows where some succeed and some fail, > > all will roll back. If that is the desired behavior, or if all > > operations are singleton, then you won't see any problems. > > > > Do I understand correctly that this means that even if the function > "handles" a failed insert then if the function occurs inside a > transaction then that transaction fails and is rolled back > regardless? It depends on how it is handled. You could certainly (for instance) use a cursor and perform the operations one by one.
This is what I have come up with. Comments are welcomed. CREATE OR REPLACE FUNCTION hll_pg_fn_ident_insert() RETURNS TRIGGER AS $pg_fn$ -- ROW AFTER TRIGGER -- trigger passes identifier_type, _value and _description -- received as ARGV[0], ARGV[1] and ARGV[2] BEGIN INSERT INTO identifiers( entity_id, identifier_type, identifier_value, identifier_description) VALUES( NEW.id, TG_ARGV[0], TG.ARGV[1], TG_ARGV[2]); -- Assume the INSERT fails because of a unique key violation, -- (entity_id + identifier_type + identifier_value) -- -- This does not matter since we only need ensure that this -- alias exists, so handle the exception and return: EXCEPTION WHEN unique_violation THEN -- do nothing NULL; END; $pg_fn$ LANGUAGE plpgsql; COMMENT ON FUNCTION hll_pg_fn_ident_insert IS 'Used by entities trigger. Inserts a corresponding identifiers row.' CREATE TRIGGER hll_pg_tr_entity_identifier_akna AFTER INSERT OR UPDATE ON entities FOR EACH ROW EXECUTE PROCEDURE hll_pg_fn_identifier_insert( "AKNA", entities.entity_common_name, "Common Name auto-insert"); COMMENT ON TRIGGER hll_pg_tr_entity_identifier_akna IS 'Inserts an alias identifier for common name if one does not exist' -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
I am testing the trigger function that I wrote. Is there a way to increase the logging detail level for just a single database instance? The manual indicates not, but just in case I am misreading things I am asking here? -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
Not in regards to logging detail, but that function in general... I'm pretty new to postgres, so I could be totally wrong in this, but I think this thread http://archives.postgresql.org/pgsql-performance/2008-03/msg00204.php may pertain if you see some performance degradation with that trigger. Like I said, somebody correct me if I'm way off base. Chris -----Original Message----- From: pgsql-general-owner@postgresql.org [mailto:pgsql-general-owner@postgresql.org] On Behalf Of James B. Byrne Sent: Tuesday, April 07, 2009 1:52 PM To: pgsql-general@postgresql.org Subject: Re: [GENERAL] INSERT or UPDATE TRIGGER I am testing the trigger function that I wrote. Is there a way to increase the logging detail level for just a single database instance? The manual indicates not, but just in case I am misreading things I am asking here? -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3 -- Sent via pgsql-general mailing list (pgsql-general@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general
"James B. Byrne" <byrnejb@harte-lyne.ca> writes: > I am testing the trigger function that I wrote. Is there a way to > increase the logging detail level for just a single database > instance? ALTER DATABASE foo SET log_min_messages = whatever; Note this will only affect subsequently-started sessions. Also, if memory serves, you have to be superuser to set this particular variable. regards, tom lane
On Tue, April 7, 2009 15:09, Tom Lane wrote: > > ALTER DATABASE foo SET log_min_messages = whatever; > > Note this will only affect subsequently-started sessions. Also, > if memory serves, you have to be superuser to set this particular > variable. Thanks. Am I correct to infer from the output this generates that log_min_messages = debug is primarily for developers of PG itself? I am poking in the dark here. What I want to do is to determine if the trigger is firing and whether the function works as intended. At the moment I am not seeing anything show up in the secondary table so I have done something wrong. Is there some way of getting PG to tell me what it is doing? -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
"James B. Byrne" <byrnejb@harte-lyne.ca> writes: > I am poking in the dark here. What I want to do is to determine if > the trigger is firing and whether the function works as intended. > At the moment I am not seeing anything show up in the secondary > table so I have done something wrong. Is there some way of getting > PG to tell me what it is doing? You might find it more useful to add some elog(LOG) statements to the trigger body. regards, tom lane
On Tue, April 7, 2009 16:07, Tom Lane wrote: > > You might find it more useful to add some elog(LOG) statements to > the trigger body. > Thank you again. I will go through section 44.2 tonight. -- *** E-Mail is NOT a SECURE channel *** James B. Byrne mailto:ByrneJB@Harte-Lyne.ca Harte & Lyne Limited http://www.harte-lyne.ca 9 Brockley Drive vox: +1 905 561 1241 Hamilton, Ontario fax: +1 905 561 0757 Canada L8E 3C3
> -----Original Message----- > From: pgsql-general-owner@postgresql.org [mailto:pgsql-general- > owner@postgresql.org] On Behalf Of James B. Byrne > Sent: Monday, April 06, 2009 1:46 PM > To: pgsql-general@postgresql.org > Subject: [GENERAL] INSERT or UPDATE > > I have spent the last couple of days reading up on SQL, of which I > know very little, and PL/pgSQl, of which I know less. I am trying > to decide how best to approach the following requirement. > > Given a legal name and a common name and associated details, we wish > to insert this information into a table, entities. As well, we > believe it useful to allow a specific entity more than one common > name. So, at the moment we are considering having another table, > identifiers, that takes entity_id = entity.id (synthetic sequenced > PK for entities), the identifier_type (always 'AKNA' for this > collection of identifiers) and identifier_value = > entity.common_name. > > This seems straight forward enough when initially inserting an > entity. However, it is conceivable that over the lifetime of the > system a particular entity might change its common name. For example > the former "John Tash Enterprises" might become popularly known as > "JTE Inc." while the legal name remains unchanged. > > When we update the entity record and set the common_name = "JTE > Inc." then we need insert an identifier row to match. However, > identifiers for a given entity can be maintained separately from the > entity itself. It is therefore possible, indeed likely, that the > identifier "JTE Inc." for that entity already exists. Likely, but > not certain. In any case, the old identifier row remains unchanged > after the new is inserted. > > The issue then is how to determine on an UPDATE entities whether it > is necessary to INSERT a new identifier using values provided from > the entities row. > > From what I have gathered, what one does is simply insert the new > identifiers row. If there is a primary key conflict then the update > fails, which the function handles gracefully. If not, then it > succeeds. > > I have also formed the opinion that what one does is write a > function or functions, such as fn_aknau(entity_id, name), and tie > these with triggers to the appropriate actions on entities such as: > > CREATE TRIGGER tr_entities_aioru AFTER INSERT OR UPDATE > ON entities > FOR EACH ROW > EXECUTE fn_aknai(entities.id, 'AKNA', entities.common_name); > > Is my appreciation correct or am I missing the mark entirely? Is > this considered the proper place and means to accomplish this sort > of task in an RDBMS? Does it belong elsewhere? Am I correct in > inferring that the values in the columns id and common_name will be > those of entities AFTER the insert or update and that these will be > available to the body of the function? Is the trigger dependent > upon a SUCCESSFUL INSERT or UPDATE of entities or will it be called > regardless? Must the function be written in PL/pgSQl (or similar PL) > or could this function be written in straight SQL? Should it be > straight SQL if possible? What should the function return, if > anything? > > Fairly basic stuff I am sure but somewhat mystifying for me at the > moment. Any help would be appreciated. It is a difficult question. For instance, there are many possibilities when a collision occurs. I guess that for some collisions, sharing the name is OK. Consider two different fictional companies (hopefully in different domains): Jet Propulsion Industries Incorporated == JPI Inc. (makes jet engines) Journey Protection Investments Inc. == JPI Inc. (underwrites travel insurance) Probably, they don't have a legal battle because they have completely different domains. So it seems OK for both companies to relate to this entity if it is only used as a label. On the other hand, you may have a typographical error on data entry for a computer firm. If you label a company as "IBM" when it should have been "IBN" I guess that won't make anyone happy. I think that the real issue is that you must truly and carefully identify your business rules and model those in the database structure. Said another way, "How would a human handle this issue given a name collision?" If the answer is not obvious, then maybe you need to write an exceptions log and handle each case by hand that is not solved by a simple and clear to understand rule.