Обсуждение: Duplicate key insert question
I have a table with a primary field and a few other fields. What is the fastest way to do an insert into that table assuming that sometimes I might try to insert a record with a duplicate primary key and want that to fail? I know that if I try a plain insert this will work, but in the case where I am trying to insert a duplicate key, the insert fails (as it should) and an error is logged. I could first do a check to see if there is already an entry with the same key as the one I am trying to insert but that would mean quite a few extra operations. Is there a quick and clean way of doing something like: INSERT into table values(...) IF there isn't already a row with pkey=x Thanks, Jean-Christian Imbeault PS The reason I am asking is that an open source project is using MySQL as their DB and they have a port to PG that isn't very clean b/c the DB code keeps trying to insert duplicate primary keys. According to them MySQL doesn't complain and just drops the insert whereas PG (as is right) complains. I've offered to clean up their PG insertion code but they say that they don't want too many extra checks as their app writes to the DB a *lot* and any extra check is going to slow down the application noticeably ...
> -----Original Message-----
> From: Jean-Christian Imbeault [mailto:jc@mega-bucks.co.jp]
> Sent: Tuesday, July 01, 2003 5:10 PM
> To: pgsql-general@postgresql.org
> Subject: [GENERAL] Duplicate key insert question
>
>
> I have a table with a primary field and a few other fields.
> What is the
> fastest way to do an insert into that table assuming that sometimes I
> might try to insert a record with a duplicate primary key and
> want that
> to fail?
>
> I know that if I try a plain insert this will work, but in the case
> where I am trying to insert a duplicate key, the insert fails (as it
> should) and an error is logged.
>
> I could first do a check to see if there is already an entry with the
> same key as the one I am trying to insert but that would mean quite a
> few extra operations.
>
> Is there a quick and clean way of doing something like:
>
> INSERT into table values(...) IF there isn't already a row with pkey=x
>
> Thanks,
>
> Jean-Christian Imbeault
>
> PS The reason I am asking is that an open source project is
> using MySQL
> as their DB and they have a port to PG that isn't very clean
> b/c the DB
> code keeps trying to insert duplicate primary keys. According to them
> MySQL doesn't complain and just drops the insert whereas PG (as is
> right) complains. I've offered to clean up their PG insertion
> code but
> they say that they don't want too many extra checks as their
> app writes
> to the DB a *lot* and any extra check is going to slow down the
> application noticeably ...
SQL*Server has a nice feature for this.  When you create an index, you
can ignore duplicate key attempts (nothing happens, and an informational
level warning is raised).
I find that it is an extremely useful feature for things like creation
of dictionaries or domains.
========================================================================
====
Syntax
CREATE [UNIQUE] [CLUSTERED | NONCLUSTERED]
    INDEX index_name ON table (column [,...n])
[WITH
        [PAD_INDEX]
        [[,] FILLFACTOR = fillfactor]
        [[,] IGNORE_DUP_KEY]
        [[,] DROP_EXISTING]
        [[,] STATISTICS_NORECOMPUTE]
]
[ON filegroup]
IGNORE_DUP_KEY
Controls what happens when an attempt is made to insert a duplicate key
value into a column that is part of a unique clustered index. If
IGNORE_DUP_KEY was specified for the index and an INSERT statement that
creates a duplicate key is executed, SQL Server issues a warning message
and ignores (does not insert) the duplicate row.
If IGNORE_DUP_KEY was not specified for the index, SQL Server issues an
error message and rolls back the entire INSERT statement.
A unique index cannot be created on a column that already includes
duplicate values, whether or not IGNORE_DUP_KEY is set. If attempted,
SQL Server displays an error message and lists the duplicate values.
Eliminate the duplicate values before creating a unique index on the
column.
E. Use the IGNORE_DUP_KEY
This example creates a unique clustered index on the emp_pay table. If a
duplicate key is entered, the INSERT or UPDATE statement is ignored.
SET NOCOUNT ON
USE pubs
IF EXISTS (SELECT * FROM INFORMATION_SCHEMA.TABLES
        WHERE TABLE_NAME = 'emp_pay')
    DROP TABLE emp_pay
GO
USE pubs
IF EXISTS (SELECT name FROM sysindexes
        WHERE name = 'employeeID_ind')
    DROP INDEX emp_pay.employeeID_ind
GO
USE pubs
GO
CREATE TABLE emp_pay
(
 employeeID int NOT NULL,
 base_pay money NOT NULL,
 commission decimal(2, 2) NOT NULL
)
INSERT emp_pay
    VALUES (1, 500, .10)
INSERT emp_pay
    VALUES (2, 1000, .05)
INSERT emp_pay
    VALUES (3, 800, .07)
INSERT emp_pay
    VALUES (5, 1500, .03)
INSERT emp_pay
    VALUES (9, 750, .06)
GO
SET NOCOUNT OFF
GO
CREATE UNIQUE CLUSTERED INDEX employeeID_ind
    ON emp_pay(employeeID)
    WITH IGNORE_DUP_KEY
========================================================================
====
			
		Dann Corbit wrote: > > SQL*Server has a nice feature for this. Ok ... is there a way to achieve the same effect in postgres? Thanks, Jean-Christian Imbeault
> -----Original Message----- > From: Jean-Christian Imbeault [mailto:jc@mega-bucks.co.jp] > Sent: Tuesday, July 01, 2003 5:21 PM > To: Dann Corbit > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Duplicate key insert question > > > Dann Corbit wrote: > > > > SQL*Server has a nice feature for this. > > Ok ... is there a way to achieve the same effect in postgres? No. MySQL is broken, unless they have some extension like SQL*Server. A duplicate insertion into a unique index must raise an error. (c)ISO/IEC ISO/IEC 9075-1:1999 (E) 4.6 SQL-schema objects 4.6.6.3 Table constraints A table constraint is an integrity constraint associated with a single base table. A table constraint is either a unique constraint, a primary key constraint, a referential constraint, or a check constraint. A unique constraint specifies one or more columns of the table as unique columns. A unique constraint is satisfied if and only if no two rows in a table have the same non-null values in the unique columns. A primary key constraint is a unique constraint that specifies PRIMARY KEY. A primary key constraint is satisfied if and only if no two rows in a table have the same non-null values in the unique columns and none of the values in the specified column or columns are the null value. A referential constraint specifies one or more columns as referencing columns and corresponding referenced columns in some (not necessarily distinct) base table, referred to as the referenced table. Such referenced columns are the unique columns of some unique constraint of the referenced table. A referential constraint is always satisfied if, for every row in the referencing table, the values of the referencing columns are equal to those of the corresponding referenced columns of some row in the referenced table. If null values are present, however, satisfaction of the referential constraint depends on the treatment specified for nulls (known as the match type). Referential actions may be specified to determine what changes are to be made to the referencing table if a change to the referenced table would otherwise cause the referential constraint to be violated. A table check constraint specifies a search condition. The constraint is violated if the result of the search condition is false for any row of the table (but not if it is unknown). 4.7 Integrity constraints and constraint checking 4.7.1 Constraint checking There are two kinds of schema object that describe constraints: assertions and table constraints (including domain constraints of any domains on which columns of that table may be based), and they are checked in the same way. Every constraint is either deferrable or not deferrable. In every SQL-session, every constraint has a constraint mode that is a property of that SQL-session. Each constraint has a (persistent) default constraint mode, with which the constraint starts each SQL-transaction in each SQL-session. A constraint mode is either deferred or immediate, and can be set by an SQL-statement, provided the constraint is deferrable. When a transaction is initiated, the constraint mode of each constraint is set to its default. On completion of execution of every SQL-statement, every constraint is checked whose constraint mode is immediate. Before termination of a transaction, every constraint mode is set to immediate (and therefore checked).
On Tuesday 01 July 2003 08:10 pm, Jean-Christian Imbeault wrote:
> I have a table with a primary field and a few other fields. What is the
> fastest way to do an insert into that table assuming that sometimes I
> might try to insert a record with a duplicate primary key and want that
> to fail?
>
> I know that if I try a plain insert this will work, but in the case
> where I am trying to insert a duplicate key, the insert fails (as it
> should) and an error is logged.
>
> I could first do a check to see if there is already an entry with the
> same key as the one I am trying to insert but that would mean quite a
> few extra operations.
>
> Is there a quick and clean way of doing something like:
>
> INSERT into table values(...) IF there isn't already a row with pkey=x
>
> Thanks,
>
> Jean-Christian Imbeault
Hi, not sure if this is answering your question, but I just asked similar
questions here. I asked about using INSERT WHERE NOT EXISTS (which you can do
in PostgreSQL). Here is what you can do:
INSERT INTO mytable
SELECT 'value1', 'value2'
   WHERE NOT EXISTS
        (SELECT NULL FROM mytable
                WHERE mycondition)
This will just return 0 when fails, but it does check first. Don't know if you
can really afford that. Just for reference, this brought up some discussion
here. Here is a link to the archive:
http://marc.theaimsgroup.com/?l=postgresql-general&w=2&r=1&s=WHERE+NOT+EXISTS&q=b
Hope that helps.
RDB
--
Reuben D. Budiardja
Department of Physics and Astronomy
The University of Tennessee, Knoxville, TN
-------------------------------------------------
/"\  ASCII Ribbon Campaign against HTML
\ /  email and proprietary format
 X   attachments.
/ \
-------------------------------------------------
Have you been used by Microsoft today?
Choose your life. Choose freedom.
Choose LINUX.
-------------------------------------------------
			
		Yes, but for MSSQL unique index with ignore duplicate in reality
will reject all duplicates.
Another word if you are trying to insert 2 identical values
you will insert none.
Do search using IF EXISTS SELECT ... or LEFT OUTER JOIN ... WHERE ... IS
NULL.
works pretty fast.
-----Original Message-----
From: Jean-Christian Imbeault [mailto:jc@mega-bucks.co.jp]
Sent: Tuesday, July 01, 2003 5:21 PM
To: Dann Corbit
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Duplicate key insert question
Dann Corbit wrote:
>
> SQL*Server has a nice feature for this.
Ok ... is there a way to achieve the same effect in postgres?
Thanks,
Jean-Christian Imbeault
---------------------------(end of broadcast)---------------------------
TIP 3: if posting/reading through Usenet, please send an appropriate
      subscribe-nomail command to majordomo@postgresql.org so that your
      message can get through to the mailing list cleanly
			
		Reuben D. Budiardja wrote: > > Hi, not sure if this is answering your question, but I just asked similar > questions here. I asked about using INSERT WHERE NOT EXISTS (which you can do > in PostgreSQL). Here is what you can do: > > INSERT INTO mytable > SELECT 'value1', 'value2' > WHERE NOT EXISTS > (SELECT NULL FROM mytable > WHERE mycondition) > > http://marc.theaimsgroup.com/?l=postgresql-general&w=2&r=1&s=WHERE+NOT+EXISTS&q=b Thanks for the link! I read the thread and it looks like even the above solution is not perfect because of a possible race condition where two inserts trying to insert a row with a pk not in the table will both get think it is ok to do so, try it and then both will fail? If I followed all the arguments correctly according to the thread there is *no* way to do what I (and you ;) want in one simple query. Do you agree? Or did I miss something? Jean-Christian Imbeault
> -----Original Message----- > From: Maksim Likharev [mailto:mlikharev@aurigin.com] > Sent: Tuesday, July 01, 2003 5:40 PM > To: Jean-Christian Imbeault; Dann Corbit > Cc: pgsql-general@postgresql.org > Subject: RE: [GENERAL] Duplicate key insert question > > > Yes, but for MSSQL unique index with ignore duplicate in > reality will reject all duplicates. Another word if you are > trying to insert 2 identical values > you will insert none. Their documentation is wrong then. From here: http://www.mysql.com/doc/en/ALTER_TABLE.html we have this: "IGNORE is a MySQL extension to SQL-92. It controls how ALTER TABLE works if there are duplicates on unique keys in the new table. If IGNORE isn't specified, the copy is aborted and rolled back. If IGNORE is specified, then for rows with duplicates on a unique key, only the first row is used; the others are deleted." > Do search using IF EXISTS SELECT ... or LEFT OUTER JOIN ... > WHERE ... IS NULL. works pretty fast. > > > -----Original Message----- > From: Jean-Christian Imbeault [mailto:jc@mega-bucks.co.jp] > Sent: Tuesday, July 01, 2003 5:21 PM > To: Dann Corbit > Cc: pgsql-general@postgresql.org > Subject: Re: [GENERAL] Duplicate key insert question > > > Dann Corbit wrote: > > > > SQL*Server has a nice feature for this. > > Ok ... is there a way to achieve the same effect in postgres? > > Thanks, > > Jean-Christian Imbeault > > > ---------------------------(end of > broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so > that your > message can get through to the mailing list cleanly >
Maksim Likharev wrote: > > Do search using IF EXISTS SELECT ... or LEFT OUTER JOIN ... WHERE ... IS > NULL. > works pretty fast. Sorry, I don't understand. Works pretty fast for what? Is that a way of finding if a value exists? or a way of doing the insertion? Thanks, Jean-Christian Imbeault
On Wed, Jul 02, 2003 at 09:45:23AM +0900, Jean-Christian Imbeault wrote: > Reuben D. Budiardja wrote: > > http://marc.theaimsgroup.com/?l=postgresql-general&w=2&r=1&s=WHERE+NOT+EXISTS&q=b > > Thanks for the link! > > I read the thread and it looks like even the above solution is not > perfect because of a possible race condition where two inserts trying to > insert a row with a pk not in the table will both get think it is ok to > do so, try it and then both will fail? No, only the "second" one will fail (though it's difficult which one is the second) > If I followed all the arguments correctly according to the thread there > is *no* way to do what I (and you ;) want in one simple query. No, there's not. You should check the returned value from the insertion function to see if it succeeded or not. Sadly, an error will cause the whole transaction to abort, but if they come from the MySQL side it will hardly matter. But you should try to use a sequence if at all possible to avoid all these problems. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "A wizard is never late, Frodo Baggins, nor is he early. He arrives precisely when he means to." (Gandalf, en LoTR FoTR)
Alvaro Herrera wrote: > > No, only the "second" one will fail (though it's difficult which one is > the second) From: http://marc.theaimsgroup.com/?l=postgresql-general&m=105656988915991&w=2 Ian Barwick wrote: [...] I proposed that same solution 3 years ago. Tom shoots it down: [...] I couldn't get the link to work so I couldn't read why Tom shot it down. But if Tom shot down this idea down ... then it mustn't be correct. >>If I followed all the arguments correctly according to the thread there >>is *no* way to do what I (and you ;) want in one simple query. > > > No, there's not. You say no, but at first you say that the proposed method works. The proposed method, if it is correct, is simple enough for me. By simple I mean all can be done with one query. > You should check the returned value from the insertion > function to see if it succeeded or not. No, what I want if to have one query that will *always* insert if there is no record with this primary key and *always* do nothing (not fail, not generate an error) if there is already a record with this primary key. I don't want to check return values :) Thanks, Jean-Christian Imbeault
Finding if the duplicate value exists and inserting if not.
As for the race condition ( your other post )
I do not know how that will work for PG, but in Microsoft SQL Server
you can do following
BEGIN TRANSACTION
UPDATE [val] = [val]
    WHERE ....
INSERT ...
COMMIT TRANSACTION
Pretty general approach tho, should work on any SQL system with
transaction and locking support.
so basically by updating specific row ( let say you have such row )
in transaction, row/page lock will be held until end of transaction
and concurrent UPDATE will wait until you are done.
Kind of semaphore.
Practical example table that holds unique rows, let say documents,
you can have extra row with let say [id] = -1 or whatever you like,
so during insert into that table you can update that row in a
transaction,
search/insert unique values, commit transaction.
-----Original Message-----
From: Jean-Christian Imbeault [mailto:jc@mega-bucks.co.jp]
Sent: Tuesday, July 01, 2003 5:47 PM
To: Maksim Likharev
Cc: pgsql-general@postgresql.org
Subject: Re: [GENERAL] Duplicate key insert question
Maksim Likharev wrote:
>
> Do search using IF EXISTS SELECT ... or LEFT OUTER JOIN ... WHERE ...
IS
> NULL.
> works pretty fast.
Sorry, I don't understand. Works pretty fast for what?
Is that a way of finding if a value exists? or a way of doing the
insertion?
Thanks,
Jean-Christian Imbeault
			
		Maksim Likharev wrote: > > Finding if the duplicate value exists and inserting if not. Ok, thanks but I think it is still vulnerable to a race condition. > I do not know how that will work for PG, but in Microsoft SQL Server > you can do following > BEGIN TRANSACTION > UPDATE [val] = [val] > WHERE .... > INSERT ... > COMMIT TRANSACTION > > so basically by updating specific row ( let say you have such row ) > in transaction, row/page lock will be held until end of transaction > and concurrent UPDATE will wait until you are done. > Kind of semaphore. Why the UPDATE? And in postgres every query runs in it's own transaction so no need for the explicit BEGIN / END block. So can't see how your solution is any better than the previous one :) Thanks, Jean-Christian Imbeault
On Wed, Jul 02, 2003 at 09:58:28AM +0900, Jean-Christian Imbeault wrote: > Alvaro Herrera wrote: > > > > No, only the "second" one will fail (though it's difficult which one is > > the second) > > I couldn't get the link to work so I couldn't read why Tom shot it down. > But if Tom shot down this idea down ... then it mustn't be correct. The thread is here: http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=3A4D6116.1A613402%40mascari.com&rnum=1&prev=/groups%3Fq%3DMike%2BMascari%2BINSERT%2BNOT%2BEXISTS%26ie%3D The solution is not correct in that there _is_ a race condition. > > You should check the returned value from the insertion > > function to see if it succeeded or not. > > No, what I want if to have one query that will *always* insert if there > is no record with this primary key and *always* do nothing (not fail, > not generate an error) if there is already a record with this primary > key. I don't want to check return values :) No way. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "No es bueno caminar con un hombre muerto"
On Tuesday 01 July 2003 08:45 pm, Jean-Christian Imbeault wrote: > Reuben D. Budiardja wrote: > > Hi, not sure if this is answering your question, but I just asked similar > > questions here. I asked about using INSERT WHERE NOT EXISTS (which you > > can do in PostgreSQL). Here is what you can do: > > > > INSERT INTO mytable > > SELECT 'value1', 'value2' > > WHERE NOT EXISTS > > (SELECT NULL FROM mytable > > WHERE mycondition) > > > > http://marc.theaimsgroup.com/?l=postgresql-general&w=2&r=1&s=WHERE+NOT+EX > >ISTS&q=b > > Thanks for the link! > > I read the thread and it looks like even the above solution is not > perfect because of a possible race condition where two inserts trying to > insert a row with a pk not in the table will both get think it is ok to > do so, try it and then both will fail? No, onlu *one* of them will fail, but yes, the other will then generate error. So it really is a trade off. Another way would be to lock the table, as other has suggested. But then there is disadvantages to that also. RDB -- Reuben D. Budiardja Department of Physics and Astronomy The University of Tennessee, Knoxville, TN ------------------------------------------------- /"\ ASCII Ribbon Campaign against HTML \ / email and proprietary format X attachments. / \ ------------------------------------------------- Have you been used by Microsoft today? Choose your life. Choose freedom. Choose LINUX. -------------------------------------------------
Alvaro Herrera wrote: > > The thread is here: > http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&oe=UTF-8&threadm=3A4D6116.1A613402%40mascari.com&rnum=1&prev=/groups%3Fq%3DMike%2BMascari%2BINSERT%2BNOT%2BEXISTS%26ie%3D Thanks! > The solution is not correct in that there _is_ a race condition. I thought so :( >>No, what I want if to have one query that will *always* insert if there >>is no record with this primary key and *always* do nothing (not fail, >>not generate an error) if there is already a record with this primary >>key. I don't want to check return values :) > > > No way. I was beginning to think so. Thanks for confirming my suspicions. In your opinion what is the best solution, if we define best as not generating any error messages and executing as quickly as possible? Thanks, Jean-Christian Imbeault
>> -----Original Message----- >> From: Alvaro Herrera [mailto:alvherre@dcc.uchile.cl] >> Sent: Tuesday, July 01, 2003 5:51 PM >> To: Jean-Christian Imbeault >> Cc: techlist@voyager.phys.utk.edu; pgsql-general@postgresql.org >> Subject: Re: [GENERAL] Duplicate key insert question >> >> >> On Wed, Jul 02, 2003 at 09:45:23AM +0900, Jean-Christian >> Imbeault wrote: >> Reuben D. Budiardja wrote: [snip] >> If I followed all the arguments correctly according to the thread >> there is *no* way to do what I (and you ;) want in one simple query. > > No, there's not. You should check the returned value from the insertion > function to see if it succeeded or not. Sadly, an error will cause the > whole transaction to abort, but if they come from the MySQL side it will > hardly matter. But you should try to use a sequence if at all possible > to avoid all these problems. Does not really avoid the named issue. Suppose that you have a dictionary of working part numbers (e.g. Boeing might have 3 million distinct parts in their database). They would like to create a domain for these parts. So, naturally, they take their list and do %cat list.dat|sort|uniq>list.sor And then bulk load list.sor. Unfortunately, the operation fails, because one part was duplicated: PartID PartDescription -------- --------------------------------- 94v-975b High speed saphire needle bearing 94V-975B High speed saphire needle bearing It would have been nice if after loading 1.7 million of the 3 million parts, it could simply skip over the obvious error instead of rolling everything back. Of course, it is also possible that 94v-975b and 94V-975B are distinct parts. So the one who designs the database must make that decision in allowing an IGNORE option. I think it would be a useful addition to PostgreSQL, but I have an easy work around for what I want to do by simply capitalizing the strings I am inserting into a dictionary or domain and use select distinct to filter. The rare times I want to do something like that incrementally, I can just request a table lock.
Reuben D. Budiardja wrote: > > No, onlu *one* of them will fail, but yes, the other will then generate error. > So it really is a trade off. Another way would be to lock the table, as other > has suggested. But then there is disadvantages to that also. Really? I just got a post form Alvaro Herrera saying; "The solution is not correct in that there _is_ a race condition." Maybe I misunderstood, but "not correct" doesn't sound good :) Jean-Christian Imbeault
On Wed, Jul 02, 2003 at 10:25:54AM +0900, Jean-Christian Imbeault wrote: > Reuben D. Budiardja wrote: > > > > No, onlu *one* of them will fail, but yes, the other will then generate error. > > So it really is a trade off. Another way would be to lock the table, as other > > has suggested. But then there is disadvantages to that also. > > Really? I just got a post form Alvaro Herrera saying; > > "The solution is not correct in that there _is_ a race condition." > > Maybe I misunderstood, but "not correct" doesn't sound good :) Well, he is right. One will fail, the other will not. The race condition is for the application. If you want to ignore it, you can do that, but there _will_ be an ERROR thrown and the transaction will be aborted. The other transaction _will_ insert the tuple, though, and it won't be aborted. Note that for the race condition to show there has to be a race, i.e. two backends trying to insert the same primary key at the same time. If one finishes half a second before the other, they will behave that way you want, i.e. there will one tuple inserted and no error generated. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "No deja de ser humillante para una persona de ingenio saber que no hay tonto que no le pueda enseñar algo." (Jean B. Say)
On Tue, Jul 01, 2003 at 06:26:08PM -0700, Dann Corbit wrote: > > But you should try to use a sequence if at all possible to avoid all > > these problems. > > Does not really avoid the named issue. > > Suppose that you have a dictionary of working part numbers (e.g. Boeing > might have 3 million distinct parts in their database). > They would like to create a domain for these parts. So, naturally, they > take their list and do > %cat list.dat|sort|uniq>list.sor > And then bulk load list.sor. Oh, sure. The sequence thing won't apply everywhere. But maybe it can be applied in his scenario, which I don't know. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) Voy a acabar con todos los humanos / con los humanos yo acabaré voy a acabar con todos / con todos los humanos acabaré (Bender)
Alvaro Herrera wrote: > > Well, he is right. One will fail, the other will not. The race > condition is for the application. If you want to ignore it, you can do > that, but there _will_ be an ERROR thrown and the transaction will be > aborted. Ah ... then maybe this solution is 'good enough'. It will still generate an error message some of the time (when there is a race condition) but will definitely generate fewer error messages than the current method used which is just to do the insert and let it fail if there is already a record with the same primary key. Thanks for the help! Jean-Christian Imbeault
> -----Original Message----- > From: Alvaro Herrera [mailto:alvherre@dcc.uchile.cl] > Sent: Tuesday, July 01, 2003 6:37 PM > To: Jean-Christian Imbeault > Cc: techlist@voyager.phys.utk.edu; pgsql-general@postgresql.org > Subject: Re: [GENERAL] Duplicate key insert question > > > On Wed, Jul 02, 2003 at 10:25:54AM +0900, Jean-Christian > Imbeault wrote: > > Reuben D. Budiardja wrote: > > > > > > No, onlu *one* of them will fail, but yes, the other will then > > > generate error. > > > So it really is a trade off. Another way would be to lock > the table, as other > > > has suggested. But then there is disadvantages to that also. > > > > Really? I just got a post form Alvaro Herrera saying; > > > > "The solution is not correct in that there _is_ a race condition." > > > > Maybe I misunderstood, but "not correct" doesn't sound good :) > > Well, he is right. One will fail, the other will not. The > race condition is for the application. If you want to ignore > it, you can do that, but there _will_ be an ERROR thrown and > the transaction will be aborted. The other transaction > _will_ insert the tuple, though, and it won't be aborted. > > Note that for the race condition to show there has to be a > race, i.e. two backends trying to insert the same primary key > at the same time. If one finishes half a second before the > other, they will behave that way you want, i.e. there will > one tuple inserted and no error generated. I assume that PostgreSQL would simply time out both transactions if it happened in a deadly-embrace pair? I searched the PG docs, but could not find a clear answer.
On Tue, Jul 01, 2003 at 06:48:29PM -0700, Dann Corbit wrote: > I assume that PostgreSQL would simply time out both transactions if it > happened in a deadly-embrace pair? > > I searched the PG docs, but could not find a clear answer. No, the deadlock will the detected and one of the transactions will be aborted. This should happen within a second or so (configurable, AFAIR). The other transaction will continue normally. -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "El que vive para el futuro es un iluso, y el que vive para el pasado, un imbécil" (Luis Adler, "Los tripulantes de la noche")
On Tuesday 01 July 2003 09:25 pm, Jean-Christian Imbeault wrote:
> Reuben D. Budiardja wrote:
> > No, onlu *one* of them will fail, but yes, the other will then generate
> > error. So it really is a trade off. Another way would be to lock the
> > table, as other has suggested. But then there is disadvantages to that
> > also.
>
> Really? I just got a post form Alvaro Herrera saying;
>
> "The solution is not correct in that there _is_ a race condition."
>
> Maybe I misunderstood, but "not correct" doesn't sound good :)
>
If you want to avoid the race condition as well, then use the locking
mechanism for transaction. Combine it with the previous INSERT ... SELECT ...
WHERE NOT EXISTS, it should give you what you want. I suspect it's slower
though. Eg:
BEGIN WORK;
INSERT INTO mytable
SELECT 'value1', 'value2'
   WHERE NOT EXISTS
                (SELECT NULL FROM mytable
                    WHERE mycondition)
COMMIT WORK;
This should solve the Race Condition, since other transaction have to wait.
But if the PK already exists, this will quit without error.
RDB
			
		Reuben D. Budiardja wrote: > > INSERT INTO mytable > SELECT 'value1', 'value2' > WHERE NOT EXISTS > (SELECT NULL FROM mytable > WHERE mycondition) Thank you to everyone who helped out on my question. I am trying to implement the above solution but I'm having problems getting this to work when I want to insert more than one value: TAL=# create table b (a text primary key, b text); NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index 'b_pkey' for table 'b' CREATE TABLE TAL=# insert into b select 'a'; INSERT 335311 1 TAL=# insert into b select 'b', select 'b'; ERROR: parser: parse error at or near "select" at character 27 Did I get the syntax wrong? Thanks, Jean-Christian Imbeault
On Tuesday 01 July 2003 11:08 pm, Jean-Christian Imbeault wrote: > Reuben D. Budiardja wrote: > > INSERT INTO mytable > > SELECT 'value1', 'value2' > > WHERE NOT EXISTS > > (SELECT NULL FROM mytable > > WHERE mycondition) > > Thank you to everyone who helped out on my question. I am trying to > implement the above solution but I'm having problems getting this to > work when I want to insert more than one value: > > TAL=# create table b (a text primary key, b text); > NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index 'b_pkey' > for table 'b' > CREATE TABLE > TAL=# insert into b select 'a'; > INSERT 335311 1 > TAL=# insert into b select 'b', select 'b'; > ERROR: parser: parse error at or near "select" at character 27 > I don't see what you're trying to do. Why do you have two select ? RDB
On Wed, Jul 02, 2003 at 12:08:56 +0900, Jean-Christian Imbeault <jc@mega-bucks.co.jp> wrote: > TAL=# insert into b select 'b', select 'b'; > ERROR: parser: parse error at or near "select" at character 27 You probably want: insert into b select 'b', 'b';
On Wednesday 02 July 2003 02:58, Jean-Christian Imbeault wrote: > Alvaro Herrera wrote: > > No, only the "second" one will fail (though it's difficult which one is > > the second) > > From: > > http://marc.theaimsgroup.com/?l=postgresql-general&m=105656988915991&w=2 > > Ian Barwick wrote: > > [...] > > I proposed that same solution 3 years ago. Tom shoots it down: (This quote is not from Mike Mascari, not me) > [...] > > I couldn't get the link to work so I couldn't read why Tom shot it down. > But if Tom shot down this idea down ... then it mustn't be correct. see: http://archives.postgresql.org/pgsql-general/2000-12/msg00970.php entire thread: http://archives.postgresql.org/pgsql-general/2000-12/msg00947.php Ian Barwick barwick@gmx.net
Jean-Christian Imbeault wrote: > Maksim Likharev wrote: > >>Finding if the duplicate value exists and inserting if not. > > Ok, thanks but I think it is still vulnerable to a race condition. > >>I do not know how that will work for PG, but in Microsoft SQL Server >>you can do following >>BEGIN TRANSACTION >>UPDATE [val] = [val] >> WHERE .... >>INSERT ... >>COMMIT TRANSACTION >> >>so basically by updating specific row ( let say you have such row ) >>in transaction, row/page lock will be held until end of transaction >>and concurrent UPDATE will wait until you are done. >>Kind of semaphore. > > Why the UPDATE? And in postgres every query runs in it's own transaction > so no need for the explicit BEGIN / END block. The transaction semantics of your application should dictate the use of BEGIN/END. If you aren't using BEGIN/END (as many non-InnoDB MySQL applications fail to do), I doubt you can guarantee logical consistency and performance will be less than what it should be. With WAL, the difference is less noticeable, but it is there. The use of BEGIN/END to guarantee logical consistency will help performance. You kill two birds with one stone. > So can't see how your solution is any better than the previous one :) The solution works because: Backend #1: BEGIN; UPDATE foo SET bar = bar WHERE keyid = 1; Backend #2: BEGIN; UPDATE foo SET bar = bar WHERE keyid = 1; <-- Now blocks because of #1 Backend #1: INSERT INTO weirdtable SELECT weirdkey, field1, field2 WHERE NOT EXISTS ( SELECT 1 FROM weirdtable w WHERE w.key = weirdkey ); <-- Performs the insert (or not) COMMIT; Backend #2: INSERT INTO weirdtable SELECT weirdkey, field1, field2 WHERE NOT EXISTS ( SELECT 1 FROM weirdtable w WHERE w.key = weirdkey ); <-- Does nothing, as INSERT was (possibly) caused by Backend #1 COMMIT; The UPDATE acts as a lock with row granularity, preventing the race condition caused by multiple INSERT..WHERE NOT EXISTS. Tuple visibility rules in READ COMMITTED will allow Backend #2 to see the new INSERT performed by Backend #1, since Backend #2 won't continue past the UPDATE until Backend #1 has issued a COMMIT or ABORT. However, the solution prohibits you from using SERIALIZABLE, since the UPDATE of the same lock row will cause a serialization error. If READ COMMITTED is sufficient, the question arises as to how to get row granularity from the lock. One could have a lock table composed of the keys being inserted but how do the keys get into the lock table without a race condition there? If row-lock granularity cannot be achieved, and you absolutely cannot handle a race condition causing a unique key violation, you might as well just use LOCK TABLE. You might also want to investigate the userlocks in /contrib, although I haven't used them so I cannot speak to their usefulness. Mike Mascari mascarm@mascari.com
On Wed, 2 Jul 2003, Jean-Christian Imbeault wrote: > I have a table with a primary field and a few other fields. What is the > fastest way to do an insert into that table assuming that sometimes I > might try to insert a record with a duplicate primary key and want that > to fail? > > I know that if I try a plain insert this will work, but in the case > where I am trying to insert a duplicate key, the insert fails (as it > should) and an error is logged. > > I could first do a check to see if there is already an entry with the > same key as the one I am trying to insert but that would mean quite a > few extra operations. > > Is there a quick and clean way of doing something like: > > INSERT into table values(...) IF there isn't already a row with pkey=x > > Thanks, > > Jean-Christian Imbeault > > PS The reason I am asking is that an open source project is using MySQL > as their DB and they have a port to PG that isn't very clean b/c the DB > code keeps trying to insert duplicate primary keys. According to them > MySQL doesn't complain and just drops the insert whereas PG (as is > right) complains. I've offered to clean up their PG insertion code but > they say that they don't want too many extra checks as their app writes > to the DB a *lot* and any extra check is going to slow down the > application noticeably ... Skipping ahead without reading all the messages in this thread (got to rush out) what about using a before insert trigger, doing the check in there, returning null if the insert would fail and see if they complain about the slow down :) Of course it's still got the race condition for the application unless you also lock the table and it'll screw up any use of currval(sequence) afterwards that expects to get the id of the row inserted with a id obtained from nextval(sequence) Nigel Andrews
Nigel J. Andrews wrote: > > Skipping ahead without reading all the messages in this thread (got to rush > out) what about using a before insert trigger, doing the check in there, > returning null if the insert would fail and see if they complain about the slow > down :) Oooh! I think I like that .... there would be no penalty for the MySQL version since this would be a trigger and only in the postgres database version of the code. Will a trigger still allow one insert to succeed if there are multiple backends trying to insert the same primary key in a table? There must be no 'lost' inserts .... > Of course it's still got the race condition for the application unless you also > lock the table and it'll screw up any use of currval(sequence) afterwards that > expects to get the id of the row inserted with a id obtained from > nextval(sequence) I just want two things from any valid solution: 1- if there is an insert and there is not row with the new insert's primary key then the insert is made. If there are multiple inserts one succeeds. i.e. No valid inserts will be 'lost'. 2- reduce the number of error messages logged as a result of 'collisions' between two backends trying to insert duplicate primary key rows. I am not very familiar with triggers so I would very much appreciate any feedback. But I think the use of a trigger might safe? Thanks, Jean-Christian Imbeault
On 2 Jul 2003 at 16:19, Jean-Christian Imbeault wrote: > I just want two things from any valid solution: > > 1- if there is an insert and there is not row with the new insert's > primary key then the insert is made. If there are multiple inserts one > succeeds. i.e. No valid inserts will be 'lost'. With postgresql, if you use transacations, no valid inserts are ever lost unless there is a deadlock. > > 2- reduce the number of error messages logged as a result of > 'collisions' between two backends trying to insert duplicate primary key > rows. Log of what? Postgresql will log all the errors. If you don't want to see them, tune the logging option. If your code is doing any checking for return value etc., check if it is a duplicate message and discard it. That way you can minimize your application logging. Otherwise use select/insert behaviour referred earlier. Besides if you are going to insert a duplicate rarely, why worry so much about performance? And if your dulicates are bit too frequent for comfort, you might have some more issues w.r.t database table design to look at. Bye Shridhar -- And 1.1.81 is officially BugFree(tm), so if you receive any bug-reportson it, you know they are just evil lies."(By Linus Torvalds, Linus.Torvalds@cs.helsinki.fi)
Shridhar Daithankar wrote: Again, please let me preface my email. The application in question is not written by me. I am trying to find a simple and easily implementable way of making the current implementation more postgres friendly :) > With postgresql, if you use transacations, no valid inserts are ever lost > unless there is a deadlock. Just needed to make sure as I don't know much about triggers. >>2- reduce the number of error messages logged as a result of >>'collisions' between two backends trying to insert duplicate primary key >>rows. > > Log of what? Postgresql will log all the errors. If you don't want to see them, > tune the logging option. I don't want to turn all error logging off, actually I don't want to turn any error logging off :) If errors are logged it usually means there is a bug some somewhere ... > If your code is doing any checking for return value etc., check if it is a > duplicate message and discard it. That way you can minimize your application > logging. It's not my application logging, it's postgres logging the fact that a duplicate insertion was attempted: ERROR: Cannot insert a duplicate key into unique index pot_documents_pkey > Otherwise use select/insert behaviour referred earlier. But the trigger is nicer since none of the application code would need to be changed at all. I would just add a trigger to the database creation script. > And if your dulicates are bit too frequent for comfort, you might > have some more issues w.r.t database table design to look at. They are frequent, and I agree that there are issues with the database design. But the application developers don't want to address them (if it's broke why fix it kind of thing) ... I just noticed that their app was generating a *lot* of log entries when used with a postgres database and I offered to try and find a way of reducing the number of errors generated. Just trying to find a way to make *their* app play nice with postgres so that people who do use their app will use it with postgres as a DB and not that *other* DB ... Jean-Christian Imbeault
On 2 Jul 2003 at 16:52, Jean-Christian Imbeault wrote: > They are frequent, and I agree that there are issues with the database > design. But the application developers don't want to address them (if > it's broke why fix it kind of thing) ... > > I just noticed that their app was generating a *lot* of log entries when > used with a postgres database and I offered to try and find a way of > reducing the number of errors generated. > > Just trying to find a way to make *their* app play nice with postgres so > that people who do use their app will use it with postgres as a DB and > not that *other* DB ... Your error message is ERROR: Cannot insert a duplicate key into unique index pot_documents_pkey Just restart the postmaster and redirect log to script which ignores this message. A simple pipe to grep -v would do the trick. No need to modify the app, no need to nodify the db as well. HTH Bye Shridhar -- COBOL: An exercise in Artificial Inelegance.
Shridhar Daithankar wrote: > > Just restart the postmaster and redirect log to script which ignores this > message. A simple pipe to grep -v would do the trick. No need to modify the > app, no need to nodify the db as well. True but each and every user would need to redirect their log, pipe to a filter, etc ... With a a trigger it's built into the app. You install the app and that's it. It's a lot more user friendly, no? Jean-Christian Imbeault
On 2 Jul 2003 at 17:09, Jean-Christian Imbeault wrote: > Shridhar Daithankar wrote: > > > > Just restart the postmaster and redirect log to script which ignores this > > message. A simple pipe to grep -v would do the trick. No need to > modify the > > app, no need to nodify the db as well. > > True but each and every user would need to redirect their log, pipe to a > filter, etc ... No. It's just your server log. > With a a trigger it's built into the app. You install the app and that's > it. It's a lot more user friendly, no? Triggers are not built into app. They are built in database so every app. would automatically get them without restarting database or application. Bye Shridhar -- Vulcans never bluff. -- Spock, "The Doomsday Machine", stardate 4202.1
Shridhar Daithankar wrote: > > No. It's just your server log. I was assuming that any users of this app who want it using postgres as a database would want to stop these error messages from being logged. (The developers know that these error messages are being logged, but they don't see them as real errors) > Triggers are not built into app. They are built in database so every app. would > automatically get them without restarting database or application. Sorry I forgot to mention that the app comes with a script to built the necessary tables, etc. So I would add to that script to create the necessary trigger. Jc
On 2 Jul 2003 at 17:20, Jean-Christian Imbeault wrote: > Shridhar Daithankar wrote: > > > > No. It's just your server log. > > I was assuming that any users of this app who want it using postgres as > a database would want to stop these error messages from being logged. > (The developers know that these error messages are being logged, but > they don't see them as real errors) Well, with grep -v in log pipe, you are just preventing these messages from reaching log file. If a client does error checking. He would get the message anyway. Bye Shridhar -- One-Shot Case Study, n.: The scientific equivalent of the four-leaf clover, from which it is concluded all cloverspossess four leaves and are sometimes green.
On Wed, 2 Jul 2003, Shridhar Daithankar wrote: > On 2 Jul 2003 at 17:20, Jean-Christian Imbeault wrote: > > > Shridhar Daithankar wrote: > > > > > > No. It's just your server log. > > > > I was assuming that any users of this app who want it using postgres as > > a database would want to stop these error messages from being logged. > > (The developers know that these error messages are being logged, but > > they don't see them as real errors) > > Well, with grep -v in log pipe, you are just preventing these messages from > reaching log file. If a client does error checking. He would get the message > anyway. You've also got the issue of the same pgsql cluster being used for more than one database and the potential for filtering out ERROR messages from other dbs in the cluster that really shouldn't be filtered out. -- Nigel J. Andrews
On 2 Jul 2003 at 9:30, Nigel J. Andrews wrote: > > Well, with grep -v in log pipe, you are just preventing these messages from > > reaching log file. If a client does error checking. He would get the message > > anyway. > You've also got the issue of the same pgsql cluster being used for more than > one database and the potential for filtering out ERROR messages from other dbs > in the cluster that really shouldn't be filtered out. Well, if you put the entire error message in grep string, that should not affect much as it contains index name. Of course, unless you have same index in two different databases. Tough luck then.. Bye Shridhar -- Fun experiments: Get a can of shaving cream, throw it in a freezer for about a week. Then take it out, peel the metal off and put it where you want... bedroom, car, etc. As it thaws, it expands an unbelievable amount.