Обсуждение: Duplicated entries are not ignored even if a "do instead nothing" rule is added.

Поиск
Список
Период
Сортировка

Duplicated entries are not ignored even if a "do instead nothing" rule is added.

От
邓尧
Дата:
Hi,

I'm new to pgsql, I need the do something like the "INSERT IGNORE" in mysql. After some searching I got a solution, which is adding a "do instead nothing" rule to the corresponding table, but it fails sometimes.
The table and the rule is created with the following sql statements:

create sequence ACCOUNT_ID_SEQ;
create table ACCOUNT (
ID bigint primary key default nextval('ACCOUNT_ID_SEQ'),
HOME char(255) not null,
NAME char(255)
);
create unique index on ACCOUNT(HOME);
create index on ACCOUNT(NAME);

create rule IGNORE_DUPLICATED_ACCOUNT
as on insert to ACCOUNT
where exists(select 1 from ACCOUNT where HOME = NEW.HOME)
do instead nothing;

There are about 20 clients do the following insertion (no UPDATE, some of them might DELETE):
begin transaction:
insert into ACCOUNT(HOME) values (v1);
insert into ACCOUNT(HOME) values (v2);
...
commit;

Sometimes I got the error says the unique constraint "account_home_idx" is violated. 

Any suggestions?

Thanks
-Yao

Re: Duplicated entries are not ignored even if a "do instead nothing" rule is added.

От
Alban Hertroys
Дата:
On 3 Jan 2012, at 5:20, 邓尧 wrote:

> Hi,
>
> I'm new to pgsql, I need the do something like the "INSERT IGNORE" in mysql. After some searching I got a solution,
whichis adding a "do instead nothing" rule to the corresponding table, but it fails sometimes. 

Yeah, if a concurrent transaction tries to create the same record, one of the transactions is going to find that it
alreadyexists on transaction commit. An INSERT-rule is not going to protect you against that. 

> The table and the rule is created with the following sql statements:
>
> create sequence ACCOUNT_ID_SEQ;
> create table ACCOUNT (
>     ID bigint primary key default nextval('ACCOUNT_ID_SEQ'),
>     HOME char(255) not null,
>     NAME char(255)
> );
> create unique index on ACCOUNT(HOME);
> create index on ACCOUNT(NAME);

It seems to me that account(home) is actually the PK - do you really need the artificial id column? That is a matter of
personalpreference; wars are waged on artificial vs natural keys. People in here will usually tell you to use what fits
theproblem best, both sides have benefits and drawbacks ;) 

Another problem you'll have is that char columns are padded up to their full size with spaces - you'll end up trimming
everyvalue in your client applications. You probably want varchar(255) or perhaps better, text. The latter also rids
youof that 255 length limit from mysql. 

> There are about 20 clients do the following insertion (no UPDATE, some of them might DELETE):
> begin transaction:
> insert into ACCOUNT(HOME) values (v1);
> insert into ACCOUNT(HOME) values (v2);
> ...
> commit;
>
> Sometimes I got the error says the unique constraint "account_home_idx" is violated.
>
> Any suggestions?

I assume you're talking about parallel inserts from a multi-process tool for importing this data? If that's the case
thenthere are a number of solutions commonly used. 

First of all, especially if you're inserting a lot of data like this, see if you can use COPY FROM STDIN instead. That
loadsthe whole transaction contents in one go, which is a lot more efficient then thousands of sequential inserts. As
it'sa single statement that way, you don't even need to wrap it in a transaction anymore - you'll get an implicit
transactionper single statement, which is in this case exactly what you want for this single COPY statement. 

The other thing people usually do is to insert the data into a staging table without UNIQUE constraints. After that
theyissue: 

INSERT INTO account(home) SELECT DISTINCT home FROM staging_table WHERE NOT EXISTS (SELECT 1 FROM account WHERE
account.home= staging_table.home); 

Other options are to use external tools written for batch inserting large amounts of data. I seem to recall pgfouine is
suchan application, but I've never used it. 

Alban Hertroys

--
The scale of a problem often equals the size of an ego.



Re: Duplicated entries are not ignored even if a "do instead nothing" rule is added.

От
Merlin Moncure
Дата:
On Tue, Jan 3, 2012 at 1:42 AM, Alban Hertroys <haramrae@gmail.com> wrote:
> On 3 Jan 2012, at 5:20, 邓尧 wrote:
>
>> Hi,
>>
>> I'm new to pgsql, I need the do something like the "INSERT IGNORE" in mysql. After some searching I got a solution,
whichis adding a "do instead nothing" rule to the corresponding table, but it fails sometimes. 
>
> Yeah, if a concurrent transaction tries to create the same record, one of the transactions is going to find that it
alreadyexists on transaction commit. An INSERT-rule is not going to protect you against that. 

It will if you lock the table first in the same transaction...note
this will greatly hurt concurrency and you have to watch for
deadlocks.

INSERT...SELECT..WHERE is going to be vastly superior to a rule based
approach obviously.

merlin

Re: Duplicated entries are not ignored even if a "do instead nothing" rule is added.

От
Alban Hertroys
Дата:
You accidentally clicked "Reply" instead of "Reply-all" ;)

On 4 Jan 2012, at 3:03, 邓尧 wrote:

>> On Tue, Jan 3, 2012 at 3:42 PM, Alban Hertroys <haramrae@gmail.com> wrote:
>> On 3 Jan 2012, at 5:20, 邓尧 wrote:
>>
>> > Hi,
>> >
>> > I'm new to pgsql, I need the do something like the "INSERT IGNORE" in mysql. After some searching I got a
solution,which is adding a "do instead nothing" rule to the corresponding table, but it fails sometimes. 
>>
>> Yeah, if a concurrent transaction tries to create the same record, one of the transactions is going to find that it
alreadyexists on transaction commit. An INSERT-rule is not going to protect you against that. 
>>
> I modified the code, just commit after every insertion. Haven't seen this kind of error for a few hours. It might
hurtthe performance, but for my application, database isn't the bottleneck. 
>
>
>> > The table and the rule is created with the following sql statements:
>> >
>> > create sequence ACCOUNT_ID_SEQ;
>> > create table ACCOUNT (
>> >       ID bigint primary key default nextval('ACCOUNT_ID_SEQ'),
>> >       HOME char(255) not null,
>> >       NAME char(255)
>> > );
>> > create unique index on ACCOUNT(HOME);
>> > create index on ACCOUNT(NAME);
>>
>> It seems to me that account(home) is actually the PK - do you really need the artificial id column? That is a matter
ofpersonal preference; wars are waged on artificial vs natural keys. People in here will usually tell you to use what
fitsthe problem best, both sides have benefits and drawbacks ;) 
>>
>> Another problem you'll have is that char columns are padded up to their full size with spaces - you'll end up
trimmingevery value in your client applications. You probably want varchar(255) or perhaps better, text. The latter
alsorids you of that 255 length limit from mysql. 
>>
> Thanks for the advice, I actually ran into a bug because of the padded spaces ;p
>
>
>> > There are about 20 clients do the following insertion (no UPDATE, some of them might DELETE):
>> > begin transaction:
>> > insert into ACCOUNT(HOME) values (v1);
>> > insert into ACCOUNT(HOME) values (v2);
>> > ...
>> > commit;
>> >
>> > Sometimes I got the error says the unique constraint "account_home_idx" is violated.
>> >
>> > Any suggestions?
>>
>> I assume you're talking about parallel inserts from a multi-process tool for importing this data?
>> Yes, I'm running a specific crawler. Crawl a twitter like web site. Python 3 + psycopg2 is my programming platform.
>>
>>
>> First of all, especially if you're inserting a lot of data like this, see if you can use COPY FROM STDIN instead.
Thatloads the whole transaction contents in one go, which is a lot more efficient then thousands of sequential inserts.
Asit's a single statement that way, you don't even need to wrap it in a transaction anymore - you'll get an implicit
transactionper single statement, which is in this case exactly what you want for this single COPY statement. 
> True, I don't need transactions, neither do I want them, but psycopg2 create transactions for me automatically :-(

Well, if psycopg didn't, Postgres would wrap each statement in a transaction anyway. But that doesn't matter.

The problem was that you were verifying whether you were running into a conflict in one transaction (say 'A') before a
potentiallyconflicting commit in a parallel transaction (say 'B'). 

So while you were verifying in 'A', transaction 'B' wasn't committed yet and therefore the conflicting row from 'B'
wasn'tvisible to 'A'. When you then tried to commit 'A', the database found there was a conflicting record: namely the
onefrom transaction 'B' that was just committed. 

Alban Hertroys

P.S: Normally I would have trimmed down this mail to the bottom part of the conversation, but as the original mail
didn'tmake it to the list I decided to keep all of it. 

--
If you can't see the forest for the trees,
cut the trees and you'll find there is no forest.


Re: Duplicated entries are not ignored even if a "do instead nothing" rule is added.

От
Daniele Varrazzo
Дата:
On Wed, Jan 4, 2012 at 8:09 PM, Alban Hertroys <haramrae@gmail.com> wrote:

> On 4 Jan 2012, at 3:03, 邓尧 wrote:

>> True, I don't need transactions, neither do I want them, but psycopg2 create transactions for me automatically :-(
>
> Well, if psycopg didn't, Postgres would wrap each statement in a transaction anyway. But that doesn't matter.

FYI, you can have psycopg in autocommit mode if you need that:
<http://initd.org/psycopg/docs/connection.html#connection.autocommit>.

-- Daniele

Re: Duplicated entries are not ignored even if a "do instead nothing" rule is added.

От
邓尧
Дата:
I have enabled the autocommit feature of psycopg2, and removed all the transactions in source code, also changed the sql statement to the following:

insert into ACCOUNT(HOME)
    select "v1" as HOME
    where not exists (select 1 from ACCOUNT where HOME = "v1")

Surprisingly, I still got the unique constraint violation error, but much fewer than before ( only 3 during the last 3 days).
Any suggestions? Could it be a bug in psycopg2 or postgresql?

Thanks
-Yao

On Wed, Jan 4, 2012 at 1:11 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
On Tue, Jan 3, 2012 at 1:42 AM, Alban Hertroys <haramrae@gmail.com> wrote:
> On 3 Jan 2012, at 5:20, 邓尧 wrote:
>
>> Hi,
>>
>> I'm new to pgsql, I need the do something like the "INSERT IGNORE" in mysql. After some searching I got a solution, which is adding a "do instead nothing" rule to the corresponding table, but it fails sometimes.
>
> Yeah, if a concurrent transaction tries to create the same record, one of the transactions is going to find that it already exists on transaction commit. An INSERT-rule is not going to protect you against that.

It will if you lock the table first in the same transaction...note
this will greatly hurt concurrency and you have to watch for
deadlocks.

INSERT...SELECT..WHERE is going to be vastly superior to a rule based
approach obviously.

merlin

Re: Duplicated entries are not ignored even if a "do instead nothing" rule is added.

От
Jon Nelson
Дата:
On Sun, Jan 8, 2012 at 7:13 PM, 邓尧 <torshie@gmail.com> wrote:
> I have enabled the autocommit feature of psycopg2, and removed all the
> transactions in source code, also changed the sql statement to the
> following:
>
> insert into ACCOUNT(HOME)
>     select "v1" as HOME
>     where not exists (select 1 from ACCOUNT where HOME = "v1")
>
> Surprisingly, I still got the unique constraint violation error, but much
> fewer than before ( only 3 during the last 3 days).
> Any suggestions? Could it be a bug in psycopg2 or postgresql?

Presumably, you need to add a distinct.
insert into ... select DISTINCT .... where not exists.


--
Jon

Re: Duplicated entries are not ignored even if a "do instead nothing" rule is added.

От
邓尧
Дата:

On Mon, Jan 9, 2012 at 9:53 AM, Jon Nelson <jnelson+pgsql@jamponi.net> wrote:
On Sun, Jan 8, 2012 at 7:13 PM, 邓尧 <torshie@gmail.com> wrote:
> I have enabled the autocommit feature of psycopg2, and removed all the
> transactions in source code, also changed the sql statement to the
> following:
>
> insert into ACCOUNT(HOME)
>     select "v1" as HOME
>     where not exists (select 1 from ACCOUNT where HOME = "v1")
>
> Surprisingly, I still got the unique constraint violation error, but much
> fewer than before ( only 3 during the last 3 days).
> Any suggestions? Could it be a bug in psycopg2 or postgresql?

Presumably, you need to add a distinct.
insert into ... select DISTINCT .... where not exists.
 
I don't think so. The the statement "select .. as .. where not exists" will never return more than one row.
 


--
Jon

Re: Duplicated entries are not ignored even if a "do instead nothing" rule is added.

От
Jasen Betts
Дата:
On 2012-01-09, 邓尧 <torshie@gmail.com> wrote:

> insert into ACCOUNT(HOME)
>     select "v1" as HOME
>     where not exists (select 1 from ACCOUNT where HOME =3D "v1")
>
> Surprisingly, I still got the unique constraint violation error, but much
> fewer than before ( only 3 during the last 3 days).
> Any suggestions? Could it be a bug in psycopg2 or postgresql?

No bug, that's ecpected behavior (unless you switch to serializable isolation
level)

when I need to insert if missing and need to avoid all errors I write it like
above but wrap it in a plpgsql function with an exception clause to handle the
error.

--
⚂⚃ 100% natural