Bulkloading using COPY - ignore duplicates?
От | Lee Kindness |
---|---|
Тема | Bulkloading using COPY - ignore duplicates? |
Дата | |
Msg-id | 15297.40695.914645.327132@elsick.csl.co.uk обсуждение исходный текст |
Ответ на | Bulkloading using COPY - ignore duplicates? (Lee Kindness <lkindness@csl.co.uk>) |
Список | pgsql-hackers |
Guys, I've made some inroads towards adding 'ignore duplicates' functionality to PostgreSQL's COPY command. I've updated the parser grammar for COPY FROM to now accept: COPY [ BINARY ] table [ WITH OIDS ] FROM { 'filename' | stdin } [ [USING] DELIMITERS 'delimiter' ] [ WITH [NULL AS'null string'] [IGNORE DUPLICATES] ] and added code to propagate this setting down to the CopyFrom function in backend/commands/copy.c. I also played around with _bt_check_unique, _bt_do_insert and btinsert to return NULL on duplicate rather than elog(ERROR). Likewise ExecInsertIndexTuples and index_insert were passed the ignore_duplicate flag and index_insert changed to elog(ERROR) if the return from the insert function was NULL and ignore_duplicate flag was false. These changes worked and gave the desired result for the COPY FROM command, however as many mentioned these changes are far too low level... After assessing the situation more fully, I believe the following change in CopyFrom would be more suitable: /* BEFORE ROW INSERT Triggers */ if (resultRelInfo->ri_TrigDesc && resultRelInfo->ri_TrigDesc->n_before_row[TRIGGER_EVENT_INSERT]> 0) { HeapTuple newtuple; newtuple = ExecBRInsertTriggers(estate,resultRelInfo, tuple); if (newtuple == NULL) /* "do nothing" */ skip_tuple = true; else if (newtuple != tuple)/* modified by Trigger(s) */ { heap_freetuple(tuple); tuple = newtuple; } } /* new code */ if( ignore_duplicates == true ) { if( duplicate index value ) skip_tuple= true; } if (!skip_tuple) { Now I imagine 'duplicate index value' would be functionally similar to _bt_check_unique but obviously higher level. Is there any existing code with the functionality I desire? Can anyone point me in the right way... Thanks, Lee Kindness. Lee Kindness writes:> I'm in the process of porting a large application from Ingres to> PostgreSQL. We make heavy use ofbulkloading using the 'COPY'> statement in ESQL/C. Consider the SQL statements below (in a psql> session on an arbitrarydatabase):> > CREATE TABLE copytest(f1 INTEGER, f2 INTEGER);> CREATE UNIQUE INDEX copytest_idx ON copytest USINGBTREE(f1, f2);> COPY copytest FROM '/tmp/copytest';> > Given the file /tmp/copytest:> > 1 1> 2 2> 3 3> 4 4> 4 4> 5 5> 6 6> > will result in the following output:> > ERROR: copy: line 5, Cannot insert a duplicatekey into unique index copytest_idx> > However my application code is assuming that duplicate rows will> simply beignored (this is the case in Ingres, and I believe Oracle's> bulkloader too). I propose modifying _bt_check_unique() in>/backend/access/nbtree/nbtinsert.c to emit a NOTICE (rather than> ERROR) elog() and return NULL (or appropriate) to thecalling function> if a duplicate key is detected and a 'COPY FROM' is in progress (add> new parameter to flag this).>> Would this seem a reasonable thing to do? Does anyone rely on COPY> FROM causing an ERROR on duplicate input? Would:>> WITH ON_DUPLICATE = CONTINUE|TERMINATE (or similar)> > need to be added to the COPY command (I hope not)?> > Thanks,>> -- > Lee Kindness, Senior Software Engineer> Concept Systems Limited.
В списке pgsql-hackers по дате отправления:
Предыдущее
От: "Kelly Harmon"Дата:
Сообщение: Re: Accessing Database files on a "read-only" medium...like a CD.