Bulkloading using COPY - ignore duplicates?

Поиск
Список
Период
Сортировка
От Lee Kindness
Тема Bulkloading using COPY - ignore duplicates?
Дата
Msg-id 15297.40695.914645.327132@elsick.csl.co.uk
обсуждение исходный текст
Ответ на Bulkloading using COPY - ignore duplicates?  (Lee Kindness <lkindness@csl.co.uk>)
Список pgsql-hackers
Guys,

I've made some inroads towards adding 'ignore duplicates'
functionality to PostgreSQL's COPY command. I've updated the parser
grammar for COPY FROM to now accept:
COPY [ BINARY ] table [ WITH OIDS ]   FROM { 'filename' | stdin }   [ [USING] DELIMITERS 'delimiter' ]   [ WITH [NULL
AS'null string']          [IGNORE DUPLICATES] ]
 

and added code to propagate this setting down to the CopyFrom function
in backend/commands/copy.c.

I also played around with _bt_check_unique, _bt_do_insert and btinsert
to return NULL on duplicate rather than elog(ERROR). Likewise
ExecInsertIndexTuples and index_insert were passed the
ignore_duplicate flag and index_insert changed to elog(ERROR) if the
return from the insert function was NULL and ignore_duplicate flag was
false.

These changes worked and gave the desired result for the COPY FROM
command, however as many mentioned these changes are far too low
level... After assessing the situation more fully, I believe the
following change in CopyFrom would be more suitable:
    /* BEFORE ROW INSERT Triggers */    if (resultRelInfo->ri_TrigDesc &&
resultRelInfo->ri_TrigDesc->n_before_row[TRIGGER_EVENT_INSERT]> 0)    {        HeapTuple    newtuple;        newtuple =
ExecBRInsertTriggers(estate,resultRelInfo, tuple);
 
        if (newtuple == NULL)        /* "do nothing" */            skip_tuple = true;        else if (newtuple !=
tuple)/* modified by Trigger(s) */        {            heap_freetuple(tuple);            tuple = newtuple;        }
}
    /* new code */    if( ignore_duplicates == true )    {            if( duplicate index value )
skip_tuple= true;    }
 
    if (!skip_tuple)    {


Now I imagine 'duplicate index value' would be functionally similar to
_bt_check_unique but obviously higher level. Is there any existing
code with the functionality I desire? Can anyone point me in the right
way...

Thanks,

Lee Kindness.

Lee Kindness writes:> I'm in the process of porting a large application from Ingres to> PostgreSQL. We make heavy use
ofbulkloading using the 'COPY'> statement in ESQL/C. Consider the SQL statements below (in a psql> session on an
arbitrarydatabase):> >  CREATE TABLE copytest(f1 INTEGER, f2 INTEGER);>  CREATE UNIQUE INDEX copytest_idx ON copytest
USINGBTREE(f1, f2);>  COPY copytest FROM '/tmp/copytest';> > Given the file /tmp/copytest:> >  1    1>  2    2>  3
3> 4    4>  4    4>  5    5>  6    6> > will result in the following output:> >  ERROR:  copy: line 5, Cannot insert a
duplicatekey into unique index copytest_idx> > However my application code is assuming that duplicate rows will> simply
beignored (this is the case in Ingres, and I believe Oracle's> bulkloader too). I propose modifying _bt_check_unique()
in>/backend/access/nbtree/nbtinsert.c to emit a NOTICE (rather than> ERROR) elog() and return NULL (or appropriate) to
thecalling function> if a duplicate key is detected and a 'COPY FROM' is in progress (add> new parameter to flag
this).>> Would this seem a reasonable thing to do? Does anyone rely on COPY> FROM causing an ERROR on duplicate input?
Would:>>  WITH ON_DUPLICATE = CONTINUE|TERMINATE (or similar)> > need to be added to the COPY command (I hope not)?> >
Thanks,>> -- >  Lee Kindness, Senior Software Engineer>  Concept Systems Limited.
 


В списке pgsql-hackers по дате отправления:

Предыдущее
От: "Kelly Harmon"
Дата:
Сообщение: Re: Accessing Database files on a "read-only" medium...like a CD.
Следующее
От: Horak Daniel
Дата:
Сообщение: Re: Bulkloading using COPY - ignore duplicates?