Re: Bulkloading using COPY - ignore duplicates?

Поиск
Список
Период
Сортировка
От Lee Kindness
Тема Re: Bulkloading using COPY - ignore duplicates?
Дата
Msg-id 15290.57850.68004.391312@elsick.csl.co.uk
обсуждение исходный текст
Ответ на Re: Bulkloading using COPY - ignore duplicates?  (Peter Eisentraut <peter_e@gmx.net>)
Ответы Bulkloading using COPY - ignore duplicates?  (Lee Kindness <lkindness@csl.co.uk>)
Re: Bulkloading using COPY - ignore duplicates?  (Peter Eisentraut <peter_e@gmx.net>)
Список pgsql-hackers
Peter Eisentraut writes:> However, it seems to me that COPY ignoring duplicates can easily be> done by preprocessing
theinput file.
 

Or by post-processing, like (error checking cut):
void import_shots(char *impfile, int lineshoot_id){ char tab_name[128]; char tab_temp[128];
 frig_file(impfile); /* add the postgres header */ sprintf(tab_name, "shot_%d", lineshoot_id); sprintf(tab_temp,
"shot_%d_tmp",lineshoot_id);
 
 sprintf(cmd, "CREATE TEMPORARY TABLE %s AS SELECT * FROM shot",  tab_temp); EXEC SQL EXECUTE IMMEDIATE :cmd; EXEC SQL
COMMITWORK; /* will not work without comit here! */
 
 sprintf(cmd, "COPY BINARY %s FROM '%s'", tab_temp, impfile); append_page_alloc(cmd, tab_name, impfile, 1); EXEC SQL
EXECUTEIMMEDIATE :cmd; sprintf(cmd, "INSERT INTO %s SELECT DISTINCT ON(shot_time) * FROM %s",  tab_name, tab_temp);
EXECSQL EXECUTE IMMEDIATE :cmd;
 
 sprintf(cmd, "DROP TABLE %s", tab_temp); EXEC SQL EXECUTE IMMEDIATE :cmd;
 EXEC SQL COMMIT WORK ; remove(impfile);}

However this is adding significant time to the import
operation. Likewise I could loop round the input file first and hunt
for duplicates, again with a performance hit.

My main point is that Postgres can easily and quickly check for
duplicates during the COPY (as it does currently) and it adds zero
execution time to simply ignore these duplicate rows. Obviously this
is a useful feature otherwise Oracle, Ingres and other commercial
relational databases wouldn't feature similiar functionality.

Yes, in an ideal world the input to COPY should be clean and
consistent with defined indexes. However this is only really the case
when COPY is used for database/table backup and restore. It misses the
point that a major use of COPY is in speed optimisation on bulk
inserts...

Lee.


В списке pgsql-hackers по дате отправления:

Предыдущее
От: Tatsuo Ishii
Дата:
Сообщение: Re: Unicode combining characters
Следующее
От: Dave Harkness
Дата:
Сообщение: PROBLEM SOLVED: LOCK TABLE oddness in PLpgSQL function called via JDBC