Re: Bulkloading using COPY - ignore duplicates?

Поиск
Список
Период
Сортировка
От Peter Eisentraut
Тема Re: Bulkloading using COPY - ignore duplicates?
Дата
Msg-id Pine.LNX.4.30.0112131714310.647-100000@peter.localdomain
обсуждение исходный текст
Ответ на Bulkloading using COPY - ignore duplicates?  (Lee Kindness <lkindness@csl.co.uk>)
Список pgsql-hackers
Lee Kindness writes:

>  1. Performance enhancements when doing doing bulk inserts - pre or
> post processing the data to remove duplicates is very time
> consuming. Likewise the best tool should always be used for the job at
> and, and for searching/removing things it's a database.

Arguably, a better tool for this is sort(1).  For instance, if you have a
typical copy input file with tab-separated fields and the primary key is
in columns 1 and 2, you can remove duplicates with

sort -k 1,2 -u INFILE > OUTFILE

To get a record of what duplicates were removed, use diff.

-- 
Peter Eisentraut   peter_e@gmx.net



В списке pgsql-hackers по дате отправления:

Предыдущее
От: Peter Eisentraut
Дата:
Сообщение: Re: Bulkloading using COPY - ignore duplicates?
Следующее
От: "Ross J. Reedstrom"
Дата:
Сообщение: Re: Bulkloading using COPY - ignore duplicates?