How to insert a bulk of data with unique-violations very fast

От: Torsten Zühlsdorff
Тема: How to insert a bulk of data with unique-violations very fast
Дата: ,
Msg-id: hu37gm$qrl$1@news.eternal-september.org
(см: обсуждение, исходный текст)
Ответы: Re: How to insert a bulk of data with unique-violations very fast  (Scott Marlowe)
Re: How to insert a bulk of data with unique-violations very fast  (Cédric Villemain)
Re: How to insert a bulk of data with unique-violations very fast  (Andy Colson)
Список: pgsql-performance

Скрыть дерево обсуждения

How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
 Re: How to insert a bulk of data with unique-violations very fast  (Scott Marlowe, )
 Re: How to insert a bulk of data with unique-violations very fast  (Scott Marlowe, )
  Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
   Re: How to insert a bulk of data with unique-violations very fast  (Scott Marlowe, )
    Re: How to insert a bulk of data with unique-violations very fast  ("Pierre C", )
     Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
      Re: How to insert a bulk of data with unique-violations very fast  ("Pierre C", )
       Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
        Re: How to insert a bulk of data with unique-violations very fast  ("Pierre C", )
 Re: How to insert a bulk of data with unique-violations very fast  (Cédric Villemain, )
  Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
 Re: How to insert a bulk of data with unique-violations very fast  (Andy Colson, )

Hello,

i have a set of unique data which about 150.000.000 rows. Regullary i
get a list of data, which contains multiple times of rows than the
already stored one. Often around 2.000.000.000 rows. Within this rows
are many duplicates and often the set of already stored data.
I want to store just every entry, which is not within the already stored
one. Also i do not want to store duplicates. Example:

Already stored set:
a,b,c

Given set:
a,b,a,c,d,a,c,d,b

Expected set after import:
a,b,c,d

I now looking for a faster way for the import. At the moment i import
the new data with copy into an table 'import'. then i remove the
duplicates and insert every row which is not already known. after that
import is truncated.

Is there a faster way? Should i just insert every row and ignore it, if
the unique constrain fails?

Here the simplified table-schema. in real life it's with partitions:
test=# \d urls
                          Tabelle »public.urls«
  Spalte |   Typ   |                       Attribute
--------+---------+-------------------------------------------------------
  url_id | integer | not null default nextval('urls_url_id_seq'::regclass)
  url    | text    | not null
Indexe:
     »urls_url« UNIQUE, btree (url)
     »urls_url_id« btree (url_id)

Thanks for every hint or advice! :)

Greetings from Germany,
Torsten
--
http://www.dddbl.de - ein Datenbank-Layer, der die Arbeit mit 8
verschiedenen Datenbanksystemen abstrahiert,
Queries von Applikationen trennt und automatisch die Query-Ergebnisse
auswerten kann.


В списке pgsql-performance по дате сообщения:

От: Bob Lunney
Дата:
Сообщение: Re: SELECT ignoring index even though ORDER BY and LIMIT present
От: Joshua Tolley
Дата:
Сообщение: Re: requested shared memory size overflows size_t