Re: How to insert a bulk of data with unique-violations very fast

Список
Период
Сортировка
От Pierre C
Тема Re: How to insert a bulk of data with unique-violations very fast
Дата
Msg-id op.vdxvs5l9eorkce@apollo13
обсуждение исходный текст
Ответ на Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff)
Ответы Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff)
Список pgsql-performance
Дерево обсуждения
How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
 Re: How to insert a bulk of data with unique-violations very fast  (Scott Marlowe, )
 Re: How to insert a bulk of data with unique-violations very fast  (Scott Marlowe, )
  Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
   Re: How to insert a bulk of data with unique-violations very fast  (Scott Marlowe, )
    Re: How to insert a bulk of data with unique-violations very fast  ("Pierre C", )
     Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
      Re: How to insert a bulk of data with unique-violations very fast  ("Pierre C", )
       Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
        Re: How to insert a bulk of data with unique-violations very fast  ("Pierre C", )
 Re: How to insert a bulk of data with unique-violations very fast  (Cédric Villemain, )
  Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
 Re: How to insert a bulk of data with unique-violations very fast  (Andy Colson, )
> Within the data to import most rows have 20 till 50 duplicates. Sometime
> much more, sometimes less.

In that case (source data has lots of redundancy), after importing the
data chunks in parallel, you can run a first pass of de-duplication on the
chunks, also in parallel, something like :

CREATE TEMP TABLE foo_1_dedup AS SELECT DISTINCT * FROM foo_1;

or you could compute some aggregates, counts, etc. Same as before, no WAL
needed, and you can use all your cores in parallel.

 From what you say this should reduce the size of your imported data by a
lot (and hence the time spent in the non-parallel operation).

With a different distribution, ie duplicates only between existing and
imported data, and not within the imported data, this strategy would be
useless.


В списке pgsql-performance по дате отправления:

Предыдущее
От: "Pierre C"
Дата:
Сообщение: Re: How to insert a bulk of data with unique-violations very fast
Следующее
От: Torsten Zühlsdorff
Дата:
Сообщение: Re: How to insert a bulk of data with unique-violations very fast