Re: How to insert a bulk of data with unique-violations very fast

Список
Период
Сортировка
От Pierre C
Тема Re: How to insert a bulk of data with unique-violations very fast
Дата
Msg-id op.vd04fifueorkce@apollo13
обсуждение исходный текст
Ответ на Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff)
Список pgsql-performance
Дерево обсуждения
How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
 Re: How to insert a bulk of data with unique-violations very fast  (Scott Marlowe, )
 Re: How to insert a bulk of data with unique-violations very fast  (Scott Marlowe, )
  Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
   Re: How to insert a bulk of data with unique-violations very fast  (Scott Marlowe, )
    Re: How to insert a bulk of data with unique-violations very fast  ("Pierre C", )
     Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
      Re: How to insert a bulk of data with unique-violations very fast  ("Pierre C", )
       Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
        Re: How to insert a bulk of data with unique-violations very fast  ("Pierre C", )
 Re: How to insert a bulk of data with unique-violations very fast  (Cédric Villemain, )
  Re: How to insert a bulk of data with unique-violations very fast  (Torsten Zühlsdorff, )
 Re: How to insert a bulk of data with unique-violations very fast  (Andy Colson, )
>>> Within the data to import most rows have 20 till 50 duplicates.
>>> Sometime much more, sometimes less.
>>  In that case (source data has lots of redundancy), after importing the
>> data chunks in parallel, you can run a first pass of de-duplication on
>> the chunks, also in parallel, something like :
>>  CREATE TEMP TABLE foo_1_dedup AS SELECT DISTINCT * FROM foo_1;
>>  or you could compute some aggregates, counts, etc. Same as before, no
>> WAL needed, and you can use all your cores in parallel.
>>   From what you say this should reduce the size of your imported data
>> by a lot (and hence the time spent in the non-parallel operation).
>
> Thank you very much for this advice. I've tried it inanother project
> with similar import-problems. This really speed the import up.

Glad it was useful ;)

В списке pgsql-performance по дате отправления:

Предыдущее
От: "Pierre C"
Дата:
Сообщение: Re: Large (almost 50%!) performance drop after upgrading to 8.4.4?
Следующее
От: Robert Haas
Дата:
Сообщение: Re: No hash join across partitioned tables?