Re: COPY v. java performance comparison

Поиск
Список
Период
Сортировка
От Steve Atkins
Тема Re: COPY v. java performance comparison
Дата
Msg-id 58C84EBC-B929-49F1-A209-315F65E326F6@blighty.com
обсуждение исходный текст
Ответ на COPY v. java performance comparison  (Rob Sargent <robjsargent@gmail.com>)
Ответы Re: COPY v. java performance comparison  (Rob Sargent <robjsargent@gmail.com>)
Список pgsql-general
On Apr 2, 2014, at 12:37 PM, Rob Sargent <robjsargent@gmail.com> wrote:

> I'm playing with various data models to compare performance and practicalities and not sure if I should be surprised
bythe numbers I'm getting. I hope this report isn't too wishy-washy for reasoned comment. 
>
> One model says a genotype is defined as follows:
>          Table "public.oldstyle"
> +-------------+--------------+-----------+
> |   Column    |     Type     | Modifiers |
> +-------------+--------------+-----------+
> | id          | uuid         | not null  |
> | sample_name | text         | not null  |
> | marker_name | text         | not null  |
> | allele1     | character(1) |           |
> | allele2     | character(1) |           |
> +-------------+--------------+-----------+
> (0. id is a Primary Key)
> (1. Take what you will from the table name.)
> (2. I hadn't thought of "char" type at this point)
> (3. Ultimately the names would become ids, RI included)
> (4. We're loading 39 samples and ~950K markers)
> I loaded 37M+ records using jOOQ (batching every 1000 lines) in 12+ hours (800+ records/sec).  Then I tried COPY and
killedthat after 11.25 hours when I realised that I had added on non-unque index on the name fields after the first
load.By that point is was on line 28301887, so ~0.75 done which implies it would have take ~15hours to complete. 
>
> Would the overhead of the index likely explain this decrease in throughput?
>
> Impatience got the better of me and I killed the second COPY.  This time it had done 54% of the file in 6.75 hours,
extrapolatingto roughly 12 hours to do the whole thing. 

That seems rather painfully slow. How exactly are you doing the bulk load? Are you CPU limited or disk limited?

Have you read http://www.postgresql.org/docs/current/interactive/populate.html ?

Cheers,
  Steve





В списке pgsql-general по дате отправления:

Предыдущее
От: David Johnston
Дата:
Сообщение: Re: pg_stat_activity
Следующее
От: Igor Neyman
Дата:
Сообщение: Re: Lock problem