Re: Copy performance issues

Поиск
Список
Период
Сортировка
От Sergey Konoplev
Тема Re: Copy performance issues
Дата
Msg-id AANLkTi=4og2H8RAubEQDndxn8bMbPe0jPgaD93NiK5ps@mail.gmail.com
обсуждение исходный текст
Ответ на Copy performance issues  (Saadat Anwar <sanwar@asu.edu>)
Список pgsql-performance
Hi,

Try to split your data in small batches. It helped me in a similar
situation recently. I was loading about a million rows into the table
highly populated with indexes and different triggers and the batch
size was 100 (using COPY). The only thing I did with DDL is droped FKs
and recreated them after.

BTW question to gurus - why and in what cases small batch loading
could theoretically be faster then huge one if there is no another
load on the database but this?


On 18 August 2010 20:25, Saadat Anwar <sanwar@asu.edu> wrote:
> I am having severe COPY performance issues after adding indices. What used
> to take a few minutes (without indices) now takes several hours (with
> indices). I've tried to tweak the database configuration (based on Postgres
> documentation and forums), but it hasn't helped as yet. Perhaps, I haven't
> increased the limits sufficiently. Dropping and recreating indices may not
> be an option due to a long time it takes to rebuild all indices.
>
> I'll appreciate someone looking at my configuration and giving me a few
> ideas on how to increase the copy performance.
>
> Thanks.
> Saadat.
>
> Table structure:
> ===========
> table C:
>            Table "public.C"
>   Column  |       Type       | Modifiers
> ----------+------------------+-----------
>  sclk     | double precision | not null
>  chan     | smallint         | not null
>  det      | smallint         | not null
>  x        | real             | not null
>  y        | real             | not null
>  z        | real             | not null
>  r        | real             |
>  t        | real             |
>  lat      | real             |
>  lon      | real             |
>  a        | real             |
>  b        | real             |
>  c        | real             |
>  time     | real             |
>  qa       | smallint         | not null
>  qb       | smallint         | not null
>  qc       | smallint         | not null
> Indexes:
>     "C_pkey" PRIMARY KEY, btree (sclk, chan, det)
>
>
> partitioned into 19 sub-tables covering lat bands. For example:
>
> sub-table C0:
>    Inherits: C
>    Check constraints:
>        "C0_lat_check" CHECK (lat >= (-10::real) AND lat < 0::real)
>    Indexes:
>        "C0_pkey" PRIMARY KEY, btree (sclk, chan, det)
>        "C0_lat" btree (lat)
>        "C0_time" btree (time)
>        "C0_lon" btree (lon)
>
> sub-table C1:
>    Inherits: C
>    Check constraints:
>        "C1_lat_check" CHECK (lat >= (-20::real) AND lat < -10::real)
>    Indexes:
>        "C1_pkey" PRIMARY KEY, btree (sclk, chan, det)
>        "C1_lat" btree (lat)
>        "C1_time" btree (time)
>        "C1_lon" btree (lon)
>
> The partitions C?s are ~30G (328,000,000 rows) each except one, which is
> ~65G (909,000,000 rows). There are no rows in umbrella table C from which
> C1, C2, ..., C19 inherit. The data is partitioned in C1, C2, ..., C19 in
> order to promote better access. Most people will access the data in C by
> specifying a lat range. Also, C?s can become quite large over time.
>
> The COPY operation copies one file per partition, for each of the 19
> partitions. Each file is between 300,000 - 600,000 records.
>
>
> System configuration:
> ================
> 1. RHEL5 x86_64
> 2. 32G RAM
> 3. 8T RAID5 partition for database on a Dell PERC 5/E controller
>    (I understand that I'll never get fast inserts/updates on it based on
>     http://wiki.postgresql.org/wiki/SlowQueryQuestions but cannot change
>     to a RAID0+1 for now).
>     Database's filesystem is ext4 on LVM on RAID5.
> 4. Postgres 8.4.2
>     shared_buffers = 10GB
>     temp_buffers = 16MB
>     work_mem = 2GB
>     maintenance_work_mem = 256MB
>     max_files_per_process = 1000
>     effective_io_concurrency = 3
>     wal_buffers = 8MB
>     checkpoint_segments = 40
>     enable_seqscan = off
>     effective_cache_size = 16GB
> 5. analyze verbose; ran on the database before copy operation
>
> Bonnie++ output:
> =============
> Version  1.03       ------Sequential Output------ --Sequential Input-
> --Random-
>                     -Per Chr- --Block-- -Rewrite- -Per Chr- --Block--
> --Seeks--
> Machine        Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP  /sec
> %CP
> dbtest    64240M 78829  99 266172  42 47904   6 58410  72 116247   9 767.9
> 1
>                     ------Sequential Create------ --------Random
> Create--------
>                     -Create-- --Read--- -Delete-- -Create-- --Read---
> -Delete--
>               files  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec %CP  /sec
> %CP
>                 256 16229  98 371704  99 20258  36 16115  97 445680  99
> 17966  36
>
dbtest,64240M,78829,99,266172,42,47904,6,58410,72,116247,9,767.9,1,256,16229,98,371704,99,20258,36,16115,97,445680,99,17966,36
>
>
>



--
Sergey Konoplev

Blog: http://gray-hemp.blogspot.com /
Linkedin: http://ru.linkedin.com/in/grayhemp /
JID/GTalk: gray.ru@gmail.com / Skype: gray-hemp / ICQ: 29353802

В списке pgsql-performance по дате отправления:

Предыдущее
От: Greg Smith
Дата:
Сообщение: Re: Performance on new 64bit server compared to my 32bit desktop
Следующее
От: Dimitri
Дата:
Сообщение: Re: Fwd: Vacuum Full + Cluster + Vacuum full = non removable dead rows