Обсуждение: Optimizing large data loads

Поиск
Список
Период
Сортировка

Optimizing large data loads

От
"John Wells"
Дата:
Hi guys,

We have a Java process that uses Hibernate to load approximately 14 GB of
data.  One a dual-proc 2.4 GHZ Xeon with 2048 MB RAM, it's currently
taking over 13 hours to load (PostgreSQL 7.4.8).  We're flushing from
hibernate every 50 records.

I've turned fsync to false in postgresql.conf, and we've turned autocommit
off in our code.  Is there anything else I could try to temporarily speed
up inserts?

Thanks very much for your help.

John


Re: Optimizing large data loads

От
Richard Huxton
Дата:
John Wells wrote:
> Hi guys,
>
> We have a Java process that uses Hibernate to load approximately 14 GB of
> data.  One a dual-proc 2.4 GHZ Xeon with 2048 MB RAM, it's currently
> taking over 13 hours to load (PostgreSQL 7.4.8).  We're flushing from
> hibernate every 50 records.
>
> I've turned fsync to false in postgresql.conf, and we've turned autocommit
> off in our code.  Is there anything else I could try to temporarily speed
> up inserts?

You don't say what the limitations of Hibernate are. Usually you might
look to:
1. Use COPY not INSERTs
2. If not, block INSERTS into BEGIN/COMMIT transactions of say 100-1000
3. Turn fsync off
4. DROP/RESTORE constraints/triggers/indexes while you load your data
5. Increase sort_mem/work_mem in your postgresql.conf when recreating
indexes etc.
6. Use multiple processes to make sure the I/O is maxed out.

Any of those do-able?

--
   Richard Huxton
   Archonet Ltd

Re: Optimizing large data loads

От
"John Wells"
Дата:
Richard Huxton said:
> You don't say what the limitations of Hibernate are. Usually you might
> look to:
> 1. Use COPY not INSERTs

Not an option, unfortunately.

> 2. If not, block INSERTS into BEGIN/COMMIT transactions of say 100-1000

We're using 50/commit...we can easily up this I suppose.

> 3. Turn fsync off

Done.

> 4. DROP/RESTORE constraints/triggers/indexes while you load your data

Hmmm...will have to think about this a bit...not a bad idea but not sure
how we can make it work in our situation.

> 5. Increase sort_mem/work_mem in your postgresql.conf when recreating
> indexes etc.
> 6. Use multiple processes to make sure the I/O is maxed out.

5. falls in line with 4.  6. is definitely doable.

Thanks for the suggestions!

John