Re: Load experimentation

Поиск
Список
Период
Сортировка
От Ben Brehmer
Тема Re: Load experimentation
Дата
Msg-id 4B2158E6.8080504@gmail.com
обсуждение исходный текст
Ответ на Re: Load experimentation  (Andy Colson <andy@squeakycode.net>)
Список pgsql-performance
Hi Andy,

Load is chugging along. We've optimized our postgres conf as much as
possible but are seeing the inevitable I/O bottleneck. I had the same
thought as you (converting inserts into copy's) a while back but
unfortunately each file has many inserts into many different tables.
Potentially I could rip through this with a little MapReduce job on
50-100 nodes, which is still something I might do.

One thought we are playing with was taking advantage of 4 x 414GB EBS
devices in a RAID0 configuration. This would spread disk writes across 4
block devices.

Right now I'm wrapping about 1500 inserts in a transaction block. Since
its an I/O bottlenecks, COPY statements might not give me much advantage.

Its definitely a work in progress :)

Ben


On 09/12/2009 5:31 AM, Andy Colson wrote:
> On 12/07/2009 12:12 PM, Ben Brehmer wrote:
>> Hello All,
>>
>> I'm in the process of loading a massive amount of data (500 GB). After
>> some initial timings, I'm looking at 260 hours to load the entire 500GB.
>> 10 days seems like an awfully long time so I'm searching for ways to
>> speed this up. The load is happening in the Amazon cloud (EC2), on a
>> m1.large instance:
>> -7.5 GB memory
>> -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
>> -64-bit platform
>>
>>
>> So far I have modified my postgresql.conf file (PostgreSQL 8.1.3). The
>> modifications I have made are as follows:
>>
>> shared_buffers = 786432
>> work_mem = 10240
>> maintenance_work_mem = 6291456
>> max_fsm_pages = 3000000
>> wal_buffers = 2048
>> checkpoint_segments = 200
>> checkpoint_timeout = 300
>> checkpoint_warning = 30
>> autovacuum = off
>>
>>
>> There are a variety of instance types available in the Amazon cloud
>> (http://aws.amazon.com/ec2/instance-types/), including high memory and
>> high CPU. High memory instance types come with 34GB or 68GB of memory.
>> High CPU instance types have a lot less memory (7GB max) but up to 8
>> virtual cores. I am more than willing to change to any of the other
>> instance types.
>>
>> Also, there is nothing else happening on the loading server. It is
>> completely dedicated to the load.
>>
>> Any advice would be greatly appreciated.
>>
>> Thanks,
>>
>> Ben
>>
>
> I'm kind of curious, how goes the load?  Is it done yet?  Still
> looking at days'n'days to finish?
>
> I was thinking... If the .sql files are really nicely formatted, it
> would not be too hard to whip up a perl script to run as a filter to
> change the statements into copy's.
>
> Each file would have to only fill one table, and only contain inserts,
> and all the insert statements would have to set the same fields.  (And
> I'm sure there could be other problems).
>
> Also, just for the load, did you disable fsync?
>
> -Andy
>

В списке pgsql-performance по дате отправления:

Предыдущее
От: Robert Haas
Дата:
Сообщение: Re: Fw: Help me put 2 Gigs of RAM to use
Следующее
От: Michael Clemmons
Дата:
Сообщение: 8.4.1 ubuntu karmic slow createdb