Re: Load experimentation
От | Ben Brehmer |
---|---|
Тема | Re: Load experimentation |
Дата | |
Msg-id | 4B2158E6.8080504@gmail.com обсуждение исходный текст |
Ответ на | Re: Load experimentation (Andy Colson <andy@squeakycode.net>) |
Список | pgsql-performance |
Hi Andy, Load is chugging along. We've optimized our postgres conf as much as possible but are seeing the inevitable I/O bottleneck. I had the same thought as you (converting inserts into copy's) a while back but unfortunately each file has many inserts into many different tables. Potentially I could rip through this with a little MapReduce job on 50-100 nodes, which is still something I might do. One thought we are playing with was taking advantage of 4 x 414GB EBS devices in a RAID0 configuration. This would spread disk writes across 4 block devices. Right now I'm wrapping about 1500 inserts in a transaction block. Since its an I/O bottlenecks, COPY statements might not give me much advantage. Its definitely a work in progress :) Ben On 09/12/2009 5:31 AM, Andy Colson wrote: > On 12/07/2009 12:12 PM, Ben Brehmer wrote: >> Hello All, >> >> I'm in the process of loading a massive amount of data (500 GB). After >> some initial timings, I'm looking at 260 hours to load the entire 500GB. >> 10 days seems like an awfully long time so I'm searching for ways to >> speed this up. The load is happening in the Amazon cloud (EC2), on a >> m1.large instance: >> -7.5 GB memory >> -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each) >> -64-bit platform >> >> >> So far I have modified my postgresql.conf file (PostgreSQL 8.1.3). The >> modifications I have made are as follows: >> >> shared_buffers = 786432 >> work_mem = 10240 >> maintenance_work_mem = 6291456 >> max_fsm_pages = 3000000 >> wal_buffers = 2048 >> checkpoint_segments = 200 >> checkpoint_timeout = 300 >> checkpoint_warning = 30 >> autovacuum = off >> >> >> There are a variety of instance types available in the Amazon cloud >> (http://aws.amazon.com/ec2/instance-types/), including high memory and >> high CPU. High memory instance types come with 34GB or 68GB of memory. >> High CPU instance types have a lot less memory (7GB max) but up to 8 >> virtual cores. I am more than willing to change to any of the other >> instance types. >> >> Also, there is nothing else happening on the loading server. It is >> completely dedicated to the load. >> >> Any advice would be greatly appreciated. >> >> Thanks, >> >> Ben >> > > I'm kind of curious, how goes the load? Is it done yet? Still > looking at days'n'days to finish? > > I was thinking... If the .sql files are really nicely formatted, it > would not be too hard to whip up a perl script to run as a filter to > change the statements into copy's. > > Each file would have to only fill one table, and only contain inserts, > and all the insert statements would have to set the same fields. (And > I'm sure there could be other problems). > > Also, just for the load, did you disable fsync? > > -Andy >
В списке pgsql-performance по дате отправления: