Re: Load experimentation

Поиск
Список
Период
Сортировка
От Andy Colson
Тема Re: Load experimentation
Дата
Msg-id 4B1FA6A7.9050904@squeakycode.net
обсуждение исходный текст
Ответ на Load experimentation  (Ben Brehmer <benbrehmer@gmail.com>)
Ответы Re: Load experimentation  (Ben Brehmer <benbrehmer@gmail.com>)
Список pgsql-performance
On 12/07/2009 12:12 PM, Ben Brehmer wrote:
> Hello All,
>
> I'm in the process of loading a massive amount of data (500 GB). After
> some initial timings, I'm looking at 260 hours to load the entire 500GB.
> 10 days seems like an awfully long time so I'm searching for ways to
> speed this up. The load is happening in the Amazon cloud (EC2), on a
> m1.large instance:
> -7.5 GB memory
> -4 EC2 Compute Units (2 virtual cores with 2 EC2 Compute Units each)
> -64-bit platform
>
>
> So far I have modified my postgresql.conf file (PostgreSQL 8.1.3). The
> modifications I have made are as follows:
>
> shared_buffers = 786432
> work_mem = 10240
> maintenance_work_mem = 6291456
> max_fsm_pages = 3000000
> wal_buffers = 2048
> checkpoint_segments = 200
> checkpoint_timeout = 300
> checkpoint_warning = 30
> autovacuum = off
>
>
> There are a variety of instance types available in the Amazon cloud
> (http://aws.amazon.com/ec2/instance-types/), including high memory and
> high CPU. High memory instance types come with 34GB or 68GB of memory.
> High CPU instance types have a lot less memory (7GB max) but up to 8
> virtual cores. I am more than willing to change to any of the other
> instance types.
>
> Also, there is nothing else happening on the loading server. It is
> completely dedicated to the load.
>
> Any advice would be greatly appreciated.
>
> Thanks,
>
> Ben
>

I'm kind of curious, how goes the load?  Is it done yet?  Still looking at days'n'days to finish?

I was thinking... If the .sql files are really nicely formatted, it would not be too hard to whip up a perl script to
runas a filter to change the statements into copy's. 

Each file would have to only fill one table, and only contain inserts, and all the insert statements would have to set
thesame fields.  (And I'm sure there could be other problems). 

Also, just for the load, did you disable fsync?

-Andy

В списке pgsql-performance по дате отправления:

Предыдущее
От: Andres Freund
Дата:
Сообщение: Re: Checkpoint spikes
Следующее
От: Joseph S
Дата:
Сообщение: big select is resulting in a large amount of disk writing by kjournald