Обсуждение: RFC : best way to distrubute IO from queries (low end server)

Поиск
Список
Период
Сортировка

RFC : best way to distrubute IO from queries (low end server)

От
Ow Mun Heng
Дата:
Final specs for the server is just an ordinary desktop fitted w/ 3
7200rpm 500GB drives & 1 7200 80GB drive / 1 GB ram / 2G processor
(single core)

number of records will be between 3 to 30 million rows.

Currently the process is

1. pull from mssql
2. \copy into PG temp table
3. insert into final table.

current tables are distributed via tablespaces. (current test server is
my laptop w/ 2 5400rpm drives hda & hdc.)  and I'm already seeing the
strain of the concurrent select/delete/insert/update and the additional
"client" pull.

So, I wanted to ask the list for advice on how to tread w/ regard to the
server.

Scenario 1.

1. temp table in it's own tablespace
2. final tables in it's own tablespace
3. pgxlog in the OS tablespace

scenario 2
1. temp table in it's own tablespace
2. final tables in it's own tablespace (Read Only Copy)
3. final tables in it's own tablespace (Read write Copy)
4. pgxlog in the OS tablespace

the idea of read and read/write copy is obtained from some
presentation/article I read whereby, all updates of new data is inserted
into the read/write copy up until 1 point (say lunch time / 5pm etc)
whereby the read-write copy[3] will be "renamed" and made into a
read-only copy and the previous  read-only copy[2] will be made into a
read-write copy.

The only thing I can't wrap my head around is how to keep these 2 copies
in sync. eg: when everything is being updated to [3] and users are
querying [2] and then at the switch over instance, how will the new data
be inputted to the read only copy? because if I switch back and forth
like this, there's bound to be gaps in the data.

Any comments appreciated.

Re: RFC : best way to distrubute IO from queries (low end server)

От
Ow Mun Heng
Дата:
Anyone? I know this is a low-end server so have to make the best out of
it..

On Tue, 2007-09-18 at 16:06 +0800, Ow Mun Heng wrote:
> Final specs for the server is just an ordinary desktop fitted w/ 3
> 7200rpm 500GB drives & 1 7200 80GB drive / 1 GB ram / 2G processor
> (single core)
>
> number of records will be between 3 to 30 million rows.
>
> Currently the process is
>
> 1. pull from mssql
> 2. \copy into PG temp table
> 3. insert into final table.
>
> current tables are distributed via tablespaces. (current test server is
> my laptop w/ 2 5400rpm drives hda & hdc.)  and I'm already seeing the
> strain of the concurrent select/delete/insert/update and the additional
> "client" pull.
>
> So, I wanted to ask the list for advice on how to tread w/ regard to the
> server.
>
> Scenario 1.
>
> 1. temp table in it's own tablespace
> 2. final tables in it's own tablespace
> 3. pgxlog in the OS tablespace
>
> scenario 2
> 1. temp table in it's own tablespace
> 2. final tables in it's own tablespace (Read Only Copy)
> 3. final tables in it's own tablespace (Read write Copy)
> 4. pgxlog in the OS tablespace
>
> the idea of read and read/write copy is obtained from some
> presentation/article I read whereby, all updates of new data is inserted
> into the read/write copy up until 1 point (say lunch time / 5pm etc)
> whereby the read-write copy[3] will be "renamed" and made into a
> read-only copy and the previous  read-only copy[2] will be made into a
> read-write copy.
>
> The only thing I can't wrap my head around is how to keep these 2 copies
> in sync. eg: when everything is being updated to [3] and users are
> querying [2] and then at the switch over instance, how will the new data
> be inputted to the read only copy? because if I switch back and forth
> like this, there's bound to be gaps in the data.
>
> Any comments appreciated.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster