Обсуждение: Bulk Insert

Поиск
Список
Период
Сортировка

Bulk Insert

От
David Jarvis
Дата:
Hi,

What is the fastest way to insert 237 million records into a table that has rules (for distributing the data across 84 child tables)?

First I tried inserts. No go.
Then I tried inserts with BEGIN/COMMIT. Not nearly fast enough.
Next, I tried COPY FROM, but then noticed the documentation states that the rules are ignored. (And it was having difficulties with the column order and date format -- it said that '1984-07-1' was not a valid integer; true, but a bit unexpected.)

Here is some example data:

station_id,taken,amount,category_id,flag
1,'1984-07-1',0,4,
1,'1984-07-2',0,4,
1,'1984-07-3',0,4,
1,'1984-07-4',0,4,T

Here is the table structure (with one rule included):

CREATE TABLE climate.measurement
(
  id bigserial NOT NULL,
  station_id integer NOT NULL,
  taken date NOT NULL,
  amount numeric(8,2) NOT NULL,
  category_id smallint NOT NULL,
  flag character varying(1) NOT NULL DEFAULT ' '::character varying
)
WITH (
  OIDS=FALSE
);
ALTER TABLE climate.measurement OWNER TO postgres;

-- Rule: "i_measurement_01_001 ON climate.measurement"

-- DROP RULE i_measurement_01_001 ON climate.measurement;

CREATE OR REPLACE RULE i_measurement_01_001 AS
    ON INSERT TO climate.measurement
   WHERE date_part('month'::text, new.taken)::integer = 1 AND new.category_id = 1 DO INSTEAD  INSERT INTO climate.measurement_01_001 (id, station_id, taken, amount, category_id, flag)
  VALUES (new.id, new.station_id, new.taken, new.amount, new.category_id, new.flag);

I can generate the data into any format.

Am looking for something that won't take four days.

I originally had the data in MySQL (still do), but am hoping to get a performance increase by switching to PostgreSQL and am eager to use its PL/R extensions for stats.

I was also thinking about using:

Any help, tips, or guidance would be greatly appreciated.

Thank you!

Dave

Re: Bulk Insert

От
Jasen Betts
Дата:
On 2010-05-16, David Jarvis <thangalin@gmail.com> wrote:
> --000e0cd516808361a40486acd38a
> Content-Type: text/plain; charset=ISO-8859-1
>
> Hi,
>
> What is the fastest way to insert 237 million records into a table that has
> rules (for distributing the data across 84 child tables)?

segregate the data first and then copy it into the child tables.

else copy it into a temp table and insert it from there using

 insert into CHILD_NAME select * from TEMPNAME where SEGREGATION CONDITION;

oryou may have success with the simple copy if you increase the number of file
handles and buffers sufficiently.


Re: Bulk Insert

От
Mladen Gogala
Дата:
On 05/16/2010 06:44 AM, Jasen Betts wrote:
> On 2010-05-16, David Jarvis<thangalin@gmail.com>  wrote:
>
>> --000e0cd516808361a40486acd38a
>> Content-Type: text/plain; charset=ISO-8859-1
>>
>> Hi,
>>
>> What is the fastest way to insert 237 million records into a table that has
>> rules (for distributing the data across 84 child tables)?
>>
> segregate the data first and then copy it into the child tables.
>
> else copy it into a temp table and insert it from there using
>
>   insert into CHILD_NAME select * from TEMPNAME where SEGREGATION CONDITION;
>
> oryou may have success with the simple copy if you increase the number of file
> handles and buffers sufficiently.
>
>
>

If you segregate data, as Jasen suggests, you can even populate the
child tables in parallel, which should provide some performance improvement.

--
Mladen Gogala
Sr. Oracle DBA
1500 Broadway
New York, NY 10036
(212) 329-5251
www.vmsinfo.com
The Leader in integrated Media Intelligence Solutions