Обсуждение: raw data into table process

Поиск
Список
Период
Сортировка

raw data into table process

От
novice
Дата:
I am trying to record the following entries into a table.  I'm curious
to know if there's an efficient/effective way of doing this?  This
entries come from an ancient datalogger (note: separated by space and
uses YY/MM/DD format to record date)

Plain file sample.dat

3665   OK           BS 07/08/16 07:28
3665   CC           BS 07/08/16 07:29
3665   CS           BS 07/08/16 07:29
3665   CS           BS 07/08/16 07:29
4532   OK           BS 07/08/16 07:34
4004   OK           BS 07/08/16 07:51
3991   OK           BS 07/08/16 07:54


This is the table that I'm adding the entries to

CREATE TABLE maintenance
(
  maintenance_id SERIAL PRIMARY KEY,
  meter_id integer,
  status text,
  inspector text,
  inspection_date timestamp with time zone,
)

--  Begin SQL Script
--  First table to dump the records in
CREATE TABLE dataload1
(data text)

-- Dump records using \copy
\copy dataload1 FROM sample.dat

-- Second table to import unique records ONLY
CREATE TABLE dataload2 AS
  SELECT DISTINCT
      data FROM dataload1;

-- Now I update unique records into the maintenance table
-- maintenance_id is SERIAL so it will be populated automatically
INSERT INTO maintenance(meter_id, status, inspector, inspection_date)
SELECT substr("data", 1, 4)::int
  , substr("data", 8, 3)
  , substr("data", 21, 2)
  , (20||substr("data", 24, 2) ||'-'|| substr("data", 27, 2) ||'-'||
substr("data", 30, 2)||' '||substr("data", 33, 5))::timestamp as
inspection_date
  FROM dataload2
-- So the new records will also be in timestamp order
  ORDER BY inspection_date ;

-- Some housekeeping
VACUUM FULL VERBOSE ANALYZE maintenance;

-- Finally, drop the temporary tables
DROP TABLE dataload1
DROP TABLE dataload2

--  End SQL script

Any thoughts and suggestions welcome.

Re: raw data into table process

От
Steve Midgley
Дата:
Hi,

I'm not sure if you have access to a scripting language (like perl or 
ruby) but my experience is that if you transform the source text file 
into a secondary text file that postgres "copy" can read 
natively/directly into the data formats you want, the copy command will 
move everything into your primary table and way faster than using sql 
to transform.

I've had great experience with using Ruby/regex to do text file 
transforms such as this - it's amazingly fast (considering many say 
it's a performance dog).

The best way to get started is copy some sample data OUT of PG to a 
file and then use your scripting language to build a transformation 
that formats exactly like your sample. (I've imported polygon, circle 
and point types using this method, as well as dates, like what you're 
trying to do).

I hope that helps - drop me a line off-list if you'd like some sample 
Ruby code to read/write/transform your source.

Steve


At 06:33 AM 8/22/2007, pgsql-sql-owner@postgresql.org wrote:
>Date: Wed, 22 Aug 2007 14:36:15 +1000
>From: novice <user.postgresql@gmail.com>
>To: pgsql-general@postgresql.org, pgsql-sql@postgresql.org
>Subject: raw data into table process
>Message-ID: 
><ddcb1c340708212136x1c3a5168ya884b252fa434ed7@mail.gmail.com>
>
>I am trying to record the following entries into a table.  I'm curious
>to know if there's an efficient/effective way of doing this?  This
>entries come from an ancient datalogger (note: separated by space and
>uses YY/MM/DD format to record date)
>
>Plain file sample.dat
>
>3665   OK           BS 07/08/16 07:28
>3665   CC           BS 07/08/16 07:29
>3665   CS           BS 07/08/16 07:29
>3665   CS           BS 07/08/16 07:29
>4532   OK           BS 07/08/16 07:34
>4004   OK           BS 07/08/16 07:51
>3991   OK           BS 07/08/16 07:54
>
>
>This is the table that I'm adding the entries to
>
>CREATE TABLE maintenance
>(
>   maintenance_id SERIAL PRIMARY KEY,
>   meter_id integer,
>   status text,
>   inspector text,
>   inspection_date timestamp with time zone,
>)
>
>--  Begin SQL Script
>--  First table to dump the records in
>CREATE TABLE dataload1
>(data text)
>
>-- Dump records using \copy
>\copy dataload1 FROM sample.dat
>
>-- Second table to import unique records ONLY
>CREATE TABLE dataload2 AS
>   SELECT DISTINCT
>       data FROM dataload1;
>
>-- Now I update unique records into the maintenance table
>-- maintenance_id is SERIAL so it will be populated automatically
>INSERT INTO maintenance(meter_id, status, inspector, inspection_date)
>SELECT substr("data", 1, 4)::int
>   , substr("data", 8, 3)
>   , substr("data", 21, 2)
>   , (20||substr("data", 24, 2) ||'-'|| substr("data", 27, 2) ||'-'||
>substr("data", 30, 2)||' '||substr("data", 33, 5))::timestamp as
>inspection_date
>   FROM dataload2
>-- So the new records will also be in timestamp order
>   ORDER BY inspection_date ;
>
>-- Some housekeeping
>VACUUM FULL VERBOSE ANALYZE maintenance;
>
>-- Finally, drop the temporary tables
>DROP TABLE dataload1
>DROP TABLE dataload2
>
>--  End SQL script
>
>Any thoughts and suggestions welcome.