Re: raw data into table process

Поиск
Список
Период
Сортировка
От Steve Midgley
Тема Re: raw data into table process
Дата
Msg-id 20070822113752.C3F329F9584@postgresql.org
обсуждение исходный текст
Ответ на raw data into table process  (novice <user.postgresql@gmail.com>)
Список pgsql-sql
Hi,

I'm not sure if you have access to a scripting language (like perl or 
ruby) but my experience is that if you transform the source text file 
into a secondary text file that postgres "copy" can read 
natively/directly into the data formats you want, the copy command will 
move everything into your primary table and way faster than using sql 
to transform.

I've had great experience with using Ruby/regex to do text file 
transforms such as this - it's amazingly fast (considering many say 
it's a performance dog).

The best way to get started is copy some sample data OUT of PG to a 
file and then use your scripting language to build a transformation 
that formats exactly like your sample. (I've imported polygon, circle 
and point types using this method, as well as dates, like what you're 
trying to do).

I hope that helps - drop me a line off-list if you'd like some sample 
Ruby code to read/write/transform your source.

Steve


At 06:33 AM 8/22/2007, pgsql-sql-owner@postgresql.org wrote:
>Date: Wed, 22 Aug 2007 14:36:15 +1000
>From: novice <user.postgresql@gmail.com>
>To: pgsql-general@postgresql.org, pgsql-sql@postgresql.org
>Subject: raw data into table process
>Message-ID: 
><ddcb1c340708212136x1c3a5168ya884b252fa434ed7@mail.gmail.com>
>
>I am trying to record the following entries into a table.  I'm curious
>to know if there's an efficient/effective way of doing this?  This
>entries come from an ancient datalogger (note: separated by space and
>uses YY/MM/DD format to record date)
>
>Plain file sample.dat
>
>3665   OK           BS 07/08/16 07:28
>3665   CC           BS 07/08/16 07:29
>3665   CS           BS 07/08/16 07:29
>3665   CS           BS 07/08/16 07:29
>4532   OK           BS 07/08/16 07:34
>4004   OK           BS 07/08/16 07:51
>3991   OK           BS 07/08/16 07:54
>
>
>This is the table that I'm adding the entries to
>
>CREATE TABLE maintenance
>(
>   maintenance_id SERIAL PRIMARY KEY,
>   meter_id integer,
>   status text,
>   inspector text,
>   inspection_date timestamp with time zone,
>)
>
>--  Begin SQL Script
>--  First table to dump the records in
>CREATE TABLE dataload1
>(data text)
>
>-- Dump records using \copy
>\copy dataload1 FROM sample.dat
>
>-- Second table to import unique records ONLY
>CREATE TABLE dataload2 AS
>   SELECT DISTINCT
>       data FROM dataload1;
>
>-- Now I update unique records into the maintenance table
>-- maintenance_id is SERIAL so it will be populated automatically
>INSERT INTO maintenance(meter_id, status, inspector, inspection_date)
>SELECT substr("data", 1, 4)::int
>   , substr("data", 8, 3)
>   , substr("data", 21, 2)
>   , (20||substr("data", 24, 2) ||'-'|| substr("data", 27, 2) ||'-'||
>substr("data", 30, 2)||' '||substr("data", 33, 5))::timestamp as
>inspection_date
>   FROM dataload2
>-- So the new records will also be in timestamp order
>   ORDER BY inspection_date ;
>
>-- Some housekeeping
>VACUUM FULL VERBOSE ANALYZE maintenance;
>
>-- Finally, drop the temporary tables
>DROP TABLE dataload1
>DROP TABLE dataload2
>
>--  End SQL script
>
>Any thoughts and suggestions welcome.



В списке pgsql-sql по дате отправления:

Предыдущее
От: Dani Castaños
Дата:
Сообщение: Database creation script
Следующее
От: roopa perumalraja
Дата:
Сообщение: Re: Solution to retrieve first and last row for each minute