Обсуждение: Newbie question about importing text files...

Поиск

Список

Период

Сортировка

Newbie question about importing text files...

От

Jonathan Greenberg

Дата:

10 октября 2006 г., 02:50:57

So I've been looking at the documentation for COPY, and I'm curious about a
number of features which do not appear to be included, and whether these
functions are found someplace else:

1) How do I skip an arbitrary # of "header" lines (e.g. > 1 header line) to
begin reading in data?
2) Is it possible to screen out lines which begin with a comment character
(common outputs for csv/txt files from various programs)?
3) Is there a way to read in fixed width files?

Thanks!

--j

--
Jonathan A. Greenberg, PhD
NRC Research Associate
NASA Ames Research Center
MS 242-4
Moffett Field, CA 94035-1000
Office: 650-604-5896
Cell: 415-794-5043
AIM: jgrn307
MSN: jgrn307@hotmail.com

Re: Newbie question about importing text files...

От

"Merlin Moncure"

Дата:

10 октября 2006 г., 03:44:08

On 10/10/06, Jonathan Greenberg <jgreenberg@arc.nasa.gov> wrote:
> So I've been looking at the documentation for COPY, and I'm curious about a
> number of features which do not appear to be included, and whether these
> functions are found someplace else:

> 1) How do I skip an arbitrary # of "header" lines (e.g. > 1 header line) to
> begin reading in data?

if in 'csv' mode, you can set the header flag.  if not, standard unix
tools fit the bill:
cat import.txt | tail -n +2 > output.txt  <-- from bash
copy table foo from '/home/import.txt';

on windows? get cygwin!

or, if you prefer a more sql-ish solution, load your text data into
scratch tables (all text fields) as is and do filtering there. this
works pretty well actually.

copy table foo from '/home/import.txt';
create table bar as select * from foo offset 3;

theres a million way to do this, most inolve processing before or
after the copy statement, unless you happen to be importing csv
(often, but not always works as is) or data generated from postgresql.

> 2) Is it possible to screen out lines which begin with a comment character
> (common outputs for csv/txt files from various programs)?

see notes above.
1. import all data to scratch table
2. use sql

alternative: master perl (i prefer sql approach usually, perl scares me!)

> 3) Is there a way to read in fixed width files?

here again you could load the data into postgresql, one field per row
even, and process as such
create table import(bigfield text);
copy tabe import from 'foo.txt';

create table foo as select substr(bigfield, 1, 3)::int as a,
substr(bigfield, 4, 2)::char(2) as b, [...];

voila!

merlin

Re: Newbie question about importing text files...

От

Ron Johnson

Дата:

10 октября 2006 г., 06:02:47

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/10/06 01:44, Merlin Moncure wrote:
> On 10/10/06, Jonathan Greenberg <jgreenberg@arc.nasa.gov> wrote:
>> So I've been looking at the documentation for COPY, and I'm curious
>> about a
>> number of features which do not appear to be included, and whether these
>> functions are found someplace else:
>
>> 1) How do I skip an arbitrary # of "header" lines (e.g. > 1 header
>> line) to
>> begin reading in data?
>
> if in 'csv' mode, you can set the header flag.  if not, standard unix
> tools fit the bill:
> cat import.txt | tail -n +2 > output.txt  <-- from bash

That's not a very efficient way to handle large data sets.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFK2HGS9HxQb37XmcRAtxxAKCLKYg/RU23PUbmxQvl8MQVnVw/7ACgtwjj
eZJX4zGcck0Zq8INAHXEYlA=
=9EwP
-----END PGP SIGNATURE-----

Re: Newbie question about importing text files...

От

Ron Johnson

Дата:

10 октября 2006 г., 06:16:33

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/09/06 22:43, Jonathan Greenberg wrote:
> So I've been looking at the documentation for COPY, and I'm curious about a
> number of features which do not appear to be included, and whether these
> functions are found someplace else:
>
> 1) How do I skip an arbitrary # of "header" lines (e.g. > 1 header line) to
> begin reading in data?
> 2) Is it possible to screen out lines which begin with a comment character
> (common outputs for csv/txt files from various programs)?
> 3) Is there a way to read in fixed width files?

Both Python & Perl have CSV parsing modules, and can of course deal
with fixed-width data, let you skip comments, commit every N rows,
skip over committed records in can the load crashes, etc, etc, etc.

Probably not what you wanted to read, though, since performance
takes a big hit.  But it definitely works...

- --
Ron Johnson, Jr.
Jefferson LA  USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFK2UAS9HxQb37XmcRAmOqAJ4um4xLJnKBVQ2LWB1kYYIJyBNHZgCgv/gk
griA2wHFOEogj2WToM5mxpc=
=gn/u
-----END PGP SIGNATURE-----

Re: Newbie question about importing text files...

От

Scott Marlowe

Дата:

11 октября 2006 г., 17:30:11

On Tue, 2006-10-10 at 04:16, Ron Johnson wrote:
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
>
> On 10/09/06 22:43, Jonathan Greenberg wrote:
> > So I've been looking at the documentation for COPY, and I'm curious about a
> > number of features which do not appear to be included, and whether these
> > functions are found someplace else:
> >
> > 1) How do I skip an arbitrary # of "header" lines (e.g. > 1 header line) to
> > begin reading in data?

Using something like bash, you can do this:

tail -n $(( `wc -l bookability-pg.sql|grep -oP "[0-9]+"` -2 ))
bookability-pg.sql|wc -l

make it an alias and call it skip and have it take an argument:

Put this in .bashrc and run the .bashrc file ( . ~/.bashrc ):

skipper(){
        tail -n $(( `wc -l $1|grep -oP "[0-9]+"` -$2 )) $1
}

> > 2) Is it possible to screen out lines which begin with a comment character
> > (common outputs for csv/txt files from various programs)?

grep -vP "^#" filename

will remove all lines that start with #.  grep is your friend in unix.
If you don't have unix, get cygwin as recommended elsewhere.

> > 3) Is there a way to read in fixed width files?

If you don't mind playing about with sed, you could use it and bash
scripting to do it.  I have before.  It's ugly looking but easy enough
to do.  But I'd recommend a beginner use a scripting language they like,
one of the ones that starts with p is usually a good choice (perl,
python, php, ruby (wait, that's not a p!) etc...)

>
> Both Python & Perl have CSV parsing modules, and can of course deal
> with fixed-width data, let you skip comments, commit every N rows,
> skip over committed records in can the load crashes, etc, etc, etc.

php has a fgetcsv() built in as well.  It breaks down csv into an array
and is really easy to work with.

question on renaming a foreign key

От

Jonathan Vanasco

Дата:

11 октября 2006 г., 17:54:08

I made a  HUGE mistake, and used 'UK' as the abbreviation for the
united kingdom ( the ISO abbv is 'GB' )

I've got a database where 8 tables have an FKEY on a table
'location_country' , using the text 'uk' as the value -- so i've got
9 tables that I need to swap data out on

can anyone suggest a non-nightmarish way for me to do this ?

Re: question on renaming a foreign key

От

Richard Broersma Jr

Дата:

11 октября 2006 г., 18:10:51

> I made a  HUGE mistake, and used 'UK' as the abbreviation for the
> united kingdom ( the ISO abbv is 'GB' )
>
> I've got a database where 8 tables have an FKEY on a table
> 'location_country' , using the text 'uk' as the value -- so i've got
> 9 tables that I need to swap data out on
>
> can anyone suggest a non-nightmarish way for me to do this ?

If your tables are setup to "ON UPDATE CASCASE" then you are fine.

Just updated the main table and PostgreSQL will take care of the rest.

http://www.postgresql.org/docs/8.1/interactive/sql-createtable.html

Regards,

Richard Broersma Jr.

Re: question on renaming a foreign key

От

Ron Johnson

Дата:

11 октября 2006 г., 20:18:32

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/11/06 16:10, Richard Broersma Jr wrote:
>> I made a  HUGE mistake, and used 'UK' as the abbreviation for the
>> united kingdom ( the ISO abbv is 'GB' )
>>
>> I've got a database where 8 tables have an FKEY on a table
>> 'location_country' , using the text 'uk' as the value -- so i've got
>> 9 tables that I need to swap data out on
>>
>> can anyone suggest a non-nightmarish way for me to do this ?
>
> If your tables are setup to "ON UPDATE CASCASE" then you are fine.
>
> Just updated the main table and PostgreSQL will take care of the rest.

I doesn't appear that ALTER TABLE can change constraint characteristics.

You'd have to drop/recreate, no?

- --
Ron Johnson, Jr.
Jefferson LA  USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFLXqUS9HxQb37XmcRAhNZAJ9sNEgOoEivv89oVuH5yF7rm71ztQCeM9Cn
r6BtOzSoE3+4SxvAlu7teDk=
=KxPL
-----END PGP SIGNATURE-----

Re: question on renaming a foreign key

От

Richard Broersma Jr

Дата:

11 октября 2006 г., 20:53:59

> >> can anyone suggest a non-nightmarish way for me to do this ?
> > If your tables are setup to "ON UPDATE CASCASE" then you are fine.
> > Just updated the main table and PostgreSQL will take care of the rest.

> I doesn't appear that ALTER TABLE can change constraint characteristics.
> You'd have to drop/recreate, no?

Now that you mention it, I've never tried it or seen it done. Here I what I came up with:

CREATE TABLE foo
(
id  text constraint foo_pri_key Primary Key
);

NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "foo_pri_key" for table "foo"
CREATE TABLE

INSERT INTO foo (id) values('hello');
INSERT 0 1

CREATE TABLE bar
(
id serial constraint bar_pri_key Primary key,
fooid text constraint bar_foo_for_key References foo (id)
);

NOTICE:  CREATE TABLE will create implicit sequence "bar_id_seq" for serial column "bar.id"
NOTICE:  CREATE TABLE / PRIMARY KEY will create implicit index "bar_pri_key" for table "bar"
CREATE TABLE

INSERT INTO bar (fooid) VALUES ('hello');
INSERT 0 1

update foo SET id = 'goodbye';
ERROR:  update or delete on "foo" violates foreign key constraint "bar_foo_for_key" on "bar"
DETAIL:  Key (id)=(hello) is still referenced from table "bar".

alter TABLE bar DROP CONSTRAINT bar_foo_for_key;
ALTER TABLE

 ALTER TABLE bar ADD constraint new_bar_foo_for_key foreign key (fooid) references foo (id) on
update cascade;
ALTER TABLE

update foo set id = 'goodbye';
UPDATE 1

select * from bar;
 id |  fooid
----+---------
  1 | goodbye
(1 row)


It is nice to see things work so well. :-)

Regards,

Richard Broersma Jr.

Re: question on renaming a foreign key

От

Ron Johnson

Дата:

11 октября 2006 г., 21:04:16

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/11/06 18:53, Richard Broersma Jr wrote:
>>>> can anyone suggest a non-nightmarish way for me to do this
>>>> ?
>>> If your tables are setup to "ON UPDATE CASCASE" then you are
>>> fine. Just updated the main table and PostgreSQL will take
>>> care of the rest.
>
>> I doesn't appear that ALTER TABLE can change constraint
>> characteristics. You'd have to drop/recreate, no?
>
> Now that you mention it, I've never tried it or seen it done.
> Here I what I came up with:
>
[snip]
>
>
> It is nice to see things work so well. :-)

It would be interesting to see how well it works on a 50M row table.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFLYaSS9HxQb37XmcRAmO+AKDH1ILeOKJEYitfY6LUTrRZKjk33wCdGIZW
ZN0L1iAJphWTNGlAoz63E3E=
=3zQr
-----END PGP SIGNATURE-----

Re: question on renaming a foreign key

От

Richard Broersma Jr

Дата:

11 октября 2006 г., 21:15:04

> >> I doesn't appear that ALTER TABLE can change constraint
> >> characteristics. You'd have to drop/recreate, no?
> >
> > Now that you mention it, I've never tried it or seen it done.
> > Here I what I came up with:
> >
> [snip]
> >
> >
> > It is nice to see things work so well. :-)
>
> It would be interesting to see how well it works on a 50M row table.

Good point, but hopefully only small fraction of the 50 M rows would be affected and hopefully
updating primary keys isn't a common occurrence.

Regards,

Richard Broersma Jr.

Re: question on renaming a foreign key

От

Ron Johnson

Дата:

11 октября 2006 г., 21:34:40

-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

On 10/11/06 19:15, Richard Broersma Jr wrote:
>>>> I doesn't appear that ALTER TABLE can change constraint
>>>> characteristics. You'd have to drop/recreate, no?
>>> Now that you mention it, I've never tried it or seen it done.
>>>  Here I what I came up with:
>>>
>> [snip]
>>>
>>> It is nice to see things work so well. :-)
>> It would be interesting to see how well it works on a 50M row
>> table.
>
> Good point, but hopefully only small fraction of the 50 M rows
> would be affected and hopefully updating primary keys isn't a
> common occurrence.

I was thinking of the ALTER TABLE ... ADD CONSTRAINT column.

- --
Ron Johnson, Jr.
Jefferson LA  USA

Is "common sense" really valid?
For example, it is "common sense" to white-power racists that
whites are superior to blacks, and that those with brown skins
are mud people.
However, that "common sense" is obviously wrong.
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.5 (GNU/Linux)

iD8DBQFFLY2nS9HxQb37XmcRAjYNAKDibxx/tTAXZxg8uCp3jJfaTGeb/ACgqQdT
yCvtOQMz0vGak51Gg1sC/OE=
=Q+Fz
-----END PGP SIGNATURE-----

Re: question on renaming a foreign key

От

Brent Wood

Дата:

13 октября 2006 г., 17:01:38

Jonathan Vanasco wrote:
>
> I made a  HUGE mistake, and used 'UK' as the abbreviation for the
> united kingdom ( the ISO abbv is 'GB' )
>
> I've got a database where 8 tables have an FKEY on a table
> 'location_country' , using the text 'uk' as the value -- so i've got
> 9 tables that I need to swap data out on
>
> can anyone suggest a non-nightmarish way for me to do this ?

Umm, I think this should work, & isn't all that bad:

insert a 'gb' record in location_country
update <each of 8 tables> set country='gb' where country='uk'
delete the 'uk' record from location_country


Brent Wood

Вход в личный кабинет

Восстановление пароля

Подтверждение аккаунта

Изменение пароля

Обсуждение: Newbie question about importing text files...